Age | Commit message (Collapse) | Author |
|
|
|
|
|
Sub functions that make up a stitching function can be evaluated with
the wrong number of inputs/outputs, so it is not necessary to throw an
exception if the number of inputs/outputs do not match when loading
sub functions.
|
|
This is just a lexical change, no semantic change as
the MAXN and MAXM constants are equal.
|
|
|
|
Previously sampled pdf functions having an overflow in the number of
samples were never caught until the memory allocator was triggered.
Now there is an upper bound of 100Mbyte (the same as for
fz_read_all()).
|
|
Any pdf function with incorrect number of subfunction boundaries
will either cause an exception or have excessive boundaries be
ignored.
|
|
Any pdf functions with incorrect number of constants will have their
constant values set to default values. Any excessive constants are
ignored.
|
|
Exponential pdf functions have constraints on their input values so
warn about out of range values when loading those functions. When
evaluation the functions, assume zero without warning.
|
|
Functions that have excessive dimension sizes will have those sizes
ignored, whereas functions that have too few dimension sizes will
cause an exception.
|
|
After this pdf functions that have malformed Decode/Encode arrays with
too few/many entries compared to the number of inputs/outputs will be
handled more gracefully. Those missing mappings will be set to default
values.
|
|
Functions requiring more inputs than available input values will have
those inputs set to zero. Similarly functions producing too few
outputs will have the remaining output values be set to zero. Any
excessive input values or output values will be ignored.
|
|
Both exponential and stitching functions are limited to having one
input. Make sure that any excessive inputs are ignored.
|
|
This will allow the loading of pdf functions to validate that a pdf
function has the correct number of inputs/outputs. Additionally it
will allow for handling pdf functions with incorrect number of
inputs/outputs.
|
|
Previously a pdf function having too many inputs or outputs would
cause and exception, now they will be handled silently.
There are two places pdf functions are used: for shadings and for
colorspace tint transforms. In both cases the number of inputs/outputs
may never be more than the number of components, i.e. limited to MAXN.
Additionally the number of inputs/outputs may never be less than than
the number of components, and there is always at least one component.
|
|
BitsPerSample is already screened later in the code for invalid
values, including the default value 0 returned by pdf_to_int().
|
|
|
|
|
|
Instead of using macros for min/max/abs/clamp, we move to using
inline functions. These are more typesafe, and should produce
equivalent code on compilers that support inline (i.e. pretty much
everything we care about these days).
People can always do their own macro versions if they prefer.
|
|
Currently pdf_lexbufs use a static scratch buffer for parsing. In
the main case this is 64K in size, but in other cases it can be
just 256 bytes; this causes problems when parsing long strings.
Even the 64K limit is an implementation limit of Acrobat, not an
architectural limit of PDF.
Change here to allow dynamic buffers. This means a slightly more
complex setup and destruction for each buffer, but more importantly
requires correct cleanup on errors. To avoid having to insert
lots more try/catch clauses this commit includes various changes to
the code so we reuse pdf_lexbufs where possible. This keeps the
speed up.
|
|
Don't reset the size of arrays until we have successfully resized them.
|
|
Put the logf call in it's own statement to fix a stupid header file
bug.
|
|
|
|
|
|
|
|
When bitshifting by a negative amount, we should shift right; thanks
to Sebras' work in this area, I spotted that we are attempting to
shift right by a negative number.
|
|
Floating point numbers are now clamped, division by zero is
approximated by minimum or maximum value and NaN results in 1.0.
|
|
Attempt to separate public API from internal functions.
|
|
|
|
Currently, we are in the slightly strange position of having
the PDF specific object types as part of fitz. Here we pull
them out into the pdf layer instead. This has been made possible
by the recent changes to make the store no longer be tied to
having fz_obj's as keys.
Most of this work is a simple huge rename; to help customers who
may have code that use such functions we have provided a sed
script to do the renaming; scripts/rename2.sed.
Various other small tweaks are required; the store used to have
some debugging code that still required knowledge of fz_obj
types - we extract that into a nicer 'type' based function
pointer. Also, the type 3 font handling used to have an fz_obj
pointer for type 3 resources, and therefore needed to know how
to free this; this has become a void * with a function to free
it.
|
|
A huge amount (20%+ on some files) of our runtime is spent in
fz_atof. A survey of results on the net suggests we will get
much better speed by writing our own atof.
Part of the job of doing this involves parsing the string to
identify the component parts of the number - ludicrously, we
are already doing this as part of the lexing process, so it
would make sense to do the atoi/atof as part of this process.
In order to do this, we need somewhere to store the lexed
results; rather than add a float * and an int * to every single
pdf_lex call, we generalise the calls to pass a pdf_lexbuf *
pointer instead of separate buffer/max/string length pointers.
This should help us overall.
|
|
Introduce a new 'fz_image' type; this type contains rudimentary
information about images (such as native, size, colorspace etc)
and a function to call to get a pixmap of that image (with a
size hint).
Instead of passing pixmaps through the device interface (and
holding pixmaps in the display list) we now pass images instead.
The rendering routines therefore call fz_image_to_pixmap to get
pixmaps to render, and fz_pixmap_drop those afterwards.
The file format handling routines therefore need to produce
images rather than pixmaps; xps and cbz currently just wrap
pixmaps as images. PDF is more involved.
The stream handling routines in PDF have been altered so that
they can recognise when the last stream entry in a filter
dictionary is an image decoding filter. Rather than applying
this filter, they read and store the parameters into a
pdf_image_params structure, and stop decoding at that point.
This allows us to read the compressed data for an image into
memory as a block. We can then restart the image decode process
later.
pdf_images therefore consist of the compressed image data for
images. When a pixmap is requested for such an image, the code
checks to see if we have one (of an appropriate size), and if
not, decodes it.
The size hint is used to determine whether it is possible to
subsample the image; currently this is only supported for
JPEGs, but we could add generic subsampling code later.
In order to handle caching the produced images, various changes
have been made to the store and the underlying hash table.
Previously the store was indexed purely by fz_obj keys; we don't
have an fz_obj key any more, so have extended the store by adding
a concept of a key 'type'. A key type is a pointer to a set of
functions that keep/drop/compare and make a hashable key from
a key pointer.
We make a pdf_store.c file that contains functions to offer the
existing fz_obj based functions, and add a new 'type' for keys
(based on the fz_image handle, and the subsample factor) in the
pdf_image.c file.
While working on this, a problem became apparent in the existing
store codel; fz_obj objects had no protection on their reference
counts, hence an interpreter thread could try to alter a ref count
at the same time as a malloc caused an eviction from the store.
This has been solved by using the alloc lock as protection. This in
turn requires some tweaks to the code to make sure we don't try
and keep/drop fz_obj's from the store code while the alloc lock is
held.
A side effect of this work is that when a hash table is created, we
inform it what lock should be used to protect its innards (if any).
If the alloc lock is used, the insert method knows to drop/retake it
to allow it to safely expand the hash table. Callers to the hash
functions have the responsibility of taking/dropping the appropriate
lock, and ensuring that they cope with the possibility that insert
might drop the alloc lock, causing race conditions.
|
|
This is a significant change to the use of locks in MuPDF.
Previously, the user had the option of passing us lock/unlock
functions for a single mutex as part of the allocation struct.
Now we remove these entries from the allocation struct, and
make a separate 'locks' struct. This enables people to use
fz_alloc_default with locking.
If multithreaded operation is required, then the user is
required to create FZ_LOCK_MAX mutexes, which will be locked
or unlocked by MuPDF calling the lock/unlock functions within
the new fz_locks_context structure passed in at context creation.
These mutexes are not required to be recursive (they may be, but
MuPDF should never call them in this way). MuPDF avoids deadlocks
by imposing a locking ordering on itself; a thread will never take
lock n, if it already holds any lock i for which 0 <= i <= n.
Currently, there are 4 locks used within MuPDF.
Lock 0: The alloc lock; taken around all calls to user supplied
(or default) allocation functions. Also taken around all accesses
to the refs field of storable items.
Lock 1: The store lock; taken whenever the store data structures
(specifically the linked list pointers) are accessed.
Lock 2: The file lock; taken whenever a thread is accessing the raw
file. We use the debugging macros to insist that this is held
whenever we do a file based seek or read. We also insist that this
is never held when we resolve an indirect reference, as this can
have the effect of moving the file pointer.
Lock 3: The glyphcache lock; taken whenever a thread calls freetype,
or accesses the glyphcache data structures. This introduces some
complexities w.r.t type3 fonts.
Locking can be hugely problematic, so to ease our minds as to
the correctness of this code, we introduce some debugging macros.
These compile away to nothing unless FITZ_DEBUG_LOCKING is defined.
fz_assert_lock_held(ctx, lock) checks that we hold lock.
fz_assert_lock_not_held(ctx, lock) checks that we do not hold lock.
In addition fz_lock_debug_lock and fz_lock_debug_unlock are used
on every fz_lock/fz_unlock to check the validity of the operation
we are performing - in particular it checks that we do/do not already
hold the lock we are trying to take/drop, and that by taking this
lock we are not violating our defined locking order.
The RESOLVE macro (used throughout the code to check whether we need
to resolve an indirect reference) calls fz_assert_lock_not_held to
ensure that we aren't about to resolve an indirect reference (and
hence move the stream pointer) when the file is locked.
In order to implement the file locking properly, pdf_open_stream
(and friends) now lock the file as a side effect (because they
fz_seek to the start of the stream). The lock is automatically
dropped on an fz_close of such streams.
Previously, the glyph cache was created in a context when it was first
required; this presents problems as it can be shared between several
contexts or not, depending on whether it is created before the
contexts are cloned. We now always create it at startup, so it is
always shared.
This means that we need reference counting for the glyph caches.
Added here.
In fz_render_glyph, we take the glyph cache lock, and check to see
whether the glyph is in the cache. If it is, we bump the refcount,
drop the lock and returned the cached character. If it is not, we
need to render the character.
For freetype based fonts we keep the lock throughout the rendering
process, thus ensuring that freetype is only called in a single
threaded manner.
For type3 fonts, however, we need to invoke the interpreter again
to render the glyph streams. This can require reentrance to this
routine. We therefore drop the glyph cache lock, call the
interpreter to render us our pixmap, and take the lock again.
This dropping and retaking of the lock introduces a possible race
condition; 2 threads may try to render the same character at the
same time. We therefore modify our hash table insert routines to
behave differently if it comes to insert an entry only to find
that an entry with the same key is already there.
We spot this case; if we have just rendered a type3 glyph and when
we try to insert it into the cache discover that someone has beaten
us to it, we just discard our entry and use the cached one.
Hopefully this will seldom be a problem in practise; to solve it
properly would require greater complexity (probably involving
spotting that another thread is already working on the desired
rendering, and sleeping on a semaphore until it completes).
|
|
|
|
The operator list in parse_code is binary searched through, hence
operators must be in alphabetical order. Thanks to Brian Adams for
pointing out the mistake here.
|
|
Add 2 missing cases to parse_code.
|
|
Remove remnants of old tests that are no longer required.
|
|
|
|
|
|
When we moved over to a context based system, we laid the foundation
for a thread-safe mupdf. This commit should complete that process.
Firstly, fz_clone_context is properly implemented so that it
makes a new context, but shares certain sections (currently
just the allocator, and the store).
Secondly, we add locking (to parts of the code that have
previously just had placeholder LOCK/UNLOCK comments). Functions
to lock and unlock a mutex are added to the allocator structure;
omit these (as is the case today) and no multithreading is
(safely) possible. The context will refuse to clone if these are
not provided.
Finally we flesh out the LOCK/UNLOCK comments to be real calls of
the functions - unfortunately this requires us to plumb fz_context
into the fz_keep_storable function (and all the fz_keep_xxx
functions that call it). This is the largest section of the patch.
No changes expected to any test files.
|
|
|
|
|
|
The new fz_malloc_struct(A,B) macro allocates sizeof(B) bytes using
fz_malloc, and then passes the resultant pointer to Memento_label
to label it with "B".
This costs nothing in non-memento builds, but gives much nicer
listings of leaked blocks when memento is enabled.
|
|
Fix warnings/errors thrown up by the last few commits (which were
only tested on windows).
|
|
|
|
Firstly, we rename pdf_store to fz_store, reflecting the fact that
there are no pdf specific dependencies on it.
Next, we rework it so that all the objects that can be stored in
the store start with an fz_storable structure. This consists of
a reference count, and a function used to free the object when
the reference count reaches zero.
All the keep/drop functions are then reimplemented by calling
fz_keep_sharable/fz_drop_sharable. The 'drop' functions as supplied
by the callers are thus now 'free' functions, only called if
the reference count drops to 0.
The store changes to keep all the items in the store in the linked
list (which becomes a doubly linked one). We still make use of
the hashtable to index into this list quickly, but we now have
the objects in an LRU ordering within the list.
Every object is put into the store, with a size record; this is
an estimate of how much memory would be freed by freeing that
object.
The store is moved into the context and given a maximum size;
when new things are inserted into the store, care is taken to
ensure that we do not expand beyond this size. We evict any
stored items (that are not in use) starting from the least
recently used.
Finding an object in the store now takes a reference to it already.
LOCK and UNLOCK comments are used to indicate where locks need to
be taken and released to ensure thread safety.
|
|
|
|
When using exceptions (which are implemented using setjmp/longjmp), we
need to be careful to ensure that variable values get written back
before any exception happens.
Previously we've done that using volatile, but that produces nasty
warnings (and unduly limits the compilers freedom to optimise). Here
we introduce a new macro fz_var that passes the address of the variable
out of scope. This means that the compiler has to ensure that any
changes to its value are written back to memory before calling any
out of scope function.
|
|
This frees us from passing errors back everywhere, and hence enables us
to pass results back as return values.
Rather than having to explicitly check for errors everywhere and bubble
them, we now allow exception handling to do the work for us; the
downside to this is that we no longer emit as much debugging information
as we did before (though this could be put back in). For now, the
debugging information we have lost has been retained in comments
with 'RJW:' at the start.
This code needs fuller testing, but is being committed as a work in
progress.
|
|
|