Age | Commit message (Collapse) | Author |
|
Fixes http://bugs.ghostscript.com/show_bug.cgi?id=693314
|
|
Conflicts:
Makefile
apps/mudraw.c
pdf/pdf_write.c
win32/libmupdf-v8.vcproj
|
|
|
|
Previously, during repair of a pdf, an object stream was loaded and an
attempt was made at repairing the objects stored inside the object
stream. Failure to repair the stream caused an exception which was not
handled by the code loading the object stream, it was just passed on.
This meant that the loaded object stream caused a memory leak. Now we
catch that exception, free the object stream and rethrow the exception.
|
|
|
|
|
|
Out of range object numbers cause the repaired object to be
ignored. Out of range generation numbers are clamped to the
permitted range.
|
|
|
|
Instead of using macros for min/max/abs/clamp, we move to using
inline functions. These are more typesafe, and should produce
equivalent code on compilers that support inline (i.e. pretty much
everything we care about these days).
People can always do their own macro versions if they prefer.
|
|
Currently pdf_lexbufs use a static scratch buffer for parsing. In
the main case this is 64K in size, but in other cases it can be
just 256 bytes; this causes problems when parsing long strings.
Even the 64K limit is an implementation limit of Acrobat, not an
architectural limit of PDF.
Change here to allow dynamic buffers. This means a slightly more
complex setup and destruction for each buffer, but more importantly
requires correct cleanup on errors. To avoid having to insert
lots more try/catch clauses this commit includes various changes to
the code so we reuse pdf_lexbufs where possible. This keeps the
speed up.
|
|
Previously, before interpreting a pages content stream we would
load it entirely into a buffer. Then we would interpret that
buffer. This has a cost in memory use.
Here, we update the code to read from a stream on the fly.
This has required changes in various different parts of the code.
Firstly, we have removed all use of the FILE lock - as stream
reads can now safely be interrupted by resource (or object) reads
from elsewhere in the file, the file lock becomes a very hard
thing to maintain, and doesn't actually benefit us at all. The
choices were to either use a recursive lock, or to remove it
entirely; I opted for the latter.
The file lock enum value remains as a placeholder for future use in
extendable data streams.
Secondly, we add a new 'concat' filter that concatenates a series of
streams together into one, optionally putting whitespace between each
stream (as the pdf parser requires this).
Finally, we change page/xobject/pattern content streams to work
on the fly, but we leave type3 glyphs using buffers (as presumably
these will be run repeatedly).
|
|
While debugging Bug 692943, I spotted a case where we can attempt to
unlock the file while we don't hold the file lock due to an error
being thrown while we momentarily drop that lock. Simple solution
is to add a new fz_try()/fz_catch() to retake the lock in such
an error circumstance.
|
|
Attempt to separate public API from internal functions.
|
|
Currently, we are in the slightly strange position of having
the PDF specific object types as part of fitz. Here we pull
them out into the pdf layer instead. This has been made possible
by the recent changes to make the store no longer be tied to
having fz_obj's as keys.
Most of this work is a simple huge rename; to help customers who
may have code that use such functions we have provided a sed
script to do the renaming; scripts/rename2.sed.
Various other small tweaks are required; the store used to have
some debugging code that still required knowledge of fz_obj
types - we extract that into a nicer 'type' based function
pointer. Also, the type 3 font handling used to have an fz_obj
pointer for type 3 resources, and therefore needed to know how
to free this; this has become a void * with a function to free
it.
|
|
A huge amount (20%+ on some files) of our runtime is spent in
fz_atof. A survey of results on the net suggests we will get
much better speed by writing our own atof.
Part of the job of doing this involves parsing the string to
identify the component parts of the number - ludicrously, we
are already doing this as part of the lexing process, so it
would make sense to do the atoi/atof as part of this process.
In order to do this, we need somewhere to store the lexed
results; rather than add a float * and an int * to every single
pdf_lex call, we generalise the calls to pass a pdf_lexbuf *
pointer instead of separate buffer/max/string length pointers.
This should help us overall.
|
|
This is a significant change to the use of locks in MuPDF.
Previously, the user had the option of passing us lock/unlock
functions for a single mutex as part of the allocation struct.
Now we remove these entries from the allocation struct, and
make a separate 'locks' struct. This enables people to use
fz_alloc_default with locking.
If multithreaded operation is required, then the user is
required to create FZ_LOCK_MAX mutexes, which will be locked
or unlocked by MuPDF calling the lock/unlock functions within
the new fz_locks_context structure passed in at context creation.
These mutexes are not required to be recursive (they may be, but
MuPDF should never call them in this way). MuPDF avoids deadlocks
by imposing a locking ordering on itself; a thread will never take
lock n, if it already holds any lock i for which 0 <= i <= n.
Currently, there are 4 locks used within MuPDF.
Lock 0: The alloc lock; taken around all calls to user supplied
(or default) allocation functions. Also taken around all accesses
to the refs field of storable items.
Lock 1: The store lock; taken whenever the store data structures
(specifically the linked list pointers) are accessed.
Lock 2: The file lock; taken whenever a thread is accessing the raw
file. We use the debugging macros to insist that this is held
whenever we do a file based seek or read. We also insist that this
is never held when we resolve an indirect reference, as this can
have the effect of moving the file pointer.
Lock 3: The glyphcache lock; taken whenever a thread calls freetype,
or accesses the glyphcache data structures. This introduces some
complexities w.r.t type3 fonts.
Locking can be hugely problematic, so to ease our minds as to
the correctness of this code, we introduce some debugging macros.
These compile away to nothing unless FITZ_DEBUG_LOCKING is defined.
fz_assert_lock_held(ctx, lock) checks that we hold lock.
fz_assert_lock_not_held(ctx, lock) checks that we do not hold lock.
In addition fz_lock_debug_lock and fz_lock_debug_unlock are used
on every fz_lock/fz_unlock to check the validity of the operation
we are performing - in particular it checks that we do/do not already
hold the lock we are trying to take/drop, and that by taking this
lock we are not violating our defined locking order.
The RESOLVE macro (used throughout the code to check whether we need
to resolve an indirect reference) calls fz_assert_lock_not_held to
ensure that we aren't about to resolve an indirect reference (and
hence move the stream pointer) when the file is locked.
In order to implement the file locking properly, pdf_open_stream
(and friends) now lock the file as a side effect (because they
fz_seek to the start of the stream). The lock is automatically
dropped on an fz_close of such streams.
Previously, the glyph cache was created in a context when it was first
required; this presents problems as it can be shared between several
contexts or not, depending on whether it is created before the
contexts are cloned. We now always create it at startup, so it is
always shared.
This means that we need reference counting for the glyph caches.
Added here.
In fz_render_glyph, we take the glyph cache lock, and check to see
whether the glyph is in the cache. If it is, we bump the refcount,
drop the lock and returned the cached character. If it is not, we
need to render the character.
For freetype based fonts we keep the lock throughout the rendering
process, thus ensuring that freetype is only called in a single
threaded manner.
For type3 fonts, however, we need to invoke the interpreter again
to render the glyph streams. This can require reentrance to this
routine. We therefore drop the glyph cache lock, call the
interpreter to render us our pixmap, and take the lock again.
This dropping and retaking of the lock introduces a possible race
condition; 2 threads may try to render the same character at the
same time. We therefore modify our hash table insert routines to
behave differently if it comes to insert an entry only to find
that an entry with the same key is already there.
We spot this case; if we have just rendered a type3 glyph and when
we try to insert it into the cache discover that someone has beaten
us to it, we just discard our entry and use the cached one.
Hopefully this will seldom be a problem in practise; to solve it
properly would require greater complexity (probably involving
spotting that another thread is already working on the desired
rendering, and sleeping on a semaphore until it completes).
|
|
|
|
|
|
|
|
Add missing newline and remove excessive arguments.
|
|
When repairing, if we hit a problem after having found a root object
live with that root object rather than giving up completely.
Also fix a memory leak, and cope better with trailing crap.
Thanks to Zeniko for these.
|
|
gcc 4.4.5 gives helpful warnings about variables that can become
unset due to setjmp/longjmp usage. Fix that here.
Thanks to Sebras.
|
|
|
|
|
|
When using exceptions (which are implemented using setjmp/longjmp), we
need to be careful to ensure that variable values get written back
before any exception happens.
Previously we've done that using volatile, but that produces nasty
warnings (and unduly limits the compilers freedom to optimise). Here
we introduce a new macro fz_var that passes the address of the variable
out of scope. This means that the compiler has to ensure that any
changes to its value are written back to memory before calling any
out of scope function.
|
|
When converting to exception handling I'd messed up an error handling
case; when failing to pdf_lex in pdf_repair_xref I had allowed the
error to just carry on being thrown rather than catching it and
cleaning up. This was resulting in not getting any output for the
above file, rather than outputting as much as we could.
Simple fix.
|
|
|
|
Adopt Zenikos patch from bug 692506; if a dict fails to parse, then
create an empty one and continue. The repaired document will be
incomplete, but we may well get something useful out of it.
|
|
Previously when parsing an object with a missing endobj, the
code would consume the header of the following object. Here
we amend the code to give up searching for an endobj if it
finds an integer (presumed to be the start of the next object).
We backtrack over that integer and carry on.
|
|
This frees us from passing errors back everywhere, and hence enables us
to pass results back as return values.
Rather than having to explicitly check for errors everywhere and bubble
them, we now allow exception handling to do the work for us; the
downside to this is that we no longer emit as much debugging information
as we did before (though this could be put back in). For now, the
debugging information we have lost has been retained in comments
with 'RJW:' at the start.
This code needs fuller testing, but is being committed as a work in
progress.
|
|
|
|
|
|
|
|
Huge pervasive change to lots of files, adding a context for exception
handling and allocation.
In time we'll move more statics into there.
Also fix some for(i = 0; i < function(...); i++) calls.
|
|
Import exception handling code from WSS, modified to fit into the
fitz world.
With this code we have 'real' fz_try/fz_catch/fz_rethrow functions,
handling a fz_except type. We therefore rename the existing fz_throw/
fz_catch/fz_rethrow to be fz_error_make/fz_error_handle/fz_error_note.
We don't actually use fz_try/fz_catch/fz_rethrow yet...
|
|
|
|
Some particularly broken generators forget to terminate the
comment with a newline. Skip the comment character so we'll
get some garbage tokens that we can ignore, rather than
consuming the innocent objects that follow on the same line
as the %.
|
|
The file in question is missing newlines, causing the first two
objects to be hidden because we treat the %PDF-1.3 version marker
as a comment.
|
|
|
|
The run-together words are dead! Long live the underscores!
The postscript inspired naming convention of using all run-together
words has served us well, but it is now time for more readable code.
In this commit I have also added the sed script, rename.sed, that I used
to convert the source. Use it on your patches and application code.
|
|
|