summaryrefslogtreecommitdiff
path: root/pdf/pdf_repair.c
AgeCommit message (Collapse)Author
2012-02-08Lock reworking.Robin Watts
This is a significant change to the use of locks in MuPDF. Previously, the user had the option of passing us lock/unlock functions for a single mutex as part of the allocation struct. Now we remove these entries from the allocation struct, and make a separate 'locks' struct. This enables people to use fz_alloc_default with locking. If multithreaded operation is required, then the user is required to create FZ_LOCK_MAX mutexes, which will be locked or unlocked by MuPDF calling the lock/unlock functions within the new fz_locks_context structure passed in at context creation. These mutexes are not required to be recursive (they may be, but MuPDF should never call them in this way). MuPDF avoids deadlocks by imposing a locking ordering on itself; a thread will never take lock n, if it already holds any lock i for which 0 <= i <= n. Currently, there are 4 locks used within MuPDF. Lock 0: The alloc lock; taken around all calls to user supplied (or default) allocation functions. Also taken around all accesses to the refs field of storable items. Lock 1: The store lock; taken whenever the store data structures (specifically the linked list pointers) are accessed. Lock 2: The file lock; taken whenever a thread is accessing the raw file. We use the debugging macros to insist that this is held whenever we do a file based seek or read. We also insist that this is never held when we resolve an indirect reference, as this can have the effect of moving the file pointer. Lock 3: The glyphcache lock; taken whenever a thread calls freetype, or accesses the glyphcache data structures. This introduces some complexities w.r.t type3 fonts. Locking can be hugely problematic, so to ease our minds as to the correctness of this code, we introduce some debugging macros. These compile away to nothing unless FITZ_DEBUG_LOCKING is defined. fz_assert_lock_held(ctx, lock) checks that we hold lock. fz_assert_lock_not_held(ctx, lock) checks that we do not hold lock. In addition fz_lock_debug_lock and fz_lock_debug_unlock are used on every fz_lock/fz_unlock to check the validity of the operation we are performing - in particular it checks that we do/do not already hold the lock we are trying to take/drop, and that by taking this lock we are not violating our defined locking order. The RESOLVE macro (used throughout the code to check whether we need to resolve an indirect reference) calls fz_assert_lock_not_held to ensure that we aren't about to resolve an indirect reference (and hence move the stream pointer) when the file is locked. In order to implement the file locking properly, pdf_open_stream (and friends) now lock the file as a side effect (because they fz_seek to the start of the stream). The lock is automatically dropped on an fz_close of such streams. Previously, the glyph cache was created in a context when it was first required; this presents problems as it can be shared between several contexts or not, depending on whether it is created before the contexts are cloned. We now always create it at startup, so it is always shared. This means that we need reference counting for the glyph caches. Added here. In fz_render_glyph, we take the glyph cache lock, and check to see whether the glyph is in the cache. If it is, we bump the refcount, drop the lock and returned the cached character. If it is not, we need to render the character. For freetype based fonts we keep the lock throughout the rendering process, thus ensuring that freetype is only called in a single threaded manner. For type3 fonts, however, we need to invoke the interpreter again to render the glyph streams. This can require reentrance to this routine. We therefore drop the glyph cache lock, call the interpreter to render us our pixmap, and take the lock again. This dropping and retaking of the lock introduces a possible race condition; 2 threads may try to render the same character at the same time. We therefore modify our hash table insert routines to behave differently if it comes to insert an entry only to find that an entry with the same key is already there. We spot this case; if we have just rendered a type3 glyph and when we try to insert it into the cache discover that someone has beaten us to it, we just discard our entry and use the cached one. Hopefully this will seldom be a problem in practise; to solve it properly would require greater complexity (probably involving spotting that another thread is already working on the desired rendering, and sleeping on a semaphore until it completes).
2012-01-27Rename pdf_xref type to pdf_document.Tor Andersson
2012-01-12Fix typo in pdf_repair.cTor Andersson
2012-01-10Fix many spelling errors.Sebastian Rasmussen
2012-01-07Fix compilation warningsSebastian Rasmussen
Add missing newline and remove excessive arguments.
2012-01-06pdf_repair tweaks; fail in fewer cases.Robin Watts
When repairing, if we hit a problem after having found a root object live with that root object rather than giving up completely. Also fix a memory leak, and cope better with trailing crap. Thanks to Zeniko for these.
2011-12-23Add some fz_vars to fix exception behaviour.Robin Watts
gcc 4.4.5 gives helpful warnings about variables that can become unset due to setjmp/longjmp usage. Fix that here. Thanks to Sebras.
2011-12-15Remove stray whitespace.Tor Andersson
2011-12-08Remove remaining fz_error_note calls in the pdf code.Tor Andersson
2011-12-08Move from volatile to fz_var.Robin Watts
When using exceptions (which are implemented using setjmp/longjmp), we need to be careful to ensure that variable values get written back before any exception happens. Previously we've done that using volatile, but that produces nasty warnings (and unduly limits the compilers freedom to optimise). Here we introduce a new macro fz_var that passes the address of the variable out of scope. This means that the compiler has to ensure that any changes to its value are written back to memory before calling any out of scope function.
2011-12-08Fix lack of output with "719 - EOF incorrectly detected.pdf"Robin Watts
When converting to exception handling I'd messed up an error handling case; when failing to pdf_lex in pdf_repair_xref I had allowed the error to just carry on being thrown rather than catching it and cleaning up. This was resulting in not getting any output for the above file, rather than outputting as much as we could. Simple fix.
2011-11-25Merge branch 'master' into contextRobin Watts
2011-11-24Bug 692506: Improve repairing by accepting broken dictionaries.Robin Watts
Adopt Zenikos patch from bug 692506; if a dict fails to parse, then create an empty one and continue. The repaired document will be incomplete, but we may well get something useful out of it.
2011-11-17Bug 692424: make repair cope better with missing endobjRobin Watts
Previously when parsing an object with a missing endobj, the code would consume the header of the following object. Here we amend the code to give up searching for an endobj if it finds an integer (presumed to be the start of the next object). We backtrack over that integer and carry on.
2011-10-04Move to exception handling rather than error passing throughout.Robin Watts
This frees us from passing errors back everywhere, and hence enables us to pass results back as return values. Rather than having to explicitly check for errors everywhere and bubble them, we now allow exception handling to do the work for us; the downside to this is that we no longer emit as much debugging information as we did before (though this could be put back in). For now, the debugging information we have lost has been retained in comments with 'RJW:' at the start. This code needs fuller testing, but is being committed as a work in progress.
2011-09-21Add warning context.Tor Andersson
2011-09-21Rename malloc functions for arrays (fz_calloc and fz_realloc).Tor Andersson
2011-09-21Don't thread ctx through safe fz_obj functions.Tor Andersson
2011-09-15Add context to mupdf.Robin Watts
Huge pervasive change to lots of files, adding a context for exception handling and allocation. In time we'll move more statics into there. Also fix some for(i = 0; i < function(...); i++) calls.
2011-09-14Initial import of exception handling codeRobin Watts
Import exception handling code from WSS, modified to fit into the fitz world. With this code we have 'real' fz_try/fz_catch/fz_rethrow functions, handling a fz_except type. We therefore rename the existing fz_throw/ fz_catch/fz_rethrow to be fz_error_make/fz_error_handle/fz_error_note. We don't actually use fz_try/fz_catch/fz_rethrow yet...
2011-08-31Make in_array state a local variable to cope with broken XObject streams.Tor Andersson
2011-05-02pdf_repair.c: Skip first comment after version marker.Tor Andersson
Some particularly broken generators forget to terminate the comment with a newline. Skip the comment character so we'll get some garbage tokens that we can ignore, rather than consuming the innocent objects that follow on the same line as the %.
2011-04-14Fix bug #692153: skip PDF version marker when repairing.Tor Andersson
The file in question is missing newlines, causing the first two objects to be hidden because we treat the %PDF-1.3 version marker as a comment.
2011-04-07pdf: Purge unmaintained debug/log printing messages.Tor Andersson
2011-04-04Le Roi est mort, vive le Roi!Tor Andersson
The run-together words are dead! Long live the underscores! The postscript inspired naming convention of using all run-together words has served us well, but it is now time for more readable code. In this commit I have also added the sed script, rename.sed, that I used to convert the source. Use it on your patches and application code.
2011-04-04pdf: Rename mupdf directory.Tor Andersson