mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2012-05-08	Switch to reading content streams on the fly during interpretation.	Robin Watts
	Previously, before interpreting a pages content stream we would load it entirely into a buffer. Then we would interpret that buffer. This has a cost in memory use. Here, we update the code to read from a stream on the fly. This has required changes in various different parts of the code. Firstly, we have removed all use of the FILE lock - as stream reads can now safely be interrupted by resource (or object) reads from elsewhere in the file, the file lock becomes a very hard thing to maintain, and doesn't actually benefit us at all. The choices were to either use a recursive lock, or to remove it entirely; I opted for the latter. The file lock enum value remains as a placeholder for future use in extendable data streams. Secondly, we add a new 'concat' filter that concatenates a series of streams together into one, optionally putting whitespace between each stream (as the pdf parser requires this). Finally, we change page/xobject/pattern content streams to work on the fly, but we leave type3 glyphs using buffers (as presumably these will be run repeatedly).
2012-05-08	Implement dummy JavaScript engine just for the sake of viewing calc.pdf	Paul Gardiner

2012-05-08	First go at Javascript-engine abstract API	Paul Gardiner

2012-05-08	Forms: part of text field handling	Paul Gardiner

2012-05-08	Forms: handle field appearance change on mouse up/down	Paul Gardiner

2012-05-07	A few general utility functions added for the sake of the forms work	Paul Gardiner

2012-04-30	Proper fix for previous locking issues.	Robin Watts
	Sebras pointed out that my previous (untested) fix was wrong, as we don't resolve indirect objects until we attempt to interpret them. The fix is just to move the interpretation (pdf_to_int()) into the unlocked section.
2012-04-30	Simple mupdfposter app	Robin Watts
	Divides large format pdfs into a new pdf with multiple pages, that tile the original PDF.
2012-04-30	Fix locking issues discovered by Sebras.	Robin Watts
	When reading the xref, if certain objects are indirect references (Size, Prev, XrefStm) we can 'relock' a lock we already hold. Work around that here.
2012-04-21	Correct mistake in fz_meta.	Robin Watts
	We were mapping from one enum range to another, and then using the unmapped value.
2012-04-21	Big 692996: Eliminate recursion to avoid exception stack overflows.	Robin Watts
	Avoid recursion in pdf_load_page_tree_node. Avoid recursion (most of the time) in pdf_read_xref_sections.
2012-04-17	Add Meta interface to fz_document.	Robin Watts
	Use this to reintroduce "Document Properties..." in mupdf viewer.
2012-03-14	Warn not throw on indirection cycles in pdf_resolve_indirect.	Robin Watts
	When we fail to be able to cache an object, we warn and return NULL. An indirection cycle should probably be treated the same way. From SumatraMuPDF.patch - Many thanks.
2012-03-13	Rename some functions and accessors to be more consistent.	Tor Andersson
	Debug printing functions: debug -> print. Accessors: get noun attribute -> noun attribute. Find -> lookup when the returned value is not reference counted. pixmap_with_rect -> pixmap_with_bbox. We are reserving the word "find" to mean lookups that give ownership of objects to the caller. Lookup is used in other places where the ownership is not transferred, or simple values are returned. The rename is done by the sed script in scripts/rename3.sed
2012-03-07	Splitting tweaks.	Tor Andersson

2012-03-06	Split fitz.h/mupdf.h into internal/external headers.	Robin Watts
	Attempt to separate public API from internal functions.
2012-02-29	Fix trailing whitespace and mixed tabs/spaces in indentation.	Tor Andersson

2012-02-26	Move fz_obj to be pdf_obj.	Robin Watts
	Currently, we are in the slightly strange position of having the PDF specific object types as part of fitz. Here we pull them out into the pdf layer instead. This has been made possible by the recent changes to make the store no longer be tied to having fz_obj's as keys. Most of this work is a simple huge rename; to help customers who may have code that use such functions we have provided a sed script to do the renaming; scripts/rename2.sed. Various other small tweaks are required; the store used to have some debugging code that still required knowledge of fz_obj types - we extract that into a nicer 'type' based function pointer. Also, the type 3 font handling used to have an fz_obj pointer for type 3 resources, and therefore needed to know how to free this; this has become a void * with a function to free it.
2012-02-25	Revamp pdf lexing code	Robin Watts
	A huge amount (20%+ on some files) of our runtime is spent in fz_atof. A survey of results on the net suggests we will get much better speed by writing our own atof. Part of the job of doing this involves parsing the string to identify the component parts of the number - ludicrously, we are already doing this as part of the lexing process, so it would make sense to do the atoi/atof as part of this process. In order to do this, we need somewhere to store the lexed results; rather than add a float * and an int * to every single pdf_lex call, we generalise the calls to pass a pdf_lexbuf * pointer instead of separate buffer/max/string length pointers. This should help us overall.
2012-02-15	Add braces to resolve ambiguity.	Robin Watts

2012-02-15	Fix typo in comment.	Robin Watts
	CLUSTER_UNTESTED.
2012-02-15	Treat 0000000000 * n xref entries as free ones.	Robin Watts
	Quartz generated PDFs (and maybe others too) seem to use 000000000 65536 n to mean "free object" in defiance of the spec. Add special case code to mupdf to handle this.
2012-02-08	Lock reworking.	Robin Watts
	This is a significant change to the use of locks in MuPDF. Previously, the user had the option of passing us lock/unlock functions for a single mutex as part of the allocation struct. Now we remove these entries from the allocation struct, and make a separate 'locks' struct. This enables people to use fz_alloc_default with locking. If multithreaded operation is required, then the user is required to create FZ_LOCK_MAX mutexes, which will be locked or unlocked by MuPDF calling the lock/unlock functions within the new fz_locks_context structure passed in at context creation. These mutexes are not required to be recursive (they may be, but MuPDF should never call them in this way). MuPDF avoids deadlocks by imposing a locking ordering on itself; a thread will never take lock n, if it already holds any lock i for which 0 <= i <= n. Currently, there are 4 locks used within MuPDF. Lock 0: The alloc lock; taken around all calls to user supplied (or default) allocation functions. Also taken around all accesses to the refs field of storable items. Lock 1: The store lock; taken whenever the store data structures (specifically the linked list pointers) are accessed. Lock 2: The file lock; taken whenever a thread is accessing the raw file. We use the debugging macros to insist that this is held whenever we do a file based seek or read. We also insist that this is never held when we resolve an indirect reference, as this can have the effect of moving the file pointer. Lock 3: The glyphcache lock; taken whenever a thread calls freetype, or accesses the glyphcache data structures. This introduces some complexities w.r.t type3 fonts. Locking can be hugely problematic, so to ease our minds as to the correctness of this code, we introduce some debugging macros. These compile away to nothing unless FITZ_DEBUG_LOCKING is defined. fz_assert_lock_held(ctx, lock) checks that we hold lock. fz_assert_lock_not_held(ctx, lock) checks that we do not hold lock. In addition fz_lock_debug_lock and fz_lock_debug_unlock are used on every fz_lock/fz_unlock to check the validity of the operation we are performing - in particular it checks that we do/do not already hold the lock we are trying to take/drop, and that by taking this lock we are not violating our defined locking order. The RESOLVE macro (used throughout the code to check whether we need to resolve an indirect reference) calls fz_assert_lock_not_held to ensure that we aren't about to resolve an indirect reference (and hence move the stream pointer) when the file is locked. In order to implement the file locking properly, pdf_open_stream (and friends) now lock the file as a side effect (because they fz_seek to the start of the stream). The lock is automatically dropped on an fz_close of such streams. Previously, the glyph cache was created in a context when it was first required; this presents problems as it can be shared between several contexts or not, depending on whether it is created before the contexts are cloned. We now always create it at startup, so it is always shared. This means that we need reference counting for the glyph caches. Added here. In fz_render_glyph, we take the glyph cache lock, and check to see whether the glyph is in the cache. If it is, we bump the refcount, drop the lock and returned the cached character. If it is not, we need to render the character. For freetype based fonts we keep the lock throughout the rendering process, thus ensuring that freetype is only called in a single threaded manner. For type3 fonts, however, we need to invoke the interpreter again to render the glyph streams. This can require reentrance to this routine. We therefore drop the glyph cache lock, call the interpreter to render us our pixmap, and take the lock again. This dropping and retaking of the lock introduces a possible race condition; 2 threads may try to render the same character at the same time. We therefore modify our hash table insert routines to behave differently if it comes to insert an entry only to find that an entry with the same key is already there. We spot this case; if we have just rendered a type3 glyph and when we try to insert it into the cache discover that someone has beaten us to it, we just discard our entry and use the cached one. Hopefully this will seldom be a problem in practise; to solve it properly would require greater complexity (probably involving spotting that another thread is already working on the desired rendering, and sleeping on a semaphore until it completes).
2012-02-06	Pass context to cmap and font descriptor functions.	Tor Andersson

2012-02-03	Add document interface.	Tor Andersson

2012-01-27	Rename pdf_xref type to pdf_document.	Tor Andersson

2012-01-24	Remove incorrect free in error case.	Robin Watts
	This was causing a SEGV in cluster testing of Bug690724.pdf.
2012-01-19	Remove confusing optional 'password' argument to pdf_open_xref.	Tor Andersson
	Require that clients call pdf_needs_password/pdf_authenticate_password instead. For dumb clients, we still allow for decrypting a file with a blank password without calling those functions.
2012-01-13	Rework pdf_resolve_indirect to make it idempotent.	Robin Watts
	pdf_resolve_indirect(x) = pdf_resolve_indirect(pdf_resolve_indirect(x)) now - as long as it doesn't throw an exception. Update the rest of the code to minimise unnecessary function calls. Previously, we were calling one function to find out if an object was a dict, only for that to call a function to see if it needed to resolve the object, then calling another function to actually get the dict, only to have that call the function to check for the dict needing resolving again!
2012-01-07	Empty object store when freeing xref.	Sebastian Rasmussen
	Fixes a bug where objects remained in the store and would keep stable pointers to the freed xref, later on those objects could lead to false positive hits.
2012-01-06	PDF fixes/tweaks.	Robin Watts
	Fix 2 places where we were filling a stroked pattern rather than stroking it. Cope with being asked to run a NULL buffer. If running a stream fails, warn and return what we have, rather than giving up entirely. Should really set a return code for each render. Only look at the Print flag bit for Print renders. Only look at the View flag bit for view renders. If we find an unexpected ) or > during object parsing, warn and continue rather than giving up entirely. If optional content groups are broken, render the rest of the page anyway. Previously indirect objects that point to another indirection would cause a failure; now attempt to resolve these. We set an arbitrary limit of 10 such redirections to avoid infinite loops.
2011-12-23	Add some fz_vars to fix exception behaviour.	Robin Watts
	gcc 4.4.5 gives helpful warnings about variables that can become unset due to setjmp/longjmp usage. Fix that here. Thanks to Sebras.
2011-12-15	Various Memsqueezing fixes.	Robin Watts
	Fixes for leaks (and SEGVs, division by zeros etc) seen when Memsqueezing.
2011-12-15	Remove stray whitespace.	Tor Andersson

2011-12-15	Rework pdf_store to fz_store, a part of fz_context.	Robin Watts
	Firstly, we rename pdf_store to fz_store, reflecting the fact that there are no pdf specific dependencies on it. Next, we rework it so that all the objects that can be stored in the store start with an fz_storable structure. This consists of a reference count, and a function used to free the object when the reference count reaches zero. All the keep/drop functions are then reimplemented by calling fz_keep_sharable/fz_drop_sharable. The 'drop' functions as supplied by the callers are thus now 'free' functions, only called if the reference count drops to 0. The store changes to keep all the items in the store in the linked list (which becomes a doubly linked one). We still make use of the hashtable to index into this list quickly, but we now have the objects in an LRU ordering within the list. Every object is put into the store, with a size record; this is an estimate of how much memory would be freed by freeing that object. The store is moved into the context and given a maximum size; when new things are inserted into the store, care is taken to ensure that we do not expand beyond this size. We evict any stored items (that are not in use) starting from the least recently used. Finding an object in the store now takes a reference to it already. LOCK and UNLOCK comments are used to indicate where locks need to be taken and released to ensure thread safety.
2011-12-09	Reinstate some missing fz_vars.	Robin Watts

2011-12-09	Fix pdfdraw "repairing" every document.	Robin Watts
	When adding the exception handling in, I'd mis-indented some code. This caused all the file to be read.
2011-12-09	Squash 2 warnings (fz_obj * != int).	Robin Watts

2011-12-08	Stylistic changes when testing pointer values for NULL.	Tor Andersson
	Also: use 'cannot' instead of 'failed to' in error messages.
2011-12-08	Throw exceptions for read errors.	Tor Andersson

2011-12-08	Move from volatile to fz_var.	Robin Watts
	When using exceptions (which are implemented using setjmp/longjmp), we need to be careful to ensure that variable values get written back before any exception happens. Previously we've done that using volatile, but that produces nasty warnings (and unduly limits the compilers freedom to optimise). Here we introduce a new macro fz_var that passes the address of the variable out of scope. This means that the compiler has to ensure that any changes to its value are written back to memory before calling any out of scope function.
2011-12-08	Fix regression from main; no "invalid password" error	Robin Watts
	In converting from error return to exception throwing I was over enthusiatic and removed more than I should. Add missing code back in. Problem was seen with "0 - password password (crypt level 5).pdf"
2011-12-08	Fix possible NULL dereference in pdf_resolve_indirect	Robin Watts
	The code was dereferencing xref to get ctx before checking whether it was NULL or not. Simple fix to move the test up a bit.
2011-11-25	Merge branch 'master' into context	Robin Watts

2011-11-25	First cut at support for OCGs in mupdf (bug 692314)	Robin Watts
	When opening a file, create a pdf_ocg_descriptor that lists the OCGs in a file. Add a new function to allow us to set the configuration in use (currently just the default one). This sets the states of the OCGs as appropriate. When decoding the file respect the states of the OCGs. This results in Invite.pdf rendering correctly. There is more to be done in this area (with automatic setting of OCGs by language/zoom level etc), but this is a good start.
2011-10-04	Move to exception handling rather than error passing throughout.	Robin Watts
	This frees us from passing errors back everywhere, and hence enables us to pass results back as return values. Rather than having to explicitly check for errors everywhere and bubble them, we now allow exception handling to do the work for us; the downside to this is that we no longer emit as much debugging information as we did before (though this could be put back in). For now, the debugging information we have lost has been retained in comments with 'RJW:' at the start. This code needs fuller testing, but is being committed as a work in progress.
2011-09-21	Add warning context.	Tor Andersson

2011-09-21	Rename malloc functions for arrays (fz_calloc and fz_realloc).	Tor Andersson

2011-09-21	Don't thread ctx through safe fz_obj functions.	Tor Andersson

2011-09-20	Reshuffle exception context code to fit Tor's aesthetic sense.	Tor Andersson