mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2013-06-20	Rearrange source files.	Tor Andersson

2013-06-19	Exception handling changes	Robin Watts
	In preparation for work on progressive loading, update the exception handling scheme slightly. Until now, exceptions (as thrown with fz_throw, and caught with fz_try/fz_catch) have merely had an informative string. They have never had anything that can be compared to see if an error is of a particular type. We now introduce error codes; when we fz_throw, we now always give an error code, and can optionally (using fz_throw_message) give both an error code and an informative string. When we fz_rethrow from within a fz_catch, both the error code and the error message is maintained. Using fz_rethrow_message we can 'improve' the error message, but the code is maintained. The error message can be read out using fz_caught_message() and the error code can be read as fz_caught(). Currently we only define a 'generic' error. This will expand in future versions to include other error types that may be tested for.
2013-06-18	Merge common and internal headers into one.	Tor Andersson

2013-06-18	Move header files into separate include directory.	Tor Andersson

2013-03-26	Make pdf_functions public as fz_functions.	Robin Watts
	Implementations remain unexposed, but this means we can safely pass functions in shades without having to 'sample' them (though we may still choose to do this for speed).
2013-02-06	Change to pass structures by reference rather than value.	Robin Watts
	This is faster on ARM in particular. The primary changes involve fz_matrix, fz_rect and fz_bbox. Rather than passing 'fz_rect r' into a function, we now consistently pass 'const fz_rect *r'. Where a rect is passed in and modified, we miss the 'const' off. Where possible, we return the pointer to the modified structure to allow 'chaining' of expressions. The basic upshot of this work is that we do far fewer copies of rectangle/matrix structures, and all the copies we do are explicit. This has opened the way to other optimisations, also performed in this commit. Rather than using expressions like: fz_concat(fz_scale(sx, sy), fz_translate(tx, ty)) we now have fz_pre_{scale,translate,rotate} functions. These can be implemented much more efficiently than doing the fully fledged matrix multiplication that fz_concat requires. We add fz_rect_{min,max} functions to return pointers to the min/max points of a rect. These can be used to in transformations to directly manipulate values. With a little casting in the path transformation code we can avoid more needless copying. We rename fz_widget_bbox to the more consistent fz_bound_widget.
2013-01-03	Bug 693503: Fix SEGV during pdf function loading from broken file.	Robin Watts
	If the Function entry does not point to either a dictionary or an array, we should give up, otherwise we deference a NULL pointer. Problem found in a test file, 1013.pdf.SIGSEGV.8a7.18 supplied by Mateusz "j00ru" Jurczyk and Gynvael Coldwind of the Google Security Team using Address Sanitizer. Many thanks!
2012-12-21	Bug 693503: Fix SEGV in pdf_function.	Robin Watts
	The pdf function code only expects a maximum of FZ_MAX_COLORS component functions in a sampling function; more functions than this causes a buffer overflow. Add some checks to avoid this. Problem found in 1219.pdf.SIGSEGV.fc0.246, a test file supplied by Mateusz "j00ru" Jurczyk and Gynvael Coldwind of the Google Security Team using Address Sanitizer. Many thanks!
2012-10-05	Make shadings hold the data streams compressed.	Robin Watts
	This reduces memory use by another 10% on the 2 testfiles mentioned in the previous commit (see bug 693330).
2012-10-01	Move to consistently refer to "Linear" shadings rather than "Axial" ones.	Robin Watts
	Thanks to Sebras for pointing out our schitzophrenia here.
2012-10-01	Use malformed shading entries as far as possible	Sebastian Rasmussen
	The rest of the shading code and source code for other cases already handles malformed entries in the same way.
2012-10-01	Give better warnings for out of bounds values in shadings	Sebastian Rasmussen

2012-10-01	Bug 693330: Change shadings to decompose to meshes at render time.	Robin Watts
	Currently, the mupdf code loads shadings at parse time, and instantly decomposes them into a mesh of triangles. This mesh of triangles is the transformed and rendered as required. Unfortunately the storage space for the mesh is typically much greater than the original representation. In this commit, we move the shading stream parsing/decomposition code into a general 'fz_process_mesh' function within res_shade. We then grab a copy of the buffer at load time, and 'process' (decompose/paint) at render time. For the test file on the bug, memory falls from the reported 660Mb to 30Mb. For another test file (txt9780547775815_ingested.pdf page 271) it reduces memory use from 750Meg to 33Meg. These figures could be further reduced by storing the compressed streams from the pdf file rather than the uncompressed ones. Incorporating typo fix and unused function removal from Sebras. Thanks. Remove unused function in shading code
2012-08-06	Remove old error mesages turned into comments when adding exceptions	Sebastian Rasmussen

2012-07-23	Provide number of inputs/outpus when loading pdf functions	Sebastian Rasmussen
	This will allow the loading of pdf functions to validate that a pdf function has the correct number of inputs/outputs. Additionally it will allow for handling pdf functions with incorrect number of inputs/outputs.
2012-07-04	Reverse order of Y decomposition of mesh shading.	Robin Watts
	Solves problem seen in torture.pdf - we now match acrobat. Problem derived from normal_262.pdf
2012-04-05	Fix potential problems on malloc failure.	Robin Watts
	Don't reset the size of arrays until we have successfully resized them.
2012-03-06	Split fitz.h/mupdf.h into internal/external headers.	Robin Watts
	Attempt to separate public API from internal functions.
2012-02-26	Move fz_obj to be pdf_obj.	Robin Watts
	Currently, we are in the slightly strange position of having the PDF specific object types as part of fitz. Here we pull them out into the pdf layer instead. This has been made possible by the recent changes to make the store no longer be tied to having fz_obj's as keys. Most of this work is a simple huge rename; to help customers who may have code that use such functions we have provided a sed script to do the renaming; scripts/rename2.sed. Various other small tweaks are required; the store used to have some debugging code that still required knowledge of fz_obj types - we extract that into a nicer 'type' based function pointer. Also, the type 3 font handling used to have an fz_obj pointer for type 3 resources, and therefore needed to know how to free this; this has become a void * with a function to free it.
2012-02-25	Rework image handling for on demand decode	Robin Watts
	Introduce a new 'fz_image' type; this type contains rudimentary information about images (such as native, size, colorspace etc) and a function to call to get a pixmap of that image (with a size hint). Instead of passing pixmaps through the device interface (and holding pixmaps in the display list) we now pass images instead. The rendering routines therefore call fz_image_to_pixmap to get pixmaps to render, and fz_pixmap_drop those afterwards. The file format handling routines therefore need to produce images rather than pixmaps; xps and cbz currently just wrap pixmaps as images. PDF is more involved. The stream handling routines in PDF have been altered so that they can recognise when the last stream entry in a filter dictionary is an image decoding filter. Rather than applying this filter, they read and store the parameters into a pdf_image_params structure, and stop decoding at that point. This allows us to read the compressed data for an image into memory as a block. We can then restart the image decode process later. pdf_images therefore consist of the compressed image data for images. When a pixmap is requested for such an image, the code checks to see if we have one (of an appropriate size), and if not, decodes it. The size hint is used to determine whether it is possible to subsample the image; currently this is only supported for JPEGs, but we could add generic subsampling code later. In order to handle caching the produced images, various changes have been made to the store and the underlying hash table. Previously the store was indexed purely by fz_obj keys; we don't have an fz_obj key any more, so have extended the store by adding a concept of a key 'type'. A key type is a pointer to a set of functions that keep/drop/compare and make a hashable key from a key pointer. We make a pdf_store.c file that contains functions to offer the existing fz_obj based functions, and add a new 'type' for keys (based on the fz_image handle, and the subsample factor) in the pdf_image.c file. While working on this, a problem became apparent in the existing store codel; fz_obj objects had no protection on their reference counts, hence an interpreter thread could try to alter a ref count at the same time as a malloc caused an eviction from the store. This has been solved by using the alloc lock as protection. This in turn requires some tweaks to the code to make sure we don't try and keep/drop fz_obj's from the store code while the alloc lock is held. A side effect of this work is that when a hash table is created, we inform it what lock should be used to protect its innards (if any). If the alloc lock is used, the insert method knows to drop/retake it to allow it to safely expand the hash table. Callers to the hash functions have the responsibility of taking/dropping the appropriate lock, and ensuring that they cope with the possibility that insert might drop the alloc lock, causing race conditions.
2012-02-08	Lock reworking.	Robin Watts
	This is a significant change to the use of locks in MuPDF. Previously, the user had the option of passing us lock/unlock functions for a single mutex as part of the allocation struct. Now we remove these entries from the allocation struct, and make a separate 'locks' struct. This enables people to use fz_alloc_default with locking. If multithreaded operation is required, then the user is required to create FZ_LOCK_MAX mutexes, which will be locked or unlocked by MuPDF calling the lock/unlock functions within the new fz_locks_context structure passed in at context creation. These mutexes are not required to be recursive (they may be, but MuPDF should never call them in this way). MuPDF avoids deadlocks by imposing a locking ordering on itself; a thread will never take lock n, if it already holds any lock i for which 0 <= i <= n. Currently, there are 4 locks used within MuPDF. Lock 0: The alloc lock; taken around all calls to user supplied (or default) allocation functions. Also taken around all accesses to the refs field of storable items. Lock 1: The store lock; taken whenever the store data structures (specifically the linked list pointers) are accessed. Lock 2: The file lock; taken whenever a thread is accessing the raw file. We use the debugging macros to insist that this is held whenever we do a file based seek or read. We also insist that this is never held when we resolve an indirect reference, as this can have the effect of moving the file pointer. Lock 3: The glyphcache lock; taken whenever a thread calls freetype, or accesses the glyphcache data structures. This introduces some complexities w.r.t type3 fonts. Locking can be hugely problematic, so to ease our minds as to the correctness of this code, we introduce some debugging macros. These compile away to nothing unless FITZ_DEBUG_LOCKING is defined. fz_assert_lock_held(ctx, lock) checks that we hold lock. fz_assert_lock_not_held(ctx, lock) checks that we do not hold lock. In addition fz_lock_debug_lock and fz_lock_debug_unlock are used on every fz_lock/fz_unlock to check the validity of the operation we are performing - in particular it checks that we do/do not already hold the lock we are trying to take/drop, and that by taking this lock we are not violating our defined locking order. The RESOLVE macro (used throughout the code to check whether we need to resolve an indirect reference) calls fz_assert_lock_not_held to ensure that we aren't about to resolve an indirect reference (and hence move the stream pointer) when the file is locked. In order to implement the file locking properly, pdf_open_stream (and friends) now lock the file as a side effect (because they fz_seek to the start of the stream). The lock is automatically dropped on an fz_close of such streams. Previously, the glyph cache was created in a context when it was first required; this presents problems as it can be shared between several contexts or not, depending on whether it is created before the contexts are cloned. We now always create it at startup, so it is always shared. This means that we need reference counting for the glyph caches. Added here. In fz_render_glyph, we take the glyph cache lock, and check to see whether the glyph is in the cache. If it is, we bump the refcount, drop the lock and returned the cached character. If it is not, we need to render the character. For freetype based fonts we keep the lock throughout the rendering process, thus ensuring that freetype is only called in a single threaded manner. For type3 fonts, however, we need to invoke the interpreter again to render the glyph streams. This can require reentrance to this routine. We therefore drop the glyph cache lock, call the interpreter to render us our pixmap, and take the lock again. This dropping and retaking of the lock introduces a possible race condition; 2 threads may try to render the same character at the same time. We therefore modify our hash table insert routines to behave differently if it comes to insert an entry only to find that an entry with the same key is already there. We spot this case; if we have just rendered a type3 glyph and when we try to insert it into the cache discover that someone has beaten us to it, we just discard our entry and use the cached one. Hopefully this will seldom be a problem in practise; to solve it properly would require greater complexity (probably involving spotting that another thread is already working on the desired rendering, and sleeping on a semaphore until it completes).
2012-01-27	Rename pdf_xref type to pdf_document.	Tor Andersson

2012-01-10	Fix many spelling errors.	Sebastian Rasmussen

2012-01-06	Various memory leak fixes.	Robin Watts
	In error cases, ensure we free objects correctly. Thanks to Zeniko for finding the problems (and many of the solutions!)
2011-12-23	Remove stray error check. Optional functions in shadings are optional.	Tor Andersson

2011-12-23	Add some fz_vars to fix exception behaviour.	Robin Watts
	gcc 4.4.5 gives helpful warnings about variables that can become unset due to setjmp/longjmp usage. Fix that here. Thanks to Sebras.
2011-12-16	More MemSqueezing fixes.	Robin Watts

2011-12-16	Add fz_malloc_struct, and make code use it.	Robin Watts
	The new fz_malloc_struct(A,B) macro allocates sizeof(B) bytes using fz_malloc, and then passes the resultant pointer to Memento_label to label it with "B". This costs nothing in non-memento builds, but gives much nicer listings of leaked blocks when memento is enabled.
2011-12-15	Rework pdf_store to fz_store, a part of fz_context.	Robin Watts
	Firstly, we rename pdf_store to fz_store, reflecting the fact that there are no pdf specific dependencies on it. Next, we rework it so that all the objects that can be stored in the store start with an fz_storable structure. This consists of a reference count, and a function used to free the object when the reference count reaches zero. All the keep/drop functions are then reimplemented by calling fz_keep_sharable/fz_drop_sharable. The 'drop' functions as supplied by the callers are thus now 'free' functions, only called if the reference count drops to 0. The store changes to keep all the items in the store in the linked list (which becomes a doubly linked one). We still make use of the hashtable to index into this list quickly, but we now have the objects in an LRU ordering within the list. Every object is put into the store, with a size record; this is an estimate of how much memory would be freed by freeing that object. The store is moved into the context and given a maximum size; when new things are inserted into the store, care is taken to ensure that we do not expand beyond this size. We evict any stored items (that are not in use) starting from the least recently used. Finding an object in the store now takes a reference to it already. LOCK and UNLOCK comments are used to indicate where locks need to be taken and released to ensure thread safety.
2011-12-08	Stylistic changes when testing pointer values for NULL.	Tor Andersson
	Also: use 'cannot' instead of 'failed to' in error messages.
2011-12-08	Move from volatile to fz_var.	Robin Watts
	When using exceptions (which are implemented using setjmp/longjmp), we need to be careful to ensure that variable values get written back before any exception happens. Previously we've done that using volatile, but that produces nasty warnings (and unduly limits the compilers freedom to optimise). Here we introduce a new macro fz_var that passes the address of the variable out of scope. This means that the compiler has to ensure that any changes to its value are written back to memory before calling any out of scope function.
2011-11-15	Merge branch 'master' into context	Robin Watts
	Mostly redoing the xps_context to xps_document change and adding contexts to newly written code. Conflicts: apps/pdfapp.c apps/pdfapp.h apps/x11_main.c apps/xpsdraw.c draw/draw_device.c draw/draw_scale.c fitz/base_object.c fitz/fitz.h pdf/mupdf.h pdf/pdf_interpret.c pdf/pdf_outline.c pdf/pdf_page.c xps/muxps.h xps/xps_doc.c xps/xps_xml.c
2011-10-04	Move to exception handling rather than error passing throughout.	Robin Watts
	This frees us from passing errors back everywhere, and hence enables us to pass results back as return values. Rather than having to explicitly check for errors everywhere and bubble them, we now allow exception handling to do the work for us; the downside to this is that we no longer emit as much debugging information as we did before (though this could be put back in). For now, the debugging information we have lost has been retained in comments with 'RJW:' at the start. This code needs fuller testing, but is being committed as a work in progress.
2011-09-21	Add warning context.	Tor Andersson

2011-09-21	Rename malloc functions for arrays (fz_calloc and fz_realloc).	Tor Andersson

2011-09-21	Don't thread ctx through safe fz_obj functions.	Tor Andersson

2011-09-15	Add context to mupdf.	Robin Watts
	Huge pervasive change to lots of files, adding a context for exception handling and allocation. In time we'll move more statics into there. Also fix some for(i = 0; i < function(...); i++) calls.
2011-09-14	Initial import of exception handling code	Robin Watts
	Import exception handling code from WSS, modified to fit into the fitz world. With this code we have 'real' fz_try/fz_catch/fz_rethrow functions, handling a fz_except type. We therefore rename the existing fz_throw/ fz_catch/fz_rethrow to be fz_error_make/fz_error_handle/fz_error_note. We don't actually use fz_try/fz_catch/fz_rethrow yet...
2011-04-08	Remove inline keyword where it is not strictly necessary for performance.	Tor Andersson
	Also put the function on the same line for inline functions, so they stick out and are easy to find with grep.
2011-04-07	pdf: Purge unmaintained debug/log printing messages.	Tor Andersson

2011-04-04	Le Roi est mort, vive le Roi!	Tor Andersson
	The run-together words are dead! Long live the underscores! The postscript inspired naming convention of using all run-together words has served us well, but it is now time for more readable code. In this commit I have also added the sed script, rename.sed, that I used to convert the source. Use it on your patches and application code.
2011-04-04	pdf: Rename mupdf directory.	Tor Andersson