mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2013-06-20	Rearrange source files.	Tor Andersson

2013-06-19	Exception handling changes	Robin Watts
	In preparation for work on progressive loading, update the exception handling scheme slightly. Until now, exceptions (as thrown with fz_throw, and caught with fz_try/fz_catch) have merely had an informative string. They have never had anything that can be compared to see if an error is of a particular type. We now introduce error codes; when we fz_throw, we now always give an error code, and can optionally (using fz_throw_message) give both an error code and an informative string. When we fz_rethrow from within a fz_catch, both the error code and the error message is maintained. Using fz_rethrow_message we can 'improve' the error message, but the code is maintained. The error message can be read out using fz_caught_message() and the error code can be read as fz_caught(). Currently we only define a 'generic' error. This will expand in future versions to include other error types that may be tested for.
2013-06-18	Merge common and internal headers into one.	Tor Andersson

2013-06-18	Move header files into separate include directory.	Tor Andersson

2013-05-06	Fix formatting.	Tor Andersson

2013-03-26	Make pdf_functions public as fz_functions.	Robin Watts
	Implementations remain unexposed, but this means we can safely pass functions in shades without having to 'sample' them (though we may still choose to do this for speed).
2013-03-22	Squash some warnings.	Robin Watts
	Some -Wshadow ones, plus some 'set but not used' ones.
2013-02-20	Bug 693639: fix typo in ps calculator.	Tor Andersson
	Thanks to zeniko.
2013-01-04	Make token enum a type to ease debugging	Sebastian Rasmussen

2013-01-04	Bug 693503: Fix stack overflows due to infinite recursion.	Robin Watts
	If a colorspace refers to itself as a base, we can get an infinite recursion and hence stack overflow. Thanks to zeniko for pointing out that this occurs in embedded CMAPs and stitching functions. Also solved here. To avoid having to keep a long list of the objects we've traversed through, extend the pdf_dict_mark functions to work on all pdf objects, and hence rename them as pdf_obj_mark etc. Thanks to zeniko again for feedback on this way of working. Problem found in a test file, 3882.pdf.SIGSEGV.99.3204 supplied by Mateusz "j00ru" Jurczyk and Gynvael Coldwind of the Google Security Team using Address Sanitizer. Many thanks!
2012-12-14	Bug 693503: Fix out of bounds memory access.	Robin Watts
	We failed to detect a PDF sample function with a size of 0 as being illegal. This lead us to continue through the code, and then access out of bounds. Issue found by Mateusz "j00ru" Jurczyk and Gynvael Coldwind of the Google Security Team using Address Sanitizer. Many thanks!
2012-08-06	Remove old error mesages turned into comments when adding exceptions	Sebastian Rasmussen

2012-08-06	Rewording of warning messages for PDF functions	Sebastian Rasmussen

2012-08-06	Only warn on stitching function's sub function with wrong arity	Sebastian Rasmussen
	Sub functions that make up a stitching function can be evaluated with the wrong number of inputs/outputs, so it is not necessary to throw an exception if the number of inputs/outputs do not match when loading sub functions.
2012-08-06	Fix typo in PDF function code	Sebastian Rasmussen
	This is just a lexical change, no semantic change as the MAXN and MAXM constants are equal.
2012-07-26	Assume default value for negative sample function dimension size	Sebastian Rasmussen

2012-07-24	Add upper bound on size of sampled pdf functions	Sebastian Rasmussen
	Previously sampled pdf functions having an overflow in the number of samples were never caught until the memory allocator was triggered. Now there is an upper bound of 100Mbyte (the same as for fz_read_all()).
2012-07-24	Handle out of range pdf subfunction boundaries	Sebastian Rasmussen
	Any pdf function with incorrect number of subfunction boundaries will either cause an exception or have excessive boundaries be ignored.
2012-07-24	Handle exponential pdf functions with malformed constants	Tor Andersson
	Any pdf functions with incorrect number of constants will have their constant values set to default values. Any excessive constants are ignored.
2012-07-23	Warn about out of range values for exponential pdf functions	Sebastian Rasmussen
	Exponential pdf functions have constraints on their input values so warn about out of range values when loading those functions. When evaluation the functions, assume zero without warning.
2012-07-23	Handle sampled pdf function dimensionality	Sebastian Rasmussen
	Functions that have excessive dimension sizes will have those sizes ignored, whereas functions that have too few dimension sizes will cause an exception.
2012-07-23	Handle pdf functions with malformed input/output mappings	Sebastian Rasmussen
	After this pdf functions that have malformed Decode/Encode arrays with too few/many entries compared to the number of inputs/outputs will be handled more gracefully. Those missing mappings will be set to default values.
2012-07-23	Handle pdf function evaluation with wrong number of inputs/outputs	Sebastian Rasmussen
	Functions requiring more inputs than available input values will have those inputs set to zero. Similarly functions producing too few outputs will have the remaining output values be set to zero. Any excessive input values or output values will be ignored.
2012-07-23	Handle pdf functions with too many inputs	Sebastian Rasmussen
	Both exponential and stitching functions are limited to having one input. Make sure that any excessive inputs are ignored.
2012-07-23	Provide number of inputs/outpus when loading pdf functions	Sebastian Rasmussen
	This will allow the loading of pdf functions to validate that a pdf function has the correct number of inputs/outputs. Additionally it will allow for handling pdf functions with incorrect number of inputs/outputs.
2012-07-23	Clamp number of pdf function inputs/outputs	Sebastian Rasmussen
	Previously a pdf function having too many inputs or outputs would cause and exception, now they will be handled silently. There are two places pdf functions are used: for shadings and for colorspace tint transforms. In both cases the number of inputs/outputs may never be more than the number of components, i.e. limited to MAXN. Additionally the number of inputs/outputs may never be less than than the number of components, and there is always at least one component.
2012-07-23	Remove redundant check in pdf function code	Sebastian Rasmussen
	BitsPerSample is already screened later in the code for invalid values, including the default value 0 returned by pdf_to_int().
2012-07-23	Whitespace fixes in code for pdf functions.	Sebastian Rasmussen

2012-07-06	Remove debugging functions for release builds.	Sebastian Rasmussen

2012-07-05	Move to static inline functions from macros.	Robin Watts
	Instead of using macros for min/max/abs/clamp, we move to using inline functions. These are more typesafe, and should produce equivalent code on compilers that support inline (i.e. pretty much everything we care about these days). People can always do their own macro versions if they prefer.
2012-06-22	Rework pdf_lexbuf to allow for dynamic parsing buffers.	Robin Watts
	Currently pdf_lexbufs use a static scratch buffer for parsing. In the main case this is 64K in size, but in other cases it can be just 256 bytes; this causes problems when parsing long strings. Even the 64K limit is an implementation limit of Acrobat, not an architectural limit of PDF. Change here to allow dynamic buffers. This means a slightly more complex setup and destruction for each buffer, but more importantly requires correct cleanup on errors. To avoid having to insert lots more try/catch clauses this commit includes various changes to the code so we reuse pdf_lexbufs where possible. This keeps the speed up.
2012-04-05	Fix potential problems on malloc failure.	Robin Watts
	Don't reset the size of arrays until we have successfully resized them.
2012-04-05	Bug 692141 - Work around bug in VS2005 Team Suite	Robin Watts
	Put the logf call in it's own statement to fix a stupid header file bug.
2012-03-28	Whitespace fixes.	Tor Andersson

2012-03-12	Merge branch 'master' into header-split	Robin Watts

2012-03-12	Squash MSVC warning.	Robin Watts

2012-03-12	Fix bitshifting by a negative amount in PS functions	Robin Watts
	When bitshifting by a negative amount, we should shift right; thanks to Sebras' work in this area, I spotted that we are attempting to shift right by a negative number.
2012-03-12	Take care of boundary conditions in ps function evaluation.	Sebastian Rasmussen
	Floating point numbers are now clamped, division by zero is approximated by minimum or maximum value and NaN results in 1.0.
2012-03-06	Split fitz.h/mupdf.h into internal/external headers.	Robin Watts
	Attempt to separate public API from internal functions.
2012-02-29	Fix trailing whitespace and mixed tabs/spaces in indentation.	Tor Andersson

2012-02-26	Move fz_obj to be pdf_obj.	Robin Watts
	Currently, we are in the slightly strange position of having the PDF specific object types as part of fitz. Here we pull them out into the pdf layer instead. This has been made possible by the recent changes to make the store no longer be tied to having fz_obj's as keys. Most of this work is a simple huge rename; to help customers who may have code that use such functions we have provided a sed script to do the renaming; scripts/rename2.sed. Various other small tweaks are required; the store used to have some debugging code that still required knowledge of fz_obj types - we extract that into a nicer 'type' based function pointer. Also, the type 3 font handling used to have an fz_obj pointer for type 3 resources, and therefore needed to know how to free this; this has become a void * with a function to free it.
2012-02-25	Revamp pdf lexing code	Robin Watts
	A huge amount (20%+ on some files) of our runtime is spent in fz_atof. A survey of results on the net suggests we will get much better speed by writing our own atof. Part of the job of doing this involves parsing the string to identify the component parts of the number - ludicrously, we are already doing this as part of the lexing process, so it would make sense to do the atoi/atof as part of this process. In order to do this, we need somewhere to store the lexed results; rather than add a float * and an int * to every single pdf_lex call, we generalise the calls to pass a pdf_lexbuf * pointer instead of separate buffer/max/string length pointers. This should help us overall.
2012-02-25	Rework image handling for on demand decode	Robin Watts
	Introduce a new 'fz_image' type; this type contains rudimentary information about images (such as native, size, colorspace etc) and a function to call to get a pixmap of that image (with a size hint). Instead of passing pixmaps through the device interface (and holding pixmaps in the display list) we now pass images instead. The rendering routines therefore call fz_image_to_pixmap to get pixmaps to render, and fz_pixmap_drop those afterwards. The file format handling routines therefore need to produce images rather than pixmaps; xps and cbz currently just wrap pixmaps as images. PDF is more involved. The stream handling routines in PDF have been altered so that they can recognise when the last stream entry in a filter dictionary is an image decoding filter. Rather than applying this filter, they read and store the parameters into a pdf_image_params structure, and stop decoding at that point. This allows us to read the compressed data for an image into memory as a block. We can then restart the image decode process later. pdf_images therefore consist of the compressed image data for images. When a pixmap is requested for such an image, the code checks to see if we have one (of an appropriate size), and if not, decodes it. The size hint is used to determine whether it is possible to subsample the image; currently this is only supported for JPEGs, but we could add generic subsampling code later. In order to handle caching the produced images, various changes have been made to the store and the underlying hash table. Previously the store was indexed purely by fz_obj keys; we don't have an fz_obj key any more, so have extended the store by adding a concept of a key 'type'. A key type is a pointer to a set of functions that keep/drop/compare and make a hashable key from a key pointer. We make a pdf_store.c file that contains functions to offer the existing fz_obj based functions, and add a new 'type' for keys (based on the fz_image handle, and the subsample factor) in the pdf_image.c file. While working on this, a problem became apparent in the existing store codel; fz_obj objects had no protection on their reference counts, hence an interpreter thread could try to alter a ref count at the same time as a malloc caused an eviction from the store. This has been solved by using the alloc lock as protection. This in turn requires some tweaks to the code to make sure we don't try and keep/drop fz_obj's from the store code while the alloc lock is held. A side effect of this work is that when a hash table is created, we inform it what lock should be used to protect its innards (if any). If the alloc lock is used, the insert method knows to drop/retake it to allow it to safely expand the hash table. Callers to the hash functions have the responsibility of taking/dropping the appropriate lock, and ensuring that they cope with the possibility that insert might drop the alloc lock, causing race conditions.
2012-02-08	Lock reworking.	Robin Watts
	This is a significant change to the use of locks in MuPDF. Previously, the user had the option of passing us lock/unlock functions for a single mutex as part of the allocation struct. Now we remove these entries from the allocation struct, and make a separate 'locks' struct. This enables people to use fz_alloc_default with locking. If multithreaded operation is required, then the user is required to create FZ_LOCK_MAX mutexes, which will be locked or unlocked by MuPDF calling the lock/unlock functions within the new fz_locks_context structure passed in at context creation. These mutexes are not required to be recursive (they may be, but MuPDF should never call them in this way). MuPDF avoids deadlocks by imposing a locking ordering on itself; a thread will never take lock n, if it already holds any lock i for which 0 <= i <= n. Currently, there are 4 locks used within MuPDF. Lock 0: The alloc lock; taken around all calls to user supplied (or default) allocation functions. Also taken around all accesses to the refs field of storable items. Lock 1: The store lock; taken whenever the store data structures (specifically the linked list pointers) are accessed. Lock 2: The file lock; taken whenever a thread is accessing the raw file. We use the debugging macros to insist that this is held whenever we do a file based seek or read. We also insist that this is never held when we resolve an indirect reference, as this can have the effect of moving the file pointer. Lock 3: The glyphcache lock; taken whenever a thread calls freetype, or accesses the glyphcache data structures. This introduces some complexities w.r.t type3 fonts. Locking can be hugely problematic, so to ease our minds as to the correctness of this code, we introduce some debugging macros. These compile away to nothing unless FITZ_DEBUG_LOCKING is defined. fz_assert_lock_held(ctx, lock) checks that we hold lock. fz_assert_lock_not_held(ctx, lock) checks that we do not hold lock. In addition fz_lock_debug_lock and fz_lock_debug_unlock are used on every fz_lock/fz_unlock to check the validity of the operation we are performing - in particular it checks that we do/do not already hold the lock we are trying to take/drop, and that by taking this lock we are not violating our defined locking order. The RESOLVE macro (used throughout the code to check whether we need to resolve an indirect reference) calls fz_assert_lock_not_held to ensure that we aren't about to resolve an indirect reference (and hence move the stream pointer) when the file is locked. In order to implement the file locking properly, pdf_open_stream (and friends) now lock the file as a side effect (because they fz_seek to the start of the stream). The lock is automatically dropped on an fz_close of such streams. Previously, the glyph cache was created in a context when it was first required; this presents problems as it can be shared between several contexts or not, depending on whether it is created before the contexts are cloned. We now always create it at startup, so it is always shared. This means that we need reference counting for the glyph caches. Added here. In fz_render_glyph, we take the glyph cache lock, and check to see whether the glyph is in the cache. If it is, we bump the refcount, drop the lock and returned the cached character. If it is not, we need to render the character. For freetype based fonts we keep the lock throughout the rendering process, thus ensuring that freetype is only called in a single threaded manner. For type3 fonts, however, we need to invoke the interpreter again to render the glyph streams. This can require reentrance to this routine. We therefore drop the glyph cache lock, call the interpreter to render us our pixmap, and take the lock again. This dropping and retaking of the lock introduces a possible race condition; 2 threads may try to render the same character at the same time. We therefore modify our hash table insert routines to behave differently if it comes to insert an entry only to find that an entry with the same key is already there. We spot this case; if we have just rendered a type3 glyph and when we try to insert it into the cache discover that someone has beaten us to it, we just discard our entry and use the cached one. Hopefully this will seldom be a problem in practise; to solve it properly would require greater complexity (probably involving spotting that another thread is already working on the desired rendering, and sleeping on a semaphore until it completes).
2012-02-03	Make fz_malloc_struct return zeroed memory.	Tor Andersson

2012-02-01	Fix alpha sort of operators in parse_code.	Robin Watts
	The operator list in parse_code is binary searched through, hence operators must be in alphabetical order. Thanks to Brian Adams for pointing out the mistake here.
2012-02-01	Fix Bug 692829; add cope to handle true/false in functions.	Robin Watts
	Add 2 missing cases to parse_code.
2012-02-01	Fix Bug 692830; remove unnecessary tests.	Robin Watts
	Remove remnants of old tests that are no longer required.
2012-01-27	Rename pdf_xref type to pdf_document.	Tor Andersson

2012-01-27	Whitespace fixes.	Tor Andersson