mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2012-05-31	Add linearization to pdf_write function.	Robin Watts
	Extend mupdfclean to have a new -l file that writes the file linearized. This should still be considered experimental When writing a pdf file, analyse object use, flatten resource use, reorder the objects, generate a hintstream and output with linearisaton parameters. This is enough for Acrobat to accept the file as being optimised for Fast Web View. We ought to add more tables to the hintstream in some cases, but I doubt anyone actually uses it, the spec is so badly written. Certainly acrobat accepts the file as being optimised for 'Fast Web View'. Update fz_dict_put to allow for us adding a reference to the dictionary that is the sole owner of that reference already (i.e. don't drop then keep something that has a reference count of just 1). Update pdf_load_image_stream to use the stm_buf from the xref if there is one. Update pdf_close_document to discard any stm_bufs it may be holding. Update fz_dict_put to be pdf_dict_put - this was missed in a renaming ages ago and has been inconsistent since.
2012-05-23	Update usage messages for mubusy command line tools.	Tor Andersson

2012-05-23	Bring xref object and stream mutation functions back from the dead.	Tor Andersson
	Needs more work to use the linked list of free xref slots.
2012-05-11	Split part of fz_document interface for pdf_document into separate file.	Tor Andersson
	Make a separate constructor function that does not link in the interpreter, so we can save space in the mubusy binary by not including the font and cmap resources.
2012-05-10	Combine all small tools into mubusy and remove the separate executables.	Tor Andersson

2012-05-10	Clamp page numbers given to mupdfclean.	Sebastian Rasmussen
	Also make page specification parsing in all tools look similar.
2012-04-30	Remove 2 unused variables.	Robin Watts
	Leftovers from recent mupdfclean refactoring.
2012-04-28	Move guts of pdfclean into new pdf_write function.	Robin Watts
	Expose pdf_write function through the document interface.
2012-03-21	Fix warning where we return nothing rather than NULL in sweepref.	Tor Andersson

2012-03-19	mupdfclean: eliminate mutual recursion (sweepobj/sweepref).	Robin Watts
	Mutual recursion was blowing the stack. This will still blow the stack, but less often.
2012-03-06	Split fitz.h/mupdf.h into internal/external headers.	Robin Watts
	Attempt to separate public API from internal functions.
2012-03-06	Remove stray newlines in error messages.	Tor Andersson

2012-02-26	Move fz_obj to be pdf_obj.	Robin Watts
	Currently, we are in the slightly strange position of having the PDF specific object types as part of fitz. Here we pull them out into the pdf layer instead. This has been made possible by the recent changes to make the store no longer be tied to having fz_obj's as keys. Most of this work is a simple huge rename; to help customers who may have code that use such functions we have provided a sed script to do the renaming; scripts/rename2.sed. Various other small tweaks are required; the store used to have some debugging code that still required knowledge of fz_obj types - we extract that into a nicer 'type' based function pointer. Also, the type 3 font handling used to have an fz_obj pointer for type 3 resources, and therefore needed to know how to free this; this has become a void * with a function to free it.
2012-02-08	Lock reworking.	Robin Watts
	This is a significant change to the use of locks in MuPDF. Previously, the user had the option of passing us lock/unlock functions for a single mutex as part of the allocation struct. Now we remove these entries from the allocation struct, and make a separate 'locks' struct. This enables people to use fz_alloc_default with locking. If multithreaded operation is required, then the user is required to create FZ_LOCK_MAX mutexes, which will be locked or unlocked by MuPDF calling the lock/unlock functions within the new fz_locks_context structure passed in at context creation. These mutexes are not required to be recursive (they may be, but MuPDF should never call them in this way). MuPDF avoids deadlocks by imposing a locking ordering on itself; a thread will never take lock n, if it already holds any lock i for which 0 <= i <= n. Currently, there are 4 locks used within MuPDF. Lock 0: The alloc lock; taken around all calls to user supplied (or default) allocation functions. Also taken around all accesses to the refs field of storable items. Lock 1: The store lock; taken whenever the store data structures (specifically the linked list pointers) are accessed. Lock 2: The file lock; taken whenever a thread is accessing the raw file. We use the debugging macros to insist that this is held whenever we do a file based seek or read. We also insist that this is never held when we resolve an indirect reference, as this can have the effect of moving the file pointer. Lock 3: The glyphcache lock; taken whenever a thread calls freetype, or accesses the glyphcache data structures. This introduces some complexities w.r.t type3 fonts. Locking can be hugely problematic, so to ease our minds as to the correctness of this code, we introduce some debugging macros. These compile away to nothing unless FITZ_DEBUG_LOCKING is defined. fz_assert_lock_held(ctx, lock) checks that we hold lock. fz_assert_lock_not_held(ctx, lock) checks that we do not hold lock. In addition fz_lock_debug_lock and fz_lock_debug_unlock are used on every fz_lock/fz_unlock to check the validity of the operation we are performing - in particular it checks that we do/do not already hold the lock we are trying to take/drop, and that by taking this lock we are not violating our defined locking order. The RESOLVE macro (used throughout the code to check whether we need to resolve an indirect reference) calls fz_assert_lock_not_held to ensure that we aren't about to resolve an indirect reference (and hence move the stream pointer) when the file is locked. In order to implement the file locking properly, pdf_open_stream (and friends) now lock the file as a side effect (because they fz_seek to the start of the stream). The lock is automatically dropped on an fz_close of such streams. Previously, the glyph cache was created in a context when it was first required; this presents problems as it can be shared between several contexts or not, depending on whether it is created before the contexts are cloned. We now always create it at startup, so it is always shared. This means that we need reference counting for the glyph caches. Added here. In fz_render_glyph, we take the glyph cache lock, and check to see whether the glyph is in the cache. If it is, we bump the refcount, drop the lock and returned the cached character. If it is not, we need to render the character. For freetype based fonts we keep the lock throughout the rendering process, thus ensuring that freetype is only called in a single threaded manner. For type3 fonts, however, we need to invoke the interpreter again to render the glyph streams. This can require reentrance to this routine. We therefore drop the glyph cache lock, call the interpreter to render us our pixmap, and take the lock again. This dropping and retaking of the lock introduces a possible race condition; 2 threads may try to render the same character at the same time. We therefore modify our hash table insert routines to behave differently if it comes to insert an entry only to find that an entry with the same key is already there. We spot this case; if we have just rendered a type3 glyph and when we try to insert it into the cache discover that someone has beaten us to it, we just discard our entry and use the cached one. Hopefully this will seldom be a problem in practise; to solve it properly would require greater complexity (probably involving spotting that another thread is already working on the desired rendering, and sleeping on a semaphore until it completes).
2012-01-31	Make pdfclean more resilient to errors while parsing.	Robin Watts
	Just add some fz_try/fz_catches.
2012-01-27	Rename pdfdraw to mupdfdraw etc.	Robin Watts
	This a) improves our branding, and b) avoids conflicts with other pdf tools out there (pdfinfo etc).