mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2014-05-11	Add cmap cleaning scripts.	Tor Andersson
	One to write a CMap out in expanded form ready for text processing tools. Another to write a CMap out as compactly as possible. The output is not in proper CMap format and can only be parsed by MuPDF.
2014-05-10	Fix 694698: Support 32-bit values in CMaps.	Tor Andersson
	Increasing the existing data structure to 32-bit values would bloat the data tables too much. Simplify the data structure and use three separate range tables for lookups -- one with small 16-bit to 16-bit range lookups, one with 32-bit range lookups, and a final one for one-to-many lookups. This loses the range-to-table optimization we had before, but even with the extra ranges this necessitates, the total size of the compiled binary CMap data is smaller than if we were to extend the previous scheme to 32 bits.
2014-05-05	Fix 695105: openjpeg configuration for big-endian.	Tor Andersson
	Stupid unportable code needs stupid unportable preprocessor macros. This only works with GCC, but should be good enough since I expect anyone using a big-endian machine to also use a GCC compatible compiler.
2014-05-02	Fix 692171: Guard against .incbin on Intel's C compiler.	Tor Andersson

2014-04-23	Fix 692986: add OpenBSD to list of systems that may have .incbin	Tor Andersson

2014-03-19	Implement our own vsnprintf variant.	Tor Andersson
	The primary motivator for this is so that we can print floating point values and get the full accuracy out, without having to print 1.5 as 1.5000000, and without getting 23e24 etc. We only support %c, %f, %d, %o, %x and %s currently. We only support the zero padding qualifier, for integers. We do support some extensions: %C turns values >=128 into UTF-8. %M prints a fz_matrix. %R prints a fz_rect. %P prints a fz_point. We also implement a fprintf variant on top of this to allow for consistent results when using fz_output. a
2014-01-07	Introduce 'document handlers'.	Robin Watts
	We define a document handler for each file type (2 in the case of PDF, one to handle files with the ability to 'run' them, and one without). We then register these handlers with the context at startup, and then call fz_open_document... as usual. This enables people to select the document types they want at will (and even to extend the library with more document types should they wish).
2013-11-28	Put thirdparty config headers in separate directories.	Tor Andersson
	Only -I the config header directory if building the thirdparty library, not if using the system library. Fix bug 694808.
2013-11-26	Add Objective-C files to git's list of files to use tab indents.	Tor Andersson

2013-08-28	make cmapdump ignore VCS folders	Simon Bünzli
	If MuPDF is used in a project using Subversion or another VCS adding hidden subfolders to each folder, cmapdump breaks when trying to load the subfolder as cmap file. This fix is required starting with 643370f04348569b5e5e577660031d638537671c
2013-08-21	Update source code browser html generation scripts for new layout.	Tor Andersson

2013-07-26	Add script to create source tarballs using git-archive.	Tor Andersson

2013-06-20	Update source, makefiles and win32 projects.	Tor Andersson

2013-06-18	Merge common and internal headers into one.	Tor Andersson

2013-06-18	Move header files into separate include directory.	Tor Andersson

2013-06-08	Silence warning.	Tor Andersson

2013-06-03	Assume non-clang compilers support incbin	Sebastian Rasmussen

2013-06-03	Clean out some old renaming scripts.	Tor Andersson

2013-05-30	Generate C-includable version of Adobe CA certificate	Paul Gardiner

2013-05-29	Fix fontdump .incbin ifdef for clang.	Tor Andersson

2013-05-29	Trivial (and probably needless) simplification of git hook.	Tor Andersson

2013-05-27	Strip trailing whitespace.	Tor Andersson

2013-05-27	Add whitespace settings to .gitattributes and add commit hook script.	Tor Andersson
	Run "bash scripts/gitsetup.sh" to set up the hooks after cloning.
2013-05-24	Update build to use the latest openJPEG2	Shailesh Mistry

2013-05-22	Update OpenJPEG to v2.0.0.	Robin Watts

2013-05-17	Add colorspace context dummy functions to cmapdump.c	Tor Andersson

2013-03-01	Bug 693624: Ensure that windows copes with utf8 filenames	Robin Watts
	When running under Windows, replace fopen with our own fopen_utf8 that converts from utf8 to unicode before calling the unicode version of fopen.
2012-11-26	xps: Move XML parser into fitz namespace.	Tor Andersson

2012-10-05	Bug 693355: Fix cquote.c trying to write to input file.	Robin Watts
	Simple typo; trying to write to input file. Thanks to Moritz Lipp for pointing out the problem.
2012-07-17	Fix a couple of bugs in c-string generator	Paul Gardiner

2012-07-12	Separate out the Javascript utility functions and autogenerate C string	Paul Gardiner

2012-06-25	Warning fixes and various clean ups:	Sebastian Rasmussen
	Remove unused variable, silencing compiler warning. No need to initialize variables twice. Remove initialization of unread variable. Remove unnecessary check for NULL. Close output file upon error in cmapdump.
2012-05-08	Switch to reading content streams on the fly during interpretation.	Robin Watts
	Previously, before interpreting a pages content stream we would load it entirely into a buffer. Then we would interpret that buffer. This has a cost in memory use. Here, we update the code to read from a stream on the fly. This has required changes in various different parts of the code. Firstly, we have removed all use of the FILE lock - as stream reads can now safely be interrupted by resource (or object) reads from elsewhere in the file, the file lock becomes a very hard thing to maintain, and doesn't actually benefit us at all. The choices were to either use a recursive lock, or to remove it entirely; I opted for the latter. The file lock enum value remains as a placeholder for future use in extendable data streams. Secondly, we add a new 'concat' filter that concatenates a series of streams together into one, optionally putting whitespace between each stream (as the pdf parser requires this). Finally, we change page/xobject/pattern content streams to work on the fly, but we leave type3 glyphs using buffers (as presumably these will be run repeatedly).
2012-05-08	Add scripts to generate hyperlinked source in HTML.	Tor Andersson

2012-04-09	Bug 692977: Stop harmless thread debugging messages during cmapdump	Robin Watts
	If compiled with -DDEBUG, cmapdump throws a large number of warnings regarding thread locking. These are harmless and can be ignored, but are, nonetheless, not pretty. Fixed here. Thanks to Bas Weelinck for the report.
2012-03-13	Rename some functions and accessors to be more consistent.	Tor Andersson
	Debug printing functions: debug -> print. Accessors: get noun attribute -> noun attribute. Find -> lookup when the returned value is not reference counted. pixmap_with_rect -> pixmap_with_bbox. We are reserving the word "find" to mean lookups that give ownership of objects to the caller. Lookup is used in other places where the ownership is not transferred, or simple values are returned. The rename is done by the sed script in scripts/rename3.sed
2012-03-12	More API tidying.	Robin Watts
	Make fz_clone_context copy existing AA settings. Add accessor function for fz_bitmap. Add more documentation for various functions/types.
2012-03-06	Split fitz.h/mupdf.h into internal/external headers.	Robin Watts
	Attempt to separate public API from internal functions.
2012-03-01	Add some more docs to fitz.h	Robin Watts
	Add docs for fz_store, fz_image, fz_halftones. Move fz_item definition into res_store.c as it does not need to be external. Rename fz_store_context to fz_keep_store_context to be consistent.
2012-02-26	Move fz_obj to be pdf_obj.	Robin Watts
	Currently, we are in the slightly strange position of having the PDF specific object types as part of fitz. Here we pull them out into the pdf layer instead. This has been made possible by the recent changes to make the store no longer be tied to having fz_obj's as keys. Most of this work is a simple huge rename; to help customers who may have code that use such functions we have provided a sed script to do the renaming; scripts/rename2.sed. Various other small tweaks are required; the store used to have some debugging code that still required knowledge of fz_obj types - we extract that into a nicer 'type' based function pointer. Also, the type 3 font handling used to have an fz_obj pointer for type 3 resources, and therefore needed to know how to free this; this has become a void * with a function to free it.
2012-02-13	Add locking around freetype calls.	Robin Watts
	We only open one instance of freetype per document. We therefore have to ensure that only 1 call to it takes place at a time. We introduce a lock for this purpose (FZ_LOCK_FREETYPE), and arrange to take/release it as required. We also update the font context so it is properly shared.
2012-02-08	Lock reworking.	Robin Watts
	This is a significant change to the use of locks in MuPDF. Previously, the user had the option of passing us lock/unlock functions for a single mutex as part of the allocation struct. Now we remove these entries from the allocation struct, and make a separate 'locks' struct. This enables people to use fz_alloc_default with locking. If multithreaded operation is required, then the user is required to create FZ_LOCK_MAX mutexes, which will be locked or unlocked by MuPDF calling the lock/unlock functions within the new fz_locks_context structure passed in at context creation. These mutexes are not required to be recursive (they may be, but MuPDF should never call them in this way). MuPDF avoids deadlocks by imposing a locking ordering on itself; a thread will never take lock n, if it already holds any lock i for which 0 <= i <= n. Currently, there are 4 locks used within MuPDF. Lock 0: The alloc lock; taken around all calls to user supplied (or default) allocation functions. Also taken around all accesses to the refs field of storable items. Lock 1: The store lock; taken whenever the store data structures (specifically the linked list pointers) are accessed. Lock 2: The file lock; taken whenever a thread is accessing the raw file. We use the debugging macros to insist that this is held whenever we do a file based seek or read. We also insist that this is never held when we resolve an indirect reference, as this can have the effect of moving the file pointer. Lock 3: The glyphcache lock; taken whenever a thread calls freetype, or accesses the glyphcache data structures. This introduces some complexities w.r.t type3 fonts. Locking can be hugely problematic, so to ease our minds as to the correctness of this code, we introduce some debugging macros. These compile away to nothing unless FITZ_DEBUG_LOCKING is defined. fz_assert_lock_held(ctx, lock) checks that we hold lock. fz_assert_lock_not_held(ctx, lock) checks that we do not hold lock. In addition fz_lock_debug_lock and fz_lock_debug_unlock are used on every fz_lock/fz_unlock to check the validity of the operation we are performing - in particular it checks that we do/do not already hold the lock we are trying to take/drop, and that by taking this lock we are not violating our defined locking order. The RESOLVE macro (used throughout the code to check whether we need to resolve an indirect reference) calls fz_assert_lock_not_held to ensure that we aren't about to resolve an indirect reference (and hence move the stream pointer) when the file is locked. In order to implement the file locking properly, pdf_open_stream (and friends) now lock the file as a side effect (because they fz_seek to the start of the stream). The lock is automatically dropped on an fz_close of such streams. Previously, the glyph cache was created in a context when it was first required; this presents problems as it can be shared between several contexts or not, depending on whether it is created before the contexts are cloned. We now always create it at startup, so it is always shared. This means that we need reference counting for the glyph caches. Added here. In fz_render_glyph, we take the glyph cache lock, and check to see whether the glyph is in the cache. If it is, we bump the refcount, drop the lock and returned the cached character. If it is not, we need to render the character. For freetype based fonts we keep the lock throughout the rendering process, thus ensuring that freetype is only called in a single threaded manner. For type3 fonts, however, we need to invoke the interpreter again to render the glyph streams. This can require reentrance to this routine. We therefore drop the glyph cache lock, call the interpreter to render us our pixmap, and take the lock again. This dropping and retaking of the lock introduces a possible race condition; 2 threads may try to render the same character at the same time. We therefore modify our hash table insert routines to behave differently if it comes to insert an entry only to find that an entry with the same key is already there. We spot this case; if we have just rendered a type3 glyph and when we try to insert it into the cache discover that someone has beaten us to it, we just discard our entry and use the cached one. Hopefully this will seldom be a problem in practise; to solve it properly would require greater complexity (probably involving spotting that another thread is already working on the desired rendering, and sleeping on a semaphore until it completes).
2012-02-06	Pass context to cmap and font descriptor functions.	Tor Andersson

2012-01-19	Multi-threading support for MuPDF	Robin Watts
	When we moved over to a context based system, we laid the foundation for a thread-safe mupdf. This commit should complete that process. Firstly, fz_clone_context is properly implemented so that it makes a new context, but shares certain sections (currently just the allocator, and the store). Secondly, we add locking (to parts of the code that have previously just had placeholder LOCK/UNLOCK comments). Functions to lock and unlock a mutex are added to the allocator structure; omit these (as is the case today) and no multithreading is (safely) possible. The context will refuse to clone if these are not provided. Finally we flesh out the LOCK/UNLOCK comments to be real calls of the functions - unfortunately this requires us to plumb fz_context into the fz_keep_storable function (and all the fz_keep_xxx functions that call it). This is the largest section of the patch. No changes expected to any test files.
2012-01-11	Use enum for FZ_STORE_DEFAULT default size.	Tor Andersson

2012-01-11	Hide glyph cache in context.	Tor Andersson

2012-01-10	Fix many spelling errors.	Sebastian Rasmussen

2011-12-15	Add scavenging functionality.	Robin Watts
	When fz_malloc (etc) are about to fail, we try to scavenge memory from the store and then retry. We repeatedly try to bin objects from the store until the malloc succeeds, or until we have nothing else to bin. This means we no longer need the 'aging' of the store, so this is removed.
2011-12-15	Rework pdf_store to fz_store, a part of fz_context.	Robin Watts
	Firstly, we rename pdf_store to fz_store, reflecting the fact that there are no pdf specific dependencies on it. Next, we rework it so that all the objects that can be stored in the store start with an fz_storable structure. This consists of a reference count, and a function used to free the object when the reference count reaches zero. All the keep/drop functions are then reimplemented by calling fz_keep_sharable/fz_drop_sharable. The 'drop' functions as supplied by the callers are thus now 'free' functions, only called if the reference count drops to 0. The store changes to keep all the items in the store in the linked list (which becomes a doubly linked one). We still make use of the hashtable to index into this list quickly, but we now have the objects in an LRU ordering within the list. Every object is put into the store, with a size record; this is an estimate of how much memory would be freed by freeing that object. The store is moved into the context and given a maximum size; when new things are inserted into the store, care is taken to ensure that we do not expand beyond this size. We evict any stored items (that are not in use) starting from the least recently used. Finding an object in the store now takes a reference to it already. LOCK and UNLOCK comments are used to indicate where locks need to be taken and released to ensure thread safety.
2011-12-08	Stylistic changes when testing pointer values for NULL.	Tor Andersson
	Also: use 'cannot' instead of 'failed to' in error messages.