mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2013-04-11	Convert UTF-8 passwords to correct encoding.	Tor Andersson
	PDFDocEncoding for crypt revisions <= 4, UTF-8 for newer.
2013-02-22	Add fz_get_annot_type	Paul Gardiner

2013-02-20	Bug 693639: some convenience functions.	Tor Andersson
	Added primarily for use by SumatraPDF. Thanks to zeniko.
2013-02-06	Change to pass structures by reference rather than value.	Robin Watts
	This is faster on ARM in particular. The primary changes involve fz_matrix, fz_rect and fz_bbox. Rather than passing 'fz_rect r' into a function, we now consistently pass 'const fz_rect *r'. Where a rect is passed in and modified, we miss the 'const' off. Where possible, we return the pointer to the modified structure to allow 'chaining' of expressions. The basic upshot of this work is that we do far fewer copies of rectangle/matrix structures, and all the copies we do are explicit. This has opened the way to other optimisations, also performed in this commit. Rather than using expressions like: fz_concat(fz_scale(sx, sy), fz_translate(tx, ty)) we now have fz_pre_{scale,translate,rotate} functions. These can be implemented much more efficiently than doing the fully fledged matrix multiplication that fz_concat requires. We add fz_rect_{min,max} functions to return pointers to the min/max points of a rect. These can be used to in transformations to directly manipulate values. With a little casting in the path transformation code we can avoid more needless copying. We rename fz_widget_bbox to the more consistent fz_bound_widget.
2013-01-31	Add support for annotation creation	Paul Gardiner

2013-01-30	Parts of Robin's PDF editing/page creation commit useful for annotations	Paul Gardiner

2013-01-30	Always pass value structs (rect, matrix, etc) as values not by pointer.	Tor Andersson

2013-01-04	Bug 693503: Fix stack overflows due to infinite recursion.	Robin Watts
	If a colorspace refers to itself as a base, we can get an infinite recursion and hence stack overflow. Thanks to zeniko for pointing out that this occurs in embedded CMAPs and stitching functions. Also solved here. To avoid having to keep a long list of the objects we've traversed through, extend the pdf_dict_mark functions to work on all pdf objects, and hence rename them as pdf_obj_mark etc. Thanks to zeniko again for feedback on this way of working. Problem found in a test file, 3882.pdf.SIGSEGV.99.3204 supplied by Mateusz "j00ru" Jurczyk and Gynvael Coldwind of the Google Security Team using Address Sanitizer. Many thanks!
2012-10-25	Support separate rendering of the main page contents and the annotations	Paul Gardiner

2012-10-17	First steps towards supporting transitions.	Robin Watts
	Only Fade, Wipe and Blinds supported so far. Hit 'p' in the viewer to go into 'presentation' mode. Page swaps then transition from page to page. Pages auto advance until key or mouse is used.
2012-10-08	Bug 693350: Add some dreaded 'const's for string keys.	Robin Watts
	On the whole we avoid using const within MuPDF, but bug 693350 highlights cases where this can cause a problem with C++. In C, if you do: foo("bar"); then "bar" has type char . In C++, if you do foo("bar"); then "bar" has type const char . This means that any calls to the MuPDF library from C++ that take strings give warnings. The fix is simple, so it seems to be worthwhile adding a few consts. None of our internal data structures are affected in any way by this change. Thanks to Franz Fellner for pointing out this issue.
2012-09-06	Add pdf_dict_puts_drop function.	Robin Watts
	Does the same as pdf_dict_puts, but guarantees to always drop the value passed in (even if the function throws an error). This allows calling code to have a much simpler life in some cases. Update pdf_write to make use of this. Also, fix pdf_dict_puts so it doesn't leak the key object that it creates in the event of an error while growing the dictionary.
2012-09-04	Merge branch 'master' into forms	Paul Gardiner
	Conflicts: pdf/pdf_xref_aux.c
2012-08-29	Merge branch 'master' into forms	Paul Gardiner
	Conflicts: cbz/mucbz.c pdf/pdf_parse.c pdf/pdf_form.c xps/xps_zip.c
2012-08-28	Add fz_open_document_with_stream function.	Tor Andersson
	Use a "magic" string for filetype detection: filename or mime-type.
2012-08-23	Rename fz_new_name to pdf_new_name.	Robin Watts
	Should have been pdf_new_name ever since the pre 1.0 rename, but evidently we missed it.
2012-08-01	Merge branch 'master' into forms	Paul Gardiner
	Conflicts: pdf/mupdf-internal.h pdf/pdf_font.c
2012-07-18	Update pdf_to_utf8 to handle either a stream or a string	Paul Gardiner
	Also change first argument from fz_context to pdf_document in each of pdf_to_utf8, pdf_to_utf8_name, pdf_to_ucs2 and pdf_to_ucs2_name
2012-07-06	Remove debugging functions for release builds.	Sebastian Rasmussen

2012-06-20	Add better mechanism for enumerating annotation rectangles.	Robin Watts
	Rather than having a dedicated call to enumerate the rectangles for the annotations on a page, add an interface for enumerating annotations with accessor functions. Currently the only accessor function is the one to get the annotation rectangle. Use this new scheme in place of fz_bound_annots within mudraw. Also use this scheme to set the caret cursor in the viewer when over a data field.
2012-06-14	Add -j flag to mudraw; create simple mujstest scripts automatically.	Robin Watts
	We add a new fz_bound_annots function (and associated pdf_bound_annots function) that calls a given callback with the page rectangle of the annotations on a given page. This is marked as being a 'temporary' function, so we can remove it/change it in future if required. It seems likely that we'll want to have some sort of 'iterate over annotations' function eventually, and this does the job for now. Add a -j flag to mudraw that outputs a simple mujstest script. For each page with annotations, the script jumps to that page, then for each annotation on the page, it sets some text to be entered, and clicks the annotation. In the case of text fields, this will cause the text to be entered into that text field; in the case of buttons it will execute the button. At the end of each page with annotations, the script is told to snapshot the page. These test scripts are not designed to be full tests, but they do at least provide an easy way for us to generate scripts where every field in our test suite is interacted with.
2012-06-13	Remove unnecessary function and improve naming	Paul Gardiner

2012-06-12	A few general utility functions added for the sake of the forms work	Paul Gardiner

2012-05-31	Add linearization to pdf_write function.	Robin Watts
	Extend mupdfclean to have a new -l file that writes the file linearized. This should still be considered experimental When writing a pdf file, analyse object use, flatten resource use, reorder the objects, generate a hintstream and output with linearisaton parameters. This is enough for Acrobat to accept the file as being optimised for Fast Web View. We ought to add more tables to the hintstream in some cases, but I doubt anyone actually uses it, the spec is so badly written. Certainly acrobat accepts the file as being optimised for 'Fast Web View'. Update fz_dict_put to allow for us adding a reference to the dictionary that is the sole owner of that reference already (i.e. don't drop then keep something that has a reference count of just 1). Update pdf_load_image_stream to use the stm_buf from the xref if there is one. Update pdf_close_document to discard any stm_bufs it may be holding. Update fz_dict_put to be pdf_dict_put - this was missed in a renaming ages ago and has been inconsistent since.
2012-05-23	Bring xref object and stream mutation functions back from the dead.	Tor Andersson
	Needs more work to use the linked list of free xref slots.
2012-05-11	Split part of fz_document interface for pdf_document into separate file.	Tor Andersson
	Make a separate constructor function that does not link in the interpreter, so we can save space in the mubusy binary by not including the font and cmap resources.
2012-04-30	Simple mupdfposter app	Robin Watts
	Divides large format pdfs into a new pdf with multiple pages, that tile the original PDF.
2012-04-28	Move guts of pdfclean into new pdf_write function.	Robin Watts
	Expose pdf_write function through the document interface.
2012-03-13	Rename some functions and accessors to be more consistent.	Tor Andersson
	Debug printing functions: debug -> print. Accessors: get noun attribute -> noun attribute. Find -> lookup when the returned value is not reference counted. pixmap_with_rect -> pixmap_with_bbox. We are reserving the word "find" to mean lookups that give ownership of objects to the caller. Lookup is used in other places where the ownership is not transferred, or simple values are returned. The rename is done by the sed script in scripts/rename3.sed
2012-03-07	Splitting tweaks.	Tor Andersson

2012-03-06	Split fitz.h/mupdf.h into internal/external headers.	Robin Watts
	Attempt to separate public API from internal functions.
2012-02-26	Move fz_obj to be pdf_obj.	Robin Watts
	Currently, we are in the slightly strange position of having the PDF specific object types as part of fitz. Here we pull them out into the pdf layer instead. This has been made possible by the recent changes to make the store no longer be tied to having fz_obj's as keys. Most of this work is a simple huge rename; to help customers who may have code that use such functions we have provided a sed script to do the renaming; scripts/rename2.sed. Various other small tweaks are required; the store used to have some debugging code that still required knowledge of fz_obj types - we extract that into a nicer 'type' based function pointer. Also, the type 3 font handling used to have an fz_obj pointer for type 3 resources, and therefore needed to know how to free this; this has become a void * with a function to free it.
2012-02-26	Continued documentation improvements.	Sebastian Rasmussen
	More changes still to come.
2012-02-26	Document the most commonly used interface functions.	Sebastian Rasmussen

2012-02-25	Revamp pdf lexing code	Robin Watts
	A huge amount (20%+ on some files) of our runtime is spent in fz_atof. A survey of results on the net suggests we will get much better speed by writing our own atof. Part of the job of doing this involves parsing the string to identify the component parts of the number - ludicrously, we are already doing this as part of the lexing process, so it would make sense to do the atoi/atof as part of this process. In order to do this, we need somewhere to store the lexed results; rather than add a float * and an int * to every single pdf_lex call, we generalise the calls to pass a pdf_lexbuf * pointer instead of separate buffer/max/string length pointers. This should help us overall.
2012-02-25	Rework image handling for on demand decode	Robin Watts
	Introduce a new 'fz_image' type; this type contains rudimentary information about images (such as native, size, colorspace etc) and a function to call to get a pixmap of that image (with a size hint). Instead of passing pixmaps through the device interface (and holding pixmaps in the display list) we now pass images instead. The rendering routines therefore call fz_image_to_pixmap to get pixmaps to render, and fz_pixmap_drop those afterwards. The file format handling routines therefore need to produce images rather than pixmaps; xps and cbz currently just wrap pixmaps as images. PDF is more involved. The stream handling routines in PDF have been altered so that they can recognise when the last stream entry in a filter dictionary is an image decoding filter. Rather than applying this filter, they read and store the parameters into a pdf_image_params structure, and stop decoding at that point. This allows us to read the compressed data for an image into memory as a block. We can then restart the image decode process later. pdf_images therefore consist of the compressed image data for images. When a pixmap is requested for such an image, the code checks to see if we have one (of an appropriate size), and if not, decodes it. The size hint is used to determine whether it is possible to subsample the image; currently this is only supported for JPEGs, but we could add generic subsampling code later. In order to handle caching the produced images, various changes have been made to the store and the underlying hash table. Previously the store was indexed purely by fz_obj keys; we don't have an fz_obj key any more, so have extended the store by adding a concept of a key 'type'. A key type is a pointer to a set of functions that keep/drop/compare and make a hashable key from a key pointer. We make a pdf_store.c file that contains functions to offer the existing fz_obj based functions, and add a new 'type' for keys (based on the fz_image handle, and the subsample factor) in the pdf_image.c file. While working on this, a problem became apparent in the existing store codel; fz_obj objects had no protection on their reference counts, hence an interpreter thread could try to alter a ref count at the same time as a malloc caused an eviction from the store. This has been solved by using the alloc lock as protection. This in turn requires some tweaks to the code to make sure we don't try and keep/drop fz_obj's from the store code while the alloc lock is held. A side effect of this work is that when a hash table is created, we inform it what lock should be used to protect its innards (if any). If the alloc lock is used, the insert method knows to drop/retake it to allow it to safely expand the hash table. Callers to the hash functions have the responsibility of taking/dropping the appropriate lock, and ensuring that they cope with the possibility that insert might drop the alloc lock, causing race conditions.
2012-02-06	Pass context to cmap and font descriptor functions.	Tor Andersson

2012-02-03	Add document interface.	Tor Andersson

2012-02-01	Tweak to previous pdf_decode_cmap fix.	Robin Watts
	More aesthetically pleasing version.
2012-01-31	Fix big 692824: incorrect application of word space.	Robin Watts
	Word space should only be applied when the codepoint is 32, and is read from a single byte encoding region. Ghostscript gets this wrong too.
2012-01-27	Rename pdf_xref type to pdf_document.	Tor Andersson

2012-01-19	Transform link rectangles by the hidden page CTM.	Tor Andersson

2012-01-19	Remove confusing optional 'password' argument to pdf_open_xref.	Tor Andersson
	Require that clients call pdf_needs_password/pdf_authenticate_password instead. For dumb clients, we still allow for decrypting a file with a blank password without calling those functions.
2012-01-19	Multi-threading support for MuPDF	Robin Watts
	When we moved over to a context based system, we laid the foundation for a thread-safe mupdf. This commit should complete that process. Firstly, fz_clone_context is properly implemented so that it makes a new context, but shares certain sections (currently just the allocator, and the store). Secondly, we add locking (to parts of the code that have previously just had placeholder LOCK/UNLOCK comments). Functions to lock and unlock a mutex are added to the allocator structure; omit these (as is the case today) and no multithreading is (safely) possible. The context will refuse to clone if these are not provided. Finally we flesh out the LOCK/UNLOCK comments to be real calls of the functions - unfortunately this requires us to plumb fz_context into the fz_keep_storable function (and all the fz_keep_xxx functions that call it). This is the largest section of the patch. No changes expected to any test files.
2012-01-18	Better handling of 'uncacheable' Type3 glyphs. Bug 692745.	Robin Watts
	Some Type 3 fonts contain glyphs that rely on inheriting various aspects of the graphics state from their calling code. (i.e. a glyph might use d0, then fill an area without setting a color first). While the spec is vague on this point, we believe that technically it is invalid. Previously mupdf defaulted all elements of the graphic state back when beginning to draw the glyph. This does not match what Acrobat does though, so we change the approach taken. We now watch (by use of bits in the device flags word) for the use of parts of the graphics state before it is set. If such use is detected, then we note that the glyph is 'uncacheable' and render it direct. This seems to match Acrobats behaviour.
2012-01-13	Avoid infinite loops with XObjects.	Robin Watts
	Every xobject keeps a reference to the object from whence it came. This is marked/unmarked as it is executed. Thanks to Zeniko for spotting the potential problem.
2012-01-12	Use the same coordinate system for pdf and xps pages in the interface.	Tor Andersson
	Move coordinate space tweaks into pdf_ and xps_run_page, and provide neutral pdf_ and xps_bound_page functions to return the page size as a zero-origined bounding box.
2012-01-10	Automatically load page tree when accessing a page/page count.	Sebastian Rasmussen

2012-01-04	Bug 692739: Add ability to abort time consuming actions	Robin Watts
	A new 'cookie' parameter is added to page rendering/interpretation functions. Supply this as NULL to get existing behaviour. If you supply a non-NULL cookie, then this is taken as a pointer to a struct that can be used for simple, non-thread locked communication between caller and library. The entire struct should be memset to zero before entry, except for specific flags (thus coping with future extensions to this struct). The abort flag should be zero on entry. It will be checked periodically by the library - if the caller sets it non-zero (via another thread) then the current operation will be aborted. No guarantees are given as to how often this will be checked, or how fast it will be responded to. The progress_max field will be set to an integer (-1 for unknown) representing the number of 'things' to do. The progress field will count up from 0 to this number as time goes by. No guarantees are made as to the accuracy of this information, but it should be useful for offering some sort of progress bar etc. Note that progress_max may increase during the job. In general, callers should be careful to accept out of range or invalid data in this structure as this is deliberately accessed 'unlocked'.
2011-12-28	Outline/link destination tweaks.	Robin Watts
	Move 'kind' into the fz_link_dest structure (as this makes more sense). Put an fz_link_dest rather than just a page number into the outlines structure. Correct parsing of actions and dests from pdf outlines.