mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2014-05-10	Fix 694698: Support 32-bit values in CMaps.	Tor Andersson
	Increasing the existing data structure to 32-bit values would bloat the data tables too much. Simplify the data structure and use three separate range tables for lookups -- one with small 16-bit to 16-bit range lookups, one with 32-bit range lookups, and a final one for one-to-many lookups. This loses the range-to-table optimization we had before, but even with the extra ranges this necessitates, the total size of the compiled binary CMap data is smaller than if we were to extend the previous scheme to 32 bits.
2014-04-11	Add all form field flags. Check flags before marking fields dirty.	Tor Andersson
	NoExport (and ReadOnly) fields shouldn't mark the document for saving.
2014-04-02	Bump the version number to 1.4.	Tor Andersson

2014-03-25	Add va_copy/va_copy_end macros to support both C89 and C99.	Tor Andersson

2014-03-25	Break dependencies on pdf-form.c and pdf-js.c	Tor Andersson
	Split functions out of pdf-form.c that shouldn't be there, and make javascript initialization explicit.
2014-03-25	Break dependency of pdf-annot.c to graphics library.	Tor Andersson

2014-03-19	Add routine to clean pdf content streams for pages.	Robin Watts
	New routine to filter the content streams for pages, xobjects, type3 charprocs, patterns etc. The filtered streams are guaranteed to be properly matched with q/Q's, and to not have changed the top level ctm. Additionally we remove (some) repeated settings of colors etc. This filtering can be extended to be smarter later. The idea of this is to both repair after editing, and to leave the streams in a form that can be easily appended to. This is preparatory to work on Bates numbering and Watermarking. Currently the streams produced are uncompressed.
2014-03-19	Add %q and %( formatting to fz_printf to print escaped strings.	Tor Andersson
	%q escapes using C syntax and wraps the string in double quotes. %( escapes using PS/PDF syntax and wraps the string in parens.
2014-03-19	Implement our own vsnprintf variant.	Tor Andersson
	The primary motivator for this is so that we can print floating point values and get the full accuracy out, without having to print 1.5 as 1.5000000, and without getting 23e24 etc. We only support %c, %f, %d, %o, %x and %s currently. We only support the zero padding qualifier, for integers. We do support some extensions: %C turns values >=128 into UTF-8. %M prints a fz_matrix. %R prints a fz_rect. %P prints a fz_point. We also implement a fprintf variant on top of this to allow for consistent results when using fz_output. a
2014-03-18	Fix operator buffering of inline images.	Robin Watts
	Previously pdf_process buffer did not understand inline images. In order to make this work without needlessly duplicating complex code from within pdf-op-run, the parsing of inline images has been moved to happen in pdf-interpret.c. When the op_table entry for BI is called it now expects the inline image to be in csi->img and the dictionary object to be in csi->obj. To make this work, we have had to improve the handling of inline images in general. While non-inline images have been loaded and held in memory in their compressed form and only decoded when required, until now we have always loaded and decoded inline images immediately. This has been due to the difficulty in knowing how many bytes of data to read from the stream - we know the length of the stream once uncompressed, but relating this to the compressed length is hard. To cure this we introduce a new type of filter stream, a 'leecher'. We insert a leecher stream before we build the filters required to decode the image. We then read and discard the appropriate number of uncompressed bytes from the filters. This pulls the compressed data through the leecher stream, which stores it in an fz_buffer. Thus images are now always held in their compressed forms in memory. The pdf-op-run implementation is now trivial. The only real complexity in the pdf-op-buffer implementation is the need to ensure that the /Filter entry in the dictionary object matches the exact point at which we backstopped the decompression.
2014-03-17	Rework fz_streams.	Robin Watts
	Currently fz_streams have a 4K buffer within their header. The call to read from a stream fills this buffer, resulting in more data being pulled from any underlying stream than we might like. This causes problems with the forthcoming 'leech' filter. Here we simplify the fields available in the public stream header. No specific buffer is given; simply the read and write pointers. The underlying 'read' function is replaced by a 'next' function that makes the next block of data available and returns the first character of it (or EOF). A caller to the 'next' function should supply the maximum number of bytes that it knows it will need (possibly not now, but eventually). This enables the underlying stream to efficiently decode just enough. The underlying stream is free to return fewer, or a greater number if it wants to. The exact size of the 'block' of data returned will depend on the filter in use and (possibly) the data therein. Callers can get the currently available amount of data by calling fz_available (but again should pass the maximum amount of data they know they will need). The only time this will ever return 0 is if we have hit EOF.
2014-03-13	Make pdf_output_obj consistent with pdf_fprint_obj	Robin Watts
	Pass in the 'tight' flag.
2014-03-04	Bug 691691: Add way of clearing cached objects out of the xref.	Robin Watts
	We add various facilities here, intended to allow us to efficiently minimise the memory we use for holding cached pdf objects. Firstly, we add the ability to 'mark' all the currently loaded objects. Next we add the ability to 'clear the xref' - to drop all the currently loaded objects that have no other references except the ones held by the xref table itself. Finally, we add the ability to 'clear the xref to the last mark' - to drop all the currently loaded objects that have been created since the last 'mark' operation and have no other references except the ones held by the xref table. We expose this to the user by adding a new device hint 'FZ_NO_CACHE'. If set on the device, then the PDF interpreter will pdf_mark_xref before starting and pdf_clear_xref_to_mark afterwards. Thus no additional objects will be retained in memory after a given page is run, unless someone else picks them up and takes a reference to them as part of the run. We amend our simple example app to set this device hint when loading pages as part of a search.
2014-02-25	Bug 694851: pass more information to fz_load_system_font	Simon Bünzli
	The following changes allow font providers to make better choices WRT what font to provide and under what circumstances: * bold and italic flags are passed in so that implementors can decide themselves whether to ask for simulated boldening/italicising if a font claims not to be bold/italic * is_substitute is replaced with needs_exact_metrics to make the meaning of this argument hopefully clearer (that argument is set only for PDF fonts without a FontDescriptor) * the font name is always passed as requested by the document instead of the cleaned name for the standard 14 fonts which allows distinguishing e.g. Symbol and Symbol,Bold
2014-02-25	Support text (aka sticky note) annotations	Paul Gardiner

2014-02-17	Simplify shade vertex preparation and remove redundant memcpy calls.	Tor Andersson

2014-02-17	Add fz_transform_point_xy to simplify transforming a point.	Tor Andersson
	Many times, the idiom p.x = x; p.y = y; fz_transform_point() is used. This function should simplify that use case by both initializing and transforming the point in one call.
2014-02-17	Add const to colorspace source arguments and dependencies.	Tor Andersson

2014-02-14	Add function for creating form fields (widgets)	Paul Gardiner
	This feature is being implemented mostly for the purpose of permitting the addition to a page of invisible signatures. Also change pdf_create_annot to make freshly created annotations printable by default.
2014-02-14	pdf-js: Pass a name string to type constructor.	Tor Andersson

2014-02-10	Bug 695022: Add TIFF format handler	Robin Watts
	Patch from Thomas Fach-Pedersen. Many thanks! Add a new format handler that copes with TIFF files. This replaces the TIFF functionality within the image format handler, and is better because this copes with multiple images (as one image per page).
2014-02-10	Bug 695022: Add support for multuple image tiff files.	Robin Watts
	Patch from Thomas Fach-Pedersen. Many Thanks.
2014-02-10	Add pdf_is_number.	Robin Watts
	Useful utility missing from our arsenal.
2014-02-10	Add pdf_output_obj function.	Robin Watts
	Reuses the same internals as pdf_fprintf_obj etc.
2014-01-17	Bug 694896: Ensure that repairs don't lose trailer dict.	Robin Watts
	When we find certain classes of flaw in the file while attempting to read an object, we trigger an automatic repair of the file. This leaves almost all objects unchanged; the sole exception is that of the trailer object (and its sub objects) which can get dropped and recreated. To avoid leaving people holding handles to objects within the trailer dict high and dry, we introduce a 'pre_repair_trailer' object to each xref entry. On a repair, we copy the existing trailer object to this. As we only ever repair once, this is safe. The only known place where this is a problem is when setting up the pdf_crypt for a document; we adapt the code here to allow for potential problems. The example file that shows this up is: 048d14d2f5f0ae31e9a2cde0be66f16a_asan_heap-uaf_86d4ed_3961_3661.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the fuzzing files.
2014-01-13	Bug 694851: enhance fz_load_system_font	Simon Bünzli
	For SumatraPDF, the following changes are required: * fz_load_system_font is called from pdf_load_builtin_font as well so that Arial, Courier New, etc. can be loaded from the system instead of their Nimbus replacements. In order to distinguish between calls from pdf_load_builtin_font and pdf_load_substitute_font, an is_substitute argument is added. * fz_load_system_cjk_font is added and called from pdf_load_substitute_cjk_font so that a better replacement font can be loaded instead of DroidSansFallback. * Both fz_load_system_font and fz_load_system_cjk_font return fz_font* instead of fz_buffer* so that implementers aren't required to load fonts into memory (SumatraPDF uses fz_new_font_from_file for system fonts). In addition to that, fz_load_system_font_func is renamed to fz_load_system_font_funcs since it now accepts two functions, and the PDF_ROS_* constants are renamed to FZ_ADOBE_* (collection names aren't passed as const char* so that implementers know which collections to expect). For convenience, fz_load_*_font also never throws since currently all callers have further fallbacks available.
2014-01-07	Introduce 'document handlers'.	Robin Watts
	We define a document handler for each file type (2 in the case of PDF, one to handle files with the ability to 'run' them, and one without). We then register these handlers with the context at startup, and then call fz_open_document... as usual. This enables people to select the document types they want at will (and even to extend the library with more document types should they wish).
2014-01-06	reuse JBIG2Globals	Simon Bünzli
	Certain optimized documents use a rather large common symbol dictionary for all JBIG2 images. Caching these JBIG2Globals speeds up loading and rendering of such documents.
2014-01-06	add stub files for JPEG-XR support	Simon Bünzli
	See SumatraPDF's repo for a Windows-only implementation using WIC.
2014-01-06	fix MSVC warnings C4054 and C4152	Simon Bünzli
	These warnings are caused by casting function pointers to void* instead of proper function types.
2014-01-06	fix various MSVC warnings	Simon Bünzli
	Some warnings we'd like to enable for MuPDF and still be able to compile it with warnings as errors using MSVC (2008 to 2013): * C4115: 'timeval' : named type definition in parentheses * C4204: nonstandard extension used : non-constant aggregate initializer * C4295: 'hex' : array is too small to include a terminating null character * C4389: '==' : signed/unsigned mismatch * C4702: unreachable code * C4706: assignment within conditional expression Also, globally disable C4701 which is frequently caused by MSVC not being able to correctly figure out fz_try/fz_catch code flow. And don't define isnan for VS2013 and later where that's no longer needed.
2014-01-02	Add rebinding for fz_devices and fz_documents	Robin Watts
	The SVG device needs rebinding as it holds a file. The PDF device needs to rebind the underlying pdf document. All documents need to rebind their underlying streams.
2014-01-02	Add rebinding for fz_streams.	Robin Watts

2014-01-02	Add rebinding for fz_output.	Robin Watts

2014-01-02	Bug 694585: Further improve mesh rendering times	Robin Watts
	Add a cached color converter mechanism. Use this for rendering meshes to speed repeated conversions. This reduces a (release build to ppm at default resolution) run from 23.5s to 13.2 seconds.
2014-01-02	Bug 694585: Slow rendering of meshes	Robin Watts
	In the existing code for meshes, we decompose the mesh down into quads (or triangles) and then call a process routine to actually do the work. This process routine typically maps each vertexes position/color and plots it. As each vertex is used several times by neighbouring patches, this results in each vertex being processed several times. The fix in this commit is therefore to break the processing into 'prepare' and 'process' phases. Each vertex is 'prepared' before being used in the 'process' phase. This cuts the number of prepare operations in half. In testing, this reduced the time for a (release build, generating ppm at default resolution) run from 33.4s to 23.5s.
2014-01-02	Improve PDF repair logic.	Robin Watts
	When we meet a broken PDF file, we attempt to repair it. We do this by reading tokens from the file and attempting to interpret them as a normal PDF stream. Unfortunately, if the file is corrupt enough so that we start to read from the middle of a stream, and we happen to hit an '(' character, we can go into string reading mode. We can then end up skipping over vast swathes of file that we could otherwise repair. We fix this here by using a new version of the pdf_lex function that refuses to ever return a string. This means we may take more time over skipping things than we did before, but are less likely to skip stuff. We also tweak other parts of the pdf repair logic here. If we hit a badly formed piece of data, clear the num/gen we have stored so that the next plausible piece we get does not get assigned to a random object number.
2014-01-02	Cull code unused as a result of the "tolerate inline images..." fix.	Robin Watts
	Remove code that's not used any more as a result of the previous fix, plus some code that was unused anyway.
2013-12-24	Bug 694810: Implement late file repair for PDFs.	Robin Watts
	Currently, if we spot a bad xref as we are reading a PDF in, we can repair that PDF by doing a long exhaustive read of the file. This reconstructs the information that was in the xref, and the file can be opened (and later saved) as normal. If we hit an object that is not in the expected place however, we cannot trigger a repair at that point - so xrefs with duff offsets in (within the bounds of the file) will never be repaired. This commit solves that by triggering a repair (just once) whenever we fail to parse an object in the expected place.
2013-12-17	Remove duplicated XPS definitions from header.	Robin Watts

2013-12-17	Remove fz_context from pdf_crypt	Robin Watts
	Unused field. Also tweak some comments for clarity.
2013-11-28	Bug 694127: Valgrind fix for pdf_decode_cmap	Robin Watts
	A poorly formed string can cause us to overrun the end of the buffer. Now we check the end of the string at each stage to avoid this.
2013-11-27	track font path in fz_font	Simon Bünzli
	ft_file was removed in a2c945506ea2a2b58edbde84124094c6b4f69eac even though it might still be needed by downstream consumers (such as SumatraPDF) for allowing devices to load fonts again when a font has been loaded by fz_new_font_from_file which doesn't maintain a buffer.
2013-11-26	Add const keyword to some font function parameters.	Tor Andersson

2013-11-26	Add fz_advance_glyph and fz_encode_character functions.	Tor Andersson

2013-11-11	Add hooks to load system fonts. Use them in PDF interpreter.	Tor Andersson

2013-11-11	Add fz_new_font_from_buffer function.	Tor Andersson
	Use fz_buffer to wrap and reference count data used in font.
2013-11-08	Use an end pointer for the annotation list to avoid unnecessary iteration	Paul Gardiner

2013-11-05	Allow stroke states to be kept on the stack.	Tor Andersson
	Add a function to clone stroke states, a magic number to keep in the reference count to signal that a stroke state is stack-stored, and automatically clone stack stored stroke states in the keep function. Use fz_default_stroke_state to initialise stack stored stroke states.
2013-11-05	Add binary search tree for mapping strings to void* pointers.	Tor Andersson
	Self balancing AA-tree.