mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2017-12-13	Add 'clean' option to pdfclean to clean (but not sanitize) content streams.	Tor Andersson
	This goes well with the 'mutool clean -d' decompression option to debug content streams, without doing the sanitize optimization pass.
2017-11-06	Expose text filtering through pdf_clean interface.	Robin Watts

2017-11-06	Use text state handling in pdf_filter_processor to filter text.	Robin Watts

2017-09-07	Use dict_put_drop/array_push_drop wherever possible.	Sebastian Rasmussen

2017-06-03	Add documentation for pdf_processors.	Robin Watts
	Expose pdf_new_output_processor. Remove pdf_document argument to pdf_new_filter_processor. It is only ever used when copying resources from the old resource dictionary to the new one, whereupon it must agree with the bound pdf_document in the old resource dictionary.
2017-04-27	Include required system headers.	Tor Andersson

2017-01-09	Add missing pdf_close_processor calls.	Tor Andersson

2016-07-06	Start slimming pdf_page.	Tor Andersson
	We want to turn pdf_page into a thin wrapper around a pdf_obj, so that any updates to the underlying PDF objects will be reflected without having to reload the pdf_page.
2016-05-06	Mutool clean: Fix sanitisation of pages with Content arrays.	Robin Watts
	If the Contents of a page are an array, we were forgetting to write the new singleton replacement into the dictionary.
2016-04-27	Fix 696649: remove fz_rethrow_message calls.	Tor Andersson

2016-04-26	Update mutool clean sanitize to clean annotations too.	Robin Watts

2016-03-01	Rename pdf_new_ref to pdf_add_object.	Tor Andersson

2016-02-09	Fix 696552: Double free error in mutool clean -s.	Tor Andersson
	Also fix a memory leak.
2015-04-16	ASCIIHexEncode inline images during sanitization if do_ascii is set.	Tor Andersson

2015-03-24	Rework handling of PDF names for speed and memory.	Robin Watts
	Currently, every PDF name is allocated in a pdf_obj structure, and comparisons are done using strcmp. Given that we can predict most of the PDF names we'll use in a given file, this seems wasteful. The pdf_obj type is opaque outside the pdf-object.c file, so we can abuse it slightly without anyone outside knowing. We collect a sorted list of names used in PDF (resources/pdf/names.txt), and we add a utility (namedump) that preprocesses this into 2 header files. The first (include/mupdf/pdf/pdf-names-table.h, included as part of include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx" entries. These are pdf_obj 's that callers can use to mean "A PDF object that means literal name 'xxxx'" The second (source/pdf/pdf-name-impl.h) is a C array of names. We therefore update the code so that rather than passing "xxxx" to functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to pdf_dict_get(...)). This is a fairly natural (if widespread) change. The pdf_dict_getp (and sibling) functions that take a path (e.g. "foo/bar/baz") are therefore supplemented with equivalents that take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar, PDF_NAME_baz, NULL)). The actual implementation of this relies on the fact that small pointer values are never valid values. For a given pdf_obj p, if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal entry in the name table. This enables us to do fast pointer compares and to skip expensive strcmps. Also, bring "null", "true" and "false" into the same style as PDF names. Rather than using full pdf_obj structures for null/true/false, use special pointer values just above the PDF_NAME_ table. This saves memory and makes comparisons easier.
2015-03-24	Don't pass interpreter context to pdf_processor opcode callbacks.	Tor Andersson
	Update buffer and filter processors. Filter both colors and stroke states. Move OCG hiding logic into interpreter.
2015-03-23	Pass context to pdf_page_contents_process callback.	Tor Andersson

2015-03-20	Automatically update /Length and /Filter in pdf_update_stream.	Tor Andersson

2015-02-25	Add post processing option to page operator cleaning.	Robin Watts
	In order to be able to watermark etc, we want the ability to add more operators/resources after page cleaning. Add a post processing hook to enable this to be done more easily.
2015-02-17	Add ctx parameter and remove embedded contexts for API regularity.	Tor Andersson
	Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2014-03-19	Add routine to clean pdf content streams for pages.	Robin Watts
	New routine to filter the content streams for pages, xobjects, type3 charprocs, patterns etc. The filtered streams are guaranteed to be properly matched with q/Q's, and to not have changed the top level ctm. Additionally we remove (some) repeated settings of colors etc. This filtering can be extended to be smarter later. The idea of this is to both repair after editing, and to leave the streams in a form that can be easily appended to. This is preparatory to work on Bates numbering and Watermarking. Currently the streams produced are uncompressed.