mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2018-02-06	Include limits.h where INT_MAX/INT_MIN/PATH_MAX/UINT_MAX are used.	Sebastian Rasmussen

2018-02-02	Signature support: decouple mupdf from the pkcs7 implementation	Paul Gardiner
	The mupdf build included an implimentation of the pkcs7 functions that are needed for signing documents and verifying signatures, the implementation being either an openssl-based one, or a stub that returned errors. This commit removes the pkcs7 functions from the main mupdf library. For the sake of verification, there wasn't really a need for the pkcs7 functions to be part of mupdf. It was only the checking function that used them. The checking function is now provided as a helper, outside of the main build. The openssl-based pkcs7 functions area also supplied as a helper. Users wishing to verify signatures can either use the checking function directly, or use the source on which to base their own. Document signing requires more integration between mupdf and pkcs7 because part of the process is performed at time of signing and part when saving the document. Mupdf already had a pdf_pkcs7_signer object that kept information between the two phases. That object has now been extended to include the pkcs7 functions involved in signing, and the signing function now requires such an object, rather than a file path to a certificate. The openssl-based pkcs7 helper provides a function that, given the path to a certificate, will return a pdf_pkcs7_signer object. The intention is that different implementations can be produced for different platforms, based on cryptographic routines built into the operationg system. In each case, for the sake of document signing, the routines would be wrapped up as a pdf_pkcs7_signer object.
2018-02-02	Signature support: separate pkcs7 specifics into a separate file.	Paul Gardiner
	Previously, pdf-pkcs7.c contained mishmash of functions required for creating and checking signatures, with no separation between the parts relating to pdf and those relating to pkcs7. This commit introduces pdf_signature.c which contains the pdf specifics, leaving pdf-pkcs7.c to be purely pkcs7 functions. This should more easily allow the use of pkcs7 solutions other than openssl. The pkcs7 api is declared in pdf-pkcs7.h. It is entirely free of mupdf specifics, other than using an fz_stream to specify the bytes to be hashed.
2018-02-01	Bug 698908: Resize object use and renumbering lists after repair.	Sebastian Rasmussen
	Previously repair might end up increasing xref_len, but the lists were not correspodingly expanded, leading to ASAN complaints.
2018-01-31	Use convenience pdf dictionary/array creation functions.	Tor Andersson

2018-01-19	Perform document signing via fz_stream and fz_output	Paul Gardiner
	This change achieves two goals. It allows signing to be performed even when the document is obtained other than from a disk file. It also reestablishes to a working state signing of file-based documents, a feature that was broken due to complete_signatures being called after certain tables, avaialble via the output options object, had been destroyed.
2018-01-05	Enable saving of encrypted PDF files.	Robin Watts
	We need both RC4 and AES encryption. RC4 is a straight reversable stream, and our AES library knows how to encrypt as well as decrypt, so it's "just" a matter of calling them correctly. We therefore expose a generic "encrypt this data" routine (and a matching "how long will the data be once encrypted" routine) within pdf-crypt.c. We then extend our our PDF object output routines to call these. This is enough to get encrypted data preserved over calls to mutool clean. Unfortunately the created files aren't readable, due to 2 further problems, also fixed here. Firstly, mutool clean does not preserve the Encrypt entry in the trailer. This is a simple fix. Secondly, we are required NOT to encrypt the Encrypt entry. This requires us to spot the crypt entry and to special case it.
2018-01-05	Fix "being able to search for redacted text" bug.	Robin Watts
	A customer reports that even after text has been redacted, we can still search for the redacted text. The example file supplied had many instances of the word 'words', and 4 instances of 'apple'. The 'apple' instances were redacted, and the document saved out. 2 such instances were on the first page; when we searched for 'apple' acrobat would find the word after the first removed instance of apple, then find the word 2 after the second removed instance of apple. After much head scratching and cutting down of the file, it appears that the information genuinely isn't in the file. Acrobat is somehow remembering it. It appears to be doing this using the 'ID' entries in the trailer dict. My suspicion is that Acrobat has cached the text extraction from the original document, and is using this on all files that match the IDs. Change the IDs (or remove them) and the problem goes away. The spec says that the ID should be 2 bytestrings in an array. The first is supposed to stay the same in all versions of a file (i.e. it shows the original version of the file, and it is the one that is used by encrypt). The second bytestring is supposed to change more often, so here we simply return a new random string on each writing.
2017-12-13	Initialize generation numbers when saving a new pdf.	Tor Andersson

2017-12-13	Never write negative xref offsets when saving to PDF.	Sebastian Rasmussen

2017-12-13	Add 'clean' option to pdfclean to clean (but not sanitize) content streams.	Tor Andersson
	This goes well with the 'mutool clean -d' decompression option to debug content streams, without doing the sanitize optimization pass.
2017-11-22	Add usage for missing options to pdf-write.	Sebastian Rasmussen

2017-11-22	Skip unnecessary newline when writing ASCII streams.	Tor Andersson

2017-11-08	Silence warning.	Tor Andersson

2017-11-08	Bug 689699: Avoid buffer overrun.	Robin Watts
	When cleaning a pdf file, various lists (of pdf_xref_len length) are defined early on. If we trigger a repair during the clean, this can cause pdf_xref_len to increase causing an overrun. Fix this by watching for changes in the length, and checking accesses to the list for validity. This also appears to fix bugs 698700-698703.
2017-11-08	Bug 698689: Don't create a hint stream for a file with 0 pages.	Robin Watts

2017-11-01	Add separate fz_close_output step.	Tor Andersson
	Closing flushes output and may throw exceptions. Dropping frees the state and never throws exceptions.
2017-11-01	Use int64_t for public file API offsets.	Tor Andersson
	Don't mess with conditional compilation with LARGEFILE -- always expose 64-bit file offsets in our public API.
2017-10-12	Some more consts.	Tor Andersson

2017-10-05	Remove unused code.	Sebastian Rasmussen

2017-09-08	Remove unnecessary fz_try()/fz_catch().	Sebastian Rasmussen

2017-09-07	Use dict_put_drop/array_push_drop wherever possible.	Sebastian Rasmussen

2017-09-07	Initialize variables to appease clang scan-build.	Sebastian Rasmussen

2017-08-31	Always add newline before 'endstream' keyword for PDF/A compliance.	Philipp Knechtges

2017-08-31	Adjust PDF header for PDF/A compliance.	Philipp Knechtges
	Remove superfluous '%' character in the comment with binary bytes.
2017-08-31	Do not deflate metadata (necessary for PDF/A compliance).	Philipp Knechtges

2017-07-06	pdf: Drop object upon error while renumbering objects.	Sebastian Rasmussen

2017-06-22	Add const to pdf_toname.	Tor Andersson

2017-05-31	Avoid double literals causing casts to float.	Sebastian Rasmussen

2017-04-27	Include required system headers.	Tor Andersson

2017-03-23	Introduce fz_new_derived_...	Robin Watts
	Instead of having fz_new_XXXX(ctx, type, ...) macros that call fz_new_XXXX_of_size etc, use fz_new_derived_... Clearer naming, and doesn't clash with fz_new_document_writer.
2017-03-23	Add fz_new_writer function.	Robin Watts
	Moves document_writers into the same style as fz_new_{image,document,page} etc.
2017-03-22	Rename fz_putc/puts/printf to fz_write_*.	Tor Andersson
	Rename fz_write to fz_write_data. Rename fz_write_buffer_* and fz_buffer_printf to fz_append_. Be consistent in naming: fz_write_ calls write to fz_output. fz_append_* calls append to fz_buffer. Update documentation.
2017-01-17	Fix typos.	Sebastian Rasmussen

2016-12-27	Strip extraneous blank lines.	Tor Andersson

2016-12-16	pdf: Don't allow incremental writes on a new document.	Tor Andersson

2016-12-12	Make more pdf functions private.	Tor Andersson

2016-12-12	Change pdf_dict_put_val to pdf_dict_put_val_null.	Tor Andersson
	It's only used to 'fix' duff indirect references when cleaning PDF files. Writing general values into dictionaries should be done by key, not by internal index.
2016-11-23	Fix pdf-write bug when ascii encoding.	Robin Watts

2016-11-14	Make fz_buffer structure private to fitz.	Robin Watts
	Move the definition of the structure contents into new fitz-imp.h file. Make all code outside of fitz access the buffer through the defined API. Add a convenience API for people that want to get buffers as null terminated C strings.
2016-09-08	Make fz_option_eq() available outside of pdf-writer.	Sebastian Rasmussen

2016-09-05	mutool clean: Fixes seen as part of bug 697092 investigation.	Robin Watts
	Firstly, we avoid compressing streams if they get bigger. Secondly, we ensure that we always update the Length field. Seen as part of the investigation into bug 697092, though not the actual cause. Thanks to Tor for the latter part of the fix.
2016-09-01	pdf: Load/open streams by indirect reference object when possible.	Tor Andersson

2016-08-02	Parse more fz_document_writer() options.	Sebastian Rasmussen

2016-07-08	Separate close and drop functionality for devices and writers.	Tor Andersson
	Closing a device or writer may throw exceptions, but much of the foreign language bindings (JNI and JS) depend on drop to never throw an exception (exceptions in finalizers are bad).
2016-07-06	Fix garbage collection and page grafting for indirect reference chains.	Tor Andersson
	The mark & sweep pass of garbage collection, and resolving indirect objects when grafting objects was following the full chain of indirect references. In the unusual case where a numbered object is itself only an indirect reference to another object, this intermediate numbered object would be missed both when marking for garbage collection, and when copying objects for grafting. Add a function to resolve only one step for these two uses. The following is an example of a file that would break during garbage collection if we follow full indirect reference chains: %PDF-1.3 1 0 obj <</Type/Catalog /Foo[2 0 R 3 0 R]>> endobj 2 0 obj 4 0 R endobj 3 0 obj 5 0 R endobj 4 0 obj <</Length 1>> stream A endstream endobj 5 0 obj <</Length 1>> stream B endstream endobj
2016-07-06	pdf: Drop generation number from public interfaces.	Tor Andersson
	The generation number is only needed for decryption, and is assumed to be zero or irrelevant for all other uses. Store the original object number and generation in the xref slot, so that we can decrypt them even when the objects have been renumbered, without needing to pass the original object number around through the stream loading APIs.
2016-06-17	Add device space transform state to draw device.	Tor Andersson
	Allows us to remove the out parameter 'transform' from fz_begin_page.
2016-06-17	Use 'size_t' instead of int as appropriate.	Robin Watts
	This silences the many warnings we get when building for x64 in windows. This does not address any of the warnings we get in thirdparty libraries - in particular harfbuzz. These look (at a quick glance) harmless though.
2016-06-16	Add PNG output for mutool convert.	Tor Andersson