mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2017-09-12	Fix leaks upon error while copying array/dict.	Sebastian Rasmussen

2017-09-07	Use dict_put_drop/array_push_drop wherever possible.	Sebastian Rasmussen

2017-09-07	Initialize variables to appease clang scan-build.	Sebastian Rasmussen

2017-06-22	Add const to pdf_toname.	Tor Andersson

2017-04-27	Include required system headers.	Tor Andersson

2017-03-22	Rename fz_putc/puts/printf to fz_write_*.	Tor Andersson
	Rename fz_write to fz_write_data. Rename fz_write_buffer_* and fz_buffer_printf to fz_append_. Be consistent in naming: fz_write_ calls write to fz_output. fz_append_* calls append to fz_buffer. Update documentation.
2017-01-09	Remove some dead code.	Tor Andersson

2016-12-27	Strip extraneous blank lines.	Tor Andersson

2016-12-19	Fix typo in dictionary entry sorting.	Sebastian Rasmussen
	Commit a92f0db5987b408bef0d9b07277c8ff2329e9ce5 introduced a typo causing pdf_sort_dict() to try to sort non-dict objects. Attempting to do this for non-dict objects causes a segmentation fault. For dictionary objects this causes a performance degradation that has not been noticed. pdf_sort_dict() is called in two places: pdf_dict_get_put() and showgrep(). The resson that calling pdf_sort_dict() from pdf_dict_get_put() does not cause a segmentation fault is that pdf_dict_get_put() makes sure that the object is a dictionary before calling pdf_sort_dict(), which will then decide NOT to sort the dict keys. showgrep() on the other hand does not make sure that it is only processing dict objects before calling pdf_sort_dict() which caused a segmentation fault.
2016-12-12	pdf: Add missing prepare_object_for_alteration calls.	Tor Andersson
	pdf_array_delete and pdf_dict_put_val_null weren't calling this function.
2016-12-12	Change pdf_dict_put_val to pdf_dict_put_val_null.	Tor Andersson
	It's only used to 'fix' duff indirect references when cleaning PDF files. Writing general values into dictionaries should be done by key, not by internal index.
2016-12-08	Update pdf_array_put to allow extension.	Robin Watts
	Previously, attempting to put an object beyond the end of an array would throw an error. Here we update the code to allow objects to be placed exactly at the end (i.e. to extend the length by 1). Update js use of pdf_array_put.
2016-11-23	Fix object leak in pdf_array_put_drop() and pdf_dict_put_val_drop().	Sebastian Rasmussen

2016-09-22	Bug 697015: Avoid object references vanishing during repair.	Robin Watts
	A PDF repair can be triggered 'just in time', when we encounter a problem in the file. The idea is that this can happen without the enclosing code being aware of it. Thus the enclosing code may be holding 'borrowed' references (such as those returned by pdf_dict_get()) at the time when the repair is triggered. We are therefore at pains to ensure that the repair does not replace any objects that exist already, so that the calling code will not have these references unexpectedly invalidated. The sole exception to this is when we replace the 'Length' fields in stream dictionaries with the actual lengths. Bug 697015 shows exactly this situation causing a reference to become invalid. The solution implemented here is to add an 'orphan list' to the document, where we put these (hopefully few, small) objects. These orphans are kept around until the document is closed.
2016-09-14	Don't report addRef/dropRef events to Memento twice.	Robin Watts
	We call Memento_addRef etc in fz_keep_impXX functions, so don't call them in the callers too.
2016-08-24	Add pdf_array_find to look up the index of an object in an array.	Tor Andersson

2016-08-24	When NULL is added to PDF dicts/arrays, insert null objects.	Sebastian Rasmussen

2016-08-24	Be stricter in what can be added into arrays/dicts.	Sebastian Rasmussen

2016-08-24	Do not resolve PDF dict keys before using them.	Sebastian Rasmussen
	Only direct PDF name objects should be used as arguments, indirect PDF name objects cannot be used.
2016-08-24	Always check that PDF dict keys are names in same way.	Sebastian Rasmussen

2016-08-24	Add macros for checking PDF object type.	Sebastian Rasmussen
	This avoids resolving object references which is important for dictionary keys.
2016-08-24	Turn warnings in dict/array functions into exceptions.	Sebastian Rasmussen

2016-07-08	Use fz_keep_imp and fz_drop_imp for all reference counting.	Tor Andersson

2016-07-06	Fix garbage collection and page grafting for indirect reference chains.	Tor Andersson
	The mark & sweep pass of garbage collection, and resolving indirect objects when grafting objects was following the full chain of indirect references. In the unusual case where a numbered object is itself only an indirect reference to another object, this intermediate numbered object would be missed both when marking for garbage collection, and when copying objects for grafting. Add a function to resolve only one step for these two uses. The following is an example of a file that would break during garbage collection if we follow full indirect reference chains: %PDF-1.3 1 0 obj <</Type/Catalog /Foo[2 0 R 3 0 R]>> endobj 2 0 obj 4 0 R endobj 3 0 obj 5 0 R endobj 4 0 obj <</Length 1>> stream A endstream endobj 5 0 obj <</Length 1>> stream B endstream endobj
2016-07-06	pdf: Check ownership when adding objects to a document.	Tor Andersson

2016-06-20	Fix signed/unsigned warning.	Robin Watts

2016-06-17	Allow PDF strings to be > 16bits.	Robin Watts
	This stops Bug693111.pdf giving errors.
2016-06-17	Use 'size_t' instead of int as appropriate.	Robin Watts
	This silences the many warnings we get when building for x64 in windows. This does not address any of the warnings we get in thirdparty libraries - in particular harfbuzz. These look (at a quick glance) harmless though.
2016-03-16	Avoid unused var warnings in Memento ref counting code.	Robin Watts

2016-03-15	Make PDF objects ref changes memento-trackable.	Robin Watts

2016-03-14	Make pdf_is_stream work on loaded stream dictionary objects as well.	Tor Andersson

2016-03-01	js: Add PDF document and object access.	Tor Andersson

2016-02-29	Improve pretty-print formatting of arrays.	Tor Andersson

2015-12-11	Use fz_output instead of FILE* for most of our output needs.	Tor Andersson
	Use fz_output in debug printing functions. Use fz_output in pdfshow. Use fz_output in fz_trace_device instead of stdout. Use fz_output in pdf-write.c. Rename fz_new_output_to_filename to fz_new_output_with_path. Add seek and tell to fz_output. Remove unused functions like fz_fprintf. Fix typo in pdf_print_obj.
2015-08-27	Move objects to the incremental xref before changing them	Paul Gardiner
	This is work towards supporting several levels of incremental xref, which in turn is work towards bug #696123. When several levels are present, the operation will make a copy of the object and that needs to be done before any change to the object.
2015-08-27	Add a deep-copy function for pdf objects	Paul Gardiner
	This is work towards supporting several levels of incremental xref, which in turn, is work towards bug #696123. When several levels of incremental xref are present there can be objects that appear at multiple levels and differ between those levels. This deep-copy function will be used to create new copies before the new version is altered.
2015-08-20	Remove duplicate inclusions of headers.	Sebastian Rasmussen
	These headers are already included by mupdf/fitz/system.h.
2015-07-27	Correctly compare PDF names with null, true and false.	Sebastian Rasmussen
	Commit f533104 introduced optimized handling of pdf names, null, true and false. That commit handles most object types correctly in pdf_objcmp() but it does not correctly handle comparisons such as pdf_objcmp("/Crypt", "true") or pdf_objcmp("null", "/Crypt"). Fixes one issue from bug 696012.
2015-06-05	Fix mutool clean for FZ_LARGEFILE case.	Robin Watts
	We were allocating the ofs array as ints and then filling it with fz_off_t's.
2015-05-25	Bug 695949: Fix bug in pdf_dict_del.	Robin Watts
	Fir typo in pdf_dict_del. Issue and fix both provided by Willus (William Menninger).
2015-05-15	pdf_dict_find optimisation.	Robin Watts
	When doing pdf_dict_put, we first call pdf_dict_find to hunt for an existing entry we can just update. Recently we introduced a 'location' return from pdf_dict_find that would (in the non-found case) return the location of where such an entry should be inserted. It's just dawned on me that we don't need a separate variable for this. We continue to return negative numbers for 'not found', but these negative numbers can contain the insertion point.
2015-05-15	Fix bug in pdf_dict_find.	Robin Watts
	Sebras and Tor spotted that we could get occasional 'warning: cannot seek backwards' messages. An example command that shows this is: mutool show pdf_reference17.pdf grep They further tracked the problem down to the 'sorted' side of the pdf_dict_find function. In the binary search, I calculate c to be the comparison value between pairs of keys. In the case where both keys (names) are in the special case 'known' range below PDF_OBJ__LIMIT, I use pointer arithmetic for this. Unfortunately, I was forgetting that the compiler thinks that pdf_obj 's are 4 (or 8) bytes in size, so was doing (a-b)/4. To workaround this I cast both keys to char 's. This solves the bug. Thanks to Sebras and Tor for doing the hard work in tracking this down.
2015-05-15	Support pdf files larger than 2Gig.	Robin Watts
	If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets for files; this allows us to open streams larger than 2Gig. The downsides to this are that: * The xref entries are larger. * All PDF ints are held as 64bit things rather than 32bit things (to cope with /Prev entries, hint stream offsets etc). * All file positions are stored as 64bits rather than 32. The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery in fitz/system.h sets fz_off_t to either int or int64_t as appropriate, and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required. These call the fseeko64 etc functions on linux (and so define _LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
2015-04-07	Fix whitespace.	Tor Andersson

2015-03-25	Avoid calling pdf_dict_finds when we could call pdf_dict_find.	Robin Watts
	Faster, shinier, better.
2015-03-24	Reduce pdf_obj memory usage.	Robin Watts
	Historically pdf_obj was a structure with a header and a union in it. As time has gone by more stuff has been put into the header, and the different arms of the union have changed in size. We've even adopted the idea of different 'kinds' of pdf_obj's being different sizes (names and strings for examples). Here we rework the system slightly; we minimise the header, and split out everything into different structures. Every different 'kind' of pdf_obj is now it's own structure, just as big as it needs to be. Key changes: * refs is now a short rather than an int. We are never going to need more than 32767 refs (indeed, if we ever need more than about 3 (10 at the outside), something has gone very wrong!). This aids structure packing. * Only arrays, dicts and refs actually need the pdf_document pointer. * Only arrays and dicts need the parent_num pointer.
2015-03-24	Rework handling of PDF names for speed and memory.	Robin Watts
	Currently, every PDF name is allocated in a pdf_obj structure, and comparisons are done using strcmp. Given that we can predict most of the PDF names we'll use in a given file, this seems wasteful. The pdf_obj type is opaque outside the pdf-object.c file, so we can abuse it slightly without anyone outside knowing. We collect a sorted list of names used in PDF (resources/pdf/names.txt), and we add a utility (namedump) that preprocesses this into 2 header files. The first (include/mupdf/pdf/pdf-names-table.h, included as part of include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx" entries. These are pdf_obj 's that callers can use to mean "A PDF object that means literal name 'xxxx'" The second (source/pdf/pdf-name-impl.h) is a C array of names. We therefore update the code so that rather than passing "xxxx" to functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to pdf_dict_get(...)). This is a fairly natural (if widespread) change. The pdf_dict_getp (and sibling) functions that take a path (e.g. "foo/bar/baz") are therefore supplemented with equivalents that take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar, PDF_NAME_baz, NULL)). The actual implementation of this relies on the fact that small pointer values are never valid values. For a given pdf_obj p, if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal entry in the name table. This enables us to do fast pointer compares and to skip expensive strcmps. Also, bring "null", "true" and "false" into the same style as PDF names. Rather than using full pdf_obj structures for null/true/false, use special pointer values just above the PDF_NAME_ table. This saves memory and makes comparisons easier.
2015-03-24	Don't pass interpreter context to pdf_processor opcode callbacks.	Tor Andersson
	Update buffer and filter processors. Filter both colors and stroke states. Move OCG hiding logic into interpreter.
2015-02-27	Bug 695853: Fix pdf clean operation with invalid refs in input file.	Robin Watts
	MuPDF (and other PDF readers) treat invalid references as 'null' objects. For instance, in the supplied file, object 239 is supposedly free, but a reference is made to it. When cleaning (or linearising) a file, we renumber objects; such illegal refs then end up pointing somewhere else. The workaround here is simply to spot the invalid refs during the mark phase, and to set the referencing to null.
2015-02-17	Add ctx parameter and remove embedded contexts for API regularity.	Tor Andersson
	Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.