mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2015-05-15	Support pdf files larger than 2Gig.	Robin Watts
	If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets for files; this allows us to open streams larger than 2Gig. The downsides to this are that: * The xref entries are larger. * All PDF ints are held as 64bit things rather than 32bit things (to cope with /Prev entries, hint stream offsets etc). * All file positions are stored as 64bits rather than 32. The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery in fitz/system.h sets fz_off_t to either int or int64_t as appropriate, and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required. These call the fseeko64 etc functions on linux (and so define _LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
2015-04-07	Fix whitespace.	Tor Andersson

2015-03-25	Avoid calling pdf_dict_finds when we could call pdf_dict_find.	Robin Watts
	Faster, shinier, better.
2015-03-24	Reduce pdf_obj memory usage.	Robin Watts
	Historically pdf_obj was a structure with a header and a union in it. As time has gone by more stuff has been put into the header, and the different arms of the union have changed in size. We've even adopted the idea of different 'kinds' of pdf_obj's being different sizes (names and strings for examples). Here we rework the system slightly; we minimise the header, and split out everything into different structures. Every different 'kind' of pdf_obj is now it's own structure, just as big as it needs to be. Key changes: * refs is now a short rather than an int. We are never going to need more than 32767 refs (indeed, if we ever need more than about 3 (10 at the outside), something has gone very wrong!). This aids structure packing. * Only arrays, dicts and refs actually need the pdf_document pointer. * Only arrays and dicts need the parent_num pointer.
2015-03-24	Rework handling of PDF names for speed and memory.	Robin Watts
	Currently, every PDF name is allocated in a pdf_obj structure, and comparisons are done using strcmp. Given that we can predict most of the PDF names we'll use in a given file, this seems wasteful. The pdf_obj type is opaque outside the pdf-object.c file, so we can abuse it slightly without anyone outside knowing. We collect a sorted list of names used in PDF (resources/pdf/names.txt), and we add a utility (namedump) that preprocesses this into 2 header files. The first (include/mupdf/pdf/pdf-names-table.h, included as part of include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx" entries. These are pdf_obj 's that callers can use to mean "A PDF object that means literal name 'xxxx'" The second (source/pdf/pdf-name-impl.h) is a C array of names. We therefore update the code so that rather than passing "xxxx" to functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to pdf_dict_get(...)). This is a fairly natural (if widespread) change. The pdf_dict_getp (and sibling) functions that take a path (e.g. "foo/bar/baz") are therefore supplemented with equivalents that take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar, PDF_NAME_baz, NULL)). The actual implementation of this relies on the fact that small pointer values are never valid values. For a given pdf_obj p, if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal entry in the name table. This enables us to do fast pointer compares and to skip expensive strcmps. Also, bring "null", "true" and "false" into the same style as PDF names. Rather than using full pdf_obj structures for null/true/false, use special pointer values just above the PDF_NAME_ table. This saves memory and makes comparisons easier.
2015-03-24	Don't pass interpreter context to pdf_processor opcode callbacks.	Tor Andersson
	Update buffer and filter processors. Filter both colors and stroke states. Move OCG hiding logic into interpreter.
2015-02-27	Bug 695853: Fix pdf clean operation with invalid refs in input file.	Robin Watts
	MuPDF (and other PDF readers) treat invalid references as 'null' objects. For instance, in the supplied file, object 239 is supposedly free, but a reference is made to it. When cleaning (or linearising) a file, we renumber objects; such illegal refs then end up pointing somewhere else. The workaround here is simply to spot the invalid refs during the mark phase, and to set the referencing to null.
2015-02-17	Add ctx parameter and remove embedded contexts for API regularity.	Tor Andersson
	Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2015-02-17	Rename fz_close_* and fz_free_* to fz_drop_*.	Tor Andersson
	Rename fz_close to fz_drop_stream. Rename fz_close_archive to fz_drop_archive. Rename fz_close_output to fz_drop_output. Rename fz_free_* to fz_drop_. Rename pdf_free_ to pdf_drop_. Rename xps_free_ to xps_drop_*.
2014-09-02	Add fz_snprintf and use it for formatting floating point numbers.	Tor Andersson

2014-07-18	hex-encode UTF-16 strings when writing PDF	Simon Bünzli
	fmt_obj calculates whether a string is better hex-encoded or written using escapes. Due to a bug, '\0' is considered to be escapable same as '\n' when instead it would have to be written as '\000'. Since UTF-16 strings tend to consist of many '\0' bytes, their octal encoded form is much longer than their hex encoded form. The issue is that the first argument to strchr contains an unintended trailing '\0' which has to be special-cased first.
2014-06-09	Fix 695300: don't throw exception on invalid reference number.	Tor Andersson
	Return the null object rather than throwing an exception when parsing indirect object references with negative object numbers. Do range check for object numbers (1 .. length) when object numbers are used instead. Object number 0 is not a valid object number. It must always be 'free'.
2014-03-16	Avoid premature dropping of objects in pdf_dict_put	Robin Watts
	When inserting a new value into a dictionary, if replacing an existing entry, ensure we keep the new value before dropping the old one. This is important in the case where (for example) the existing value is "[ object ]" and the new value is "object". If we drop the array and that loses the only reference to object, we can find that we have lost the value we are adding.
2014-03-13	Make pdf_output_obj consistent with pdf_fprint_obj	Robin Watts
	Pass in the 'tight' flag.
2014-03-04	Bug 691691: Add way of clearing cached objects out of the xref.	Robin Watts
	We add various facilities here, intended to allow us to efficiently minimise the memory we use for holding cached pdf objects. Firstly, we add the ability to 'mark' all the currently loaded objects. Next we add the ability to 'clear the xref' - to drop all the currently loaded objects that have no other references except the ones held by the xref table itself. Finally, we add the ability to 'clear the xref to the last mark' - to drop all the currently loaded objects that have been created since the last 'mark' operation and have no other references except the ones held by the xref table. We expose this to the user by adding a new device hint 'FZ_NO_CACHE'. If set on the device, then the PDF interpreter will pdf_mark_xref before starting and pdf_clear_xref_to_mark afterwards. Thus no additional objects will be retained in memory after a given page is run, unless someone else picks them up and takes a reference to them as part of the run. We amend our simple example app to set this device hint when loading pages as part of a search.
2014-02-28	Fix harmless copy/paste error in pdf_copy_dict.	Robin Watts
	Thanks to Sebastian for spotting this.
2014-02-28	Ensure that pdf_array_delete works even with indirected objects.	Robin Watts
	Add a RESOLVE(obj) call in line with other such functions.
2014-02-25	make pdf_new_obj_from_str throw on error	Simon Bünzli
	Currently, pdf_new_obj_from_str returns NULL if the object can't be parsed. This isn't consistent with how all other pdf_new_* methods behave which is to throw on errors.
2014-02-10	Add pdf_is_number.	Robin Watts
	Useful utility missing from our arsenal.
2014-02-10	Add pdf_output_obj function.	Robin Watts
	Reuses the same internals as pdf_fprintf_obj etc.
2014-01-13	More fixes for PDF clean.	Robin Watts
	Avoid negative indirections. Don't make indirections to objects that aren't going to be used. Also improve pdf-write.c so that it doesn't call renumberobj on objs that are going to be dropped.
2013-12-23	Bug 694715: Fix typo in error message	Robin Watts
	Thanks to Michael Cadilhac for spotting this.
2013-09-06	Fix problem with object dirty flag	Paul Gardiner
	There is the possibility of marking an object dirty via one indirection and testing it via another. This patch ensures that is handled correctly. The scenario occurred within calc.pdf and stopped the update of the display field.
2013-08-13	Signature creation	Paul Gardiner

2013-07-19	Initial work on progressive loading	Robin Watts
	We are testing this using a new -p flag to mupdf that sets a bitrate at which data will appear to arrive progressively as time goes on. For example: mupdf -p 102400 pdf_reference17.pdf Details of the scheme used here are presented in docs/progressive.txt
2013-07-03	Rename pdf_set_objects_parent_num to pdf_set_obj_parent	Robin Watts

2013-06-28	Add array_insert_drop and array_delete functions.	Tor Andersson
	Also add index argument to array_insert.
2013-06-28	Ensure altered objects are moved to the incremental xref section	Paul Gardiner

2013-06-27	Move to using a flags bit rather than "Dirty" dict entries.	Robin Watts
	Correct the naming scheme for pdf_obj_xxx functions.
2013-06-27	Bug 694382: Fix problems arising from recent pdf_obj changes.	Robin Watts
	Thanks to zeniko for spotting these problems. When we close a document, purge the glyph cache to ensure that no type3 glyphs hang around with pointers to pdf_obj's that are now gone. Pass doc to pdf_new_obj_from_str rather than NULL. We believe that the reason this needed to be NULL is no longer valid. Also, revert to using an int for ref counts. In a quick test of our regression suite we only found 2 files that ever made a refcount > 256 (and they never got larger than 768), but the potential is there for issues. Reverting to an int until we can think of a better idea.
2013-06-25	Rework storing internal flags in PDF objects.	Robin Watts
	Before we render a page we need to evaluate whether we need transparency or not. To establish this, we recursively walk the resources looking for certain markers (blend modes, alpha levels, smasks etc). To avoid doing this repeatedly we'd like to stash the results somewhere. Currently we write a '.useBM' entry into the top level dictionary object, but with the recent changes to support incremental update this is not ideal - it has the effect of forcing all resources into the new section of the xref. So we avoid that horrible hack and use a different one; we make use of the new flags word in the pdf_obj structure. 1 bit is used to indicate whether we have stashed a (boolean) value here, and another bit is used to indicate what that value was.
2013-06-25	Update pdf_obj's to have a pdf_document field.	Robin Watts
	Remove the fz_context field to avoid the structure growing.
2013-06-24	Shrink pdf_obj type by using a flags word, and moving refs to a short.	Robin Watts
	'marked' moves into the flags word. 'kind' becomes an unsigned char. 'sorted' moves from in the dictionary specific bit of the object into the flags word. This should shrink us by 8 bytes.
2013-06-20	Rearrange source files.	Tor Andersson