summaryrefslogtreecommitdiff
path: root/source/pdf/pdf-object.c
AgeCommit message (Collapse)Author
2015-05-15Support pdf files larger than 2Gig.Robin Watts
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets for files; this allows us to open streams larger than 2Gig. The downsides to this are that: * The xref entries are larger. * All PDF ints are held as 64bit things rather than 32bit things (to cope with /Prev entries, hint stream offsets etc). * All file positions are stored as 64bits rather than 32. The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery in fitz/system.h sets fz_off_t to either int or int64_t as appropriate, and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required. These call the fseeko64 etc functions on linux (and so define _LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
2015-04-07Fix whitespace.Tor Andersson
2015-03-25Avoid calling pdf_dict_finds when we could call pdf_dict_find.Robin Watts
Faster, shinier, better.
2015-03-24Reduce pdf_obj memory usage.Robin Watts
Historically pdf_obj was a structure with a header and a union in it. As time has gone by more stuff has been put into the header, and the different arms of the union have changed in size. We've even adopted the idea of different 'kinds' of pdf_obj's being different sizes (names and strings for examples). Here we rework the system slightly; we minimise the header, and split out everything into different structures. Every different 'kind' of pdf_obj is now it's own structure, just as big as it needs to be. Key changes: * refs is now a short rather than an int. We are never going to need more than 32767 refs (indeed, if we ever need more than about 3 (10 at the outside), something has gone very wrong!). This aids structure packing. * Only arrays, dicts and refs actually need the pdf_document pointer. * Only arrays and dicts need the parent_num pointer.
2015-03-24Rework handling of PDF names for speed and memory.Robin Watts
Currently, every PDF name is allocated in a pdf_obj structure, and comparisons are done using strcmp. Given that we can predict most of the PDF names we'll use in a given file, this seems wasteful. The pdf_obj type is opaque outside the pdf-object.c file, so we can abuse it slightly without anyone outside knowing. We collect a sorted list of names used in PDF (resources/pdf/names.txt), and we add a utility (namedump) that preprocesses this into 2 header files. The first (include/mupdf/pdf/pdf-names-table.h, included as part of include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx" entries. These are pdf_obj *'s that callers can use to mean "A PDF object that means literal name 'xxxx'" The second (source/pdf/pdf-name-impl.h) is a C array of names. We therefore update the code so that rather than passing "xxxx" to functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to pdf_dict_get(...)). This is a fairly natural (if widespread) change. The pdf_dict_getp (and sibling) functions that take a path (e.g. "foo/bar/baz") are therefore supplemented with equivalents that take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar, PDF_NAME_baz, NULL)). The actual implementation of this relies on the fact that small pointer values are never valid values. For a given pdf_obj *p, if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal entry in the name table. This enables us to do fast pointer compares and to skip expensive strcmps. Also, bring "null", "true" and "false" into the same style as PDF names. Rather than using full pdf_obj structures for null/true/false, use special pointer values just above the PDF_NAME_ table. This saves memory and makes comparisons easier.
2015-03-24Don't pass interpreter context to pdf_processor opcode callbacks.Tor Andersson
Update buffer and filter processors. Filter both colors and stroke states. Move OCG hiding logic into interpreter.
2015-02-27Bug 695853: Fix pdf clean operation with invalid refs in input file.Robin Watts
MuPDF (and other PDF readers) treat invalid references as 'null' objects. For instance, in the supplied file, object 239 is supposedly free, but a reference is made to it. When cleaning (or linearising) a file, we renumber objects; such illegal refs then end up pointing somewhere else. The workaround here is simply to spot the invalid refs during the mark phase, and to set the referencing to null.
2015-02-17Add ctx parameter and remove embedded contexts for API regularity.Tor Andersson
Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2015-02-17Rename fz_close_* and fz_free_* to fz_drop_*.Tor Andersson
Rename fz_close to fz_drop_stream. Rename fz_close_archive to fz_drop_archive. Rename fz_close_output to fz_drop_output. Rename fz_free_* to fz_drop_*. Rename pdf_free_* to pdf_drop_*. Rename xps_free_* to xps_drop_*.
2014-09-02Add fz_snprintf and use it for formatting floating point numbers.Tor Andersson
2014-07-18hex-encode UTF-16 strings when writing PDFSimon Bünzli
fmt_obj calculates whether a string is better hex-encoded or written using escapes. Due to a bug, '\0' is considered to be escapable same as '\n' when instead it would have to be written as '\000'. Since UTF-16 strings tend to consist of many '\0' bytes, their octal encoded form is much longer than their hex encoded form. The issue is that the first argument to strchr contains an unintended trailing '\0' which has to be special-cased first.
2014-06-09Fix 695300: don't throw exception on invalid reference number.Tor Andersson
Return the null object rather than throwing an exception when parsing indirect object references with negative object numbers. Do range check for object numbers (1 .. length) when object numbers are used instead. Object number 0 is not a valid object number. It must always be 'free'.
2014-03-16Avoid premature dropping of objects in pdf_dict_putRobin Watts
When inserting a new value into a dictionary, if replacing an existing entry, ensure we keep the new value before dropping the old one. This is important in the case where (for example) the existing value is "[ object ]" and the new value is "object". If we drop the array and that loses the only reference to object, we can find that we have lost the value we are adding.
2014-03-13Make pdf_output_obj consistent with pdf_fprint_objRobin Watts
Pass in the 'tight' flag.
2014-03-04Bug 691691: Add way of clearing cached objects out of the xref.Robin Watts
We add various facilities here, intended to allow us to efficiently minimise the memory we use for holding cached pdf objects. Firstly, we add the ability to 'mark' all the currently loaded objects. Next we add the ability to 'clear the xref' - to drop all the currently loaded objects that have no other references except the ones held by the xref table itself. Finally, we add the ability to 'clear the xref to the last mark' - to drop all the currently loaded objects that have been created since the last 'mark' operation and have no other references except the ones held by the xref table. We expose this to the user by adding a new device hint 'FZ_NO_CACHE'. If set on the device, then the PDF interpreter will pdf_mark_xref before starting and pdf_clear_xref_to_mark afterwards. Thus no additional objects will be retained in memory after a given page is run, unless someone else picks them up and takes a reference to them as part of the run. We amend our simple example app to set this device hint when loading pages as part of a search.
2014-02-28Fix harmless copy/paste error in pdf_copy_dict.Robin Watts
Thanks to Sebastian for spotting this.
2014-02-28Ensure that pdf_array_delete works even with indirected objects.Robin Watts
Add a RESOLVE(obj) call in line with other such functions.
2014-02-25make pdf_new_obj_from_str throw on errorSimon Bünzli
Currently, pdf_new_obj_from_str returns NULL if the object can't be parsed. This isn't consistent with how all other pdf_new_* methods behave which is to throw on errors.
2014-02-10Add pdf_is_number.Robin Watts
Useful utility missing from our arsenal.
2014-02-10Add pdf_output_obj function.Robin Watts
Reuses the same internals as pdf_fprintf_obj etc.
2014-01-13More fixes for PDF clean.Robin Watts
Avoid negative indirections. Don't make indirections to objects that aren't going to be used. Also improve pdf-write.c so that it doesn't call renumberobj on objs that are going to be dropped.
2013-12-23Bug 694715: Fix typo in error messageRobin Watts
Thanks to Michael Cadilhac for spotting this.
2013-09-06Fix problem with object dirty flagPaul Gardiner
There is the possibility of marking an object dirty via one indirection and testing it via another. This patch ensures that is handled correctly. The scenario occurred within calc.pdf and stopped the update of the display field.
2013-08-13Signature creationPaul Gardiner
2013-07-19Initial work on progressive loadingRobin Watts
We are testing this using a new -p flag to mupdf that sets a bitrate at which data will appear to arrive progressively as time goes on. For example: mupdf -p 102400 pdf_reference17.pdf Details of the scheme used here are presented in docs/progressive.txt
2013-07-03Rename pdf_set_objects_parent_num to pdf_set_obj_parentRobin Watts
2013-06-28Add array_insert_drop and array_delete functions.Tor Andersson
Also add index argument to array_insert.
2013-06-28Ensure altered objects are moved to the incremental xref sectionPaul Gardiner
2013-06-27Move to using a flags bit rather than "Dirty" dict entries.Robin Watts
Correct the naming scheme for pdf_obj_xxx functions.
2013-06-27Bug 694382: Fix problems arising from recent pdf_obj changes.Robin Watts
Thanks to zeniko for spotting these problems. When we close a document, purge the glyph cache to ensure that no type3 glyphs hang around with pointers to pdf_obj's that are now gone. Pass doc to pdf_new_obj_from_str rather than NULL. We believe that the reason this needed to be NULL is no longer valid. Also, revert to using an int for ref counts. In a quick test of our regression suite we only found 2 files that ever made a refcount > 256 (and they never got larger than 768), but the potential is there for issues. Reverting to an int until we can think of a better idea.
2013-06-25Rework storing internal flags in PDF objects.Robin Watts
Before we render a page we need to evaluate whether we need transparency or not. To establish this, we recursively walk the resources looking for certain markers (blend modes, alpha levels, smasks etc). To avoid doing this repeatedly we'd like to stash the results somewhere. Currently we write a '.useBM' entry into the top level dictionary object, but with the recent changes to support incremental update this is not ideal - it has the effect of forcing all resources into the new section of the xref. So we avoid that horrible hack and use a different one; we make use of the new flags word in the pdf_obj structure. 1 bit is used to indicate whether we have stashed a (boolean) value here, and another bit is used to indicate what that value was.
2013-06-25Update pdf_obj's to have a pdf_document field.Robin Watts
Remove the fz_context field to avoid the structure growing.
2013-06-24Shrink pdf_obj type by using a flags word, and moving refs to a short.Robin Watts
'marked' moves into the flags word. 'kind' becomes an unsigned char. 'sorted' moves from in the dictionary specific bit of the object into the flags word. This should shrink us by 8 bytes.
2013-06-20Rearrange source files.Tor Andersson