summaryrefslogtreecommitdiff
path: root/source/pdf/pdf-object.c
AgeCommit message (Collapse)Author
2017-11-09Bug 698353: Avoid having our API depend on DEBUG/NDEBUG.Robin Watts
Currently, our API uses static inlines for fz_lock and fz_unlock, the definitions for which depend on whether we build NDEBUG or not. This isn't ideal as it causes problems when people link a release binary with a debug lib (or vice versa). We really want to continue to use static inlines for the locking functions as used from MuPDF, as we hit them hard in the keep/drop functions. We therefore remove fz_lock/fz_unlock from the public API entirely. Accordingly, we move the fz_lock/fz_unlock static inlines into fitz-imp.h (an internal header), together with the fz_keep_.../fz_drop_... functions. We then have public fz_lock/fz_unlock functions for any external callers to use that are free of compilications. At the same time, to avoid another indirection, we change from holding the locking functions as a pointer to a struct to a struct itself.
2017-11-01Use int64_t for public file API offsets.Tor Andersson
Don't mess with conditional compilation with LARGEFILE -- always expose 64-bit file offsets in our public API.
2017-10-24Improved overprint (simulation) control.Robin Watts
First, we add an fz_page_overprint function to detect if a page uses overprint. Only PDF implements this currently (other formats all return false). PDF looks for '/OP true' in any ExtGState entry. We make Mutool check this. If it finds it, and spot rendering is not completely disabled, then it ensures that the separation object passed to the pixmap into which we draw is non NULL. This causes the draw device to do overprint simulation. We ensure that mutool draw defaults to having the spot rendering mode default to simulation in builds that support it. Finally, we ensure that if an output intent is set by the document, and spot rendering is not completely disabled, then we ensure the seps object is non NULL so that we render to a group in the specified output intent, and THEN convert down to the required colorspace for the output. This should make us match acrobats behaviour.
2017-09-12Fix leaks upon error while copying array/dict.Sebastian Rasmussen
2017-09-07Use dict_put_drop/array_push_drop wherever possible.Sebastian Rasmussen
2017-09-07Initialize variables to appease clang scan-build.Sebastian Rasmussen
2017-06-22Add const to pdf_toname.Tor Andersson
2017-04-27Include required system headers.Tor Andersson
2017-03-22Rename fz_putc/puts/printf to fz_write_*.Tor Andersson
Rename fz_write to fz_write_data. Rename fz_write_buffer_* and fz_buffer_printf to fz_append_*. Be consistent in naming: fz_write_* calls write to fz_output. fz_append_* calls append to fz_buffer. Update documentation.
2017-01-09Remove some dead code.Tor Andersson
2016-12-27Strip extraneous blank lines.Tor Andersson
2016-12-19Fix typo in dictionary entry sorting.Sebastian Rasmussen
Commit a92f0db5987b408bef0d9b07277c8ff2329e9ce5 introduced a typo causing pdf_sort_dict() to try to sort non-dict objects. Attempting to do this for non-dict objects causes a segmentation fault. For dictionary objects this causes a performance degradation that has not been noticed. pdf_sort_dict() is called in two places: pdf_dict_get_put() and showgrep(). The resson that calling pdf_sort_dict() from pdf_dict_get_put() does not cause a segmentation fault is that pdf_dict_get_put() makes sure that the object is a dictionary before calling pdf_sort_dict(), which will then decide NOT to sort the dict keys. showgrep() on the other hand does not make sure that it is only processing dict objects before calling pdf_sort_dict() which caused a segmentation fault.
2016-12-12pdf: Add missing prepare_object_for_alteration calls.Tor Andersson
pdf_array_delete and pdf_dict_put_val_null weren't calling this function.
2016-12-12Change pdf_dict_put_val to pdf_dict_put_val_null.Tor Andersson
It's only used to 'fix' duff indirect references when cleaning PDF files. Writing general values into dictionaries should be done by key, not by internal index.
2016-12-08Update pdf_array_put to allow extension.Robin Watts
Previously, attempting to put an object beyond the end of an array would throw an error. Here we update the code to allow objects to be placed *exactly* at the end (i.e. to extend the length by 1). Update js use of pdf_array_put.
2016-11-23Fix object leak in pdf_array_put_drop() and pdf_dict_put_val_drop().Sebastian Rasmussen
2016-09-22Bug 697015: Avoid object references vanishing during repair.Robin Watts
A PDF repair can be triggered 'just in time', when we encounter a problem in the file. The idea is that this can happen without the enclosing code being aware of it. Thus the enclosing code may be holding 'borrowed' references (such as those returned by pdf_dict_get()) at the time when the repair is triggered. We are therefore at pains to ensure that the repair does not replace any objects that exist already, so that the calling code will not have these references unexpectedly invalidated. The sole exception to this is when we replace the 'Length' fields in stream dictionaries with the actual lengths. Bug 697015 shows exactly this situation causing a reference to become invalid. The solution implemented here is to add an 'orphan list' to the document, where we put these (hopefully few, small) objects. These orphans are kept around until the document is closed.
2016-09-14Don't report addRef/dropRef events to Memento twice.Robin Watts
We call Memento_addRef etc in fz_keep_impXX functions, so don't call them in the callers too.
2016-08-24Add pdf_array_find to look up the index of an object in an array.Tor Andersson
2016-08-24When NULL is added to PDF dicts/arrays, insert null objects.Sebastian Rasmussen
2016-08-24Be stricter in what can be added into arrays/dicts.Sebastian Rasmussen
2016-08-24Do not resolve PDF dict keys before using them.Sebastian Rasmussen
Only direct PDF name objects should be used as arguments, indirect PDF name objects cannot be used.
2016-08-24Always check that PDF dict keys are names in same way.Sebastian Rasmussen
2016-08-24Add macros for checking PDF object type.Sebastian Rasmussen
This avoids resolving object references which is important for dictionary keys.
2016-08-24Turn warnings in dict/array functions into exceptions.Sebastian Rasmussen
2016-07-08Use fz_keep_imp and fz_drop_imp for all reference counting.Tor Andersson
2016-07-06Fix garbage collection and page grafting for indirect reference chains.Tor Andersson
The mark & sweep pass of garbage collection, and resolving indirect objects when grafting objects was following the full chain of indirect references. In the unusual case where a numbered object is itself only an indirect reference to another object, this intermediate numbered object would be missed both when marking for garbage collection, and when copying objects for grafting. Add a function to resolve only one step for these two uses. The following is an example of a file that would break during garbage collection if we follow full indirect reference chains: %PDF-1.3 1 0 obj <</Type/Catalog /Foo[2 0 R 3 0 R]>> endobj 2 0 obj 4 0 R endobj 3 0 obj 5 0 R endobj 4 0 obj <</Length 1>> stream A endstream endobj 5 0 obj <</Length 1>> stream B endstream endobj
2016-07-06pdf: Check ownership when adding objects to a document.Tor Andersson
2016-06-20Fix signed/unsigned warning.Robin Watts
2016-06-17Allow PDF strings to be > 16bits.Robin Watts
This stops Bug693111.pdf giving errors.
2016-06-17Use 'size_t' instead of int as appropriate.Robin Watts
This silences the many warnings we get when building for x64 in windows. This does not address any of the warnings we get in thirdparty libraries - in particular harfbuzz. These look (at a quick glance) harmless though.
2016-03-16Avoid unused var warnings in Memento ref counting code.Robin Watts
2016-03-15Make PDF objects ref changes memento-trackable.Robin Watts
2016-03-14Make pdf_is_stream work on loaded stream dictionary objects as well.Tor Andersson
2016-03-01js: Add PDF document and object access.Tor Andersson
2016-02-29Improve pretty-print formatting of arrays.Tor Andersson
2015-12-11Use fz_output instead of FILE* for most of our output needs.Tor Andersson
Use fz_output in debug printing functions. Use fz_output in pdfshow. Use fz_output in fz_trace_device instead of stdout. Use fz_output in pdf-write.c. Rename fz_new_output_to_filename to fz_new_output_with_path. Add seek and tell to fz_output. Remove unused functions like fz_fprintf. Fix typo in pdf_print_obj.
2015-08-27Move objects to the incremental xref before changing themPaul Gardiner
This is work towards supporting several levels of incremental xref, which in turn is work towards bug #696123. When several levels are present, the operation will make a copy of the object and that needs to be done before any change to the object.
2015-08-27Add a deep-copy function for pdf objectsPaul Gardiner
This is work towards supporting several levels of incremental xref, which in turn, is work towards bug #696123. When several levels of incremental xref are present there can be objects that appear at multiple levels and differ between those levels. This deep-copy function will be used to create new copies before the new version is altered.
2015-08-20Remove duplicate inclusions of headers.Sebastian Rasmussen
These headers are already included by mupdf/fitz/system.h.
2015-07-27Correctly compare PDF names with null, true and false.Sebastian Rasmussen
Commit f533104 introduced optimized handling of pdf names, null, true and false. That commit handles most object types correctly in pdf_objcmp() but it does not correctly handle comparisons such as pdf_objcmp("/Crypt", "true") or pdf_objcmp("null", "/Crypt"). Fixes one issue from bug 696012.
2015-06-05Fix mutool clean for FZ_LARGEFILE case.Robin Watts
We were allocating the ofs array as ints and then filling it with fz_off_t's.
2015-05-25Bug 695949: Fix bug in pdf_dict_del.Robin Watts
Fir typo in pdf_dict_del. Issue and fix both provided by Willus (William Menninger).
2015-05-15pdf_dict_find optimisation.Robin Watts
When doing pdf_dict_put, we first call pdf_dict_find to hunt for an existing entry we can just update. Recently we introduced a 'location' return from pdf_dict_find that would (in the non-found case) return the location of where such an entry should be inserted. It's just dawned on me that we don't need a separate variable for this. We continue to return negative numbers for 'not found', but these negative numbers can contain the insertion point.
2015-05-15Fix bug in pdf_dict_find.Robin Watts
Sebras and Tor spotted that we could get occasional 'warning: cannot seek backwards' messages. An example command that shows this is: mutool show pdf_reference17.pdf grep They further tracked the problem down to the 'sorted' side of the pdf_dict_find function. In the binary search, I calculate c to be the comparison value between pairs of keys. In the case where both keys (names) are in the special case 'known' range below PDF_OBJ__LIMIT, I use pointer arithmetic for this. Unfortunately, I was forgetting that the compiler thinks that pdf_obj *'s are 4 (or 8) bytes in size, so was doing (a-b)/4. To workaround this I cast both keys to char *'s. This solves the bug. Thanks to Sebras and Tor for doing the hard work in tracking this down.
2015-05-15Support pdf files larger than 2Gig.Robin Watts
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets for files; this allows us to open streams larger than 2Gig. The downsides to this are that: * The xref entries are larger. * All PDF ints are held as 64bit things rather than 32bit things (to cope with /Prev entries, hint stream offsets etc). * All file positions are stored as 64bits rather than 32. The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery in fitz/system.h sets fz_off_t to either int or int64_t as appropriate, and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required. These call the fseeko64 etc functions on linux (and so define _LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
2015-04-07Fix whitespace.Tor Andersson
2015-03-25Avoid calling pdf_dict_finds when we could call pdf_dict_find.Robin Watts
Faster, shinier, better.
2015-03-24Reduce pdf_obj memory usage.Robin Watts
Historically pdf_obj was a structure with a header and a union in it. As time has gone by more stuff has been put into the header, and the different arms of the union have changed in size. We've even adopted the idea of different 'kinds' of pdf_obj's being different sizes (names and strings for examples). Here we rework the system slightly; we minimise the header, and split out everything into different structures. Every different 'kind' of pdf_obj is now it's own structure, just as big as it needs to be. Key changes: * refs is now a short rather than an int. We are never going to need more than 32767 refs (indeed, if we ever need more than about 3 (10 at the outside), something has gone very wrong!). This aids structure packing. * Only arrays, dicts and refs actually need the pdf_document pointer. * Only arrays and dicts need the parent_num pointer.
2015-03-24Rework handling of PDF names for speed and memory.Robin Watts
Currently, every PDF name is allocated in a pdf_obj structure, and comparisons are done using strcmp. Given that we can predict most of the PDF names we'll use in a given file, this seems wasteful. The pdf_obj type is opaque outside the pdf-object.c file, so we can abuse it slightly without anyone outside knowing. We collect a sorted list of names used in PDF (resources/pdf/names.txt), and we add a utility (namedump) that preprocesses this into 2 header files. The first (include/mupdf/pdf/pdf-names-table.h, included as part of include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx" entries. These are pdf_obj *'s that callers can use to mean "A PDF object that means literal name 'xxxx'" The second (source/pdf/pdf-name-impl.h) is a C array of names. We therefore update the code so that rather than passing "xxxx" to functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to pdf_dict_get(...)). This is a fairly natural (if widespread) change. The pdf_dict_getp (and sibling) functions that take a path (e.g. "foo/bar/baz") are therefore supplemented with equivalents that take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar, PDF_NAME_baz, NULL)). The actual implementation of this relies on the fact that small pointer values are never valid values. For a given pdf_obj *p, if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal entry in the name table. This enables us to do fast pointer compares and to skip expensive strcmps. Also, bring "null", "true" and "false" into the same style as PDF names. Rather than using full pdf_obj structures for null/true/false, use special pointer values just above the PDF_NAME_ table. This saves memory and makes comparisons easier.