summaryrefslogtreecommitdiff
path: root/source/pdf/pdf-op-filter.c
AgeCommit message (Collapse)Author
2016-04-18Fix broken documents after sanitizeRobin Watts
The DP and BDC operators, are used in the form: <NAME> <PROPERTIES> <OPERATOR> where <PROPERTIES> can either be a name (that can be looked up to get a dictionary) or an inline dictionary. What the spec doesn't say is that the two are not interchangeable. If you resolve the name to an inline dict, then insert it, Acrobat will give an error for some (but not all) cases. The interpreter currently resolves any references, and passes the resolved version into the operator handling function. This precludes us outputting the original form. We therefore update it to pass both the raw and the cooked versions in. This has no effect on MuPDFs own handling of anything, it just enables the buffer device to output a correct stream.
2015-03-24Rework handling of PDF names for speed and memory.Robin Watts
Currently, every PDF name is allocated in a pdf_obj structure, and comparisons are done using strcmp. Given that we can predict most of the PDF names we'll use in a given file, this seems wasteful. The pdf_obj type is opaque outside the pdf-object.c file, so we can abuse it slightly without anyone outside knowing. We collect a sorted list of names used in PDF (resources/pdf/names.txt), and we add a utility (namedump) that preprocesses this into 2 header files. The first (include/mupdf/pdf/pdf-names-table.h, included as part of include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx" entries. These are pdf_obj *'s that callers can use to mean "A PDF object that means literal name 'xxxx'" The second (source/pdf/pdf-name-impl.h) is a C array of names. We therefore update the code so that rather than passing "xxxx" to functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to pdf_dict_get(...)). This is a fairly natural (if widespread) change. The pdf_dict_getp (and sibling) functions that take a path (e.g. "foo/bar/baz") are therefore supplemented with equivalents that take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar, PDF_NAME_baz, NULL)). The actual implementation of this relies on the fact that small pointer values are never valid values. For a given pdf_obj *p, if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal entry in the name table. This enables us to do fast pointer compares and to skip expensive strcmps. Also, bring "null", "true" and "false" into the same style as PDF names. Rather than using full pdf_obj structures for null/true/false, use special pointer values just above the PDF_NAME_ table. This saves memory and makes comparisons easier.
2015-03-24Don't pass interpreter context to pdf_processor opcode callbacks.Tor Andersson
Update buffer and filter processors. Filter both colors and stroke states. Move OCG hiding logic into interpreter.
2015-02-17Add ctx parameter and remove embedded contexts for API regularity.Tor Andersson
Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2014-03-17Ensure that BDC operators get both params.Robin Watts
Currently, when parsing, each time we encounter a name, we throw away the last name we had. BDC operators are called with: /Name <object> BDC If the <object> is a name, we lose the original /Name. To fix this, parsing a name when we already have a name will cause the name to be stored as an object. This has various knock on effects throughout the code to read from csi->obj rather than csi->name. Also, ensure that when cleaning, we collect a list of the object names in our new resources dictionary.
2014-03-04Fix memory leaks in pdf-filter code.Robin Watts
We were never freeing the top level filter_gstate, and we were losing a reference to each new resource type dictionary when we create them.
2014-03-04Add pdf_process for filtering operator streams.Robin Watts
Currently this knows about q/Q matching/eliding and avoiding repeated/unneccesary color/colorspace setting. It will also collect a dictionary of resources used by a page. This can be extended to be cleverer in future.