Age | Commit message (Collapse) | Author |
|
Also don't bother adding an indirect object for the top resource dict.
|
|
Previously if both cleaning and sanitizing content streams the
pages' resource dictionaries would retain the actually used
resources. If the content streams were only cleaned and not
sanitized the page's resource dictionaries were incorrectly
emptied. All resources, whether used or not, ought to be
retained, as is the case after this commit.
|
|
|
|
Add a PDF_NAME(Foo) macro that evaluates to a pdf_obj for /Foo.
Use the C preprocessor to create the enum values and string table
from one include file instead of using a separate code generator tool.
|
|
Patterns and XObjects can have their own resource dictionaries.
If they do, use those in preference to the page ones when
filtering.
|
|
This goes well with the 'mutool clean -d' decompression option to debug
content streams, without doing the sanitize optimization pass.
|
|
|
|
|
|
|
|
Expose pdf_new_output_processor.
Remove pdf_document argument to pdf_new_filter_processor. It is
only ever used when copying resources from the old resource
dictionary to the new one, whereupon it must agree with the bound
pdf_document in the old resource dictionary.
|
|
|
|
|
|
We want to turn pdf_page into a thin wrapper around a pdf_obj, so that
any updates to the underlying PDF objects will be reflected without
having to reload the pdf_page.
|
|
If the Contents of a page are an array, we were forgetting to
write the new singleton replacement into the dictionary.
|
|
|
|
|
|
|
|
Also fix a memory leak.
|
|
|
|
Currently, every PDF name is allocated in a pdf_obj structure, and
comparisons are done using strcmp. Given that we can predict most
of the PDF names we'll use in a given file, this seems wasteful.
The pdf_obj type is opaque outside the pdf-object.c file, so we can
abuse it slightly without anyone outside knowing.
We collect a sorted list of names used in PDF (resources/pdf/names.txt),
and we add a utility (namedump) that preprocesses this into 2 header
files.
The first (include/mupdf/pdf/pdf-names-table.h, included as part of
include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx"
entries. These are pdf_obj *'s that callers can use to mean "A PDF
object that means literal name 'xxxx'"
The second (source/pdf/pdf-name-impl.h) is a C array of names.
We therefore update the code so that rather than passing "xxxx" to
functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to
pdf_dict_get(...)). This is a fairly natural (if widespread) change.
The pdf_dict_getp (and sibling) functions that take a path (e.g.
"foo/bar/baz") are therefore supplemented with equivalents that
take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar,
PDF_NAME_baz, NULL)).
The actual implementation of this relies on the fact that small
pointer values are never valid values. For a given pdf_obj *p,
if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal
entry in the name table.
This enables us to do fast pointer compares and to skip expensive
strcmps.
Also, bring "null", "true" and "false" into the same style as PDF names.
Rather than using full pdf_obj structures for null/true/false, use
special pointer values just above the PDF_NAME_ table. This saves
memory and makes comparisons easier.
|
|
Update buffer and filter processors.
Filter both colors and stroke states.
Move OCG hiding logic into interpreter.
|
|
|
|
|
|
In order to be able to watermark etc, we want the ability to
add more operators/resources after page cleaning.
Add a post processing hook to enable this to be done more
easily.
|
|
Purge several embedded contexts:
Remove embedded context in fz_output.
Remove embedded context in fz_stream.
Remove embedded context in fz_device.
Remove fz_rebind_stream (since it is no longer necessary).
Remove embedded context in svg_device.
Remove embedded context in XML parser.
Add ctx argument to fz_document functions.
Remove embedded context in fz_document.
Remove embedded context in pdf_document.
Remove embedded context in pdf_obj.
Make fz_page independent of fz_document in the interface.
We shouldn't need to pass the document to all functions handling a page.
If a page is tied to the source document, it's redundant; otherwise it's
just pointless.
Fix reference counting oddity in fz_new_image_from_pixmap.
|
|
New routine to filter the content streams for pages, xobjects,
type3 charprocs, patterns etc. The filtered streams are guaranteed
to be properly matched with q/Q's, and to not have changed the top
level ctm. Additionally we remove (some) repeated settings of
colors etc. This filtering can be extended to be smarter later.
The idea of this is to both repair after editing, and to leave the
streams in a form that can be easily appended to.
This is preparatory to work on Bates numbering and Watermarking.
Currently the streams produced are uncompressed.
|