Age | Commit message (Collapse) | Author |
|
|
|
Add a new class of errors and use them to abort interpretation when
the test device detects a color page.
|
|
New routine to filter the content streams for pages, xobjects,
type3 charprocs, patterns etc. The filtered streams are guaranteed
to be properly matched with q/Q's, and to not have changed the top
level ctm. Additionally we remove (some) repeated settings of
colors etc. This filtering can be extended to be smarter later.
The idea of this is to both repair after editing, and to leave the
streams in a form that can be easily appended to.
This is preparatory to work on Bates numbering and Watermarking.
Currently the streams produced are uncompressed.
|
|
Previously pdf_process buffer did not understand inline images.
In order to make this work without needlessly duplicating complex code
from within pdf-op-run, the parsing of inline images has been moved to
happen in pdf-interpret.c. When the op_table entry for BI is called
it now expects the inline image to be in csi->img and the dictionary
object to be in csi->obj.
To make this work, we have had to improve the handling of inline images
in general. While non-inline images have been loaded and held in
memory in their compressed form and only decoded when required, until
now we have always loaded and decoded inline images immediately. This
has been due to the difficulty in knowing how many bytes of data to
read from the stream - we know the length of the stream once
uncompressed, but relating this to the compressed length is hard.
To cure this we introduce a new type of filter stream, a 'leecher'.
We insert a leecher stream before we build the filters required to
decode the image. We then read and discard the appropriate number
of uncompressed bytes from the filters. This pulls the compressed
data through the leecher stream, which stores it in an fz_buffer.
Thus images are now always held in their compressed forms in memory.
The pdf-op-run implementation is now trivial. The only real complexity
in the pdf-op-buffer implementation is the need to ensure that the
/Filter entry in the dictionary object matches the exact point at
which we backstopped the decompression.
|
|
Currently, when parsing, each time we encounter a name, we throw away
the last name we had. BDC operators are called with:
/Name <object> BDC
If the <object> is a name, we lose the original /Name.
To fix this, parsing a name when we already have a name will cause
the name to be stored as an object.
This has various knock on effects throughout the code to read from
csi->obj rather than csi->name.
Also, ensure that when cleaning, we collect a list of the object
names in our new resources dictionary.
|
|
In the event that an annot is hidden or invisible, the pdf_process
would never be freed. Solve that here.
Thanks to Simon for spotting this!
|
|
Currently the only processing we can do of PDF pages is to run
them through an fz_device. We introduce new "pdf_process"
functionality here to enable us to do more things.
We define a pdf_processor structure with a set of function
pointers in, one per PDF operator, together with functions
for processing xobjects etc. The guts of pdf_run_page_contents
and pdf_run_annot operations are then extracted to give
pdf_process_page_contents and pdf_process_annot, and the
originals implemented in terms of these.
This commit contains just one instance of a pdf_processor, namely
the "run" processor, which contains the original code refactored.
The graphical state (and device pointer) is now part of private data
to the run operator set, rather than being in pdf_csi.
|
|
pdf_flush_text can cause the list of gstates to be extended. This
can in turn cause them to move in memory. This means that any
gstate pointers already held can be invalidated.
Update the code to allow for this.
|
|
|
|
This makes every pdf_run_XX operator function have the same function
type. This paves the way for future changes in this area.
|
|
Acrobat honours Tc and Tw operators found during parsing TJ arrays.
We update the code here to cope. Possibly to completely match we should
honour other operators too, but this will do for now.
This maintains the behaviour of
tests_private/pdf/sumatra/916_-_invalid_argument_to_TJ.pdf 916.pdf
and improves the behaviour in general.
|
|
When we call pdf_begin_group, this can go away and do lots of
drawing. This can result in the gstate stack growing, which can
involve a realloc. Any gstate pointer we are holding must therefore
be recalculated after such a call.
The neatest way to do this is to get pdf_begin_group to return
the gstate pointer, thus making it hard to forget to do.
This solves:
e2a1dda5393f4cb8a446fd8edd9d94f9_asan_heap-uaf_b938cf_2075_2393.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
When we call to execute a pattern, we clear out the pdf_csi (the
interpreter state). This involves clearing the stack and throwing
away the record of the object we have just parsed.
Unfortunately, when filling glyphs with a pattern, that object is
still in use. We therefore amend the pdf_run_contents_stream to
safely stash the object away and restore it afterwards.
This solves this problem, and protects us against any other similar
problems that might also arise.
This solves:
b8e2b57991896bf8120215cfbf7b54bb_asan_heap-uaf_86064f_2362_2587.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
A poorly formed string can cause us to overrun the end of the buffer.
Now we check the end of the string at each stage to avoid this.
|
|
|
|
When stroking and filling in a single operation, we are supposed
to form the complete stroke+fill image, then blend it back, rather
than filling and blending, then stroking and blending.
This only matters during transparency, or with non-normal blend
modes.
We fix MuPDF to push a knockout group when doing such operations.
|
|
fz_clip_path takes a rect parameter, but all the callers of it use
NULL. In most cases they have a perfectly reasonable value that they
could pass to hand anyway. Update the code to pass this value, which
saves the need for the scissor stack keeping code to recalculate it.
|
|
For Separation and DeviceN colorspaces, the initial color value is 1.0
for all components instead of 0.0 as for most other colorspaces. The
current initialization in pdf_set_colorspace initializes for CMYK which
happens to work for all non-tint colorspaces.
|
|
Required for 1879_-_Indexed_colors_wrongly_converted.pdf
Also, removing broken code in the same place (where mat->v[] is
overwritten right after being set in the L*a*b* case).
|
|
A user (sebblonline) reports a problem when rendering the 300Meg
document downloadable from:
http://www.mitsubishicarbide.com/EU/de/product/epaper/index.html
(7th icon on the bottom). Memento builds report that softmask objects
are leaked.
Tracing these it appears that the handling of softmasks in
pdf_run_xobject is not quite nested right. Fixing this reveals another
problem with clipping paths being doubly removed. Adding a gsave/grestore
pair solves this and leaves us with a well behaved program.
|
|
By default an OCG is supposed to be visible (for a testcase, see
2011 - ocg without ocgs invisible.pdf). Also, the default visibility
value can be overwritten in both ways, so that pdf_is_hidden_ocg
must check the state both for being "OFF" and "ON" (testcase was
2066 - ocg not printed.pdf rendered with event="Print").
|
|
* If fz_alpha_from_gray throws in fz_render_t3_glyph, then glyph is leaked.
* If fz_new_image throws in pdf_load_image_imp, then colorspace and mask
are leaked.
* pdf_copy_pattern_gstate overwrites font and softmask without dropping
them first.
|
|
We are testing this using a new -p flag to mupdf that sets a bitrate at
which data will appear to arrive progressively as time goes on. For
example:
mupdf -p 102400 pdf_reference17.pdf
Details of the scheme used here are presented in docs/progressive.txt
|
|
Correct the naming scheme for pdf_obj_xxx functions.
|
|
For historical reasons lots of the code uses "xref" when talking about
a pdf document. Now pdf_xref is a separate type this has become
confusing, so replace 'xref' with 'doc' for clarity.
|
|
Remove the fz_context field to avoid the structure growing.
|
|
|