Age | Commit message (Collapse) | Author |
|
When we find certain classes of flaw in the file while attempting to
read an object, we trigger an automatic repair of the file. This
leaves almost all objects unchanged; the sole exception is that of
the trailer object (and its sub objects) which can get dropped and
recreated.
To avoid leaving people holding handles to objects within the trailer
dict high and dry, we introduce a 'pre_repair_trailer' object to
each xref entry. On a repair, we copy the existing trailer object to
this. As we only ever repair once, this is safe.
The only known place where this is a problem is when setting up the
pdf_crypt for a document; we adapt the code here to allow for
potential problems.
The example file that shows this up is:
048d14d2f5f0ae31e9a2cde0be66f16a_asan_heap-uaf_86d4ed_3961_3661.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the fuzzing files.
|
|
For SumatraPDF, the following changes are required:
* fz_load_system_font is called from pdf_load_builtin_font as well so
that Arial, Courier New, etc. can be loaded from the system instead
of their Nimbus replacements. In order to distinguish between calls
from pdf_load_builtin_font and pdf_load_substitute_font, an
is_substitute argument is added.
* fz_load_system_cjk_font is added and called from
pdf_load_substitute_cjk_font so that a better replacement font can
be loaded instead of DroidSansFallback.
* Both fz_load_system_font and fz_load_system_cjk_font return fz_font*
instead of fz_buffer* so that implementers aren't required to load
fonts into memory (SumatraPDF uses fz_new_font_from_file for system
fonts).
In addition to that, fz_load_system_font_func is renamed to
fz_load_system_font_funcs since it now accepts two functions, and the
PDF_ROS_* constants are renamed to FZ_ADOBE_* (collection names aren't
passed as const char* so that implementers know which collections to
expect). For convenience, fz_load_*_font also never throws since
currently all callers have further fallbacks available.
|
|
We define a document handler for each file type (2 in the case of PDF, one
to handle files with the ability to 'run' them, and one without).
We then register these handlers with the context at startup, and then
call fz_open_document... as usual. This enables people to select the
document types they want at will (and even to extend the library with more
document types should they wish).
|
|
Certain optimized documents use a rather large common symbol dictionary
for all JBIG2 images. Caching these JBIG2Globals speeds up loading and
rendering of such documents.
|
|
See SumatraPDF's repo for a Windows-only implementation using WIC.
|
|
These warnings are caused by casting function pointers to void*
instead of proper function types.
|
|
Some warnings we'd like to enable for MuPDF and still be able to
compile it with warnings as errors using MSVC (2008 to 2013):
* C4115: 'timeval' : named type definition in parentheses
* C4204: nonstandard extension used : non-constant aggregate initializer
* C4295: 'hex' : array is too small to include a terminating null character
* C4389: '==' : signed/unsigned mismatch
* C4702: unreachable code
* C4706: assignment within conditional expression
Also, globally disable C4701 which is frequently caused by MSVC not
being able to correctly figure out fz_try/fz_catch code flow.
And don't define isnan for VS2013 and later where that's no longer needed.
|
|
The SVG device needs rebinding as it holds a file. The PDF device needs
to rebind the underlying pdf document.
All documents need to rebind their underlying streams.
|
|
|
|
|
|
Add a cached color converter mechanism. Use this for rendering meshes
to speed repeated conversions.
This reduces a (release build to ppm at default resolution) run from
23.5s to 13.2 seconds.
|
|
In the existing code for meshes, we decompose the mesh down into
quads (or triangles) and then call a process routine to actually
do the work. This process routine typically maps each vertexes
position/color and plots it.
As each vertex is used several times by neighbouring patches, this
results in each vertex being processed several times. The fix in
this commit is therefore to break the processing into 'prepare' and
'process' phases. Each vertex is 'prepared' before being used in
the 'process' phase. This cuts the number of prepare operations in
half.
In testing, this reduced the time for a (release build, generating ppm
at default resolution) run from 33.4s to 23.5s.
|
|
When we meet a broken PDF file, we attempt to repair it. We do this by
reading tokens from the file and attempting to interpret them as a
normal PDF stream.
Unfortunately, if the file is corrupt enough so that we start to read
from the middle of a stream, and we happen to hit an '(' character,
we can go into string reading mode. We can then end up skipping over
vast swathes of file that we could otherwise repair.
We fix this here by using a new version of the pdf_lex function that
refuses to ever return a string. This means we may take more time
over skipping things than we did before, but are less likely to
skip stuff.
We also tweak other parts of the pdf repair logic here. If we hit a
badly formed piece of data, clear the num/gen we have stored so that
the next plausible piece we get does not get assigned to a random
object number.
|
|
Remove code that's not used any more as a result of the previous
fix, plus some code that was unused anyway.
|
|
Currently, if we spot a bad xref as we are reading a PDF in, we can
repair that PDF by doing a long exhaustive read of the file. This
reconstructs the information that was in the xref, and the file can
be opened (and later saved) as normal.
If we hit an object that is not in the expected place however, we
cannot trigger a repair at that point - so xrefs with duff offsets
in (within the bounds of the file) will never be repaired.
This commit solves that by triggering a repair (just once) whenever
we fail to parse an object in the expected place.
|
|
|
|
Unused field. Also tweak some comments for clarity.
|
|
A poorly formed string can cause us to overrun the end of the buffer.
Now we check the end of the string at each stage to avoid this.
|
|
ft_file was removed in a2c945506ea2a2b58edbde84124094c6b4f69eac even
though it might still be needed by downstream consumers (such as
SumatraPDF) for allowing devices to load fonts again when a font has
been loaded by fz_new_font_from_file which doesn't maintain a buffer.
|
|
|
|
|
|
|
|
Use fz_buffer to wrap and reference count data used in font.
|
|
|
|
Add a function to clone stroke states, a magic number to keep in the
reference count to signal that a stroke state is stack-stored, and
automatically clone stack stored stroke states in the keep function.
Use fz_default_stroke_state to initialise stack stored stroke states.
|
|
Self balancing AA-tree.
|
|
|
|
|
|
When putting store objects into the store, ensure that they do cannot
collide across documents.
|
|
The luminosity flag and background color are currently ignored.
The clip stack optionally held in the null device is updated here to
be a container stack, together with a flags word (currently just used
to indicate the type of the container at the current place in the
stack), and a user value (used by the SVG device to stash the id for
the mask it's generating).
|
|
If the appropriate device hint is set, the null device will keep
a scissor stack. This saves duplicating code in every device.
|
|
This accompanies the function formerly known as fz_image_as_png (now
renamed to fz_new_png_from_image).
|
|
|
|
Set the hint in mudraw when AA bits is set to 0.
|
|
The bug fix added in the previous commit fails to work in this
case (hang-9527.pdf) because the matrix is not invertible and
hence the clipping rectangle ends up infinite. Spot this case
here and return early.
|
|
SumatraPDF's testsuite uses Targa images as output because they're
compressed while still far easier to compare than PNG and have better
tool support than PCL/PWG.
|
|
* Destination names are a name and not a string
* Expose whether a /Launch action points to a path or a URI
|
|
For Separation and DeviceN colorspaces, the initial color value is 1.0
for all components instead of 0.0 as for most other colorspaces. The
current initialization in pdf_set_colorspace initializes for CMYK which
happens to work for all non-tint colorspaces.
|
|
Thanks to Zaister for reporting this.
|
|
The pdf_ versions were already correct. Probably just an
oversight that this was missed.
|
|
This is required e.g. for 2314 - jpeg tables in tiff.xps.
This folds fz_open_resized_dct back into fz_open_dct instead of adding
further variations for calls with and without the jpegtables argument.
|
|
|
|
|
|
Pull subpixel glyph adjustment calculations into fz_subpixel_adjust.
This reduces the repetition of code, and will be helpful for the
OpenGL device.
|
|
These do no caching, and are intended to be useful for the opengl device.
|
|
For large glyphs, sub pixel positioning is supremely unimportant.
Even for smaller glyphs, we don't need 5*5 possible sub pixel
positions. Base the degree of sub pixel quantisation on the size
of the glyphs.
This should result in better cache use.
We push all the glyph sub positioning logic into fz_render_glyph
(and fz_render_stroked_glyph). This simplifies the calling code.
We also tweak fz_render_glyph so that it updates the transform it
is called with to reflect the sub pixel positioning. This solves
various problems: Firstly, we can round positions both up and down
to achieve a smaller net displacement (e.g. (0.99, 0.99) can go
to (1,1) rather than (0.75, 0.75) if we have a subpixel position
resolution of 1/4 pixels).
Secondly, glyphs that are drawn from outlines will have exactly the
same subpixel changes applied. This is unlikely to be noticable, but
it does mean that baselines should avoid having any shifts in them.
Finally, it enables us to avoid lots of unnecessary copying of
matrices, hopefully reducing overhead.
|
|
Rather than generating fz_pixmaps for glyphs, we generate fz_glyphs.
fz_glyphs can either contain a pixmap, or an RLEd representation
(if it's a mask, and it's smaller).
Should take less memory in the cache, and should be faster to plot.
|
|
The most complex part here is to ensure that we can output various
bitmaps in bands.
|
|
fz_putc; this fills a hole in our fz_output functions.
fz_new_output_to_filename: This saves people having to create
a FILE * just to pass to fz_new_output_with_file and then having
to remember to close the FILE *.
|
|
|