Age | Commit message (Collapse) | Author |
|
Acrobat honours Tc and Tw operators found during parsing TJ arrays.
We update the code here to cope. Possibly to completely match we should
honour other operators too, but this will do for now.
This maintains the behaviour of
tests_private/pdf/sumatra/916_-_invalid_argument_to_TJ.pdf 916.pdf
and improves the behaviour in general.
|
|
Useful utility missing from our arsenal.
|
|
Reuses the same internals as pdf_fprintf_obj etc.
|
|
String.prototype.substr() is deprecated.
RegExp.prototype.compile() has never been part of the ECMA standard,
and is deprecated in Mozilla's Javascript since 1.5 (at least).
|
|
Luiz Henrique de Figueiredo reports that glyphs output from the
SVG device contain 'lumpy' outlines. Investigation reveals that
this is because the current code extracts the outlines from
freetype at unit scale, and then relies on SVG to scale them up.
Unfortunately, freetype insists on working in integer maths, so
any sort of scaling runs the risk of distorting the outlines.
The fix is to change the way we call freetype; we now request an
'UNSCALED' char, and set the required size to be the design size.
We then transform the results in the floating point domain
ourself.
This cures the lumpy outlines, but reveals a second problem,
namely that the bbox given for characters is inaccurate (and
sometimes too small). Investigation shows that this is again
caused by freetypes scaling, so we apply the same trick; ask
for the glyph without scaling (as far as possible), and then
scale the results down.
We also take care to spot the 'ft_hint' flag in the font. If set
this indicates that hinting must be performed to ensure that
the returned outlines are sane. We therefore take note of this
when calculating both bbox and outlines. This means that 'tricky'
fonts such as dynalab ones now render correctly.
This produces many changes in the bitmaps, the vast majority of which
are neutral. The ones that aren't are all progressions.
|
|
|
|
|
|
Arrays are intended for numeric arrays, since they have the magic
updating of their "length" property which regular objects lack.
|
|
The test file on this bug:
de53b4bd41191f02d01a3c39b4880fa8_asan_heap-oob_caba3c_9561_7427.pdf
includes a corrupt CMAP. When this is read into memory it produces
a CMAP where the table gets too large. This produces lots of warnings
from 'add_table', but the calls to add_table all assume that the
process completed fine, resulting in range entries being added
that point to nonexistent values.
The fix is to make add_table return a bool to indicate success or
failure, and to only add range entries if the add_table succeeds.
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
When we call pdf_begin_group, this can go away and do lots of
drawing. This can result in the gstate stack growing, which can
involve a realloc. Any gstate pointer we are holding must therefore
be recalculated after such a call.
The neatest way to do this is to get pdf_begin_group to return
the gstate pointer, thus making it hard to forget to do.
This solves:
e2a1dda5393f4cb8a446fd8edd9d94f9_asan_heap-uaf_b938cf_2075_2393.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
If the scale is too large, the calculation to determine the
required size of a pixmap can overflow. This can lead to negative
width/heights being passed in, which confuses the subsampling
code, leading to SEGVs.
|
|
Seen when valgrinding a memento build of mudraw on:
e0e44ed8692671b820de72c6c0a32608_asan_heap-uaf_8c2b76_1530_2026.pdf
|
|
When we find certain classes of flaw in the file while attempting to
read an object, we trigger an automatic repair of the file. This
leaves almost all objects unchanged; the sole exception is that of
the trailer object (and its sub objects) which can get dropped and
recreated.
To avoid leaving people holding handles to objects within the trailer
dict high and dry, we introduce a 'pre_repair_trailer' object to
each xref entry. On a repair, we copy the existing trailer object to
this. As we only ever repair once, this is safe.
The only known place where this is a problem is when setting up the
pdf_crypt for a document; we adapt the code here to allow for
potential problems.
The example file that shows this up is:
048d14d2f5f0ae31e9a2cde0be66f16a_asan_heap-uaf_86d4ed_3961_3661.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the fuzzing files.
|
|
If the /Version is a single character string (say "s") then the
current code for converting this in pdf_init_document reads off
the end of the string.
Simple fix is to use fz_atof instead.
Same fix for reading the PDF version normally.
This solves:
53b830f849d028fb2d528520716e157a_asan_heap-oob_478692_5259_4534.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
xps_parse_color happily reads more than FZ_MAX_COLORS values out of a
ContextColor array which overflows the passed in samples array.
Limiting the number of allowed samples to FZ_MAX_COLORS and make sure
to use that constant for all callers fixes the problem.
Thanks to Jean-Jamil Khalifé for reporting and investigating the issue
and providing a sample exploit file.
|
|
fz_new_image_from_pixmap expects that the pixmap's colorspace has two
references which is contrary to expectations. If it instead addrefs the
pixmap's colorspace, the only caller pdf_load_jpx can consistently
drop the colorspace after passing it to fz_load_jpx.
Also, if the contract is that whatever is passed into
fz_new_image_from_pixmap belongs to the new image, then the pixmap also
has to be dropped on error so that it isn't leaked.
|
|
Using JDCT_FASTEST as rendering method can produce visible artifacts
(e.g. in 1960_-_DCT_image_wrongly_decoded_regression_from_1.2_.pdf).
|
|
When we call to execute a pattern, we clear out the pdf_csi (the
interpreter state). This involves clearing the stack and throwing
away the record of the object we have just parsed.
Unfortunately, when filling glyphs with a pattern, that object is
still in use. We therefore amend the pdf_run_contents_stream to
safely stash the object away and restore it afterwards.
This solves this problem, and protects us against any other similar
problems that might also arise.
This solves:
b8e2b57991896bf8120215cfbf7b54bb_asan_heap-uaf_86064f_2362_2587.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
If we perform a linejoin that ends up being over an impossibly small
distance, we can get a rendering error. This is caused by trying to
calculate scale = linewidth/sqrtf(len), where len < FLT_EPSILON.
Avoid this by rearranging the code slightly - no extra calculations
required.
Also given that sn == bn at all times within the stroking code, just
remove bn.
Credit for spotting this problem goes to Simon for tracking the
problem with rounding_artifact_due_to_closepath.pdf. My fix just
fixes the problem at a lower level than his does.
|
|
At http://code.google.com/p/sumatrapdf/issues/detail?id=2477 , there's
a document which has an indexed colorspace whose lookup string contains
a trailing character. That character can be safely ignored without
rejecting everything depending on such a colorspace.
|
|
fz_draw_clip_text changes the value of 'state' during a loop. The
'if (glyph)' part of the loop assumes that it points to gstate[top-1]
where the 'path' part of the loop changes it to point to gstate[top].
If we render a "non glyph" glyph, then a "glyph" glyph, we will access
an invalid state. This can cause a draw_glyph call on an invalid
destination bitmap.
The fix is simply not to reset state.
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the fuzzing files.
|
|
For SumatraPDF, the following changes are required:
* fz_load_system_font is called from pdf_load_builtin_font as well so
that Arial, Courier New, etc. can be loaded from the system instead
of their Nimbus replacements. In order to distinguish between calls
from pdf_load_builtin_font and pdf_load_substitute_font, an
is_substitute argument is added.
* fz_load_system_cjk_font is added and called from
pdf_load_substitute_cjk_font so that a better replacement font can
be loaded instead of DroidSansFallback.
* Both fz_load_system_font and fz_load_system_cjk_font return fz_font*
instead of fz_buffer* so that implementers aren't required to load
fonts into memory (SumatraPDF uses fz_new_font_from_file for system
fonts).
In addition to that, fz_load_system_font_func is renamed to
fz_load_system_font_funcs since it now accepts two functions, and the
PDF_ROS_* constants are renamed to FZ_ADOBE_* (collection names aren't
passed as const char* so that implementers know which collections to
expect). For convenience, fz_load_*_font also never throws since
currently all callers have further fallbacks available.
|
|
This can be seen e.g. in:
5db811ac25ef543fd0cfa0873e155329_signal_sigsegv_c9b60f_9636_76.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
Avoid negative indirections. Don't make indirections to objects that
aren't going to be used.
Also improve pdf-write.c so that it doesn't call renumberobj on objs
that are going to be dropped.
|
|
Remember to make blocks defined before writing/reading them.
|
|
If indexed spaces are empty (or truncated) we use garbage values when
they are read. Spot this and pad with 0s to at least be consistent.
Fixes:
013b2dcbd0207501e922910ac335eb59_asan_heap-oob_a59696_5952_500.pdf
5440f8bc8af12e5f7050e59b7ee008cd_asan_heap-oob_a59dd9_5952_500.pdf
fa8c712b03a7b02d6a12856ce042a44e_signal_sigsegv_a59b06_5847_493.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the fuzzing files.
|
|
While attempting to debug a valgrind issue with:
013b2dcbd0207501e922910ac335eb59_asan_heap-oob_a59696_5952_500.pdf
I found that mutool -difggg on it failed with a SEGV. This is due to
us parsing an array with a large invalid indirection in it (e.g.
[123456789 0 R]) and then the renumbering code assuming this is valid
and accessing off the end of an array.
|
|
The ifelse and if operators require special parsing where we convert
ps function streams to bytecode. If a malformed stream presents
if or ifelse without being preceded by the appropriate { ...} blocks
then throw an error.
This avoids us potentially calling ps_run recursively in an infinite
loop as happens with the test file in this bug.
5f091df77f6600d0927dc36777db2b93_signal_sigabrt_7ffff6d59425_6762_5545.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the fuzzing files.
|
|
Problems caused by the fact that -0x8000000 = 0x80000000.
Sidestep the problem for all coords where floats cannot accurately
represent them.
|
|
In the existing code, if build_filter fails, chain will be freed. If
pdf_array_get fails however, it will leak.
Rectify this. No specific bug or example file, just observation arising
from discussions about previous commit.
|
|
pdf_open_raw_renumbered_stream and pdf_open_image_stream both have the
same issue that 98a111c8e49916f8f5ac21d11f4627540f9ddd49 fixes.
|
|
When constructing a filter chain, we pass ownership of 'chain' inwards.
This means we need to be careful not to double close chain.
This fixes:
5df97f8539d31745f1c45cc9e1468825_asan_heap-oob_a59afe_1862_225.pdf
a736faf6f4a34b7ad8eff207ba52aa57_asan_heap-oob_a59dd9_5744_4860.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the fuzzing files.
|
|
Windows doesn't like redirecting binary output, so add an explicit
filename argument.
|
|
Bad annotation appearance streams can cause font_recs to have invalid
values in. Avoid this partly by hardening the code against duff values,
and partly by setting sane defaults before the parsing.
This can be seen in:
33bfbe117bfef7fafc3f927acf50a2e7_signal_sigsegv_81dd96_6257_5205.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
The gel bbox was being stored internally as floats (despite
only holding ints). This means that as numbers get large the
bbox can become approximate, rather than exact. If the bbox
becomes smaller than it should, this causes crashes in the
scanline filling code.
This is seen with:
tests_private/fuzzing/mupdf2/17f8aee51ac776994af0b36195cdadd7_signal_sigsegv_5607be_7308_5912.pdf
The solution is simply to use ints rather than floats.
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
If a file specifies a silly number of bpp in the PNG predictor it can
overrun a buffer. This was shown by:
tests_private/fuzzing/mupdf2/013b2dcbd0207501e922910ac335eb59_*.pdf
but no longer shows up due to Simons earlier fix.
Following discussion we still think it's worth having this fix in, as
truncated data streams can cause len < bpp. Possibly we should throw
an error here, but I think that's not necessary as we will return the
short length, and the image reading code will notice that the image
is truncated already.
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the fuzzing files.
|
|
pdf_load_obj_stm may resize the xref if it finds further objects in the
stream, that might however invalidate any pdf_xref_entry hold such as
the one in pdf_cache_object. This can be seen e.g. with
7ac3ad9ddad98d10b947a43cf640062f_asan_heap-uaf_930b78_1007_1675.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
(Second part of Simons patch - apologies for missing this the first time).
This correctly enables the sanitization of the key length needed for
90db34f64037e2a8a5c3b6a518ba4153_asan_heap-oob_9b117e_1197_1802.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
If columns is quite close to INT_MAX, the column index max overflow
in find_changing which causes an access violation in the next getbits.
This happens e.g. with
0c76a20163f30ea8ec860c4e588ce337_signal_sigsegv_5e7b28_9115_7127.pdf
|
|
This correctly enables the sanitization of the key length needed for
90db34f64037e2a8a5c3b6a518ba4153_asan_heap-oob_9b117e_1197_1802.pdf
|
|
This fixes a NULL pointer dereference in
2192b04848b2d8210d1a33e3ddeb2742_asan_heap-oob_a5a57d_2745_2844.pdf
Also, replace MAXC with FZ_MAX_COLORS.
|
|
We define a document handler for each file type (2 in the case of PDF, one
to handle files with the ability to 'run' them, and one without).
We then register these handlers with the context at startup, and then
call fz_open_document... as usual. This enables people to select the
document types they want at will (and even to extend the library with more
document types should they wish).
|
|
This bug shows 2 problems with our data handling.
Firstly, if a zip file entry has less data in the stream than it
is declared to have, we would leave the end of the data uninitialised.
We now put out a warning, and blank it with zeros.
Secondly, if the PNG decompression fails to decode enough data, we
don't notice. Now we give a warning and blank the remaining pixels.
|
|
Certain optimized documents use a rather large common symbol dictionary
for all JBIG2 images. Caching these JBIG2Globals speeds up loading and
rendering of such documents.
|
|
This helps debugging issues with JBIG2 images.
Conflicts:
source/fitz/filter-jbig2.c
|
|
See SumatraPDF's repo for a Windows-only implementation using WIC.
|
|
At https://code.google.com/p/sumatrapdf/issues/detail?id=2460 , there's
a file with missing /Type keys in the page tree nodes. In that case,
leaf nodes and intermediary nodes have to be distinguished in a
different way.
|
|
These warnings are caused by casting function pointers to void*
instead of proper function types.
|
|
Some warnings we'd like to enable for MuPDF and still be able to
compile it with warnings as errors using MSVC (2008 to 2013):
* C4115: 'timeval' : named type definition in parentheses
* C4204: nonstandard extension used : non-constant aggregate initializer
* C4295: 'hex' : array is too small to include a terminating null character
* C4389: '==' : signed/unsigned mismatch
* C4702: unreachable code
* C4706: assignment within conditional expression
Also, globally disable C4701 which is frequently caused by MSVC not
being able to correctly figure out fz_try/fz_catch code flow.
And don't define isnan for VS2013 and later where that's no longer needed.
|
|
The SVG device needs rebinding as it holds a file. The PDF device needs
to rebind the underlying pdf document.
All documents need to rebind their underlying streams.
|