Age | Commit message (Collapse) | Author |
|
|
|
Same as for fz_bbox_fill_image_mask, fz_bbox_clip_image_mask must
transform the unit rectangle to get the bounding bbox.
|
|
There are two issues where variables may be used unitialized:
* extract_exif_resolution fails to set xres and yres for JPEG images if
there's no valid resolution unit (mainly affects XPS documents)
* xps_measure_font_glyph uses hadv and vadv unitialized if the glyph id
isn't valid (i.e. if FT_Get_Advance fails)
|
|
Thanks to Triet Lai.
|
|
|
|
Currently, png_read_phys always rounds the resolution down. Many images
have a resolution just slightly shy of 96 DPI and are thus rendered too
large when they're resized from 95 to match the required 96 for output.
|
|
If the reported height is 0 or too large, use the image size reported
in the PDF itself instead (in the case of height 0, the JPEG library
is supposed to read the correct value from the DNL segment, but libjpeg
doesn't support that).
|
|
If a JPEG stream is missing valid values for width/height (usually -1),
Adobe Reader substitutes these using the values read from the PDF
object. This can be done by scanning and patching the data before
passing it to libjpeg.
Thanks to zeniko for the patch.
|
|
fast_cmyk_to_rgb had a simple 1 place cache to avoid recalculating
the same conversions again and again. The implementation was broken
though, both in C and ARM code versions. This seems to fix it.
|
|
... instead convert a JPEG2000 used as a soft mask into grayscale.
This is more robust than trusting the PDF specified colorspace over
the internal JPX colorspace.
The spec implies that in a colorspace conflict, the internal JPX
colorspace should be used.
The PDF colorspace may be a DeviceN or Separation colorspace.
DeviceN and Separation colorspaces are not valid destination
colorspaces, so we may not always be able to convert the internal
JPX colorspace into the PDF specified colorspace.
Converting from the internal colorspace into grayscale is more robust,
and solves the issue that the original commit was intended to fix.
|
|
|
|
When we return the padding byte in an fz_concat stream, ensure that
we remember to increment rp to point just past in. If not, then we'll
read 2 whitespace chars out. This is fine unless we try and
fz_unread_byte the first one, when we'll leave rp pointing to
an out of buffer address.
Credit to Malc for the bisecting/debugging that got me to the fix.
Many thanks.
|
|
fts_5904.xps and fts_5905.xps use namespace prefixes.
Work around that by ignoring the namespace prefix for tag names.
A more robust solution would be to expand or record the tag and
attribute namespaces in the fz_xml node structure, but that's a
overkill for our current needs.
|
|
|
|
This fixes three instances of warning C4706, allows compilation with
VS2013 and prevents an accidental va_end for when va_end is defined
(which is the case for debug builds).
|
|
Split functions out of pdf-form.c that shouldn't be there, and make
javascript initialization explicit.
|
|
Stupid MSVC has no strtof.
|
|
%q escapes using C syntax and wraps the string in double quotes.
%( escapes using PS/PDF syntax and wraps the string in parens.
|
|
The primary motivator for this is so that we can print floating point
values and get the full accuracy out, without having to print 1.5 as
1.5000000, and without getting 23e24 etc.
We only support %c, %f, %d, %o, %x and %s currently.
We only support the zero padding qualifier, for integers.
We do support some extensions:
%C turns values >=128 into UTF-8.
%M prints a fz_matrix.
%R prints a fz_rect.
%P prints a fz_point.
We also implement a fprintf variant on top of this to allow for
consistent results when using fz_output.
a
|
|
Previously pdf_process buffer did not understand inline images.
In order to make this work without needlessly duplicating complex code
from within pdf-op-run, the parsing of inline images has been moved to
happen in pdf-interpret.c. When the op_table entry for BI is called
it now expects the inline image to be in csi->img and the dictionary
object to be in csi->obj.
To make this work, we have had to improve the handling of inline images
in general. While non-inline images have been loaded and held in
memory in their compressed form and only decoded when required, until
now we have always loaded and decoded inline images immediately. This
has been due to the difficulty in knowing how many bytes of data to
read from the stream - we know the length of the stream once
uncompressed, but relating this to the compressed length is hard.
To cure this we introduce a new type of filter stream, a 'leecher'.
We insert a leecher stream before we build the filters required to
decode the image. We then read and discard the appropriate number
of uncompressed bytes from the filters. This pulls the compressed
data through the leecher stream, which stores it in an fz_buffer.
Thus images are now always held in their compressed forms in memory.
The pdf-op-run implementation is now trivial. The only real complexity
in the pdf-op-buffer implementation is the need to ensure that the
/Filter entry in the dictionary object matches the exact point at
which we backstopped the decompression.
|
|
Currently fz_streams have a 4K buffer within their header. The call
to read from a stream fills this buffer, resulting in more data being
pulled from any underlying stream than we might like. This causes
problems with the forthcoming 'leech' filter.
Here we simplify the fields available in the public stream header.
No specific buffer is given; simply the read and write pointers.
The underlying 'read' function is replaced by a 'next' function
that makes the next block of data available and returns the first
character of it (or EOF).
A caller to the 'next' function should supply the maximum number of
bytes that it knows it will need (possibly not now, but eventually).
This enables the underlying stream to efficiently decode just enough.
The underlying stream is free to return fewer, or a greater number
if it wants to.
The exact size of the 'block' of data returned will depend on the
filter in use and (possibly) the data therein.
Callers can get the currently available amount of data by calling
fz_available (but again should pass the maximum amount of data they know
they will need). The only time this will ever return 0 is if we have
hit EOF.
|
|
Gridfitting can increase the required width/height of images by up to
2 pixels. This makes images that are rendered very small very
sensitive to over quantisation.
This can produce 'mushier' images than it should, for instance on
tests/Ghent_V3.0/090_Font-Support_x3.pdf (pgmraw, 72dpi)
|
|
If the expansion of a transformation matrix is huge, the path flatness
becomes so small that even simple paths consist of millions of edges
which easily causes MuPDF to hang quite long for simple documents. One
solution for this is to limit the allowed flatness.
|
|
The following changes allow font providers to make better choices WRT
what font to provide and under what circumstances:
* bold and italic flags are passed in so that implementors can decide
themselves whether to ask for simulated boldening/italicising
if a font claims not to be bold/italic
* is_substitute is replaced with needs_exact_metrics to make the
meaning of this argument hopefully clearer (that argument is set only
for PDF fonts without a FontDescriptor)
* the font name is always passed as requested by the document instead
of the cleaned name for the standard 14 fonts which allows
distinguishing e.g. Symbol and Symbol,Bold
|
|
and give them names more likely to be unique.
|
|
|
|
|
|
Many times, the idiom p.x = x; p.y = y; fz_transform_point() is used.
This function should simplify that use case by both initializing and
transforming the point in one call.
|
|
|
|
Patch from Thomas Fach-Pedersen. Many thanks!
Add a new format handler that copes with TIFF files. This replaces
the TIFF functionality within the image format handler, and is
better because this copes with multiple images (as one image per
page).
|
|
Patch from Thomas Fach-Pedersen. Many Thanks.
|
|
Luiz Henrique de Figueiredo reports that glyphs output from the
SVG device contain 'lumpy' outlines. Investigation reveals that
this is because the current code extracts the outlines from
freetype at unit scale, and then relies on SVG to scale them up.
Unfortunately, freetype insists on working in integer maths, so
any sort of scaling runs the risk of distorting the outlines.
The fix is to change the way we call freetype; we now request an
'UNSCALED' char, and set the required size to be the design size.
We then transform the results in the floating point domain
ourself.
This cures the lumpy outlines, but reveals a second problem,
namely that the bbox given for characters is inaccurate (and
sometimes too small). Investigation shows that this is again
caused by freetypes scaling, so we apply the same trick; ask
for the glyph without scaling (as far as possible), and then
scale the results down.
We also take care to spot the 'ft_hint' flag in the font. If set
this indicates that hinting must be performed to ensure that
the returned outlines are sane. We therefore take note of this
when calculating both bbox and outlines. This means that 'tricky'
fonts such as dynalab ones now render correctly.
This produces many changes in the bitmaps, the vast majority of which
are neutral. The ones that aren't are all progressions.
|
|
|
|
If the scale is too large, the calculation to determine the
required size of a pixmap can overflow. This can lead to negative
width/heights being passed in, which confuses the subsampling
code, leading to SEGVs.
|
|
Seen when valgrinding a memento build of mudraw on:
e0e44ed8692671b820de72c6c0a32608_asan_heap-uaf_8c2b76_1530_2026.pdf
|
|
fz_new_image_from_pixmap expects that the pixmap's colorspace has two
references which is contrary to expectations. If it instead addrefs the
pixmap's colorspace, the only caller pdf_load_jpx can consistently
drop the colorspace after passing it to fz_load_jpx.
Also, if the contract is that whatever is passed into
fz_new_image_from_pixmap belongs to the new image, then the pixmap also
has to be dropped on error so that it isn't leaked.
|
|
Using JDCT_FASTEST as rendering method can produce visible artifacts
(e.g. in 1960_-_DCT_image_wrongly_decoded_regression_from_1.2_.pdf).
|
|
If we perform a linejoin that ends up being over an impossibly small
distance, we can get a rendering error. This is caused by trying to
calculate scale = linewidth/sqrtf(len), where len < FLT_EPSILON.
Avoid this by rearranging the code slightly - no extra calculations
required.
Also given that sn == bn at all times within the stroking code, just
remove bn.
Credit for spotting this problem goes to Simon for tracking the
problem with rounding_artifact_due_to_closepath.pdf. My fix just
fixes the problem at a lower level than his does.
|
|
fz_draw_clip_text changes the value of 'state' during a loop. The
'if (glyph)' part of the loop assumes that it points to gstate[top-1]
where the 'path' part of the loop changes it to point to gstate[top].
If we render a "non glyph" glyph, then a "glyph" glyph, we will access
an invalid state. This can cause a draw_glyph call on an invalid
destination bitmap.
The fix is simply not to reset state.
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the fuzzing files.
|
|
For SumatraPDF, the following changes are required:
* fz_load_system_font is called from pdf_load_builtin_font as well so
that Arial, Courier New, etc. can be loaded from the system instead
of their Nimbus replacements. In order to distinguish between calls
from pdf_load_builtin_font and pdf_load_substitute_font, an
is_substitute argument is added.
* fz_load_system_cjk_font is added and called from
pdf_load_substitute_cjk_font so that a better replacement font can
be loaded instead of DroidSansFallback.
* Both fz_load_system_font and fz_load_system_cjk_font return fz_font*
instead of fz_buffer* so that implementers aren't required to load
fonts into memory (SumatraPDF uses fz_new_font_from_file for system
fonts).
In addition to that, fz_load_system_font_func is renamed to
fz_load_system_font_funcs since it now accepts two functions, and the
PDF_ROS_* constants are renamed to FZ_ADOBE_* (collection names aren't
passed as const char* so that implementers know which collections to
expect). For convenience, fz_load_*_font also never throws since
currently all callers have further fallbacks available.
|
|
This can be seen e.g. in:
5db811ac25ef543fd0cfa0873e155329_signal_sigsegv_c9b60f_9636_76.pdf
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
Remember to make blocks defined before writing/reading them.
|
|
Problems caused by the fact that -0x8000000 = 0x80000000.
Sidestep the problem for all coords where floats cannot accurately
represent them.
|
|
The gel bbox was being stored internally as floats (despite
only holding ints). This means that as numbers get large the
bbox can become approximate, rather than exact. If the bbox
becomes smaller than it should, this causes crashes in the
scanline filling code.
This is seen with:
tests_private/fuzzing/mupdf2/17f8aee51ac776994af0b36195cdadd7_signal_sigsegv_5607be_7308_5912.pdf
The solution is simply to use ints rather than floats.
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the example files.
|
|
If a file specifies a silly number of bpp in the PNG predictor it can
overrun a buffer. This was shown by:
tests_private/fuzzing/mupdf2/013b2dcbd0207501e922910ac335eb59_*.pdf
but no longer shows up due to Simons earlier fix.
Following discussion we still think it's worth having this fix in, as
truncated data streams can cause len < bpp. Possibly we should throw
an error here, but I think that's not necessary as we will return the
short length, and the image reading code will notice that the image
is truncated already.
Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security
Team for providing the fuzzing files.
|
|
If columns is quite close to INT_MAX, the column index max overflow
in find_changing which causes an access violation in the next getbits.
This happens e.g. with
0c76a20163f30ea8ec860c4e588ce337_signal_sigsegv_5e7b28_9115_7127.pdf
|
|
This fixes a NULL pointer dereference in
2192b04848b2d8210d1a33e3ddeb2742_asan_heap-oob_a5a57d_2745_2844.pdf
Also, replace MAXC with FZ_MAX_COLORS.
|
|
We define a document handler for each file type (2 in the case of PDF, one
to handle files with the ability to 'run' them, and one without).
We then register these handlers with the context at startup, and then
call fz_open_document... as usual. This enables people to select the
document types they want at will (and even to extend the library with more
document types should they wish).
|
|
This bug shows 2 problems with our data handling.
Firstly, if a zip file entry has less data in the stream than it
is declared to have, we would leave the end of the data uninitialised.
We now put out a warning, and blank it with zeros.
Secondly, if the PNG decompression fails to decode enough data, we
don't notice. Now we give a warning and blank the remaining pixels.
|
|
Certain optimized documents use a rather large common symbol dictionary
for all JBIG2 images. Caching these JBIG2Globals speeds up loading and
rendering of such documents.
|