Age | Commit message (Collapse) | Author |
|
fz_open_jbig2d() is called at two locations in MuPDF. At one
location a reference to the JBIG2 globals struct was taken before
passing it to fz_open_jbig2d(). At the other location no such
reference was taken, but rather ownership of the struct was
implicitly transferred to fz_open_jbig2d(). This inconsistency
led to a leak of the globals struct at the first location.
Now, passing a JBIG2 globals struct to fz_open_jbig2d() never
implictly takes ownership. Instead the JBIG2 stream will take a
reference if it needs it and drops it in case of error. As usual
it is the callers responsibility to drop the reference to the
globals struct it owns.
|
|
JBIG2 images are detected by build_compression_params() and then
always passed to fz_open_image_decomp_stream() by build_filter().
Therefore there is no chance for build_filter() at a later stage
to detect JBIG2 images, and so that check can be removed.
|
|
build_filter_chain_drop() promises to extend (according to the
fs argument) the filter chain it is given, or in case of exception
throw away the at that point potentially extended filter chain it
was given from the beginning.
Because build_filter_chain_drop() calls build_filter_drop() for
every filter it adds it doesn't need to do any cleanup of the
filter chain on its own, that's build_filter_drop()'s responsibility.
Prior to this commit fz_catch() in build_filter_chain_drop() which
would drop the filter chain one time too many (it was already dropped
by build_filter_drop()), causing the callers to use a stale pointer.
Now once the extra fz_drop_stream() has been removed the logic works
as it ought to, even in the case of exceptions. Thanks to oss-fuzz
for reporting.
|
|
Use separate functions to keep the code simpler.
Use memmem to simplify and optimize search for 'endstream' token.
Do not look for 'endobj' since that could cause a false positives in
compressed object streams that have duff lengths.
|
|
Always look for the "endstream" marker after a PDF stream to see
if we've hit the end. Allow for "endobj" to cope with producers
that omit endstream entirely.
Avoid slowing down legal files by only checking for the end marker
after the specified length has been read.
|
|
|
|
|
|
Add a PDF_NAME(Foo) macro that evaluates to a pdf_obj for /Foo.
Use the C preprocessor to create the enum values and string table
from one include file instead of using a separate code generator tool.
|
|
Previously these were not set to NULL, which caused
spurious segmentation errors.
|
|
|
|
|
|
|
|
Previously mupdf would attempt to load any indirect reference,
whether it was a stream or not.
|
|
Previously the JBIG2 globals object might be indirect and if that
reference pointed to the object containing the stream itself then
mupdf would recurse until running out of error stack. Thanks to
oss-fuzz for reporting.
|
|
It should only return true for indirect references that are actually
streams, not just any array/dict that is contained in a stream object.
|
|
|
|
Under normal conditions where fz_keep_stream() is called inside
fz_try() we may call fz_drop_stream() in fz_catch() upon exceptions.
The issue comes when fz_keep_stream() has not yet been called but is
dropped in fz_catch(). This happens in the PDF from the bug when
fz_try() runs out of exception stack, and next the code in fz_catch()
runs, dropping the caller's reference to the filter chain stream!
The simplest way of fixing this it to always keep the filter chain
stream before fz_try() is called. That way fz_catch() may drop the
stream whether an exception has occurred or if the fz_try() ran out of
exception stack.
|
|
Don't mess with conditional compilation with LARGEFILE -- always expose
64-bit file offsets in our public API.
|
|
This really should have been part of commit
0ef7cb983c4325156e08525381542ae3ada04720.
|
|
|
|
|
|
The change in 2707fa9e8e6d17d794330e719dec1b08161fb045
in build_filter_chain() allows for the variable chain
to reside in a register, which means that the bug is
likely to only be visible if built under optimization.
First the chain variable is transferred to chain2, then
set to NULL, then when an exception occurs in build_filter()
the filter chain will be freed by build_filter(). Next
the expectation is that execution proceeds to fz_catch()
where fz_drop_stream() would be called with chain == NULL.
However due to the chain variable residing in a register,
its value is not NULL as expected, but was reset to its
original value upon the exception (since they use setjmp()),
hence fz_drop_stream() is called with a non-NULL value.
Marking the chain variable with fz_var() prevents the
compiler from allowing the chain variable to reside in
a register and hence its value will remain NULL and
never be reset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Move the definition of the structure contents into new fitz-imp.h
file. Make all code outside of fitz access the buffer through the
defined API.
Add a convenience API for people that want to get buffers as
null terminated C strings.
|
|
|
|
|
|
|
|
The generation number is only needed for decryption, and is assumed
to be zero or irrelevant for all other uses.
Store the original object number and generation in the xref slot, so
that we can decrypt them even when the objects have been renumbered,
without needing to pass the original object number around through
the stream loading APIs.
|
|
If thirdparty/luratech is populated then this decoder will be preferred
over jbig2dec (even if both are present).
|
|
Split compressed images (images based on a compressed buffer)
and pixmap images (images based on a pixmap) out into separate
subclasses.
|
|
Update the core fz_get_pixmap_from_image code to allow fetching
a subarea of a pixmap. We pass in the required subarea, together
with the transformation matrix for the whole image.
On return, we have a pixmap at least as big as was requested,
and the transformation matrix is updated to map the supplied
area to the correct place on the screen.
The draw device is updated to use this as required. Everywhere
else passes NULLs in, and so gets unchanged behaviour.
The standard 'get_pixmap' function has been updated to decode
just the required areas of the bitmaps.
This means that banded rendering of pages will decode just the
image subareas that are required for each band, limiting the
memory use. The downside to this is that each band will redecode
the image again to extract just the section we want.
The image subareas are put into the fz_store in the same way
as full images. Currently image areas in the store are only
matched when they match exactly; subareas are not identified
as being able to use existing images.
|
|
|
|
When sanitizing a file, while cleaning with decompression, I was
seeing a flate problem reported.
The issue is that pdf_open_filter was passing pdf_open_raw_filter
the orig_num as both num and orig_num. This was causing us to
find an fz_buffer attached to the (wrong) xref entry and to open
that instead of the underlying stream.
The fix is to propogate num a bit further.
|
|
|
|
|
|
Ensure that subsampling and caching happen in the generic image
code, not in the specific.
Previously, the subsampling happened only for images that were
decoded from streams. Images that were loaded direct were never
subsampled and hence were always cached at full size. After this
change both classes of image are correctly subsampled, and
the subsampled version kept in the cache.
This produces various image diffs in the cluster, none of which
are noticable to the naked eye.
|
|
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets
for files; this allows us to open streams larger than 2Gig.
The downsides to this are that:
* The xref entries are larger.
* All PDF ints are held as 64bit things rather than 32bit things
(to cope with /Prev entries, hint stream offsets etc).
* All file positions are stored as 64bits rather than 32.
The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery
in fitz/system.h sets fz_off_t to either int or int64_t as appropriate,
and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required.
These call the fseeko64 etc functions on linux (and so define
_LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
|
|
pdf_load_image_stream is supposed to return a buffer containing the
uncompressed stream from an object (or, in the case of image streams
where an fz_compression_params structure is supplied, a stream
decompressed up to the point of the image format compression).
We have an optimisation in pdf_load_image_stream to allow it to
return the existing buffer from a cached object rather than
reloading it again, but as bug 695549 points out, this breaks in
the case where the cached stream is compressed.
The suggested fix by the bug reporter (Stefan Klein) would work
in that it would stop compressed streams being returned as
uncompressed ones, but it is not perfect as it could lead to
several copies of shortstoppable image streams being loaded (and
for streams with null or empty array filters being mistaken for
compressed ones).
The fix here solves these cases too.
|
|
Currently, every PDF name is allocated in a pdf_obj structure, and
comparisons are done using strcmp. Given that we can predict most
of the PDF names we'll use in a given file, this seems wasteful.
The pdf_obj type is opaque outside the pdf-object.c file, so we can
abuse it slightly without anyone outside knowing.
We collect a sorted list of names used in PDF (resources/pdf/names.txt),
and we add a utility (namedump) that preprocesses this into 2 header
files.
The first (include/mupdf/pdf/pdf-names-table.h, included as part of
include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx"
entries. These are pdf_obj *'s that callers can use to mean "A PDF
object that means literal name 'xxxx'"
The second (source/pdf/pdf-name-impl.h) is a C array of names.
We therefore update the code so that rather than passing "xxxx" to
functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to
pdf_dict_get(...)). This is a fairly natural (if widespread) change.
The pdf_dict_getp (and sibling) functions that take a path (e.g.
"foo/bar/baz") are therefore supplemented with equivalents that
take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar,
PDF_NAME_baz, NULL)).
The actual implementation of this relies on the fact that small
pointer values are never valid values. For a given pdf_obj *p,
if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal
entry in the name table.
This enables us to do fast pointer compares and to skip expensive
strcmps.
Also, bring "null", "true" and "false" into the same style as PDF names.
Rather than using full pdf_obj structures for null/true/false, use
special pointer values just above the PDF_NAME_ table. This saves
memory and makes comparisons easier.
|
|
Update buffer and filter processors.
Filter both colors and stroke states.
Move OCG hiding logic into interpreter.
|
|
Purge several embedded contexts:
Remove embedded context in fz_output.
Remove embedded context in fz_stream.
Remove embedded context in fz_device.
Remove fz_rebind_stream (since it is no longer necessary).
Remove embedded context in svg_device.
Remove embedded context in XML parser.
Add ctx argument to fz_document functions.
Remove embedded context in fz_document.
Remove embedded context in pdf_document.
Remove embedded context in pdf_obj.
Make fz_page independent of fz_document in the interface.
We shouldn't need to pass the document to all functions handling a page.
If a page is tied to the source document, it's redundant; otherwise it's
just pointless.
Fix reference counting oddity in fz_new_image_from_pixmap.
|
|
Rename fz_close to fz_drop_stream.
Rename fz_close_archive to fz_drop_archive.
Rename fz_close_output to fz_drop_output.
Rename fz_free_* to fz_drop_*.
Rename pdf_free_* to pdf_drop_*.
Rename xps_free_* to xps_drop_*.
|
|
The recent change to holding pdf xrefs in a sparse format has resulted
in a significant decrease in speed (x10). Malc points out that some of
this (2x) can be recovered simply by making pdf_cache_object return the
entry which it found the object in.
This saves us having to immediately call pdf_get_xref_entry again
afterwards.
I am still thinking about ways to try and get the remaining time back.
|
|
|
|
In load_sample_func, the stream is not closed and thus leaked if one of
the fz_read_byte or fz_read_bits calls throws (which might happen e.g.
on a Deflate data error).
In pdf_load_compressed_inline_image, the allocated buffer is not freed
if one of the stream initializers or the tile creation throws
(fz_open_leecher does not take ownership of the stream).
|