Age | Commit message (Collapse) | Author |
|
|
|
|
|
Add a PDF_NAME(Foo) macro that evaluates to a pdf_obj for /Foo.
Use the C preprocessor to create the enum values and string table
from one include file instead of using a separate code generator tool.
|
|
Previously these were not set to NULL, which caused
spurious segmentation errors.
|
|
|
|
|
|
|
|
Previously mupdf would attempt to load any indirect reference,
whether it was a stream or not.
|
|
Previously the JBIG2 globals object might be indirect and if that
reference pointed to the object containing the stream itself then
mupdf would recurse until running out of error stack. Thanks to
oss-fuzz for reporting.
|
|
It should only return true for indirect references that are actually
streams, not just any array/dict that is contained in a stream object.
|
|
|
|
Under normal conditions where fz_keep_stream() is called inside
fz_try() we may call fz_drop_stream() in fz_catch() upon exceptions.
The issue comes when fz_keep_stream() has not yet been called but is
dropped in fz_catch(). This happens in the PDF from the bug when
fz_try() runs out of exception stack, and next the code in fz_catch()
runs, dropping the caller's reference to the filter chain stream!
The simplest way of fixing this it to always keep the filter chain
stream before fz_try() is called. That way fz_catch() may drop the
stream whether an exception has occurred or if the fz_try() ran out of
exception stack.
|
|
Don't mess with conditional compilation with LARGEFILE -- always expose
64-bit file offsets in our public API.
|
|
This really should have been part of commit
0ef7cb983c4325156e08525381542ae3ada04720.
|
|
|
|
|
|
The change in 2707fa9e8e6d17d794330e719dec1b08161fb045
in build_filter_chain() allows for the variable chain
to reside in a register, which means that the bug is
likely to only be visible if built under optimization.
First the chain variable is transferred to chain2, then
set to NULL, then when an exception occurs in build_filter()
the filter chain will be freed by build_filter(). Next
the expectation is that execution proceeds to fz_catch()
where fz_drop_stream() would be called with chain == NULL.
However due to the chain variable residing in a register,
its value is not NULL as expected, but was reset to its
original value upon the exception (since they use setjmp()),
hence fz_drop_stream() is called with a non-NULL value.
Marking the chain variable with fz_var() prevents the
compiler from allowing the chain variable to reside in
a register and hence its value will remain NULL and
never be reset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Move the definition of the structure contents into new fitz-imp.h
file. Make all code outside of fitz access the buffer through the
defined API.
Add a convenience API for people that want to get buffers as
null terminated C strings.
|
|
|
|
|
|
|
|
The generation number is only needed for decryption, and is assumed
to be zero or irrelevant for all other uses.
Store the original object number and generation in the xref slot, so
that we can decrypt them even when the objects have been renumbered,
without needing to pass the original object number around through
the stream loading APIs.
|
|
If thirdparty/luratech is populated then this decoder will be preferred
over jbig2dec (even if both are present).
|
|
Split compressed images (images based on a compressed buffer)
and pixmap images (images based on a pixmap) out into separate
subclasses.
|
|
Update the core fz_get_pixmap_from_image code to allow fetching
a subarea of a pixmap. We pass in the required subarea, together
with the transformation matrix for the whole image.
On return, we have a pixmap at least as big as was requested,
and the transformation matrix is updated to map the supplied
area to the correct place on the screen.
The draw device is updated to use this as required. Everywhere
else passes NULLs in, and so gets unchanged behaviour.
The standard 'get_pixmap' function has been updated to decode
just the required areas of the bitmaps.
This means that banded rendering of pages will decode just the
image subareas that are required for each band, limiting the
memory use. The downside to this is that each band will redecode
the image again to extract just the section we want.
The image subareas are put into the fz_store in the same way
as full images. Currently image areas in the store are only
matched when they match exactly; subareas are not identified
as being able to use existing images.
|
|
|
|
When sanitizing a file, while cleaning with decompression, I was
seeing a flate problem reported.
The issue is that pdf_open_filter was passing pdf_open_raw_filter
the orig_num as both num and orig_num. This was causing us to
find an fz_buffer attached to the (wrong) xref entry and to open
that instead of the underlying stream.
The fix is to propogate num a bit further.
|
|
|
|
|
|
Ensure that subsampling and caching happen in the generic image
code, not in the specific.
Previously, the subsampling happened only for images that were
decoded from streams. Images that were loaded direct were never
subsampled and hence were always cached at full size. After this
change both classes of image are correctly subsampled, and
the subsampled version kept in the cache.
This produces various image diffs in the cluster, none of which
are noticable to the naked eye.
|
|
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets
for files; this allows us to open streams larger than 2Gig.
The downsides to this are that:
* The xref entries are larger.
* All PDF ints are held as 64bit things rather than 32bit things
(to cope with /Prev entries, hint stream offsets etc).
* All file positions are stored as 64bits rather than 32.
The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery
in fitz/system.h sets fz_off_t to either int or int64_t as appropriate,
and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required.
These call the fseeko64 etc functions on linux (and so define
_LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
|
|
pdf_load_image_stream is supposed to return a buffer containing the
uncompressed stream from an object (or, in the case of image streams
where an fz_compression_params structure is supplied, a stream
decompressed up to the point of the image format compression).
We have an optimisation in pdf_load_image_stream to allow it to
return the existing buffer from a cached object rather than
reloading it again, but as bug 695549 points out, this breaks in
the case where the cached stream is compressed.
The suggested fix by the bug reporter (Stefan Klein) would work
in that it would stop compressed streams being returned as
uncompressed ones, but it is not perfect as it could lead to
several copies of shortstoppable image streams being loaded (and
for streams with null or empty array filters being mistaken for
compressed ones).
The fix here solves these cases too.
|
|
Currently, every PDF name is allocated in a pdf_obj structure, and
comparisons are done using strcmp. Given that we can predict most
of the PDF names we'll use in a given file, this seems wasteful.
The pdf_obj type is opaque outside the pdf-object.c file, so we can
abuse it slightly without anyone outside knowing.
We collect a sorted list of names used in PDF (resources/pdf/names.txt),
and we add a utility (namedump) that preprocesses this into 2 header
files.
The first (include/mupdf/pdf/pdf-names-table.h, included as part of
include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx"
entries. These are pdf_obj *'s that callers can use to mean "A PDF
object that means literal name 'xxxx'"
The second (source/pdf/pdf-name-impl.h) is a C array of names.
We therefore update the code so that rather than passing "xxxx" to
functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to
pdf_dict_get(...)). This is a fairly natural (if widespread) change.
The pdf_dict_getp (and sibling) functions that take a path (e.g.
"foo/bar/baz") are therefore supplemented with equivalents that
take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar,
PDF_NAME_baz, NULL)).
The actual implementation of this relies on the fact that small
pointer values are never valid values. For a given pdf_obj *p,
if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal
entry in the name table.
This enables us to do fast pointer compares and to skip expensive
strcmps.
Also, bring "null", "true" and "false" into the same style as PDF names.
Rather than using full pdf_obj structures for null/true/false, use
special pointer values just above the PDF_NAME_ table. This saves
memory and makes comparisons easier.
|
|
Update buffer and filter processors.
Filter both colors and stroke states.
Move OCG hiding logic into interpreter.
|
|
Purge several embedded contexts:
Remove embedded context in fz_output.
Remove embedded context in fz_stream.
Remove embedded context in fz_device.
Remove fz_rebind_stream (since it is no longer necessary).
Remove embedded context in svg_device.
Remove embedded context in XML parser.
Add ctx argument to fz_document functions.
Remove embedded context in fz_document.
Remove embedded context in pdf_document.
Remove embedded context in pdf_obj.
Make fz_page independent of fz_document in the interface.
We shouldn't need to pass the document to all functions handling a page.
If a page is tied to the source document, it's redundant; otherwise it's
just pointless.
Fix reference counting oddity in fz_new_image_from_pixmap.
|
|
Rename fz_close to fz_drop_stream.
Rename fz_close_archive to fz_drop_archive.
Rename fz_close_output to fz_drop_output.
Rename fz_free_* to fz_drop_*.
Rename pdf_free_* to pdf_drop_*.
Rename xps_free_* to xps_drop_*.
|
|
The recent change to holding pdf xrefs in a sparse format has resulted
in a significant decrease in speed (x10). Malc points out that some of
this (2x) can be recovered simply by making pdf_cache_object return the
entry which it found the object in.
This saves us having to immediately call pdf_get_xref_entry again
afterwards.
I am still thinking about ways to try and get the remaining time back.
|
|
|
|
In load_sample_func, the stream is not closed and thus leaked if one of
the fz_read_byte or fz_read_bits calls throws (which might happen e.g.
on a Deflate data error).
In pdf_load_compressed_inline_image, the allocated buffer is not freed
if one of the stream initializers or the tile creation throws
(fz_open_leecher does not take ownership of the stream).
|
|
Return the null object rather than throwing an exception when parsing
indirect object references with negative object numbers.
Do range check for object numbers (1 .. length) when object numbers
are used instead.
Object number 0 is not a valid object number. It must always be 'free'.
|
|
After rushing to get the fix for a crash in, I realised the
routine could be simplified a bit.
|
|
Michael spotted that double closing an fz_stream on an inline image
does bad things. Simple fix is not to double close.
|
|
Previously pdf_process buffer did not understand inline images.
In order to make this work without needlessly duplicating complex code
from within pdf-op-run, the parsing of inline images has been moved to
happen in pdf-interpret.c. When the op_table entry for BI is called
it now expects the inline image to be in csi->img and the dictionary
object to be in csi->obj.
To make this work, we have had to improve the handling of inline images
in general. While non-inline images have been loaded and held in
memory in their compressed form and only decoded when required, until
now we have always loaded and decoded inline images immediately. This
has been due to the difficulty in knowing how many bytes of data to
read from the stream - we know the length of the stream once
uncompressed, but relating this to the compressed length is hard.
To cure this we introduce a new type of filter stream, a 'leecher'.
We insert a leecher stream before we build the filters required to
decode the image. We then read and discard the appropriate number
of uncompressed bytes from the filters. This pulls the compressed
data through the leecher stream, which stores it in an fz_buffer.
Thus images are now always held in their compressed forms in memory.
The pdf-op-run implementation is now trivial. The only real complexity
in the pdf-op-buffer implementation is the need to ensure that the
/Filter entry in the dictionary object matches the exact point at
which we backstopped the decompression.
|
|
In the existing code, if build_filter fails, chain will be freed. If
pdf_array_get fails however, it will leak.
Rectify this. No specific bug or example file, just observation arising
from discussions about previous commit.
|