summaryrefslogtreecommitdiff
path: root/source/pdf/pdf-stream.c
AgeCommit message (Collapse)Author
2018-08-30Bug 699694: Fix reference counting for JBIG2 globals.Sebastian Rasmussen
fz_open_jbig2d() is called at two locations in MuPDF. At one location a reference to the JBIG2 globals struct was taken before passing it to fz_open_jbig2d(). At the other location no such reference was taken, but rather ownership of the struct was implicitly transferred to fz_open_jbig2d(). This inconsistency led to a leak of the globals struct at the first location. Now, passing a JBIG2 globals struct to fz_open_jbig2d() never implictly takes ownership. Instead the JBIG2 stream will take a reference if it needs it and drops it in case of error. As usual it is the callers responsibility to drop the reference to the globals struct it owns.
2018-08-30Remove unreachable code.Sebastian Rasmussen
JBIG2 images are detected by build_compression_params() and then always passed to fz_open_image_decomp_stream() by build_filter(). Therefore there is no chance for build_filter() at a later stage to detect JBIG2 images, and so that check can be removed.
2018-08-22Bug 699653: Avoid dropping filter chain once too often in case of error.Sebastian Rasmussen
build_filter_chain_drop() promises to extend (according to the fs argument) the filter chain it is given, or in case of exception throw away the at that point potentially extended filter chain it was given from the beginning. Because build_filter_chain_drop() calls build_filter_drop() for every filter it adds it doesn't need to do any cleanup of the filter chain on its own, that's build_filter_drop()'s responsibility. Prior to this commit fz_catch() in build_filter_chain_drop() which would drop the filter chain one time too many (it was already dropped by build_filter_drop()), causing the callers to use a stale pointer. Now once the extra fz_drop_stream() has been removed the logic works as it ought to, even in the case of exceptions. Thanks to oss-fuzz for reporting.
2018-08-10Clean up null/range/endstream filter.Tor Andersson
Use separate functions to keep the code simpler. Use memmem to simplify and optimize search for 'endstream' token. Do not look for 'endobj' since that could cause a false positives in compressed object streams that have duff lengths.
2018-07-06Bug 699308: Fix stream reading logic to better cope with duff Lengths.Robin Watts
Always look for the "endstream" marker after a PDF stream to see if we've hit the end. Allow for "endobj" to cope with producers that omit endstream entirely. Avoid slowing down legal files by only checking for the end marker after the specified length has been read.
2018-05-16Keep JBIG2 image data compressed in fz_compressed_buffer.Tor Andersson
2018-04-27Use pdf_dict_get_int, etc.Tor Andersson
2018-04-24Remove need for namedump by using macros and preprocessor.Tor Andersson
Add a PDF_NAME(Foo) macro that evaluates to a pdf_obj for /Foo. Use the C preprocessor to create the enum values and string table from one include file instead of using a separate code generator tool.
2018-04-08Set pointers to NULL so they can be safely dropped.Sebastian Rasmussen
Previously these were not set to NULL, which caused spurious segmentation errors.
2018-04-04Fix silly typo in pdf_load_compressed_inline_image.Tor Andersson
2018-04-03Don't implicitly drop in fz_open_* chained filters.Tor Andersson
2018-03-16Do not warn if there are no JBIG2 globals.Sebastian Rasmussen
2018-02-12jbig2 globals are streams, this implies indirect references.Sebastian Rasmussen
Previously mupdf would attempt to load any indirect reference, whether it was a stream or not.
2018-02-12Bug 698998: Avoid recursion when opening jbig2 image streams.Sebastian Rasmussen
Previously the JBIG2 globals object might be indirect and if that reference pointed to the object containing the stream itself then mupdf would recurse until running out of error stack. Thanks to oss-fuzz for reporting.
2018-02-08Fix 698991: The pdf_is_stream call is too generous.Tor Andersson
It should only return true for indirect references that are actually streams, not just any array/dict that is contained in a stream object.
2018-02-06Bug 698986: Remember to fz_var() variable dropped in fz_catch().Sebastian Rasmussen
2018-02-01Bug 698830: Don't drop unkept stream if running out of error stack.Sebastian Rasmussen
Under normal conditions where fz_keep_stream() is called inside fz_try() we may call fz_drop_stream() in fz_catch() upon exceptions. The issue comes when fz_keep_stream() has not yet been called but is dropped in fz_catch(). This happens in the PDF from the bug when fz_try() runs out of exception stack, and next the code in fz_catch() runs, dropping the caller's reference to the filter chain stream! The simplest way of fixing this it to always keep the filter chain stream before fz_try() is called. That way fz_catch() may drop the stream whether an exception has occurred or if the fz_try() ran out of exception stack.
2017-11-01Use int64_t for public file API offsets.Tor Andersson
Don't mess with conditional compilation with LARGEFILE -- always expose 64-bit file offsets in our public API.
2017-10-04Mark another variable fz_var(), avoiding optimization.Sebastian Rasmussen
This really should have been part of commit 0ef7cb983c4325156e08525381542ae3ada04720.
2017-10-02Drop stream upon error in inline stream.Sebastian Rasmussen
2017-10-02Make sure to drop chain upon error in raw and crypto filters.Sebastian Rasmussen
2017-09-25Bug 698592: Mark variable fz_var(), avoiding optimization.Sebastian Rasmussen
The change in 2707fa9e8e6d17d794330e719dec1b08161fb045 in build_filter_chain() allows for the variable chain to reside in a register, which means that the bug is likely to only be visible if built under optimization. First the chain variable is transferred to chain2, then set to NULL, then when an exception occurs in build_filter() the filter chain will be freed by build_filter(). Next the expectation is that execution proceeds to fz_catch() where fz_drop_stream() would be called with chain == NULL. However due to the chain variable residing in a register, its value is not NULL as expected, but was reset to its original value upon the exception (since they use setjmp()), hence fz_drop_stream() is called with a non-NULL value. Marking the chain variable with fz_var() prevents the compiler from allowing the chain variable to reside in a register and hence its value will remain NULL and never be reset.
2017-09-13Consistently drop filter chain upon error.Sebastian Rasmussen
2017-09-13Remove old workaround.Sebastian Rasmussen
2017-09-07Initialize variables to appease clang scan-build.Sebastian Rasmussen
2017-06-22Add const to pdf_toname.Tor Andersson
2017-04-27Include required system headers.Tor Andersson
2017-01-17Fix typos.Sebastian Rasmussen
2016-11-14Make fz_buffer structure private to fitz.Robin Watts
Move the definition of the structure contents into new fitz-imp.h file. Make all code outside of fitz access the buffer through the defined API. Add a convenience API for people that want to get buffers as null terminated C strings.
2016-10-21Clean up FZ_IMAGE_XXX enums and purge unused FZ_IMAGE_JBIG2.Tor Andersson
2016-09-26Fix memory leak when opening html/loading raw stream.Sebastian Rasmussen
2016-09-01pdf: Load/open streams by indirect reference object when possible.Tor Andersson
2016-07-06pdf: Drop generation number from public interfaces.Tor Andersson
The generation number is only needed for decryption, and is assumed to be zero or irrelevant for all other uses. Store the original object number and generation in the xref slot, so that we can decrypt them even when the objects have been renumbered, without needing to pass the original object number around through the stream loading APIs.
2016-06-14Add optional support for Luratech JBIG2 decoder.Sebastian Rasmussen
If thirdparty/luratech is populated then this decoder will be preferred over jbig2dec (even if both are present).
2016-04-28Refactor fz_image code cases.Robin Watts
Split compressed images (images based on a compressed buffer) and pixmap images (images based on a pixmap) out into separate subclasses.
2016-04-28Partial image decode.Robin Watts
Update the core fz_get_pixmap_from_image code to allow fetching a subarea of a pixmap. We pass in the required subarea, together with the transformation matrix for the whole image. On return, we have a pixmap at least as big as was requested, and the transformation matrix is updated to map the supplied area to the correct place on the screen. The draw device is updated to use this as required. Everywhere else passes NULLs in, and so gets unchanged behaviour. The standard 'get_pixmap' function has been updated to decode just the required areas of the bitmaps. This means that banded rendering of pages will decode just the image subareas that are required for each band, limiting the memory use. The downside to this is that each band will redecode the image again to extract just the section we want. The image subareas are put into the fz_store in the same way as full images. Currently image areas in the store are only matched when they match exactly; subareas are not identified as being able to use existing images.
2016-04-27Fix 696649: remove fz_rethrow_message calls.Tor Andersson
2016-04-18Fix corruption of file using sanitize.Robin Watts
When sanitizing a file, while cleaning with decompression, I was seeing a flate problem reported. The issue is that pdf_open_filter was passing pdf_open_raw_filter the orig_num as both num and orig_num. This was causing us to find an fz_buffer attached to the (wrong) xref entry and to open that instead of the underlying stream. The fix is to propogate num a bit further.
2016-03-14Make pdf_is_stream work on loaded stream dictionary objects as well.Tor Andersson
2016-03-14Take pdf_obj argument to pdf_is_stream.Tor Andersson
2015-06-29Further tweaks to fz_image handling.Robin Watts
Ensure that subsampling and caching happen in the generic image code, not in the specific. Previously, the subsampling happened only for images that were decoded from streams. Images that were loaded direct were never subsampled and hence were always cached at full size. After this change both classes of image are correctly subsampled, and the subsampled version kept in the cache. This produces various image diffs in the cluster, none of which are noticable to the naked eye.
2015-05-15Support pdf files larger than 2Gig.Robin Watts
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets for files; this allows us to open streams larger than 2Gig. The downsides to this are that: * The xref entries are larger. * All PDF ints are held as 64bit things rather than 32bit things (to cope with /Prev entries, hint stream offsets etc). * All file positions are stored as 64bits rather than 32. The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery in fitz/system.h sets fz_off_t to either int or int64_t as appropriate, and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required. These call the fseeko64 etc functions on linux (and so define _LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
2015-03-30Bug 695549: Avoid returning compressed buffer as uncompressed.Robin Watts
pdf_load_image_stream is supposed to return a buffer containing the uncompressed stream from an object (or, in the case of image streams where an fz_compression_params structure is supplied, a stream decompressed up to the point of the image format compression). We have an optimisation in pdf_load_image_stream to allow it to return the existing buffer from a cached object rather than reloading it again, but as bug 695549 points out, this breaks in the case where the cached stream is compressed. The suggested fix by the bug reporter (Stefan Klein) would work in that it would stop compressed streams being returned as uncompressed ones, but it is not perfect as it could lead to several copies of shortstoppable image streams being loaded (and for streams with null or empty array filters being mistaken for compressed ones). The fix here solves these cases too.
2015-03-24Rework handling of PDF names for speed and memory.Robin Watts
Currently, every PDF name is allocated in a pdf_obj structure, and comparisons are done using strcmp. Given that we can predict most of the PDF names we'll use in a given file, this seems wasteful. The pdf_obj type is opaque outside the pdf-object.c file, so we can abuse it slightly without anyone outside knowing. We collect a sorted list of names used in PDF (resources/pdf/names.txt), and we add a utility (namedump) that preprocesses this into 2 header files. The first (include/mupdf/pdf/pdf-names-table.h, included as part of include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx" entries. These are pdf_obj *'s that callers can use to mean "A PDF object that means literal name 'xxxx'" The second (source/pdf/pdf-name-impl.h) is a C array of names. We therefore update the code so that rather than passing "xxxx" to functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to pdf_dict_get(...)). This is a fairly natural (if widespread) change. The pdf_dict_getp (and sibling) functions that take a path (e.g. "foo/bar/baz") are therefore supplemented with equivalents that take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar, PDF_NAME_baz, NULL)). The actual implementation of this relies on the fact that small pointer values are never valid values. For a given pdf_obj *p, if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal entry in the name table. This enables us to do fast pointer compares and to skip expensive strcmps. Also, bring "null", "true" and "false" into the same style as PDF names. Rather than using full pdf_obj structures for null/true/false, use special pointer values just above the PDF_NAME_ table. This saves memory and makes comparisons easier.
2015-03-24Don't pass interpreter context to pdf_processor opcode callbacks.Tor Andersson
Update buffer and filter processors. Filter both colors and stroke states. Move OCG hiding logic into interpreter.
2015-02-17Add ctx parameter and remove embedded contexts for API regularity.Tor Andersson
Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2015-02-17Rename fz_close_* and fz_free_* to fz_drop_*.Tor Andersson
Rename fz_close to fz_drop_stream. Rename fz_close_archive to fz_drop_archive. Rename fz_close_output to fz_drop_output. Rename fz_free_* to fz_drop_*. Rename pdf_free_* to pdf_drop_*. Rename xps_free_* to xps_drop_*.
2014-12-29Performance optimisation with pdf_cache_object/pdf_get_xref_entryRobin Watts
The recent change to holding pdf xrefs in a sparse format has resulted in a significant decrease in speed (x10). Malc points out that some of this (2x) can be recovered simply by making pdf_cache_object return the entry which it found the object in. This saves us having to immediately call pdf_get_xref_entry again afterwards. I am still thinking about ways to try and get the remaining time back.
2014-12-03Add ZIP file and directory reading module.Tor Andersson
2014-10-28fix memory leaks in load_sample_func and pdf_load_compressed_inline_imageSimon Bünzli
In load_sample_func, the stream is not closed and thus leaked if one of the fz_read_byte or fz_read_bits calls throws (which might happen e.g. on a Deflate data error). In pdf_load_compressed_inline_image, the allocated buffer is not freed if one of the stream initializers or the tile creation throws (fz_open_leecher does not take ownership of the stream).