summaryrefslogtreecommitdiff
path: root/source/pdf/pdf-stream.c
AgeCommit message (Collapse)Author
2018-05-16Keep JBIG2 image data compressed in fz_compressed_buffer.Tor Andersson
2018-04-27Use pdf_dict_get_int, etc.Tor Andersson
2018-04-24Remove need for namedump by using macros and preprocessor.Tor Andersson
Add a PDF_NAME(Foo) macro that evaluates to a pdf_obj for /Foo. Use the C preprocessor to create the enum values and string table from one include file instead of using a separate code generator tool.
2018-04-08Set pointers to NULL so they can be safely dropped.Sebastian Rasmussen
Previously these were not set to NULL, which caused spurious segmentation errors.
2018-04-04Fix silly typo in pdf_load_compressed_inline_image.Tor Andersson
2018-04-03Don't implicitly drop in fz_open_* chained filters.Tor Andersson
2018-03-16Do not warn if there are no JBIG2 globals.Sebastian Rasmussen
2018-02-12jbig2 globals are streams, this implies indirect references.Sebastian Rasmussen
Previously mupdf would attempt to load any indirect reference, whether it was a stream or not.
2018-02-12Bug 698998: Avoid recursion when opening jbig2 image streams.Sebastian Rasmussen
Previously the JBIG2 globals object might be indirect and if that reference pointed to the object containing the stream itself then mupdf would recurse until running out of error stack. Thanks to oss-fuzz for reporting.
2018-02-08Fix 698991: The pdf_is_stream call is too generous.Tor Andersson
It should only return true for indirect references that are actually streams, not just any array/dict that is contained in a stream object.
2018-02-06Bug 698986: Remember to fz_var() variable dropped in fz_catch().Sebastian Rasmussen
2018-02-01Bug 698830: Don't drop unkept stream if running out of error stack.Sebastian Rasmussen
Under normal conditions where fz_keep_stream() is called inside fz_try() we may call fz_drop_stream() in fz_catch() upon exceptions. The issue comes when fz_keep_stream() has not yet been called but is dropped in fz_catch(). This happens in the PDF from the bug when fz_try() runs out of exception stack, and next the code in fz_catch() runs, dropping the caller's reference to the filter chain stream! The simplest way of fixing this it to always keep the filter chain stream before fz_try() is called. That way fz_catch() may drop the stream whether an exception has occurred or if the fz_try() ran out of exception stack.
2017-11-01Use int64_t for public file API offsets.Tor Andersson
Don't mess with conditional compilation with LARGEFILE -- always expose 64-bit file offsets in our public API.
2017-10-04Mark another variable fz_var(), avoiding optimization.Sebastian Rasmussen
This really should have been part of commit 0ef7cb983c4325156e08525381542ae3ada04720.
2017-10-02Drop stream upon error in inline stream.Sebastian Rasmussen
2017-10-02Make sure to drop chain upon error in raw and crypto filters.Sebastian Rasmussen
2017-09-25Bug 698592: Mark variable fz_var(), avoiding optimization.Sebastian Rasmussen
The change in 2707fa9e8e6d17d794330e719dec1b08161fb045 in build_filter_chain() allows for the variable chain to reside in a register, which means that the bug is likely to only be visible if built under optimization. First the chain variable is transferred to chain2, then set to NULL, then when an exception occurs in build_filter() the filter chain will be freed by build_filter(). Next the expectation is that execution proceeds to fz_catch() where fz_drop_stream() would be called with chain == NULL. However due to the chain variable residing in a register, its value is not NULL as expected, but was reset to its original value upon the exception (since they use setjmp()), hence fz_drop_stream() is called with a non-NULL value. Marking the chain variable with fz_var() prevents the compiler from allowing the chain variable to reside in a register and hence its value will remain NULL and never be reset.
2017-09-13Consistently drop filter chain upon error.Sebastian Rasmussen
2017-09-13Remove old workaround.Sebastian Rasmussen
2017-09-07Initialize variables to appease clang scan-build.Sebastian Rasmussen
2017-06-22Add const to pdf_toname.Tor Andersson
2017-04-27Include required system headers.Tor Andersson
2017-01-17Fix typos.Sebastian Rasmussen
2016-11-14Make fz_buffer structure private to fitz.Robin Watts
Move the definition of the structure contents into new fitz-imp.h file. Make all code outside of fitz access the buffer through the defined API. Add a convenience API for people that want to get buffers as null terminated C strings.
2016-10-21Clean up FZ_IMAGE_XXX enums and purge unused FZ_IMAGE_JBIG2.Tor Andersson
2016-09-26Fix memory leak when opening html/loading raw stream.Sebastian Rasmussen
2016-09-01pdf: Load/open streams by indirect reference object when possible.Tor Andersson
2016-07-06pdf: Drop generation number from public interfaces.Tor Andersson
The generation number is only needed for decryption, and is assumed to be zero or irrelevant for all other uses. Store the original object number and generation in the xref slot, so that we can decrypt them even when the objects have been renumbered, without needing to pass the original object number around through the stream loading APIs.
2016-06-14Add optional support for Luratech JBIG2 decoder.Sebastian Rasmussen
If thirdparty/luratech is populated then this decoder will be preferred over jbig2dec (even if both are present).
2016-04-28Refactor fz_image code cases.Robin Watts
Split compressed images (images based on a compressed buffer) and pixmap images (images based on a pixmap) out into separate subclasses.
2016-04-28Partial image decode.Robin Watts
Update the core fz_get_pixmap_from_image code to allow fetching a subarea of a pixmap. We pass in the required subarea, together with the transformation matrix for the whole image. On return, we have a pixmap at least as big as was requested, and the transformation matrix is updated to map the supplied area to the correct place on the screen. The draw device is updated to use this as required. Everywhere else passes NULLs in, and so gets unchanged behaviour. The standard 'get_pixmap' function has been updated to decode just the required areas of the bitmaps. This means that banded rendering of pages will decode just the image subareas that are required for each band, limiting the memory use. The downside to this is that each band will redecode the image again to extract just the section we want. The image subareas are put into the fz_store in the same way as full images. Currently image areas in the store are only matched when they match exactly; subareas are not identified as being able to use existing images.
2016-04-27Fix 696649: remove fz_rethrow_message calls.Tor Andersson
2016-04-18Fix corruption of file using sanitize.Robin Watts
When sanitizing a file, while cleaning with decompression, I was seeing a flate problem reported. The issue is that pdf_open_filter was passing pdf_open_raw_filter the orig_num as both num and orig_num. This was causing us to find an fz_buffer attached to the (wrong) xref entry and to open that instead of the underlying stream. The fix is to propogate num a bit further.
2016-03-14Make pdf_is_stream work on loaded stream dictionary objects as well.Tor Andersson
2016-03-14Take pdf_obj argument to pdf_is_stream.Tor Andersson
2015-06-29Further tweaks to fz_image handling.Robin Watts
Ensure that subsampling and caching happen in the generic image code, not in the specific. Previously, the subsampling happened only for images that were decoded from streams. Images that were loaded direct were never subsampled and hence were always cached at full size. After this change both classes of image are correctly subsampled, and the subsampled version kept in the cache. This produces various image diffs in the cluster, none of which are noticable to the naked eye.
2015-05-15Support pdf files larger than 2Gig.Robin Watts
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets for files; this allows us to open streams larger than 2Gig. The downsides to this are that: * The xref entries are larger. * All PDF ints are held as 64bit things rather than 32bit things (to cope with /Prev entries, hint stream offsets etc). * All file positions are stored as 64bits rather than 32. The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery in fitz/system.h sets fz_off_t to either int or int64_t as appropriate, and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required. These call the fseeko64 etc functions on linux (and so define _LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
2015-03-30Bug 695549: Avoid returning compressed buffer as uncompressed.Robin Watts
pdf_load_image_stream is supposed to return a buffer containing the uncompressed stream from an object (or, in the case of image streams where an fz_compression_params structure is supplied, a stream decompressed up to the point of the image format compression). We have an optimisation in pdf_load_image_stream to allow it to return the existing buffer from a cached object rather than reloading it again, but as bug 695549 points out, this breaks in the case where the cached stream is compressed. The suggested fix by the bug reporter (Stefan Klein) would work in that it would stop compressed streams being returned as uncompressed ones, but it is not perfect as it could lead to several copies of shortstoppable image streams being loaded (and for streams with null or empty array filters being mistaken for compressed ones). The fix here solves these cases too.
2015-03-24Rework handling of PDF names for speed and memory.Robin Watts
Currently, every PDF name is allocated in a pdf_obj structure, and comparisons are done using strcmp. Given that we can predict most of the PDF names we'll use in a given file, this seems wasteful. The pdf_obj type is opaque outside the pdf-object.c file, so we can abuse it slightly without anyone outside knowing. We collect a sorted list of names used in PDF (resources/pdf/names.txt), and we add a utility (namedump) that preprocesses this into 2 header files. The first (include/mupdf/pdf/pdf-names-table.h, included as part of include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx" entries. These are pdf_obj *'s that callers can use to mean "A PDF object that means literal name 'xxxx'" The second (source/pdf/pdf-name-impl.h) is a C array of names. We therefore update the code so that rather than passing "xxxx" to functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to pdf_dict_get(...)). This is a fairly natural (if widespread) change. The pdf_dict_getp (and sibling) functions that take a path (e.g. "foo/bar/baz") are therefore supplemented with equivalents that take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar, PDF_NAME_baz, NULL)). The actual implementation of this relies on the fact that small pointer values are never valid values. For a given pdf_obj *p, if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal entry in the name table. This enables us to do fast pointer compares and to skip expensive strcmps. Also, bring "null", "true" and "false" into the same style as PDF names. Rather than using full pdf_obj structures for null/true/false, use special pointer values just above the PDF_NAME_ table. This saves memory and makes comparisons easier.
2015-03-24Don't pass interpreter context to pdf_processor opcode callbacks.Tor Andersson
Update buffer and filter processors. Filter both colors and stroke states. Move OCG hiding logic into interpreter.
2015-02-17Add ctx parameter and remove embedded contexts for API regularity.Tor Andersson
Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2015-02-17Rename fz_close_* and fz_free_* to fz_drop_*.Tor Andersson
Rename fz_close to fz_drop_stream. Rename fz_close_archive to fz_drop_archive. Rename fz_close_output to fz_drop_output. Rename fz_free_* to fz_drop_*. Rename pdf_free_* to pdf_drop_*. Rename xps_free_* to xps_drop_*.
2014-12-29Performance optimisation with pdf_cache_object/pdf_get_xref_entryRobin Watts
The recent change to holding pdf xrefs in a sparse format has resulted in a significant decrease in speed (x10). Malc points out that some of this (2x) can be recovered simply by making pdf_cache_object return the entry which it found the object in. This saves us having to immediately call pdf_get_xref_entry again afterwards. I am still thinking about ways to try and get the remaining time back.
2014-12-03Add ZIP file and directory reading module.Tor Andersson
2014-10-28fix memory leaks in load_sample_func and pdf_load_compressed_inline_imageSimon Bünzli
In load_sample_func, the stream is not closed and thus leaked if one of the fz_read_byte or fz_read_bits calls throws (which might happen e.g. on a Deflate data error). In pdf_load_compressed_inline_image, the allocated buffer is not freed if one of the stream initializers or the tile creation throws (fz_open_leecher does not take ownership of the stream).
2014-06-09Fix 695300: don't throw exception on invalid reference number.Tor Andersson
Return the null object rather than throwing an exception when parsing indirect object references with negative object numbers. Do range check for object numbers (1 .. length) when object numbers are used instead. Object number 0 is not a valid object number. It must always be 'free'.
2014-04-01Tidy up code in pdf_load_compressed_inline_imageRobin Watts
After rushing to get the fix for a crash in, I realised the routine could be simplified a bit.
2014-03-25Avoid double closing a stream.Robin Watts
Michael spotted that double closing an fz_stream on an inline image does bad things. Simple fix is not to double close.
2014-03-18Fix operator buffering of inline images.Robin Watts
Previously pdf_process buffer did not understand inline images. In order to make this work without needlessly duplicating complex code from within pdf-op-run, the parsing of inline images has been moved to happen in pdf-interpret.c. When the op_table entry for BI is called it now expects the inline image to be in csi->img and the dictionary object to be in csi->obj. To make this work, we have had to improve the handling of inline images in general. While non-inline images have been loaded and held in memory in their compressed form and only decoded when required, until now we have always loaded and decoded inline images immediately. This has been due to the difficulty in knowing how many bytes of data to read from the stream - we know the length of the stream once uncompressed, but relating this to the compressed length is hard. To cure this we introduce a new type of filter stream, a 'leecher'. We insert a leecher stream before we build the filters required to decode the image. We then read and discard the appropriate number of uncompressed bytes from the filters. This pulls the compressed data through the leecher stream, which stores it in an fz_buffer. Thus images are now always held in their compressed forms in memory. The pdf-op-run implementation is now trivial. The only real complexity in the pdf-op-buffer implementation is the need to ensure that the /Filter entry in the dictionary object matches the exact point at which we backstopped the decompression.
2014-01-10Fix build_filter_chain not to leak if pdf_array_get fails.Robin Watts
In the existing code, if build_filter fails, chain will be freed. If pdf_array_get fails however, it will leak. Rectify this. No specific bug or example file, just observation arising from discussions about previous commit.