summaryrefslogtreecommitdiff
path: root/source/pdf
AgeCommit message (Collapse)Author
2014-02-13pdf-util.js: Always use strict equality comparisons.Tor Andersson
2014-02-13pdf-util.js: Use explicit type conversions in AFParseDateEx and AFParseTime.Tor Andersson
2014-02-12pdf-util.js: Simplify MuPDF object.Tor Andersson
2014-02-11tolerate streamed xrefs where object 0 is missingSimon Bünzli
see https://code.google.com/p/sumatrapdf/issues/detail?id=2517 for a document which is broken to the point where it fails to load using reparation but loads successfully if object 0 is implicitly defined.
2014-02-11fix memory leak in 08c632046474e72f2e08e54f31e31a343808f6cbSimon Bünzli
2014-02-10Bug 695021: Fix pdf_insert_page operation with empty page tree.Robin Watts
Patch from Thomas Fach-Pedersen to fix the operation of pdf_insert_page when called with an empty page tree. Many thanks! As noted in the code with a FIXME this currently throws an error. Also, cope with being told to add a page "at" INT_MAX as meaning to add it at the end of the document. Possibly this code should cope with a Root without a Pages entry, or a Pages without a Kids too, but we can fix this in future if it ever becomes a problem.
2014-02-10Move rdb and file entries into pdf_csi.Robin Watts
This makes every pdf_run_XX operator function have the same function type. This paves the way for future changes in this area.
2014-02-10Tweak handling of PDF arrays during text object operator stream parsing.Robin Watts
Acrobat honours Tc and Tw operators found during parsing TJ arrays. We update the code here to cope. Possibly to completely match we should honour other operators too, but this will do for now. This maintains the behaviour of tests_private/pdf/sumatra/916_-_invalid_argument_to_TJ.pdf 916.pdf and improves the behaviour in general.
2014-02-10Add pdf_is_number.Robin Watts
Useful utility missing from our arsenal.
2014-02-10Add pdf_output_obj function.Robin Watts
Reuses the same internals as pdf_fprintf_obj etc.
2014-02-04Don't use deprecated and/or non-standard Javascript functions.Tor Andersson
String.prototype.substr() is deprecated. RegExp.prototype.compile() has never been part of the ECMA standard, and is deprecated in Mozilla's Javascript since 1.5 (at least).
2014-01-22Handle cmap table overflow gracefully in range-to-table mappings.Tor Andersson
2014-01-22Use object literals rather than "new Array" objects.Tor Andersson
Arrays are intended for numeric arrays, since they have the magic updating of their "length" property which regular objects lack.
2014-01-21Bug 694900: Avoid valgrind problems when cmap tables fill up.Robin Watts
The test file on this bug: de53b4bd41191f02d01a3c39b4880fa8_asan_heap-oob_caba3c_9561_7427.pdf includes a corrupt CMAP. When this is read into memory it produces a CMAP where the table gets too large. This produces lots of warnings from 'add_table', but the calls to add_table all assume that the process completed fine, resulting in range entries being added that point to nonexistent values. The fix is to make add_table return a bool to indicate success or failure, and to only add range entries if the add_table succeeds. Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-17Bug 694899: Avoid using invalid gstate pointer.Robin Watts
When we call pdf_begin_group, this can go away and do lots of drawing. This can result in the gstate stack growing, which can involve a realloc. Any gstate pointer we are holding must therefore be recalculated after such a call. The neatest way to do this is to get pdf_begin_group to return the gstate pointer, thus making it hard to forget to do. This solves: e2a1dda5393f4cb8a446fd8edd9d94f9_asan_heap-uaf_b938cf_2075_2393.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-17Bug 694896: Ensure that repairs don't lose trailer dict.Robin Watts
When we find certain classes of flaw in the file while attempting to read an object, we trigger an automatic repair of the file. This leaves almost all objects unchanged; the sole exception is that of the trailer object (and its sub objects) which can get dropped and recreated. To avoid leaving people holding handles to objects within the trailer dict high and dry, we introduce a 'pre_repair_trailer' object to each xref entry. On a repair, we copy the existing trailer object to this. As we only ever repair once, this is safe. The only known place where this is a problem is when setting up the pdf_crypt for a document; we adapt the code here to allow for potential problems. The example file that shows this up is: 048d14d2f5f0ae31e9a2cde0be66f16a_asan_heap-uaf_86d4ed_3961_3661.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the fuzzing files.
2014-01-17Bug 694897: Fix valgrind issues with versionsRobin Watts
If the /Version is a single character string (say "s") then the current code for converting this in pdf_init_document reads off the end of the string. Simple fix is to use fz_atof instead. Same fix for reading the PDF version normally. This solves: 53b830f849d028fb2d528520716e157a_asan_heap-oob_478692_5259_4534.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-16fix memory leaks in pdf_load_jpx and fz_new_image_from_pixmapSimon Bünzli
fz_new_image_from_pixmap expects that the pixmap's colorspace has two references which is contrary to expectations. If it instead addrefs the pixmap's colorspace, the only caller pdf_load_jpx can consistently drop the colorspace after passing it to fz_load_jpx. Also, if the contract is that whatever is passed into fz_new_image_from_pixmap belongs to the new image, then the pixmap also has to be dropped on error so that it isn't leaked.
2014-01-16Bug 694894: Avoid throwing away an object while in use.Robin Watts
When we call to execute a pattern, we clear out the pdf_csi (the interpreter state). This involves clearing the stack and throwing away the record of the object we have just parsed. Unfortunately, when filling glyphs with a pattern, that object is still in use. We therefore amend the pdf_run_contents_stream to safely stash the object away and restore it afterwards. This solves this problem, and protects us against any other similar problems that might also arise. This solves: b8e2b57991896bf8120215cfbf7b54bb_asan_heap-uaf_86064f_2362_2587.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-13tolerate overlong colorspace lookup stringsSimon Bünzli
At http://code.google.com/p/sumatrapdf/issues/detail?id=2477 , there's a document which has an indexed colorspace whose lookup string contains a trailing character. That character can be safely ignored without rejecting everything depending on such a colorspace.
2014-01-13Bug 694851: enhance fz_load_system_fontSimon Bünzli
For SumatraPDF, the following changes are required: * fz_load_system_font is called from pdf_load_builtin_font as well so that Arial, Courier New, etc. can be loaded from the system instead of their Nimbus replacements. In order to distinguish between calls from pdf_load_builtin_font and pdf_load_substitute_font, an is_substitute argument is added. * fz_load_system_cjk_font is added and called from pdf_load_substitute_cjk_font so that a better replacement font can be loaded instead of DroidSansFallback. * Both fz_load_system_font and fz_load_system_cjk_font return fz_font* instead of fz_buffer* so that implementers aren't required to load fonts into memory (SumatraPDF uses fz_new_font_from_file for system fonts). In addition to that, fz_load_system_font_func is renamed to fz_load_system_font_funcs since it now accepts two functions, and the PDF_ROS_* constants are renamed to FZ_ADOBE_* (collection names aren't passed as const char* so that implementers know which collections to expect). For convenience, fz_load_*_font also never throws since currently all callers have further fallbacks available.
2014-01-13More fixes for PDF clean.Robin Watts
Avoid negative indirections. Don't make indirections to objects that aren't going to be used. Also improve pdf-write.c so that it doesn't call renumberobj on objs that are going to be dropped.
2014-01-10Bug 694889: Fix valgrind issues due to empty indexed spaces.Robin Watts
If indexed spaces are empty (or truncated) we use garbage values when they are read. Spot this and pad with 0s to at least be consistent. Fixes: 013b2dcbd0207501e922910ac335eb59_asan_heap-oob_a59696_5952_500.pdf 5440f8bc8af12e5f7050e59b7ee008cd_asan_heap-oob_a59dd9_5952_500.pdf fa8c712b03a7b02d6a12856ce042a44e_signal_sigsegv_a59b06_5847_493.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the fuzzing files.
2014-01-10Solve SEGV in mutool clean with fuzzed file.Robin Watts
While attempting to debug a valgrind issue with: 013b2dcbd0207501e922910ac335eb59_asan_heap-oob_a59696_5952_500.pdf I found that mutool -difggg on it failed with a SEGV. This is due to us parsing an array with a large invalid indirection in it (e.g. [123456789 0 R]) and then the renumbering code assuming this is valid and accessing off the end of an array.
2014-01-10Bug 694885: Avoid stack overflow in ps_run.Robin Watts
The ifelse and if operators require special parsing where we convert ps function streams to bytecode. If a malformed stream presents if or ifelse without being preceded by the appropriate { ...} blocks then throw an error. This avoids us potentially calling ps_run recursively in an infinite loop as happens with the test file in this bug. 5f091df77f6600d0927dc36777db2b93_signal_sigabrt_7ffff6d59425_6762_5545.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the fuzzing files.
2014-01-10Fix build_filter_chain not to leak if pdf_array_get fails.Robin Watts
In the existing code, if build_filter fails, chain will be freed. If pdf_array_get fails however, it will leak. Rectify this. No specific bug or example file, just observation arising from discussions about previous commit.
2014-01-09prevent two further heap access violationsSimon Bünzli
pdf_open_raw_renumbered_stream and pdf_open_image_stream both have the same issue that 98a111c8e49916f8f5ac21d11f4627540f9ddd49 fixes.
2014-01-09Bug 694878: Fix SEGV due to double freeRobin Watts
When constructing a filter chain, we pass ownership of 'chain' inwards. This means we need to be careful not to double close chain. This fixes: 5df97f8539d31745f1c45cc9e1468825_asan_heap-oob_a59afe_1862_225.pdf a736faf6f4a34b7ad8eff207ba52aa57_asan_heap-oob_a59dd9_5744_4860.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the fuzzing files.
2014-01-08fuzzing fix for null colorspace derefence.Robin Watts
Bad annotation appearance streams can cause font_recs to have invalid values in. Avoid this partly by hardening the code against duff values, and partly by setting sane defaults before the parsing. This can be seen in: 33bfbe117bfef7fafc3f927acf50a2e7_signal_sigsegv_81dd96_6257_5205.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-08prevent heap access violation in pdf_cache_objectSimon Bünzli
pdf_load_obj_stm may resize the xref if it finds further objects in the stream, that might however invalidate any pdf_xref_entry hold such as the one in pdf_cache_object. This can be seen e.g. with 7ac3ad9ddad98d10b947a43cf640062f_asan_heap-uaf_930b78_1007_1675.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-08sanitize crypt revision in pdf_new_cryptSimon Bünzli
(Second part of Simons patch - apologies for missing this the first time). This correctly enables the sanitization of the key length needed for 90db34f64037e2a8a5c3b6a518ba4153_asan_heap-oob_9b117e_1197_1802.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-08sanitize crypt revision in pdf_new_cryptSimon Bünzli
This correctly enables the sanitization of the key length needed for 90db34f64037e2a8a5c3b6a518ba4153_asan_heap-oob_9b117e_1197_1802.pdf
2014-01-07Introduce 'document handlers'.Robin Watts
We define a document handler for each file type (2 in the case of PDF, one to handle files with the ability to 'run' them, and one without). We then register these handlers with the context at startup, and then call fz_open_document... as usual. This enables people to select the document types they want at will (and even to extend the library with more document types should they wish).
2014-01-06reuse JBIG2GlobalsSimon Bünzli
Certain optimized documents use a rather large common symbol dictionary for all JBIG2 images. Caching these JBIG2Globals speeds up loading and rendering of such documents.
2014-01-06tolerate slightly broken page treesSimon Bünzli
At https://code.google.com/p/sumatrapdf/issues/detail?id=2460 , there's a file with missing /Type keys in the page tree nodes. In that case, leaf nodes and intermediary nodes have to be distinguished in a different way.
2014-01-06fix MSVC warnings C4054 and C4152Simon Bünzli
These warnings are caused by casting function pointers to void* instead of proper function types.
2014-01-06fix various MSVC warningsSimon Bünzli
Some warnings we'd like to enable for MuPDF and still be able to compile it with warnings as errors using MSVC (2008 to 2013): * C4115: 'timeval' : named type definition in parentheses * C4204: nonstandard extension used : non-constant aggregate initializer * C4295: 'hex' : array is too small to include a terminating null character * C4389: '==' : signed/unsigned mismatch * C4702: unreachable code * C4706: assignment within conditional expression Also, globally disable C4701 which is frequently caused by MSVC not being able to correctly figure out fz_try/fz_catch code flow. And don't define isnan for VS2013 and later where that's no longer needed.
2014-01-02Add rebinding for fz_devices and fz_documentsRobin Watts
The SVG device needs rebinding as it holds a file. The PDF device needs to rebind the underlying pdf document. All documents need to rebind their underlying streams.
2014-01-02Improve PDF repair logic.Robin Watts
When we meet a broken PDF file, we attempt to repair it. We do this by reading tokens from the file and attempting to interpret them as a normal PDF stream. Unfortunately, if the file is corrupt enough so that we start to read from the middle of a stream, and we happen to hit an '(' character, we can go into string reading mode. We can then end up skipping over vast swathes of file that we could otherwise repair. We fix this here by using a new version of the pdf_lex function that refuses to ever return a string. This means we may take more time over skipping things than we did before, but are less likely to skip stuff. We also tweak other parts of the pdf repair logic here. If we hit a badly formed piece of data, clear the num/gen we have stored so that the next plausible piece we get does not get assigned to a random object number.
2014-01-02Cull code unused as a result of the "tolerate inline images..." fix.Robin Watts
Remove code that's not used any more as a result of the previous fix, plus some code that was unused anyway.
2014-01-02fix memory leak in pdf_repair_xrefSimon Bünzli
The 0 null object is leaked if a document refers to 0 0 obj before requiring a delayed reparation (seen e.g. with 3324.pdf.asan.3.2585).
2014-01-02Fix memory leak in pdf_xref_size_from_old_trailer.Robin Watts
Thanks to Simon for spotting the original problem. This is a slight tweak on the patch he supplied.
2014-01-02Bug 694569: suspicious for loop in pdf-device.cPaul Gardiner
Replace an explicit i = i by a comment in a for loop where i is already at the correct starting value.
2014-01-02Bug 694729: Rounding the InkPaul Gardiner
Use round caps and joins so as to better match the result of drawing, and also so that single dots display. Thanks to Michael Cadilhac for the suggestion.
2014-01-01don't fail on invalid object streamsSimon Bünzli
At https://code.google.com/p/sumatrapdf/issues/detail?id=2436 , there's a document with an empty xref section which since recently causes a repair to be triggered. Repairs then stop when pdf_repair_obj_stms fails on an object which isn't even required for the document to render. Such broken object streams should rather be ignored same as broken objects are ignored in pdf_init_document.
2013-12-30Bug 694564: Move pdf_lookup_page_obj to be iterativeRobin Watts
Avoid recursion to avoid stack overflows.
2013-12-24Bug 694810: Implement late file repair for PDFs.Robin Watts
Currently, if we spot a bad xref as we are reading a PDF in, we can repair that PDF by doing a long exhaustive read of the file. This reconstructs the information that was in the xref, and the file can be opened (and later saved) as normal. If we hit an object that is not in the expected place however, we cannot trigger a repair at that point - so xrefs with duff offsets in (within the bounds of the file) will never be repaired. This commit solves that by triggering a repair (just once) whenever we fail to parse an object in the expected place.
2013-12-23Bug 694712: Do not create empty Contents stream for new pages.Robin Watts
Empty Contents streams are not valid - they need a length at least. The alternative approach would be to put /Length 0 and update it later.
2013-12-23Bug 694715: Fix typo in error messageRobin Watts
Thanks to Michael Cadilhac for spotting this.
2013-12-23Bug 694794: Fix blendmode use in pdf device.Robin Watts
Previously we were setting blendmode in the created form XObjects transparency group definition. This didn't work as PDF readers don't look for it there. Now we set it in the calling stream's resources, and set it before calling the group.