mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2014-02-10	Tweak handling of PDF arrays during text object operator stream parsing.	Robin Watts
	Acrobat honours Tc and Tw operators found during parsing TJ arrays. We update the code here to cope. Possibly to completely match we should honour other operators too, but this will do for now. This maintains the behaviour of tests_private/pdf/sumatra/916_-_invalid_argument_to_TJ.pdf 916.pdf and improves the behaviour in general.
2014-02-10	Add pdf_is_number.	Robin Watts
	Useful utility missing from our arsenal.
2014-02-10	Add pdf_output_obj function.	Robin Watts
	Reuses the same internals as pdf_fprintf_obj etc.
2014-02-04	Don't use deprecated and/or non-standard Javascript functions.	Tor Andersson
	String.prototype.substr() is deprecated. RegExp.prototype.compile() has never been part of the ECMA standard, and is deprecated in Mozilla's Javascript since 1.5 (at least).
2014-02-04	Improve glyph bounding, outlining and SVG output text.	Robin Watts
	Luiz Henrique de Figueiredo reports that glyphs output from the SVG device contain 'lumpy' outlines. Investigation reveals that this is because the current code extracts the outlines from freetype at unit scale, and then relies on SVG to scale them up. Unfortunately, freetype insists on working in integer maths, so any sort of scaling runs the risk of distorting the outlines. The fix is to change the way we call freetype; we now request an 'UNSCALED' char, and set the required size to be the design size. We then transform the results in the floating point domain ourself. This cures the lumpy outlines, but reveals a second problem, namely that the bbox given for characters is inaccurate (and sometimes too small). Investigation shows that this is again caused by freetypes scaling, so we apply the same trick; ask for the glyph without scaling (as far as possible), and then scale the results down. We also take care to spot the 'ft_hint' flag in the font. If set this indicates that hinting must be performed to ensure that the returned outlines are sane. We therefore take note of this when calculating both bbox and outlines. This means that 'tricky' fonts such as dynalab ones now render correctly. This produces many changes in the bitmaps, the vast majority of which are neutral. The ones that aren't are all progressions.
2014-01-22	Handle cmap table overflow gracefully in range-to-table mappings.	Tor Andersson

2014-01-22	Make fz_tree_lookup iterative rather than recursive.	Tor Andersson

2014-01-22	Use object literals rather than "new Array" objects.	Tor Andersson
	Arrays are intended for numeric arrays, since they have the magic updating of their "length" property which regular objects lack.
2014-01-21	Bug 694900: Avoid valgrind problems when cmap tables fill up.	Robin Watts
	The test file on this bug: de53b4bd41191f02d01a3c39b4880fa8_asan_heap-oob_caba3c_9561_7427.pdf includes a corrupt CMAP. When this is read into memory it produces a CMAP where the table gets too large. This produces lots of warnings from 'add_table', but the calls to add_table all assume that the process completed fine, resulting in range entries being added that point to nonexistent values. The fix is to make add_table return a bool to indicate success or failure, and to only add range entries if the add_table succeeds. Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-17	Bug 694899: Avoid using invalid gstate pointer.	Robin Watts
	When we call pdf_begin_group, this can go away and do lots of drawing. This can result in the gstate stack growing, which can involve a realloc. Any gstate pointer we are holding must therefore be recalculated after such a call. The neatest way to do this is to get pdf_begin_group to return the gstate pointer, thus making it hard to forget to do. This solves: e2a1dda5393f4cb8a446fd8edd9d94f9_asan_heap-uaf_b938cf_2075_2393.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-17	Avoid overflows in floating point causing illegal accesses	Robin Watts
	If the scale is too large, the calculation to determine the required size of a pixmap can overflow. This can lead to negative width/heights being passed in, which confuses the subsampling code, leading to SEGVs.
2014-01-17	Fix more Memento/Valgrind interactions.	Robin Watts
	Seen when valgrinding a memento build of mudraw on: e0e44ed8692671b820de72c6c0a32608_asan_heap-uaf_8c2b76_1530_2026.pdf
2014-01-17	Bug 694896: Ensure that repairs don't lose trailer dict.	Robin Watts
	When we find certain classes of flaw in the file while attempting to read an object, we trigger an automatic repair of the file. This leaves almost all objects unchanged; the sole exception is that of the trailer object (and its sub objects) which can get dropped and recreated. To avoid leaving people holding handles to objects within the trailer dict high and dry, we introduce a 'pre_repair_trailer' object to each xref entry. On a repair, we copy the existing trailer object to this. As we only ever repair once, this is safe. The only known place where this is a problem is when setting up the pdf_crypt for a document; we adapt the code here to allow for potential problems. The example file that shows this up is: 048d14d2f5f0ae31e9a2cde0be66f16a_asan_heap-uaf_86d4ed_3961_3661.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the fuzzing files.
2014-01-17	Bug 694897: Fix valgrind issues with versions	Robin Watts
	If the /Version is a single character string (say "s") then the current code for converting this in pdf_init_document reads off the end of the string. Simple fix is to use fz_atof instead. Same fix for reading the PDF version normally. This solves: 53b830f849d028fb2d528520716e157a_asan_heap-oob_478692_5259_4534.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-16	Bug 694957: fix stack buffer overflow in xps_parse_color	Simon Bünzli
	xps_parse_color happily reads more than FZ_MAX_COLORS values out of a ContextColor array which overflows the passed in samples array. Limiting the number of allowed samples to FZ_MAX_COLORS and make sure to use that constant for all callers fixes the problem. Thanks to Jean-Jamil Khalifé for reporting and investigating the issue and providing a sample exploit file.
2014-01-16	fix memory leaks in pdf_load_jpx and fz_new_image_from_pixmap	Simon Bünzli
	fz_new_image_from_pixmap expects that the pixmap's colorspace has two references which is contrary to expectations. If it instead addrefs the pixmap's colorspace, the only caller pdf_load_jpx can consistently drop the colorspace after passing it to fz_load_jpx. Also, if the contract is that whatever is passed into fz_new_image_from_pixmap belongs to the new image, then the pixmap also has to be dropped on error so that it isn't leaked.
2014-01-16	disable JPEG decoding speed-ups to prevent artifacts	Simon Bünzli
	Using JDCT_FASTEST as rendering method can produce visible artifacts (e.g. in 1960_-_DCT_image_wrongly_decoded_regression_from_1.2_.pdf).
2014-01-16	Bug 694894: Avoid throwing away an object while in use.	Robin Watts
	When we call to execute a pattern, we clear out the pdf_csi (the interpreter state). This involves clearing the stack and throwing away the record of the object we have just parsed. Unfortunately, when filling glyphs with a pattern, that object is still in use. We therefore amend the pdf_run_contents_stream to safely stash the object away and restore it afterwards. This solves this problem, and protects us against any other similar problems that might also arise. This solves: b8e2b57991896bf8120215cfbf7b54bb_asan_heap-uaf_86064f_2362_2587.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-13	Avoid rendering errors caused by linejoins on tiny distances.	Robin Watts
	If we perform a linejoin that ends up being over an impossibly small distance, we can get a rendering error. This is caused by trying to calculate scale = linewidth/sqrtf(len), where len < FLT_EPSILON. Avoid this by rearranging the code slightly - no extra calculations required. Also given that sn == bn at all times within the stroking code, just remove bn. Credit for spotting this problem goes to Simon for tracking the problem with rounding_artifact_due_to_closepath.pdf. My fix just fixes the problem at a lower level than his does.
2014-01-13	tolerate overlong colorspace lookup strings	Simon Bünzli
	At http://code.google.com/p/sumatrapdf/issues/detail?id=2477 , there's a document which has an indexed colorspace whose lookup string contains a trailing character. That character can be safely ignored without rejecting everything depending on such a colorspace.
2014-01-13	Bug 694890: Solve valgrind issues/SEGV due to use of invalid pixmap	Robin Watts
	fz_draw_clip_text changes the value of 'state' during a loop. The 'if (glyph)' part of the loop assumes that it points to gstate[top-1] where the 'path' part of the loop changes it to point to gstate[top]. If we render a "non glyph" glyph, then a "glyph" glyph, we will access an invalid state. This can cause a draw_glyph call on an invalid destination bitmap. The fix is simply not to reset state. Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the fuzzing files.
2014-01-13	Bug 694851: enhance fz_load_system_font	Simon Bünzli
	For SumatraPDF, the following changes are required: * fz_load_system_font is called from pdf_load_builtin_font as well so that Arial, Courier New, etc. can be loaded from the system instead of their Nimbus replacements. In order to distinguish between calls from pdf_load_builtin_font and pdf_load_substitute_font, an is_substitute argument is added. * fz_load_system_cjk_font is added and called from pdf_load_substitute_cjk_font so that a better replacement font can be loaded instead of DroidSansFallback. * Both fz_load_system_font and fz_load_system_cjk_font return fz_font* instead of fz_buffer* so that implementers aren't required to load fonts into memory (SumatraPDF uses fz_new_font_from_file for system fonts). In addition to that, fz_load_system_font_func is renamed to fz_load_system_font_funcs since it now accepts two functions, and the PDF_ROS_* constants are renamed to FZ_ADOBE_* (collection names aren't passed as const char* so that implementers know which collections to expect). For convenience, fz_load_*_font also never throws since currently all callers have further fallbacks available.
2014-01-13	verify that openjpeg actually allocates data	Simon Bünzli
	This can be seen e.g. in: 5db811ac25ef543fd0cfa0873e155329_signal_sigsegv_c9b60f_9636_76.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-13	More fixes for PDF clean.	Robin Watts
	Avoid negative indirections. Don't make indirections to objects that aren't going to be used. Also improve pdf-write.c so that it doesn't call renumberobj on objs that are going to be dropped.
2014-01-13	Memento fixes for working with valgrind.	Robin Watts
	Remember to make blocks defined before writing/reading them.
2014-01-10	Bug 694889: Fix valgrind issues due to empty indexed spaces.	Robin Watts
	If indexed spaces are empty (or truncated) we use garbage values when they are read. Spot this and pad with 0s to at least be consistent. Fixes: 013b2dcbd0207501e922910ac335eb59_asan_heap-oob_a59696_5952_500.pdf 5440f8bc8af12e5f7050e59b7ee008cd_asan_heap-oob_a59dd9_5952_500.pdf fa8c712b03a7b02d6a12856ce042a44e_signal_sigsegv_a59b06_5847_493.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the fuzzing files.
2014-01-10	Solve SEGV in mutool clean with fuzzed file.	Robin Watts
	While attempting to debug a valgrind issue with: 013b2dcbd0207501e922910ac335eb59_asan_heap-oob_a59696_5952_500.pdf I found that mutool -difggg on it failed with a SEGV. This is due to us parsing an array with a large invalid indirection in it (e.g. [123456789 0 R]) and then the renumbering code assuming this is valid and accessing off the end of an array.
2014-01-10	Bug 694885: Avoid stack overflow in ps_run.	Robin Watts
	The ifelse and if operators require special parsing where we convert ps function streams to bytecode. If a malformed stream presents if or ifelse without being preceded by the appropriate { ...} blocks then throw an error. This avoids us potentially calling ps_run recursively in an infinite loop as happens with the test file in this bug. 5f091df77f6600d0927dc36777db2b93_signal_sigabrt_7ffff6d59425_6762_5545.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the fuzzing files.
2014-01-10	Bug 694879: Fix SEGV in draw-simple-scale.	Robin Watts
	Problems caused by the fact that -0x8000000 = 0x80000000. Sidestep the problem for all coords where floats cannot accurately represent them.
2014-01-10	Fix build_filter_chain not to leak if pdf_array_get fails.	Robin Watts
	In the existing code, if build_filter fails, chain will be freed. If pdf_array_get fails however, it will leak. Rectify this. No specific bug or example file, just observation arising from discussions about previous commit.
2014-01-09	prevent two further heap access violations	Simon Bünzli
	pdf_open_raw_renumbered_stream and pdf_open_image_stream both have the same issue that 98a111c8e49916f8f5ac21d11f4627540f9ddd49 fixes.
2014-01-09	Bug 694878: Fix SEGV due to double free	Robin Watts
	When constructing a filter chain, we pass ownership of 'chain' inwards. This means we need to be careful not to double close chain. This fixes: 5df97f8539d31745f1c45cc9e1468825_asan_heap-oob_a59afe_1862_225.pdf a736faf6f4a34b7ad8eff207ba52aa57_asan_heap-oob_a59dd9_5744_4860.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the fuzzing files.
2014-01-09	Add -o option for mutool show.	Tor Andersson
	Windows doesn't like redirecting binary output, so add an explicit filename argument.
2014-01-08	fuzzing fix for null colorspace derefence.	Robin Watts
	Bad annotation appearance streams can cause font_recs to have invalid values in. Avoid this partly by hardening the code against duff values, and partly by setting sane defaults before the parsing. This can be seen in: 33bfbe117bfef7fafc3f927acf50a2e7_signal_sigsegv_81dd96_6257_5205.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-08	Fix fuzzing bug due to float representation limitations.	Robin Watts
	The gel bbox was being stored internally as floats (despite only holding ints). This means that as numbers get large the bbox can become approximate, rather than exact. If the bbox becomes smaller than it should, this causes crashes in the scanline filling code. This is seen with: tests_private/fuzzing/mupdf2/17f8aee51ac776994af0b36195cdadd7_signal_sigsegv_5607be_7308_5912.pdf The solution is simply to use ints rather than floats. Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-08	Fuzzing fix: Overrun in fz_predict_png	Robin Watts
	If a file specifies a silly number of bpp in the PNG predictor it can overrun a buffer. This was shown by: tests_private/fuzzing/mupdf2/013b2dcbd0207501e922910ac335eb59_*.pdf but no longer shows up due to Simons earlier fix. Following discussion we still think it's worth having this fix in, as truncated data streams can cause len < bpp. Possibly we should throw an error here, but I think that's not necessary as we will return the short length, and the image reading code will notice that the image is truncated already. Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the fuzzing files.
2014-01-08	prevent heap access violation in pdf_cache_object	Simon Bünzli
	pdf_load_obj_stm may resize the xref if it finds further objects in the stream, that might however invalidate any pdf_xref_entry hold such as the one in pdf_cache_object. This can be seen e.g. with 7ac3ad9ddad98d10b947a43cf640062f_asan_heap-uaf_930b78_1007_1675.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-08	sanitize crypt revision in pdf_new_crypt	Simon Bünzli
	(Second part of Simons patch - apologies for missing this the first time). This correctly enables the sanitization of the key length needed for 90db34f64037e2a8a5c3b6a518ba4153_asan_heap-oob_9b117e_1197_1802.pdf Thanks to Mateusz Jurczyk and Gynvael Coldwind of the Google Security Team for providing the example files.
2014-01-08	sanitize number of columns in fz_open_faxd	Simon Bünzli
	If columns is quite close to INT_MAX, the column index max overflow in find_changing which causes an access violation in the next getbits. This happens e.g. with 0c76a20163f30ea8ec860c4e588ce337_signal_sigsegv_5e7b28_9115_7127.pdf
2014-01-08	sanitize crypt revision in pdf_new_crypt	Simon Bünzli
	This correctly enables the sanitization of the key length needed for 90db34f64037e2a8a5c3b6a518ba4153_asan_heap-oob_9b117e_1197_1802.pdf
2014-01-08	sanitize values in fz_open_predict	Simon Bünzli
	This fixes a NULL pointer dereference in 2192b04848b2d8210d1a33e3ddeb2742_asan_heap-oob_a5a57d_2745_2844.pdf Also, replace MAXC with FZ_MAX_COLORS.
2014-01-07	Introduce 'document handlers'.	Robin Watts
	We define a document handler for each file type (2 in the case of PDF, one to handle files with the ability to 'run' them, and one without). We then register these handlers with the context at startup, and then call fz_open_document... as usual. This enables people to select the document types they want at will (and even to extend the library with more document types should they wish).
2014-01-06	Bug 694869: Fix indetermisms with broken PNG files.	Robin Watts
	This bug shows 2 problems with our data handling. Firstly, if a zip file entry has less data in the stream than it is declared to have, we would leave the end of the data uninitialised. We now put out a warning, and blank it with zeros. Secondly, if the PNG decompression fails to decode enough data, we don't notice. Now we give a warning and blank the remaining pixels.
2014-01-06	reuse JBIG2Globals	Simon Bünzli
	Certain optimized documents use a rather large common symbol dictionary for all JBIG2 images. Caching these JBIG2Globals speeds up loading and rendering of such documents.
2014-01-06	show jbig2dec warnings/errors in stderr	Simon Bünzli
	This helps debugging issues with JBIG2 images. Conflicts: source/fitz/filter-jbig2.c
2014-01-06	add stub files for JPEG-XR support	Simon Bünzli
	See SumatraPDF's repo for a Windows-only implementation using WIC.
2014-01-06	tolerate slightly broken page trees	Simon Bünzli
	At https://code.google.com/p/sumatrapdf/issues/detail?id=2460 , there's a file with missing /Type keys in the page tree nodes. In that case, leaf nodes and intermediary nodes have to be distinguished in a different way.
2014-01-06	fix MSVC warnings C4054 and C4152	Simon Bünzli
	These warnings are caused by casting function pointers to void* instead of proper function types.
2014-01-06	fix various MSVC warnings	Simon Bünzli
	Some warnings we'd like to enable for MuPDF and still be able to compile it with warnings as errors using MSVC (2008 to 2013): * C4115: 'timeval' : named type definition in parentheses * C4204: nonstandard extension used : non-constant aggregate initializer * C4295: 'hex' : array is too small to include a terminating null character * C4389: '==' : signed/unsigned mismatch * C4702: unreachable code * C4706: assignment within conditional expression Also, globally disable C4701 which is frequently caused by MSVC not being able to correctly figure out fz_try/fz_catch code flow. And don't define isnan for VS2013 and later where that's no longer needed.
2014-01-02	Add rebinding for fz_devices and fz_documents	Robin Watts
	The SVG device needs rebinding as it holds a file. The PDF device needs to rebind the underlying pdf document. All documents need to rebind their underlying streams.