summaryrefslogtreecommitdiff
path: root/source/pdf/pdf-parse.c
AgeCommit message (Collapse)Author
2018-01-31Use convenience pdf dictionary/array creation functions.Tor Andersson
2018-01-31Bug 698916: Indirect object numbers must be in range.Sebastian Rasmussen
2018-01-22Bug 698889: Handle unterminated PDF arrays gracefully.Sebastian Rasmussen
Thanks to oss-fuzz for reporting this.
2017-12-13Fix 698785: Catch malformed numbers in PDF lexical scanner.Tor Andersson
Return error tokens when parsing numbers with trailing garbage rather than ignoring the extra characters. Also handle error tokens more gracefully in array and dictionary parsing. Treat error tokens as the 'null' keyword and continue parsing.
2017-11-22Add pdf_new_text_string utility function.Tor Andersson
Create a PDF 'text string' type string from a UTF-8 input string. If the input is plain ASCII, keep it as is, otherwise re-encode it as UTF-16BE.
2017-11-01Use int64_t for public file API offsets.Tor Andersson
Don't mess with conditional compilation with LARGEFILE -- always expose 64-bit file offsets in our public API.
2017-10-12Some more consts.Tor Andersson
2017-09-07Use dict_put_drop/array_push_drop wherever possible.Sebastian Rasmussen
2017-09-07Initialize variables to appease clang scan-build.Sebastian Rasmussen
2017-08-17Add FZ_REPLACEMENT_CHARACTER define for U+FFFD character.Tor Andersson
2017-07-06pdf: Avoid leaking indirect object upon error.Sebastian Rasmussen
2017-04-27Include required system headers.Tor Andersson
2017-03-28pdf: Use FZ_ERROR_SYNTAX code for syntax errors.Tor Andersson
2017-01-17Fix typos.Sebastian Rasmussen
2016-10-07pdf: Support UTF-8 encoded text strings.Tor Andersson
New in PDF 2.0.
2016-10-07pdf: Separate functions to read text strings and text streams as UTF-8.Tor Andersson
The stream loading is used only by the JS code loading.
2016-10-07pdf: Remove unneccessary document argument to pdf_to_utf8 etc.Tor Andersson
2016-09-01pdf: Load/open streams by indirect reference object when possible.Tor Andersson
2016-07-08Safe defaults for pdf_to_rect and pdf_to_matrix.Tor Andersson
Return the empty rectangle and identity matrix when the pdf object is missing or not an array.
2016-07-06pdf: Drop generation number from public interfaces.Tor Andersson
The generation number is only needed for decryption, and is assumed to be zero or irrelevant for all other uses. Store the original object number and generation in the xref slot, so that we can decrypt them even when the objects have been renumbered, without needing to pass the original object number around through the stream loading APIs.
2016-06-17Use 'size_t' instead of int as appropriate.Robin Watts
This silences the many warnings we get when building for x64 in windows. This does not address any of the warnings we get in thirdparty libraries - in particular harfbuzz. These look (at a quick glance) harmless though.
2016-04-27Fix 696649: remove fz_rethrow_message calls.Tor Andersson
2016-03-14Take pdf_obj argument to pdf_is_stream.Tor Andersson
2016-01-05Stylistic naming cleanups.Tor Andersson
2015-10-14pdf: Handle surrogate pairs in pdf_to_utf8.Tor Andersson
2015-05-15Support pdf files larger than 2Gig.Robin Watts
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets for files; this allows us to open streams larger than 2Gig. The downsides to this are that: * The xref entries are larger. * All PDF ints are held as 64bit things rather than 32bit things (to cope with /Prev entries, hint stream offsets etc). * All file positions are stored as 64bits rather than 32. The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery in fitz/system.h sets fz_off_t to either int or int64_t as appropriate, and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required. These call the fseeko64 etc functions on linux (and so define _LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
2015-02-17Add ctx parameter and remove embedded contexts for API regularity.Tor Andersson
Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2014-01-06fix various MSVC warningsSimon Bünzli
Some warnings we'd like to enable for MuPDF and still be able to compile it with warnings as errors using MSVC (2008 to 2013): * C4115: 'timeval' : named type definition in parentheses * C4204: nonstandard extension used : non-constant aggregate initializer * C4295: 'hex' : array is too small to include a terminating null character * C4389: '==' : signed/unsigned mismatch * C4702: unreachable code * C4706: assignment within conditional expression Also, globally disable C4701 which is frequently caused by MSVC not being able to correctly figure out fz_try/fz_catch code flow. And don't define isnan for VS2013 and later where that's no longer needed.
2013-12-24Bug 694810: Implement late file repair for PDFs.Robin Watts
Currently, if we spot a bad xref as we are reading a PDF in, we can repair that PDF by doing a long exhaustive read of the file. This reconstructs the information that was in the xref, and the file can be opened (and later saved) as normal. If we hit an object that is not in the expected place however, we cannot trigger a repair at that point - so xrefs with duff offsets in (within the bounds of the file) will never be repaired. This commit solves that by triggering a repair (just once) whenever we fail to parse an object in the expected place.
2013-09-13Fix various compile warnings spotted by the cluster.Robin Watts
2013-06-25Rid the world of "pdf_document *xref".Robin Watts
For historical reasons lots of the code uses "xref" when talking about a pdf document. Now pdf_xref is a separate type this has become confusing, so replace 'xref' with 'doc' for clarity.
2013-06-25Update pdf_obj's to have a pdf_document field.Robin Watts
Remove the fz_context field to avoid the structure growing.
2013-06-20Rearrange source files.Tor Andersson