summaryrefslogtreecommitdiff
path: root/source/pdf/pdf-lex.c
AgeCommit message (Collapse)Author
2016-02-03Move pdf's lex_number routine over to use fast atof.Robin Watts
Spot (broken) values that will require special 'acrobat compatible' handling and use the old code for that.
2016-01-15pdf: Consume entire token before lexing numbers.Tor Andersson
"0.00-70" should be parsed as one token, not two tokens as we did.
2016-01-08Tweak lex_number to avoid (or minimise) underflowRobin Watts
Keeps operations in the int domain as long as possible, and only resorts to floats if required.
2015-12-15Rename fz_buffer_cat to fz_append_buffer.Tor Andersson
2015-10-02Bug 696131: Detect some overflow conditionsRobin Watts
When lexing a number, do NOT check for overflow. This causes loss of data in some files. The current implementation matches Acrobat. When lexing a startxref offset, check for overflow. If found, throw an error.
2015-05-15Support pdf files larger than 2Gig.Robin Watts
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets for files; this allows us to open streams larger than 2Gig. The downsides to this are that: * The xref entries are larger. * All PDF ints are held as 64bit things rather than 32bit things (to cope with /Prev entries, hint stream offsets etc). * All file positions are stored as 64bits rather than 32. The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery in fitz/system.h sets fz_off_t to either int or int64_t as appropriate, and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required. These call the fseeko64 etc functions on linux (and so define _LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
2015-02-17Add ctx parameter and remove embedded contexts for API regularity.Tor Andersson
Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2015-01-20don't decode '8' and '9' as octal digitsSimon Bünzli
At https://github.com/sumatrapdfreader/sumatrapdf/issues/66 there's a document which contains a string (\358) which is parsed as (\360) with the 8 overflowing instead of as (\0358) with the 8 being the first character after the octal escape. This patch restricts octal digits to '0' to '7' to fix that issue.
2014-09-02Add fz_snprintf and use it for formatting floating point numbers.Tor Andersson
2014-01-02Improve PDF repair logic.Robin Watts
When we meet a broken PDF file, we attempt to repair it. We do this by reading tokens from the file and attempting to interpret them as a normal PDF stream. Unfortunately, if the file is corrupt enough so that we start to read from the middle of a stream, and we happen to hit an '(' character, we can go into string reading mode. We can then end up skipping over vast swathes of file that we could otherwise repair. We fix this here by using a new version of the pdf_lex function that refuses to ever return a string. This means we may take more time over skipping things than we did before, but are less likely to skip stuff. We also tweak other parts of the pdf repair logic here. If we hit a badly formed piece of data, clear the num/gen we have stored so that the next plausible piece we get does not get assigned to a random object number.
2013-09-24Bug 694557: Fix infinite loop in pdf_lex.Robin Watts
When we read a '>' during lexing, we try to read another char to see if it's another '>'. If not, we warn that it's unexpected, put the char back and retry. Putting the char back fails if the '>' was the last char in the stream as we will then have read EOF. We then loop and reread the '>' resulting in an infinite loop. Simple fix is to check for EOF.
2013-06-20Rearrange source files.Tor Andersson