mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2018-01-31	Return error token if strings are unterminated.	Tor Andersson

2018-01-31	Return PDF_TOK_ERROR when encountering isolated '>' and ')' characters.	Tor Andersson
	Also return PDF_TOK_ERROR instead of swallowing string opening quotes in pdf_lex_no_string. Also fix the repair code to not skip an extra byte whenever it scans an error token.
2017-12-13	Fix 698785: Catch malformed numbers in PDF lexical scanner.	Tor Andersson
	Return error tokens when parsing numbers with trailing garbage rather than ignoring the extra characters. Also handle error tokens more gracefully in array and dictionary parsing. Treat error tokens as the 'null' keyword and continue parsing.
2017-11-01	Use int64_t for public file API offsets.	Tor Andersson
	Don't mess with conditional compilation with LARGEFILE -- always expose 64-bit file offsets in our public API.
2017-10-05	Remove shadowed variables.	Sebastian Rasmussen

2017-09-22	Skip to next whitespace character instead of aborting when repairing PDF.	Tor Andersson

2017-06-28	Throw on overly long PDF names.	Sebastian Rasmussen
	The architectural limit is 127 bytes according to the PDF specification.
2017-05-27	Bug 697947: Handle Illegal hex codes in PDF names.	Sebastian Rasmussen
	PDF 1.2 and prior treats # in PDF names to be regular characters. PDF 1.2 and later treats # as escape characters for character hex codes. Previously illegal hex codes, e.g. #BX, were partially parsed as escaped hex codes and the illegal remainder parsed as regular characters. Now illegal hex codes are handled as consisting entirely of regular characters. Note that character code 0 is also considered to be an illegal hex code.
2017-05-27	Handle extremely long PDF names.	Sebastian Rasmussen
	Previously the parser would cut these names short and then parse the remainder as a separate name.
2017-04-27	Include required system headers.	Tor Andersson

2017-03-22	Rename fz_putc/puts/printf to fz_write_*.	Tor Andersson
	Rename fz_write to fz_write_data. Rename fz_write_buffer_* and fz_buffer_printf to fz_append_. Be consistent in naming: fz_write_ calls write to fz_output. fz_append_* calls append to fz_buffer. Update documentation.
2017-03-01	Bug 697620: Avoid clash with "isprint".	Robin Watts

2017-01-17	pdf: Convert non-printable keywords into PDF_TOK_ERROR.	Tor Andersson
	All known keywords are printable. Converting non-printable keywords into error tokens means we don't try to print garbage when showing error messages about unknown tokens.
2016-02-03	Move pdf's lex_number routine over to use fast atof.	Robin Watts
	Spot (broken) values that will require special 'acrobat compatible' handling and use the old code for that.
2016-01-15	pdf: Consume entire token before lexing numbers.	Tor Andersson
	"0.00-70" should be parsed as one token, not two tokens as we did.
2016-01-08	Tweak lex_number to avoid (or minimise) underflow	Robin Watts
	Keeps operations in the int domain as long as possible, and only resorts to floats if required.
2015-12-15	Rename fz_buffer_cat to fz_append_buffer.	Tor Andersson

2015-10-02	Bug 696131: Detect some overflow conditions	Robin Watts
	When lexing a number, do NOT check for overflow. This causes loss of data in some files. The current implementation matches Acrobat. When lexing a startxref offset, check for overflow. If found, throw an error.
2015-05-15	Support pdf files larger than 2Gig.	Robin Watts
	If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets for files; this allows us to open streams larger than 2Gig. The downsides to this are that: * The xref entries are larger. * All PDF ints are held as 64bit things rather than 32bit things (to cope with /Prev entries, hint stream offsets etc). * All file positions are stored as 64bits rather than 32. The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery in fitz/system.h sets fz_off_t to either int or int64_t as appropriate, and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required. These call the fseeko64 etc functions on linux (and so define _LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
2015-02-17	Add ctx parameter and remove embedded contexts for API regularity.	Tor Andersson
	Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2015-01-20	don't decode '8' and '9' as octal digits	Simon Bünzli
	At https://github.com/sumatrapdfreader/sumatrapdf/issues/66 there's a document which contains a string (\358) which is parsed as (\360) with the 8 overflowing instead of as (\0358) with the 8 being the first character after the octal escape. This patch restricts octal digits to '0' to '7' to fix that issue.
2014-09-02	Add fz_snprintf and use it for formatting floating point numbers.	Tor Andersson

2014-01-02	Improve PDF repair logic.	Robin Watts
	When we meet a broken PDF file, we attempt to repair it. We do this by reading tokens from the file and attempting to interpret them as a normal PDF stream. Unfortunately, if the file is corrupt enough so that we start to read from the middle of a stream, and we happen to hit an '(' character, we can go into string reading mode. We can then end up skipping over vast swathes of file that we could otherwise repair. We fix this here by using a new version of the pdf_lex function that refuses to ever return a string. This means we may take more time over skipping things than we did before, but are less likely to skip stuff. We also tweak other parts of the pdf repair logic here. If we hit a badly formed piece of data, clear the num/gen we have stored so that the next plausible piece we get does not get assigned to a random object number.
2013-09-24	Bug 694557: Fix infinite loop in pdf_lex.	Robin Watts
	When we read a '>' during lexing, we try to read another char to see if it's another '>'. If not, we warn that it's unexpected, put the char back and retry. Putting the char back fails if the '>' was the last char in the stream as we will then have read EOF. We then loop and reread the '>' resulting in an infinite loop. Simple fix is to check for EOF.
2013-06-20	Rearrange source files.	Tor Andersson