Age | Commit message (Collapse) | Author |
|
|
|
|
|
If you define DUMP_LEXER_STREAM than the lexer dumps the input
that it reads from the stream.
|
|
|
|
Also return PDF_TOK_ERROR instead of swallowing string opening quotes in
pdf_lex_no_string.
Also fix the repair code to not skip an extra byte whenever it scans an error
token.
|
|
Return error tokens when parsing numbers with trailing garbage rather than
ignoring the extra characters.
Also handle error tokens more gracefully in array and dictionary parsing.
Treat error tokens as the 'null' keyword and continue parsing.
|
|
Don't mess with conditional compilation with LARGEFILE -- always expose
64-bit file offsets in our public API.
|
|
|
|
|
|
The architectural limit is 127 bytes according to the
PDF specification.
|
|
PDF 1.2 and prior treats # in PDF names to be regular characters.
PDF 1.2 and later treats # as escape characters for character hex
codes. Previously illegal hex codes, e.g. #BX, were partially
parsed as escaped hex codes and the illegal remainder parsed as
regular characters. Now illegal hex codes are handled as
consisting entirely of regular characters. Note that character
code 0 is also considered to be an illegal hex code.
|
|
Previously the parser would cut these names short
and then parse the remainder as a separate name.
|
|
|
|
Rename fz_write to fz_write_data.
Rename fz_write_buffer_* and fz_buffer_printf to fz_append_*.
Be consistent in naming:
fz_write_* calls write to fz_output.
fz_append_* calls append to fz_buffer.
Update documentation.
|
|
|
|
All known keywords are printable. Converting non-printable keywords into
error tokens means we don't try to print garbage when showing error
messages about unknown tokens.
|
|
Spot (broken) values that will require special 'acrobat
compatible' handling and use the old code for that.
|
|
"0.00-70" should be parsed as one token, not two tokens as we did.
|
|
Keeps operations in the int domain as long as possible,
and only resorts to floats if required.
|
|
|
|
When lexing a number, do NOT check for overflow. This causes
loss of data in some files. The current implementation matches
Acrobat.
When lexing a startxref offset, check for overflow. If found, throw
an error.
|
|
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets
for files; this allows us to open streams larger than 2Gig.
The downsides to this are that:
* The xref entries are larger.
* All PDF ints are held as 64bit things rather than 32bit things
(to cope with /Prev entries, hint stream offsets etc).
* All file positions are stored as 64bits rather than 32.
The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery
in fitz/system.h sets fz_off_t to either int or int64_t as appropriate,
and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required.
These call the fseeko64 etc functions on linux (and so define
_LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
|
|
Purge several embedded contexts:
Remove embedded context in fz_output.
Remove embedded context in fz_stream.
Remove embedded context in fz_device.
Remove fz_rebind_stream (since it is no longer necessary).
Remove embedded context in svg_device.
Remove embedded context in XML parser.
Add ctx argument to fz_document functions.
Remove embedded context in fz_document.
Remove embedded context in pdf_document.
Remove embedded context in pdf_obj.
Make fz_page independent of fz_document in the interface.
We shouldn't need to pass the document to all functions handling a page.
If a page is tied to the source document, it's redundant; otherwise it's
just pointless.
Fix reference counting oddity in fz_new_image_from_pixmap.
|
|
At https://github.com/sumatrapdfreader/sumatrapdf/issues/66 there's a
document which contains a string (\358) which is parsed as (\360) with
the 8 overflowing instead of as (\0358) with the 8 being the first
character after the octal escape. This patch restricts octal digits to
'0' to '7' to fix that issue.
|
|
|
|
When we meet a broken PDF file, we attempt to repair it. We do this by
reading tokens from the file and attempting to interpret them as a
normal PDF stream.
Unfortunately, if the file is corrupt enough so that we start to read
from the middle of a stream, and we happen to hit an '(' character,
we can go into string reading mode. We can then end up skipping over
vast swathes of file that we could otherwise repair.
We fix this here by using a new version of the pdf_lex function that
refuses to ever return a string. This means we may take more time
over skipping things than we did before, but are less likely to
skip stuff.
We also tweak other parts of the pdf repair logic here. If we hit a
badly formed piece of data, clear the num/gen we have stored so that
the next plausible piece we get does not get assigned to a random
object number.
|
|
When we read a '>' during lexing, we try to read another char to see
if it's another '>'. If not, we warn that it's unexpected, put the char
back and retry.
Putting the char back fails if the '>' was the last char in the stream
as we will then have read EOF. We then loop and reread the '>' resulting
in an infinite loop. Simple fix is to check for EOF.
|
|
|