summaryrefslogtreecommitdiff
path: root/source/pdf/pdf-repair.c
AgeCommit message (Collapse)Author
2014-01-02Improve PDF repair logic.Robin Watts
When we meet a broken PDF file, we attempt to repair it. We do this by reading tokens from the file and attempting to interpret them as a normal PDF stream. Unfortunately, if the file is corrupt enough so that we start to read from the middle of a stream, and we happen to hit an '(' character, we can go into string reading mode. We can then end up skipping over vast swathes of file that we could otherwise repair. We fix this here by using a new version of the pdf_lex function that refuses to ever return a string. This means we may take more time over skipping things than we did before, but are less likely to skip stuff. We also tweak other parts of the pdf repair logic here. If we hit a badly formed piece of data, clear the num/gen we have stored so that the next plausible piece we get does not get assigned to a random object number.
2014-01-02fix memory leak in pdf_repair_xrefSimon Bünzli
The 0 null object is leaked if a document refers to 0 0 obj before requiring a delayed reparation (seen e.g. with 3324.pdf.asan.3.2585).
2014-01-01don't fail on invalid object streamsSimon Bünzli
At https://code.google.com/p/sumatrapdf/issues/detail?id=2436 , there's a document with an empty xref section which since recently causes a repair to be triggered. Repairs then stop when pdf_repair_obj_stms fails on an object which isn't even required for the document to render. Such broken object streams should rather be ignored same as broken objects are ignored in pdf_init_document.
2013-12-24Bug 694810: Implement late file repair for PDFs.Robin Watts
Currently, if we spot a bad xref as we are reading a PDF in, we can repair that PDF by doing a long exhaustive read of the file. This reconstructs the information that was in the xref, and the file can be opened (and later saved) as normal. If we hit an object that is not in the expected place however, we cannot trigger a repair at that point - so xrefs with duff offsets in (within the bounds of the file) will never be repaired. This commit solves that by triggering a repair (just once) whenever we fail to parse an object in the expected place.
2013-09-27stop checking if the result of fz_read is negativeSimon Bünzli
fz_read used to return a negative value on errors. With the introduction of fz_try/fz_catch, it throws an error instead and always returns non-negative values. This removes the pointless checks.
2013-09-13Fix various compile warnings spotted by the cluster.Robin Watts
2013-07-19Initial work on progressive loadingRobin Watts
We are testing this using a new -p flag to mupdf that sets a bitrate at which data will appear to arrive progressively as time goes on. For example: mupdf -p 102400 pdf_reference17.pdf Details of the scheme used here are presented in docs/progressive.txt
2013-06-28Ensure altered objects are moved to the incremental xref sectionPaul Gardiner
2013-06-25Rid the world of "pdf_document *xref".Robin Watts
For historical reasons lots of the code uses "xref" when talking about a pdf document. Now pdf_xref is a separate type this has become confusing, so replace 'xref' with 'doc' for clarity.
2013-06-25Update pdf_obj's to have a pdf_document field.Robin Watts
Remove the fz_context field to avoid the structure growing.
2013-06-24fix recent regressionszeniko
* at one place, code returns from inside an fz_try which borks up the error stack * pdf_load_xref wrongly assumes that at least one non-empty xref has been read if there were no errors thrown during parsing * pdf_repair_xref skips integers when object numbers are out of range
2013-06-20Rearrange source files.Tor Andersson