Age | Commit message (Collapse) | Author |
|
Purge several embedded contexts:
Remove embedded context in fz_output.
Remove embedded context in fz_stream.
Remove embedded context in fz_device.
Remove fz_rebind_stream (since it is no longer necessary).
Remove embedded context in svg_device.
Remove embedded context in XML parser.
Add ctx argument to fz_document functions.
Remove embedded context in fz_document.
Remove embedded context in pdf_document.
Remove embedded context in pdf_obj.
Make fz_page independent of fz_document in the interface.
We shouldn't need to pass the document to all functions handling a page.
If a page is tied to the source document, it's redundant; otherwise it's
just pointless.
Fix reference counting oddity in fz_new_image_from_pixmap.
|
|
Rename fz_close to fz_drop_stream.
Rename fz_close_archive to fz_drop_archive.
Rename fz_close_output to fz_drop_output.
Rename fz_free_* to fz_drop_*.
Rename pdf_free_* to pdf_drop_*.
Rename xps_free_* to xps_drop_*.
|
|
Add a new index that quickly maps object number to the first
xref in which an object appears. This appears to get us the
speed back that we lost when moving to sparse xrefs.
|
|
Currently each xref in the file results in an array from 0 to
num_objects. If we have a file that has been updated many times
this causes a huge waste of memory.
Instead we now hold each xref as a list of non-overlapping subsections
(exactly as the file holds them).
Lookup is therefore potentially slower, but only on files where the
xrefs are highly fragmented (i.e. where we would be saving in memory
terms).
Some parts of our code (notably the file writing code that does
garbage collection etc) assumes that lookups of object entry pointers
will not change previous object entry pointers that have been
looked up. To cope with this, and to cope with the case where we are
updating/creating new objects, we introduce the idea of a 'solid'
xref.
A solid xref is one where it has a single subsection record that spans
the entire range of valid object numbers for a file. Once we have
ensured that an xref is 'solid', we can safely work on the pointers
within it without fear of them moving.
We ensure that any 'incremental' xref is solid.
We also ensure that any non-incremental write makes the xref solid.
|
|
If garbage is appended to an encrypted document, there could be another
trailer with /ID but without /Encrypt . The repairing code currently
always uses the last encountered values, but replacing the /ID value
alone can cause decryption to break. One possible solution is to
use the /ID value only when there's either none yet, when there's no
/Encrypt or when there's a matching /Encrypt in the same trailer.
See https://code.google.com/p/sumatrapdf/issues/detail?id=2697 for a
document which Adobe Reader is able to read but MuPDF isn't (it used
to before pdf_lex_no_string was introduced, but that's accidental).
|
|
When we meet a broken PDF file, we attempt to repair it. We do this by
reading tokens from the file and attempting to interpret them as a
normal PDF stream.
Unfortunately, if the file is corrupt enough so that we start to read
from the middle of a stream, and we happen to hit an '(' character,
we can go into string reading mode. We can then end up skipping over
vast swathes of file that we could otherwise repair.
We fix this here by using a new version of the pdf_lex function that
refuses to ever return a string. This means we may take more time
over skipping things than we did before, but are less likely to
skip stuff.
We also tweak other parts of the pdf repair logic here. If we hit a
badly formed piece of data, clear the num/gen we have stored so that
the next plausible piece we get does not get assigned to a random
object number.
|
|
The 0 null object is leaked if a document refers to 0 0 obj before
requiring a delayed reparation (seen e.g. with 3324.pdf.asan.3.2585).
|
|
At https://code.google.com/p/sumatrapdf/issues/detail?id=2436 , there's
a document with an empty xref section which since recently causes a
repair to be triggered. Repairs then stop when pdf_repair_obj_stms fails
on an object which isn't even required for the document to render. Such
broken object streams should rather be ignored same as broken objects
are ignored in pdf_init_document.
|
|
Currently, if we spot a bad xref as we are reading a PDF in, we can
repair that PDF by doing a long exhaustive read of the file. This
reconstructs the information that was in the xref, and the file can
be opened (and later saved) as normal.
If we hit an object that is not in the expected place however, we
cannot trigger a repair at that point - so xrefs with duff offsets
in (within the bounds of the file) will never be repaired.
This commit solves that by triggering a repair (just once) whenever
we fail to parse an object in the expected place.
|
|
fz_read used to return a negative value on errors. With the
introduction of fz_try/fz_catch, it throws an error instead and
always returns non-negative values. This removes the pointless
checks.
|
|
|
|
We are testing this using a new -p flag to mupdf that sets a bitrate at
which data will appear to arrive progressively as time goes on. For
example:
mupdf -p 102400 pdf_reference17.pdf
Details of the scheme used here are presented in docs/progressive.txt
|
|
|
|
For historical reasons lots of the code uses "xref" when talking about
a pdf document. Now pdf_xref is a separate type this has become
confusing, so replace 'xref' with 'doc' for clarity.
|
|
Remove the fz_context field to avoid the structure growing.
|
|
* at one place, code returns from inside an fz_try which borks up the
error stack
* pdf_load_xref wrongly assumes that at least one non-empty xref has
been read if there were no errors thrown during parsing
* pdf_repair_xref skips integers when object numbers are out of range
|
|
|