Age | Commit message (Collapse) | Author |
|
Currently each xref in the file results in an array from 0 to
num_objects. If we have a file that has been updated many times
this causes a huge waste of memory.
Instead we now hold each xref as a list of non-overlapping subsections
(exactly as the file holds them).
Lookup is therefore potentially slower, but only on files where the
xrefs are highly fragmented (i.e. where we would be saving in memory
terms).
Some parts of our code (notably the file writing code that does
garbage collection etc) assumes that lookups of object entry pointers
will not change previous object entry pointers that have been
looked up. To cope with this, and to cope with the case where we are
updating/creating new objects, we introduce the idea of a 'solid'
xref.
A solid xref is one where it has a single subsection record that spans
the entire range of valid object numbers for a file. Once we have
ensured that an xref is 'solid', we can safely work on the pointers
within it without fear of them moving.
We ensure that any 'incremental' xref is solid.
We also ensure that any non-incremental write makes the xref solid.
|
|
PDF documents aren't required to end in a linebreak. Objects however
must start on their own line (in particular for broken documents
relying on reparation). For this reason, a linebreak must be inserted
before starting an incremental update.
|
|
Return the null object rather than throwing an exception when parsing
indirect object references with negative object numbers.
Do range check for object numbers (1 .. length) when object numbers
are used instead.
Object number 0 is not a valid object number. It must always be 'free'.
|
|
pdf_write_document still writes the entire xref with references to all
freed objects even if the xref has been compacted which makes the
result of mutool clean -ggg larger than necessary.
|
|
New routine to filter the content streams for pages, xobjects,
type3 charprocs, patterns etc. The filtered streams are guaranteed
to be properly matched with q/Q's, and to not have changed the top
level ctm. Additionally we remove (some) repeated settings of
colors etc. This filtering can be extended to be smarter later.
The idea of this is to both repair after editing, and to leave the
streams in a form that can be easily appended to.
This is preparatory to work on Bates numbering and Watermarking.
Currently the streams produced are uncompressed.
|
|
Avoid negative indirections. Don't make indirections to objects that
aren't going to be used.
Also improve pdf-write.c so that it doesn't call renumberobj on objs
that are going to be dropped.
|
|
While attempting to debug a valgrind issue with:
013b2dcbd0207501e922910ac335eb59_asan_heap-oob_a59696_5952_500.pdf
I found that mutool -difggg on it failed with a SEGV. This is due to
us parsing an array with a large invalid indirection in it (e.g.
[123456789 0 R]) and then the renumbering code assuming this is valid
and accessing off the end of an array.
|
|
Some warnings we'd like to enable for MuPDF and still be able to
compile it with warnings as errors using MSVC (2008 to 2013):
* C4115: 'timeval' : named type definition in parentheses
* C4204: nonstandard extension used : non-constant aggregate initializer
* C4295: 'hex' : array is too small to include a terminating null character
* C4389: '==' : signed/unsigned mismatch
* C4702: unreachable code
* C4706: assignment within conditional expression
Also, globally disable C4701 which is frequently caused by MSVC not
being able to correctly figure out fz_try/fz_catch code flow.
And don't define isnan for VS2013 and later where that's no longer needed.
|
|
In order to prevent this from breaking again, a fz_write_options struct
with default values is allocated locally and used whenever fz_opts is
NULL.
|
|
If /Encrypt is not present on an updated xref of an encrypted document,
that document can no longer be opened. This is required for incremental
saving.
Note: Streams aren't encrypted by pdf_write_document and can't thus
currently be appended to an encrypted document. In that case, saving
non-incrementally will produce a working (non-encrypted) document.
|
|
|
|
|
|
Use of the feature is currently enabled only in the case that a file
that already contains xref streams is being updated incrementally. To
do so in that case is necessary because an old-style xref is then not
permitted.
This fixes bug #694527
|
|
|
|
We are testing this using a new -p flag to mupdf that sets a bitrate at
which data will appear to arrive progressively as time goes on. For
example:
mupdf -p 102400 pdf_reference17.pdf
Details of the scheme used here are presented in docs/progressive.txt
|
|
No more caching a flattened page tree in doc->page_objs/refs.
No more flattening of page resources, rotation and boxes.
Smart page number lookup by following Parent links.
Naive implementation of insert and delet page that doesn't rebalance the trees.
Requires existing page tree to hook into, cannot be used to create a page tree
from scratch.
|
|
|
|
|
|
When moving an object from one xref to a new xref, ensure firstly
that we only drop each object once (by setting it to NULL) and
secondly that it has the correct parent pointer.
|
|
Also add index argument to array_insert.
|
|
|
|
Correct the naming scheme for pdf_obj_xxx functions.
|
|
|
|
For historical reasons lots of the code uses "xref" when talking about
a pdf document. Now pdf_xref is a separate type this has become
confusing, so replace 'xref' with 'doc' for clarity.
|
|
Remove the fz_context field to avoid the structure growing.
|
|
|
|
|