Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
|
|
When printing a PDF object to a file, if it was a name, then we'd
output without a required \n. For example:
10 0 obj
/SomeNameOrOtherendobj
This would trip gs up.
|
|
|
|
|
|
In preparation of adding pdf_write_document that writes a document
to a fz_output stream.
|
|
Separate naming of functions that save complete files to disk
from functions that write data to streams.
|
|
Use fz_output in debug printing functions.
Use fz_output in pdfshow.
Use fz_output in fz_trace_device instead of stdout.
Use fz_output in pdf-write.c.
Rename fz_new_output_to_filename to fz_new_output_with_path.
Add seek and tell to fz_output.
Remove unused functions like fz_fprintf.
Fix typo in pdf_print_obj.
|
|
When writing incremental xref streams, the opts->use_list entry of the last
object, i.e. of the xref stream object itself, was left uninitialized. This
resulted in a random value, 0 or 1, being written into the xref stream.
Also, always write a newline before the endstream keyword, as that shall be
done for xref streams and should be done for all other streams.
|
|
This fixes bug #696123 by allowing multiple signatures each to be written
to the document in a separate incemental update.
Add count num_incremental_sections to keep track of the number of
incremental sections.
Add xref_base, which can be set between 0 and num_incremental_sections
inclusive to access different versions of the document.
Add disallow_new_increments flag that stops new incremental sections
being provoked by the creation of an xref stream.
Move the unsaved_sigs list from the document structure to the xref
structure. With this commit in place, the lists will never grow beyond
length one, but we've maintained the list structure in case other cases
need supporting in the future.
Add an end offset field to the xref structure, so that during completion
of signatures the document length of the various incremental versions of
the document are available.
Factor out functions for storing unsaved signatures and for checking if
an object is an unsaved signature.
Do deep copy of objects that require the holding of several versions.
|
|
This is work towards bug #696123
|
|
In the incremental case, we should update ofs_list only when actually
writing an object to file.
This is work towards bug #696123.
|
|
We were allocating the ofs array as ints and then filling it
with fz_off_t's.
|
|
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets
for files; this allows us to open streams larger than 2Gig.
The downsides to this are that:
* The xref entries are larger.
* All PDF ints are held as 64bit things rather than 32bit things
(to cope with /Prev entries, hint stream offsets etc).
* All file positions are stored as 64bits rather than 32.
The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery
in fitz/system.h sets fz_off_t to either int or int64_t as appropriate,
and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required.
These call the fseeko64 etc functions on linux (and so define
_LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
|
|
|
|
|
|
The actual fix implemented here is to bale out of pdf_write_document
if we are updating incrementally and the file has not changed.
|
|
Move pdf-write.c over to calling fz_fprintf for all places in we need
printf, and fputs elsewhere.
|
|
Ensure that %010d works.
Ensure that we can output 64 bit values (%ll{d,u,x}).
Ensure that we can output size_t and fz_off_t (%z{d,u,x} and %Z{d,u,x}).
fz_off_t isn't defined yet (it will be introduced by a commit that
depends on this one), so for now, we put a stub definition in printf.c
that we will remove later.
|
|
Currently, every PDF name is allocated in a pdf_obj structure, and
comparisons are done using strcmp. Given that we can predict most
of the PDF names we'll use in a given file, this seems wasteful.
The pdf_obj type is opaque outside the pdf-object.c file, so we can
abuse it slightly without anyone outside knowing.
We collect a sorted list of names used in PDF (resources/pdf/names.txt),
and we add a utility (namedump) that preprocesses this into 2 header
files.
The first (include/mupdf/pdf/pdf-names-table.h, included as part of
include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx"
entries. These are pdf_obj *'s that callers can use to mean "A PDF
object that means literal name 'xxxx'"
The second (source/pdf/pdf-name-impl.h) is a C array of names.
We therefore update the code so that rather than passing "xxxx" to
functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to
pdf_dict_get(...)). This is a fairly natural (if widespread) change.
The pdf_dict_getp (and sibling) functions that take a path (e.g.
"foo/bar/baz") are therefore supplemented with equivalents that
take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar,
PDF_NAME_baz, NULL)).
The actual implementation of this relies on the fact that small
pointer values are never valid values. For a given pdf_obj *p,
if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal
entry in the name table.
This enables us to do fast pointer compares and to skip expensive
strcmps.
Also, bring "null", "true" and "false" into the same style as PDF names.
Rather than using full pdf_obj structures for null/true/false, use
special pointer values just above the PDF_NAME_ table. This saves
memory and makes comparisons easier.
|
|
Update buffer and filter processors.
Filter both colors and stroke states.
Move OCG hiding logic into interpreter.
|
|
|
|
We were failing to allow for the change in length of the hint
stream caused by the ascii encoding when calculating offsets.
|
|
MuPDF (and other PDF readers) treat invalid references as 'null'
objects. For instance, in the supplied file, object 239 is supposedly
free, but a reference is made to it.
When cleaning (or linearising) a file, we renumber objects; such
illegal refs then end up pointing somewhere else.
The workaround here is simply to spot the invalid refs during the
mark phase, and to set the referencing to null.
|
|
In order to be able to watermark etc, we want the ability to
add more operators/resources after page cleaning.
Add a post processing hook to enable this to be done more
easily.
|
|
Purge several embedded contexts:
Remove embedded context in fz_output.
Remove embedded context in fz_stream.
Remove embedded context in fz_device.
Remove fz_rebind_stream (since it is no longer necessary).
Remove embedded context in svg_device.
Remove embedded context in XML parser.
Add ctx argument to fz_document functions.
Remove embedded context in fz_document.
Remove embedded context in pdf_document.
Remove embedded context in pdf_obj.
Make fz_page independent of fz_document in the interface.
We shouldn't need to pass the document to all functions handling a page.
If a page is tied to the source document, it's redundant; otherwise it's
just pointless.
Fix reference counting oddity in fz_new_image_from_pixmap.
|
|
Rename fz_close to fz_drop_stream.
Rename fz_close_archive to fz_drop_archive.
Rename fz_close_output to fz_drop_output.
Rename fz_free_* to fz_drop_*.
Rename pdf_free_* to pdf_drop_*.
Rename xps_free_* to xps_drop_*.
|
|
Currently each xref in the file results in an array from 0 to
num_objects. If we have a file that has been updated many times
this causes a huge waste of memory.
Instead we now hold each xref as a list of non-overlapping subsections
(exactly as the file holds them).
Lookup is therefore potentially slower, but only on files where the
xrefs are highly fragmented (i.e. where we would be saving in memory
terms).
Some parts of our code (notably the file writing code that does
garbage collection etc) assumes that lookups of object entry pointers
will not change previous object entry pointers that have been
looked up. To cope with this, and to cope with the case where we are
updating/creating new objects, we introduce the idea of a 'solid'
xref.
A solid xref is one where it has a single subsection record that spans
the entire range of valid object numbers for a file. Once we have
ensured that an xref is 'solid', we can safely work on the pointers
within it without fear of them moving.
We ensure that any 'incremental' xref is solid.
We also ensure that any non-incremental write makes the xref solid.
|
|
PDF documents aren't required to end in a linebreak. Objects however
must start on their own line (in particular for broken documents
relying on reparation). For this reason, a linebreak must be inserted
before starting an incremental update.
|
|
Return the null object rather than throwing an exception when parsing
indirect object references with negative object numbers.
Do range check for object numbers (1 .. length) when object numbers
are used instead.
Object number 0 is not a valid object number. It must always be 'free'.
|
|
pdf_write_document still writes the entire xref with references to all
freed objects even if the xref has been compacted which makes the
result of mutool clean -ggg larger than necessary.
|
|
New routine to filter the content streams for pages, xobjects,
type3 charprocs, patterns etc. The filtered streams are guaranteed
to be properly matched with q/Q's, and to not have changed the top
level ctm. Additionally we remove (some) repeated settings of
colors etc. This filtering can be extended to be smarter later.
The idea of this is to both repair after editing, and to leave the
streams in a form that can be easily appended to.
This is preparatory to work on Bates numbering and Watermarking.
Currently the streams produced are uncompressed.
|
|
Avoid negative indirections. Don't make indirections to objects that
aren't going to be used.
Also improve pdf-write.c so that it doesn't call renumberobj on objs
that are going to be dropped.
|
|
While attempting to debug a valgrind issue with:
013b2dcbd0207501e922910ac335eb59_asan_heap-oob_a59696_5952_500.pdf
I found that mutool -difggg on it failed with a SEGV. This is due to
us parsing an array with a large invalid indirection in it (e.g.
[123456789 0 R]) and then the renumbering code assuming this is valid
and accessing off the end of an array.
|
|
Some warnings we'd like to enable for MuPDF and still be able to
compile it with warnings as errors using MSVC (2008 to 2013):
* C4115: 'timeval' : named type definition in parentheses
* C4204: nonstandard extension used : non-constant aggregate initializer
* C4295: 'hex' : array is too small to include a terminating null character
* C4389: '==' : signed/unsigned mismatch
* C4702: unreachable code
* C4706: assignment within conditional expression
Also, globally disable C4701 which is frequently caused by MSVC not
being able to correctly figure out fz_try/fz_catch code flow.
And don't define isnan for VS2013 and later where that's no longer needed.
|
|
In order to prevent this from breaking again, a fz_write_options struct
with default values is allocated locally and used whenever fz_opts is
NULL.
|
|
If /Encrypt is not present on an updated xref of an encrypted document,
that document can no longer be opened. This is required for incremental
saving.
Note: Streams aren't encrypted by pdf_write_document and can't thus
currently be appended to an encrypted document. In that case, saving
non-incrementally will produce a working (non-encrypted) document.
|
|
|
|
|
|
Use of the feature is currently enabled only in the case that a file
that already contains xref streams is being updated incrementally. To
do so in that case is necessary because an old-style xref is then not
permitted.
This fixes bug #694527
|
|
|
|
We are testing this using a new -p flag to mupdf that sets a bitrate at
which data will appear to arrive progressively as time goes on. For
example:
mupdf -p 102400 pdf_reference17.pdf
Details of the scheme used here are presented in docs/progressive.txt
|
|
No more caching a flattened page tree in doc->page_objs/refs.
No more flattening of page resources, rotation and boxes.
Smart page number lookup by following Parent links.
Naive implementation of insert and delet page that doesn't rebalance the trees.
Requires existing page tree to hook into, cannot be used to create a page tree
from scratch.
|
|
|
|
|
|
When moving an object from one xref to a new xref, ensure firstly
that we only drop each object once (by setting it to NULL) and
secondly that it has the correct parent pointer.
|
|
Also add index argument to array_insert.
|