Age | Commit message (Collapse) | Author |
|
Firstly, we avoid compressing streams if they get bigger.
Secondly, we ensure that we always update the Length field.
Seen as part of the investigation into bug 697092, though not the
actual cause. Thanks to Tor for the latter part of the fix.
|
|
|
|
|
|
Closing a device or writer may throw exceptions, but much of the
foreign language bindings (JNI and JS) depend on drop to never throw
an exception (exceptions in finalizers are bad).
|
|
The mark & sweep pass of garbage collection, and resolving indirect objects when grafting objects
was following the full chain of indirect references. In the unusual case where a numbered object
is itself only an indirect reference to another object, this intermediate numbered object would
be missed both when marking for garbage collection, and when copying objects for grafting.
Add a function to resolve only one step for these two uses.
The following is an example of a file that would break during garbage collection if we
follow full indirect reference chains:
%PDF-1.3
1 0 obj
<</Type/Catalog /Foo[2 0 R 3 0 R]>>
endobj
2 0 obj
4 0 R
endobj
3 0 obj
5 0 R
endobj
4 0 obj
<</Length 1>>
stream
A
endstream
endobj
5 0 obj
<</Length 1>>
stream
B
endstream
endobj
|
|
The generation number is only needed for decryption, and is assumed
to be zero or irrelevant for all other uses.
Store the original object number and generation in the xref slot, so
that we can decrypt them even when the objects have been renumbered,
without needing to pass the original object number around through
the stream loading APIs.
|
|
Allows us to remove the out parameter 'transform' from fz_begin_page.
|
|
This silences the many warnings we get when building for x64
in windows.
This does not address any of the warnings we get in thirdparty
libraries - in particular harfbuzz. These look (at a quick
glance) harmless though.
|
|
|
|
Makes it easier to chain function calls.
|
|
Broken due to refactoring. Thanks to Michael for spotting this.
|
|
|
|
Allow us to write a document to an fz_output as opposed to just a
filename.
We cannot write digital signatures in this method though. Will
ponder that in a later commit.
|
|
If a file cannot be saved incrementally, then don't accept that
as an option. In practise this means if someone asks to save
a file incrementally, and it was repaired, or it uses encryption
then throw an error.
Add a new function to ask if it's safe to save a file incrementally,
and use that in the appropriate places.
|
|
|
|
|
|
Use comma-separated list of flags and key/value pairs, for
example: "linearize,resolution=72,colorspace=gray"
|
|
The handling of not-decompressing images/fonts was geared towards
pdfclean usage; but now that we can create new PDF files, it makes
more sense to ask for images and fonts to be compressed, rather than
asking for them not to be decompressed with quirky interaction with
the 'expand' and 'deflate' flags.
If -f or -i are set, we will never decompress images, and we will
compress them if they are uncompressed.
If -d is set, we will first decompress all streams (module -f or -i).
If -z is set, we will then compress all uncompressed streams.
|
|
|
|
|
|
|
|
If we rewrite a page content stream, and then drop that entire page
we shouldn't leak the buffer.
Or to put it another way, when we change the obj for an xref entry,
ditch the cached stm_buf.
|
|
|
|
|
|
|
|
|
|
|
|
When printing a PDF object to a file, if it was a name, then we'd
output without a required \n. For example:
10 0 obj
/SomeNameOrOtherendobj
This would trip gs up.
|
|
|
|
|
|
In preparation of adding pdf_write_document that writes a document
to a fz_output stream.
|
|
Separate naming of functions that save complete files to disk
from functions that write data to streams.
|
|
Use fz_output in debug printing functions.
Use fz_output in pdfshow.
Use fz_output in fz_trace_device instead of stdout.
Use fz_output in pdf-write.c.
Rename fz_new_output_to_filename to fz_new_output_with_path.
Add seek and tell to fz_output.
Remove unused functions like fz_fprintf.
Fix typo in pdf_print_obj.
|
|
When writing incremental xref streams, the opts->use_list entry of the last
object, i.e. of the xref stream object itself, was left uninitialized. This
resulted in a random value, 0 or 1, being written into the xref stream.
Also, always write a newline before the endstream keyword, as that shall be
done for xref streams and should be done for all other streams.
|
|
This fixes bug #696123 by allowing multiple signatures each to be written
to the document in a separate incemental update.
Add count num_incremental_sections to keep track of the number of
incremental sections.
Add xref_base, which can be set between 0 and num_incremental_sections
inclusive to access different versions of the document.
Add disallow_new_increments flag that stops new incremental sections
being provoked by the creation of an xref stream.
Move the unsaved_sigs list from the document structure to the xref
structure. With this commit in place, the lists will never grow beyond
length one, but we've maintained the list structure in case other cases
need supporting in the future.
Add an end offset field to the xref structure, so that during completion
of signatures the document length of the various incremental versions of
the document are available.
Factor out functions for storing unsaved signatures and for checking if
an object is an unsaved signature.
Do deep copy of objects that require the holding of several versions.
|
|
This is work towards bug #696123
|
|
In the incremental case, we should update ofs_list only when actually
writing an object to file.
This is work towards bug #696123.
|
|
We were allocating the ofs array as ints and then filling it
with fz_off_t's.
|
|
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets
for files; this allows us to open streams larger than 2Gig.
The downsides to this are that:
* The xref entries are larger.
* All PDF ints are held as 64bit things rather than 32bit things
(to cope with /Prev entries, hint stream offsets etc).
* All file positions are stored as 64bits rather than 32.
The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery
in fitz/system.h sets fz_off_t to either int or int64_t as appropriate,
and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required.
These call the fseeko64 etc functions on linux (and so define
_LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
|
|
|
|
|
|
The actual fix implemented here is to bale out of pdf_write_document
if we are updating incrementally and the file has not changed.
|
|
Move pdf-write.c over to calling fz_fprintf for all places in we need
printf, and fputs elsewhere.
|
|
Ensure that %010d works.
Ensure that we can output 64 bit values (%ll{d,u,x}).
Ensure that we can output size_t and fz_off_t (%z{d,u,x} and %Z{d,u,x}).
fz_off_t isn't defined yet (it will be introduced by a commit that
depends on this one), so for now, we put a stub definition in printf.c
that we will remove later.
|
|
Currently, every PDF name is allocated in a pdf_obj structure, and
comparisons are done using strcmp. Given that we can predict most
of the PDF names we'll use in a given file, this seems wasteful.
The pdf_obj type is opaque outside the pdf-object.c file, so we can
abuse it slightly without anyone outside knowing.
We collect a sorted list of names used in PDF (resources/pdf/names.txt),
and we add a utility (namedump) that preprocesses this into 2 header
files.
The first (include/mupdf/pdf/pdf-names-table.h, included as part of
include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx"
entries. These are pdf_obj *'s that callers can use to mean "A PDF
object that means literal name 'xxxx'"
The second (source/pdf/pdf-name-impl.h) is a C array of names.
We therefore update the code so that rather than passing "xxxx" to
functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to
pdf_dict_get(...)). This is a fairly natural (if widespread) change.
The pdf_dict_getp (and sibling) functions that take a path (e.g.
"foo/bar/baz") are therefore supplemented with equivalents that
take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar,
PDF_NAME_baz, NULL)).
The actual implementation of this relies on the fact that small
pointer values are never valid values. For a given pdf_obj *p,
if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal
entry in the name table.
This enables us to do fast pointer compares and to skip expensive
strcmps.
Also, bring "null", "true" and "false" into the same style as PDF names.
Rather than using full pdf_obj structures for null/true/false, use
special pointer values just above the PDF_NAME_ table. This saves
memory and makes comparisons easier.
|
|
Update buffer and filter processors.
Filter both colors and stroke states.
Move OCG hiding logic into interpreter.
|
|
|
|
We were failing to allow for the change in length of the hint
stream caused by the ascii encoding when calculating offsets.
|
|
MuPDF (and other PDF readers) treat invalid references as 'null'
objects. For instance, in the supplied file, object 239 is supposedly
free, but a reference is made to it.
When cleaning (or linearising) a file, we renumber objects; such
illegal refs then end up pointing somewhere else.
The workaround here is simply to spot the invalid refs during the
mark phase, and to set the referencing to null.
|
|
In order to be able to watermark etc, we want the ability to
add more operators/resources after page cleaning.
Add a post processing hook to enable this to be done more
easily.
|