Age | Commit message (Collapse) | Author |
|
|
|
That's where it's actually being used.
|
|
Don't depend on stdio.h for our own I/O functions.
|
|
|
|
A document handler normally only exposes a list of extensions and
mimetypes. Only formats that use some kind of extra detection mechnism
need to supply a recognize() callback, such as xps that can handle
.xps-files unpacked into a directory.
|
|
To make it possible to avoid casting in most cases.
|
|
Instead of having fz_new_XXXX(ctx, type, ...) macros that call
fz_new_XXXX_of_size etc, use fz_new_derived_...
Clearer naming, and doesn't clash with fz_new_document_writer.
|
|
Loading outlines wants to look up all link destinations, and doing the
normal link destination lookups triggers loading all page objects used.
This means we need to parse a lot of objects, which can be quite slow.
We can load the page tree faster by only looking at intermediate page
tree nodes. If we load the page tree and create a reverse lookup
table for use when loading the outline, we can speed up the time to
run 'mutool show pdfref17.pdf outline' from 900ms to 100ms.
|
|
|
|
|
|
Treat such calls as deleting the object, as per pdf_delete_object.
|
|
|
|
|
|
|
|
New PDF Portfolio manipulation API.
Simple mutool 'portfolio' tool for listing/extracting/embedding
files.
|
|
Also expose the argument to JS and JNI.
|
|
Move the definition of the structure contents into new fitz-imp.h
file. Make all code outside of fitz access the buffer through the
defined API.
Add a convenience API for people that want to get buffers as
null terminated C strings.
|
|
|
|
Add API to:
* allow enumeration of layer configs (OCCDs) within PDF files.
* allow selection of layer configs.
* allow enumeration of the "UI" (or "Human readable") form of layer
configs.
* allow selection/toggling of entries in the UI.
|
|
All seen in MSVC, mostly in 64bit builds.
|
|
|
|
All link destinations should be URIs, and a document specific function
can be called to resolve them to actual page numbers.
Outlines have cached page numbers as well as string URIs.
|
|
|
|
As fz_drop_*()/fz_free() all must handle NULL.
|
|
|
|
|
|
|
|
The spec says entries should be 20 bytes long. In practise we
see 19 byte long ones more often than we like. This is due to
the use of a single EOL char rather than 2.
The PCLm files I've seen use 19 byte ones, so update the code
to cope with these.
|
|
|
|
A PDF repair can be triggered 'just in time', when we encounter
a problem in the file. The idea is that this can happen without
the enclosing code being aware of it.
Thus the enclosing code may be holding 'borrowed' references
(such as those returned by pdf_dict_get()) at the time when the
repair is triggered. We are therefore at pains to ensure that
the repair does not replace any objects that exist already, so
that the calling code will not have these references unexpectedly
invalidated.
The sole exception to this is when we replace the 'Length' fields
in stream dictionaries with the actual lengths. Bug 697015 shows
exactly this situation causing a reference to become invalid.
The solution implemented here is to add an 'orphan list' to the
document, where we put these (hopefully few, small) objects. These
orphans are kept around until the document is closed.
|
|
A few commits back, we introduced the fz_key_storable concept
to allow us to cope with objects that were used both as values
within the store and as parts of keys within the store.
This commit worked, but showed up performance problems; when the
store has several million PDF objects in it, bulk changes (such
as dropping a display list or document) could trigger many passes
across the store.
We therefore introduce a mechanism to ameliorate this. These
passes, now known as "reap passes", can be batched together using
fz_defer_reap_start and fz_defer_reap_end.
We trigger this start/end around display list dropping, and around
PDF content stream processing. This should be fine, as deferral
will be interrupted if we ever run our of memory during mallocing.
|
|
|
|
|
|
When we destroy a PDF document, currently we bin everything
from the store. Instead, drop just the objects that are
specifically tied to that document.
Any object tied to the document has a pdf_obj with the
required document pointer in it as the key.
|
|
|
|
|
|
The file is HORRIBLY corrupt, and triggers Sophos to think it's
PDF malware (which it isn't). It does however trigger a use
after free, worked around here.
|
|
The code would SEGV if we were trying to synthesise an appearance
stream for an annotation, and the docs pdf resources table
had not been initialised.
We now intialise the pdf resource tables when we initialise a
pdf device. This is the earliest point we know we are going to
need them, and covers all cases.
|
|
This helps with Memento debugging, and looks neater.
|
|
Closing a device or writer may throw exceptions, but much of the
foreign language bindings (JNI and JS) depend on drop to never throw
an exception (exceptions in finalizers are bad).
|
|
We want to turn pdf_page into a thin wrapper around a pdf_obj, so that
any updates to the underlying PDF objects will be reflected without
having to reload the pdf_page.
|
|
|
|
The mark & sweep pass of garbage collection, and resolving indirect objects when grafting objects
was following the full chain of indirect references. In the unusual case where a numbered object
is itself only an indirect reference to another object, this intermediate numbered object would
be missed both when marking for garbage collection, and when copying objects for grafting.
Add a function to resolve only one step for these two uses.
The following is an example of a file that would break during garbage collection if we
follow full indirect reference chains:
%PDF-1.3
1 0 obj
<</Type/Catalog /Foo[2 0 R 3 0 R]>>
endobj
2 0 obj
4 0 R
endobj
3 0 obj
5 0 R
endobj
4 0 obj
<</Length 1>>
stream
A
endstream
endobj
5 0 obj
<</Length 1>>
stream
B
endstream
endobj
|
|
The generation number is only needed for decryption, and is assumed
to be zero or irrelevant for all other uses.
Store the original object number and generation in the xref slot, so
that we can decrypt them even when the objects have been renumbered,
without needing to pass the original object number around through
the stream loading APIs.
|
|
|
|
|
|
This silences the many warnings we get when building for x64
in windows.
This does not address any of the warnings we get in thirdparty
libraries - in particular harfbuzz. These look (at a quick
glance) harmless though.
|
|
|
|
|
|
|