Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
|
|
Rename fz_write to fz_write_data.
Rename fz_write_buffer_* and fz_buffer_printf to fz_append_*.
Be consistent in naming:
fz_write_* calls write to fz_output.
fz_append_* calls append to fz_buffer.
Update documentation.
|
|
|
|
|
|
Commit a92f0db5987b408bef0d9b07277c8ff2329e9ce5
introduced a typo causing pdf_sort_dict() to try to
sort non-dict objects. Attempting to do this for
non-dict objects causes a segmentation fault. For
dictionary objects this causes a performance
degradation that has not been noticed.
pdf_sort_dict() is called in two places:
pdf_dict_get_put() and showgrep().
The resson that calling pdf_sort_dict() from
pdf_dict_get_put() does not cause a segmentation fault
is that pdf_dict_get_put() makes sure that the object
is a dictionary before calling pdf_sort_dict(), which
will then decide NOT to sort the dict keys.
showgrep() on the other hand does not make sure that it
is only processing dict objects before calling
pdf_sort_dict() which caused a segmentation fault.
|
|
pdf_array_delete and pdf_dict_put_val_null weren't calling this function.
|
|
It's only used to 'fix' duff indirect references when cleaning PDF
files. Writing general values into dictionaries should be done by key,
not by internal index.
|
|
Previously, attempting to put an object beyond the end of
an array would throw an error. Here we update the code to
allow objects to be placed *exactly* at the end (i.e. to
extend the length by 1).
Update js use of pdf_array_put.
|
|
|
|
A PDF repair can be triggered 'just in time', when we encounter
a problem in the file. The idea is that this can happen without
the enclosing code being aware of it.
Thus the enclosing code may be holding 'borrowed' references
(such as those returned by pdf_dict_get()) at the time when the
repair is triggered. We are therefore at pains to ensure that
the repair does not replace any objects that exist already, so
that the calling code will not have these references unexpectedly
invalidated.
The sole exception to this is when we replace the 'Length' fields
in stream dictionaries with the actual lengths. Bug 697015 shows
exactly this situation causing a reference to become invalid.
The solution implemented here is to add an 'orphan list' to the
document, where we put these (hopefully few, small) objects. These
orphans are kept around until the document is closed.
|
|
We call Memento_addRef etc in fz_keep_impXX functions, so
don't call them in the callers too.
|
|
|
|
|
|
|
|
Only direct PDF name objects should be used as arguments,
indirect PDF name objects cannot be used.
|
|
|
|
This avoids resolving object references which is
important for dictionary keys.
|
|
|
|
|
|
The mark & sweep pass of garbage collection, and resolving indirect objects when grafting objects
was following the full chain of indirect references. In the unusual case where a numbered object
is itself only an indirect reference to another object, this intermediate numbered object would
be missed both when marking for garbage collection, and when copying objects for grafting.
Add a function to resolve only one step for these two uses.
The following is an example of a file that would break during garbage collection if we
follow full indirect reference chains:
%PDF-1.3
1 0 obj
<</Type/Catalog /Foo[2 0 R 3 0 R]>>
endobj
2 0 obj
4 0 R
endobj
3 0 obj
5 0 R
endobj
4 0 obj
<</Length 1>>
stream
A
endstream
endobj
5 0 obj
<</Length 1>>
stream
B
endstream
endobj
|
|
|
|
|
|
This stops Bug693111.pdf giving errors.
|
|
This silences the many warnings we get when building for x64
in windows.
This does not address any of the warnings we get in thirdparty
libraries - in particular harfbuzz. These look (at a quick
glance) harmless though.
|
|
|
|
|
|
|
|
|
|
|
|
Use fz_output in debug printing functions.
Use fz_output in pdfshow.
Use fz_output in fz_trace_device instead of stdout.
Use fz_output in pdf-write.c.
Rename fz_new_output_to_filename to fz_new_output_with_path.
Add seek and tell to fz_output.
Remove unused functions like fz_fprintf.
Fix typo in pdf_print_obj.
|
|
This is work towards supporting several levels of incremental xref,
which in turn is work towards bug #696123. When several levels are
present, the operation will make a copy of the object and that needs
to be done before any change to the object.
|
|
This is work towards supporting several levels of incremental xref,
which in turn, is work towards bug #696123. When several levels of
incremental xref are present there can be objects that appear at
multiple levels and differ between those levels. This deep-copy function
will be used to create new copies before the new version is altered.
|
|
These headers are already included by mupdf/fitz/system.h.
|
|
Commit f533104 introduced optimized handling of pdf names, null, true
and false. That commit handles most object types correctly in
pdf_objcmp() but it does not correctly handle comparisons such as
pdf_objcmp("/Crypt", "true") or pdf_objcmp("null", "/Crypt").
Fixes one issue from bug 696012.
|
|
We were allocating the ofs array as ints and then filling it
with fz_off_t's.
|
|
Fir typo in pdf_dict_del. Issue and fix both provided by Willus
(William Menninger).
|
|
When doing pdf_dict_put, we first call pdf_dict_find to hunt for an
existing entry we can just update.
Recently we introduced a 'location' return from pdf_dict_find that
would (in the non-found case) return the location of where such an
entry should be inserted.
It's just dawned on me that we don't need a separate variable for
this. We continue to return negative numbers for 'not found', but
these negative numbers can contain the insertion point.
|
|
Sebras and Tor spotted that we could get occasional 'warning: cannot seek
backwards' messages. An example command that shows this is:
mutool show pdf_reference17.pdf grep
They further tracked the problem down to the 'sorted' side of the
pdf_dict_find function.
In the binary search, I calculate c to be the comparison value between
pairs of keys. In the case where both keys (names) are in the special
case 'known' range below PDF_OBJ__LIMIT, I use pointer arithmetic for
this. Unfortunately, I was forgetting that the compiler thinks that
pdf_obj *'s are 4 (or 8) bytes in size, so was doing (a-b)/4.
To workaround this I cast both keys to char *'s. This solves the bug.
Thanks to Sebras and Tor for doing the hard work in tracking this down.
|
|
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets
for files; this allows us to open streams larger than 2Gig.
The downsides to this are that:
* The xref entries are larger.
* All PDF ints are held as 64bit things rather than 32bit things
(to cope with /Prev entries, hint stream offsets etc).
* All file positions are stored as 64bits rather than 32.
The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery
in fitz/system.h sets fz_off_t to either int or int64_t as appropriate,
and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required.
These call the fseeko64 etc functions on linux (and so define
_LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
|
|
|
|
Faster, shinier, better.
|
|
Historically pdf_obj was a structure with a header and a union in it.
As time has gone by more stuff has been put into the header, and the
different arms of the union have changed in size. We've even adopted
the idea of different 'kinds' of pdf_obj's being different sizes
(names and strings for examples).
Here we rework the system slightly; we minimise the header, and split
out everything into different structures. Every different 'kind' of
pdf_obj is now it's own structure, just as big as it needs to be.
Key changes:
* refs is now a short rather than an int. We are never going
to need more than 32767 refs (indeed, if we ever need more
than about 3 (10 at the outside), something has gone very
wrong!). This aids structure packing.
* Only arrays, dicts and refs actually need the pdf_document
pointer.
* Only arrays and dicts need the parent_num pointer.
|
|
Currently, every PDF name is allocated in a pdf_obj structure, and
comparisons are done using strcmp. Given that we can predict most
of the PDF names we'll use in a given file, this seems wasteful.
The pdf_obj type is opaque outside the pdf-object.c file, so we can
abuse it slightly without anyone outside knowing.
We collect a sorted list of names used in PDF (resources/pdf/names.txt),
and we add a utility (namedump) that preprocesses this into 2 header
files.
The first (include/mupdf/pdf/pdf-names-table.h, included as part of
include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx"
entries. These are pdf_obj *'s that callers can use to mean "A PDF
object that means literal name 'xxxx'"
The second (source/pdf/pdf-name-impl.h) is a C array of names.
We therefore update the code so that rather than passing "xxxx" to
functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to
pdf_dict_get(...)). This is a fairly natural (if widespread) change.
The pdf_dict_getp (and sibling) functions that take a path (e.g.
"foo/bar/baz") are therefore supplemented with equivalents that
take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar,
PDF_NAME_baz, NULL)).
The actual implementation of this relies on the fact that small
pointer values are never valid values. For a given pdf_obj *p,
if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal
entry in the name table.
This enables us to do fast pointer compares and to skip expensive
strcmps.
Also, bring "null", "true" and "false" into the same style as PDF names.
Rather than using full pdf_obj structures for null/true/false, use
special pointer values just above the PDF_NAME_ table. This saves
memory and makes comparisons easier.
|
|
Update buffer and filter processors.
Filter both colors and stroke states.
Move OCG hiding logic into interpreter.
|
|
MuPDF (and other PDF readers) treat invalid references as 'null'
objects. For instance, in the supplied file, object 239 is supposedly
free, but a reference is made to it.
When cleaning (or linearising) a file, we renumber objects; such
illegal refs then end up pointing somewhere else.
The workaround here is simply to spot the invalid refs during the
mark phase, and to set the referencing to null.
|
|
Purge several embedded contexts:
Remove embedded context in fz_output.
Remove embedded context in fz_stream.
Remove embedded context in fz_device.
Remove fz_rebind_stream (since it is no longer necessary).
Remove embedded context in svg_device.
Remove embedded context in XML parser.
Add ctx argument to fz_document functions.
Remove embedded context in fz_document.
Remove embedded context in pdf_document.
Remove embedded context in pdf_obj.
Make fz_page independent of fz_document in the interface.
We shouldn't need to pass the document to all functions handling a page.
If a page is tied to the source document, it's redundant; otherwise it's
just pointless.
Fix reference counting oddity in fz_new_image_from_pixmap.
|