Age | Commit message (Collapse) | Author |
|
PDFDocEncoding for crypt revisions <= 4, UTF-8 for newer.
|
|
|
|
Added primarily for use by SumatraPDF.
Thanks to zeniko.
|
|
This is faster on ARM in particular. The primary changes involve
fz_matrix, fz_rect and fz_bbox.
Rather than passing 'fz_rect r' into a function, we now consistently
pass 'const fz_rect *r'. Where a rect is passed in and modified, we
miss the 'const' off. Where possible, we return the pointer to the
modified structure to allow 'chaining' of expressions.
The basic upshot of this work is that we do far fewer copies of
rectangle/matrix structures, and all the copies we do are explicit.
This has opened the way to other optimisations, also performed in
this commit.
Rather than using expressions like:
fz_concat(fz_scale(sx, sy), fz_translate(tx, ty))
we now have fz_pre_{scale,translate,rotate} functions. These
can be implemented much more efficiently than doing the fully
fledged matrix multiplication that fz_concat requires.
We add fz_rect_{min,max} functions to return pointers to the
min/max points of a rect. These can be used to in transformations
to directly manipulate values.
With a little casting in the path transformation code we can avoid
more needless copying.
We rename fz_widget_bbox to the more consistent fz_bound_widget.
|
|
|
|
|
|
|
|
If a colorspace refers to itself as a base, we can get an infinite
recursion and hence stack overflow. Thanks to zeniko for pointing out
that this occurs in embedded CMAPs and stitching functions. Also
solved here.
To avoid having to keep a long list of the objects we've traversed
through, extend the pdf_dict_mark functions to work on all pdf objects,
and hence rename them as pdf_obj_mark etc. Thanks to zeniko again for
feedback on this way of working.
Problem found in a test file, 3882.pdf.SIGSEGV.99.3204 supplied
by Mateusz "j00ru" Jurczyk and Gynvael Coldwind of the Google
Security Team using Address Sanitizer. Many thanks!
|
|
|
|
Only Fade, Wipe and Blinds supported so far.
Hit 'p' in the viewer to go into 'presentation' mode. Page swaps
then transition from page to page. Pages auto advance until key
or mouse is used.
|
|
On the whole we avoid using const within MuPDF, but bug 693350
highlights cases where this can cause a problem with C++.
In C, if you do: foo("bar"); then "bar" has type char *.
In C++, if you do foo("bar"); then "bar" has type const char *.
This means that any calls to the MuPDF library from C++ that take
strings give warnings.
The fix is simple, so it seems to be worthwhile adding a few consts.
None of our internal data structures are affected in any way by
this change.
Thanks to Franz Fellner for pointing out this issue.
|
|
Does the same as pdf_dict_puts, but guarantees to always drop the
value passed in (even if the function throws an error). This
allows calling code to have a much simpler life in some cases.
Update pdf_write to make use of this.
Also, fix pdf_dict_puts so it doesn't leak the key object that it
creates in the event of an error while growing the dictionary.
|
|
Conflicts:
pdf/pdf_xref_aux.c
|
|
Conflicts:
cbz/mucbz.c
pdf/pdf_parse.c
pdf/pdf_form.c
xps/xps_zip.c
|
|
Use a "magic" string for filetype detection: filename or mime-type.
|
|
Should have been pdf_new_name ever since the pre 1.0 rename, but
evidently we missed it.
|
|
Conflicts:
pdf/mupdf-internal.h
pdf/pdf_font.c
|
|
Also change first argument from fz_context to pdf_document in each
of pdf_to_utf8, pdf_to_utf8_name, pdf_to_ucs2 and pdf_to_ucs2_name
|
|
|
|
Rather than having a dedicated call to enumerate the rectangles for the
annotations on a page, add an interface for enumerating annotations
with accessor functions. Currently the only accessor function is
the one to get the annotation rectangle.
Use this new scheme in place of fz_bound_annots within mudraw.
Also use this scheme to set the caret cursor in the viewer when over
a data field.
|
|
We add a new fz_bound_annots function (and associated pdf_bound_annots
function) that calls a given callback with the page rectangle of the
annotations on a given page. This is marked as being a 'temporary'
function, so we can remove it/change it in future if required. It
seems likely that we'll want to have some sort of 'iterate over
annotations' function eventually, and this does the job for now.
Add a -j flag to mudraw that outputs a simple mujstest script.
For each page with annotations, the script jumps to that page, then
for each annotation on the page, it sets some text to be entered,
and clicks the annotation. In the case of text fields, this will cause
the text to be entered into that text field; in the case of buttons
it will execute the button.
At the end of each page with annotations, the script is told to
snapshot the page.
These test scripts are not designed to be full tests, but they do
at least provide an easy way for us to generate scripts where
every field in our test suite is interacted with.
|
|
|
|
|
|
Extend mupdfclean to have a new -l file that writes the file
linearized. This should still be considered experimental
When writing a pdf file, analyse object use, flatten resource use,
reorder the objects, generate a hintstream and output with linearisaton
parameters.
This is enough for Acrobat to accept the file as being optimised
for Fast Web View. We ought to add more tables to the hintstream
in some cases, but I doubt anyone actually uses it, the spec is so
badly written. Certainly acrobat accepts the file as being optimised
for 'Fast Web View'.
Update fz_dict_put to allow for us adding a reference to the dictionary
that is the sole owner of that reference already (i.e. don't drop then
keep something that has a reference count of just 1).
Update pdf_load_image_stream to use the stm_buf from the xref if there
is one.
Update pdf_close_document to discard any stm_bufs it may be holding.
Update fz_dict_put to be pdf_dict_put - this was missed in a renaming
ages ago and has been inconsistent since.
|
|
Needs more work to use the linked list of free xref slots.
|
|
Make a separate constructor function that does not link in the
interpreter, so we can save space in the mubusy binary by not
including the font and cmap resources.
|
|
Divides large format pdfs into a new pdf with multiple pages, that tile
the original PDF.
|
|
Expose pdf_write function through the document interface.
|
|
Debug printing functions: debug -> print.
Accessors: get noun attribute -> noun attribute.
Find -> lookup when the returned value is not reference counted.
pixmap_with_rect -> pixmap_with_bbox.
We are reserving the word "find" to mean lookups that give ownership
of objects to the caller. Lookup is used in other places where the
ownership is not transferred, or simple values are returned.
The rename is done by the sed script in scripts/rename3.sed
|
|
|
|
Attempt to separate public API from internal functions.
|
|
Currently, we are in the slightly strange position of having
the PDF specific object types as part of fitz. Here we pull
them out into the pdf layer instead. This has been made possible
by the recent changes to make the store no longer be tied to
having fz_obj's as keys.
Most of this work is a simple huge rename; to help customers who
may have code that use such functions we have provided a sed
script to do the renaming; scripts/rename2.sed.
Various other small tweaks are required; the store used to have
some debugging code that still required knowledge of fz_obj
types - we extract that into a nicer 'type' based function
pointer. Also, the type 3 font handling used to have an fz_obj
pointer for type 3 resources, and therefore needed to know how
to free this; this has become a void * with a function to free
it.
|
|
More changes still to come.
|
|
|
|
A huge amount (20%+ on some files) of our runtime is spent in
fz_atof. A survey of results on the net suggests we will get
much better speed by writing our own atof.
Part of the job of doing this involves parsing the string to
identify the component parts of the number - ludicrously, we
are already doing this as part of the lexing process, so it
would make sense to do the atoi/atof as part of this process.
In order to do this, we need somewhere to store the lexed
results; rather than add a float * and an int * to every single
pdf_lex call, we generalise the calls to pass a pdf_lexbuf *
pointer instead of separate buffer/max/string length pointers.
This should help us overall.
|
|
Introduce a new 'fz_image' type; this type contains rudimentary
information about images (such as native, size, colorspace etc)
and a function to call to get a pixmap of that image (with a
size hint).
Instead of passing pixmaps through the device interface (and
holding pixmaps in the display list) we now pass images instead.
The rendering routines therefore call fz_image_to_pixmap to get
pixmaps to render, and fz_pixmap_drop those afterwards.
The file format handling routines therefore need to produce
images rather than pixmaps; xps and cbz currently just wrap
pixmaps as images. PDF is more involved.
The stream handling routines in PDF have been altered so that
they can recognise when the last stream entry in a filter
dictionary is an image decoding filter. Rather than applying
this filter, they read and store the parameters into a
pdf_image_params structure, and stop decoding at that point.
This allows us to read the compressed data for an image into
memory as a block. We can then restart the image decode process
later.
pdf_images therefore consist of the compressed image data for
images. When a pixmap is requested for such an image, the code
checks to see if we have one (of an appropriate size), and if
not, decodes it.
The size hint is used to determine whether it is possible to
subsample the image; currently this is only supported for
JPEGs, but we could add generic subsampling code later.
In order to handle caching the produced images, various changes
have been made to the store and the underlying hash table.
Previously the store was indexed purely by fz_obj keys; we don't
have an fz_obj key any more, so have extended the store by adding
a concept of a key 'type'. A key type is a pointer to a set of
functions that keep/drop/compare and make a hashable key from
a key pointer.
We make a pdf_store.c file that contains functions to offer the
existing fz_obj based functions, and add a new 'type' for keys
(based on the fz_image handle, and the subsample factor) in the
pdf_image.c file.
While working on this, a problem became apparent in the existing
store codel; fz_obj objects had no protection on their reference
counts, hence an interpreter thread could try to alter a ref count
at the same time as a malloc caused an eviction from the store.
This has been solved by using the alloc lock as protection. This in
turn requires some tweaks to the code to make sure we don't try
and keep/drop fz_obj's from the store code while the alloc lock is
held.
A side effect of this work is that when a hash table is created, we
inform it what lock should be used to protect its innards (if any).
If the alloc lock is used, the insert method knows to drop/retake it
to allow it to safely expand the hash table. Callers to the hash
functions have the responsibility of taking/dropping the appropriate
lock, and ensuring that they cope with the possibility that insert
might drop the alloc lock, causing race conditions.
|
|
|
|
|
|
More aesthetically pleasing version.
|
|
Word space should only be applied when the codepoint is 32, and
is read from a single byte encoding region. Ghostscript gets
this wrong too.
|
|
|
|
|
|
Require that clients call pdf_needs_password/pdf_authenticate_password
instead. For dumb clients, we still allow for decrypting a file with
a blank password without calling those functions.
|
|
When we moved over to a context based system, we laid the foundation
for a thread-safe mupdf. This commit should complete that process.
Firstly, fz_clone_context is properly implemented so that it
makes a new context, but shares certain sections (currently
just the allocator, and the store).
Secondly, we add locking (to parts of the code that have
previously just had placeholder LOCK/UNLOCK comments). Functions
to lock and unlock a mutex are added to the allocator structure;
omit these (as is the case today) and no multithreading is
(safely) possible. The context will refuse to clone if these are
not provided.
Finally we flesh out the LOCK/UNLOCK comments to be real calls of
the functions - unfortunately this requires us to plumb fz_context
into the fz_keep_storable function (and all the fz_keep_xxx
functions that call it). This is the largest section of the patch.
No changes expected to any test files.
|
|
Some Type 3 fonts contain glyphs that rely on inheriting various
aspects of the graphics state from their calling code. (i.e. a
glyph might use d0, then fill an area without setting a color
first).
While the spec is vague on this point, we believe that technically
it is invalid. Previously mupdf defaulted all elements of the graphic
state back when beginning to draw the glyph. This does not match
what Acrobat does though, so we change the approach taken.
We now watch (by use of bits in the device flags word) for the use
of parts of the graphics state before it is set. If such use is
detected, then we note that the glyph is 'uncacheable' and render
it direct.
This seems to match Acrobats behaviour.
|
|
Every xobject keeps a reference to the object from whence
it came. This is marked/unmarked as it is executed.
Thanks to Zeniko for spotting the potential problem.
|
|
Move coordinate space tweaks into pdf_ and xps_run_page, and provide
neutral pdf_ and xps_bound_page functions to return the page size as
a zero-origined bounding box.
|
|
|
|
A new 'cookie' parameter is added to page rendering/interpretation
functions. Supply this as NULL to get existing behaviour.
If you supply a non-NULL cookie, then this is taken as a pointer to
a struct that can be used for simple, non-thread locked communication
between caller and library.
The entire struct should be memset to zero before entry, except for
specific flags (thus coping with future extensions to this struct).
The abort flag should be zero on entry. It will be checked periodically
by the library - if the caller sets it non-zero (via another thread)
then the current operation will be aborted. No guarantees are given as
to how often this will be checked, or how fast it will be responded to.
The progress_max field will be set to an integer (-1 for unknown)
representing the number of 'things' to do. The progress field will
count up from 0 to this number as time goes by. No guarantees are
made as to the accuracy of this information, but it should be
useful for offering some sort of progress bar etc. Note that
progress_max may increase during the job.
In general, callers should be careful to accept out of range or
invalid data in this structure as this is deliberately
accessed 'unlocked'.
|
|
Move 'kind' into the fz_link_dest structure (as this makes more sense).
Put an fz_link_dest rather than just a page number into the outlines
structure.
Correct parsing of actions and dests from pdf outlines.
|