Age | Commit message (Collapse) | Author |
|
This is an early import of a change from the forms branch to enable
me to tweak the clusters mupdf build test line to avoid trying to
build the js components for mupdf builds.
|
|
For example "Symbol,Italic" can be handled as an artificially
obliqued "Symbol".
Fixes an issue in test file normal_161.pdf
|
|
normal_178.pdf contains a monochrome black and white image, encoded
as 16bpc rgb.
|
|
When calculating the displaylist node rectangles, we were failing
to adjust for linewidth/mitrewidth etc. This could result in glyphs
being clipped; see normal_130.pdf for example.
|
|
In PDF the text rendering mode can have bit 2 set to mean "add to clipping
path". Experiments (and in particular normal_130.pdf) show that the text
should be stroked and/or filled BEFORE the path is added to the clipping
path.
|
|
This solves the normal_87.pdf rendering issues.
|
|
|
|
Remove unused variable, silencing compiler warning.
No need to initialize variables twice.
Remove initialization of unread variable.
Remove unnecessary check for NULL.
Close output file upon error in cmapdump.
|
|
This makes it easier to separate building of mupdf itself from
libraries, e.g. when running clang's scan-build.
|
|
Currently pdf_lexbufs use a static scratch buffer for parsing. In
the main case this is 64K in size, but in other cases it can be
just 256 bytes; this causes problems when parsing long strings.
Even the 64K limit is an implementation limit of Acrobat, not an
architectural limit of PDF.
Change here to allow dynamic buffers. This means a slightly more
complex setup and destruction for each buffer, but more importantly
requires correct cleanup on errors. To avoid having to insert
lots more try/catch clauses this commit includes various changes to
the code so we reuse pdf_lexbufs where possible. This keeps the
speed up.
|
|
|
|
|
|
|
|
|
|
These functions currently call pdf_array_put, but this fails to
extend the array. Change to use pdf_array_push instead.
|
|
|
|
|
|
|
|
|
|
Harmless, since the context wasn't used, but confusing.
|
|
|
|
Make mudraw pass a cookie in to the rendering procedures. If any errors
are reported for any page, remember this, and set the return code to 1
on exit.
|
|
After commit 120dadb, it's far too easy to get into a seemingly infinite
loop while processing a corrupt file.
We fix this by changing the process to abort when we receive an invalid
keyword.
Also, we add another layer of nesting to pdf_run_stream to avoid us
push/popping an fz_try level on every keyword.
|
|
Previously we used to have a special case hack in for MacOS. Now
we call sigsetjmp/siglongjmp on all platforms that define __unix.
(i.e. pretty much all of them except windows).
|
|
When we allocate a pixmap > 2G, but < 4G, the index into that
pixmap, when calculated as an int can be negative. Fix this with
various casts to unsigned int.
If we ever move to support >4G images we'll need to rejig the
casting to cast each part of the element to ptrdiff_t first.
|
|
The file supplied with the bug contains corrupt jpeg data on page
61. This causes an error to be thrown which results in mudraw
exiting.
Previously, when image decode was done at loading time, the error
would have been thrown under the pdf interpreter rather than under
the display list renderer. This error would have been caught, a
warning given, and the program would have continued. This is not
ideal behaviour, as there is no way for a caller to know that there
was a problem, and that the image is potentially incomplete.
The solution adopted here, solves both these problems. The fz_cookie
structure is expanded to include a 'errors' count. Whenever we meet
an error during rendering, we increment the 'errors' count, and
continue.
This enables applications to spot the errors count being non-zero on
exit and to display a warning.
mupdf is updated here to pass a cookie in and to check the error count
at the end; if it is found to be non zero, then a warning is given (just
once per visit to each page) to say that the page may have errors on it.
|
|
When handling knockout groups, we have to copy the background from the
previous group in so we can 'knockout' properly. If the previous group
is a different colorspace, this gives us problems!
The fix, implemented here, is to update the copy_pixmap_rect function
to know how to copy between pixmaps of different depth.
Gray <-> RGB are the ones we really care about; the generic code will
probably do a horrible job, but shouldn't ever be called at present.
This suffices to stop the crashing - we will probably revisit this
when we revise the blending support.
|
|
Extend mupdfclean to have a new -l file that writes the file
linearized. This should still be considered experimental
When writing a pdf file, analyse object use, flatten resource use,
reorder the objects, generate a hintstream and output with linearisaton
parameters.
This is enough for Acrobat to accept the file as being optimised
for Fast Web View. We ought to add more tables to the hintstream
in some cases, but I doubt anyone actually uses it, the spec is so
badly written. Certainly acrobat accepts the file as being optimised
for 'Fast Web View'.
Update fz_dict_put to allow for us adding a reference to the dictionary
that is the sole owner of that reference already (i.e. don't drop then
keep something that has a reference count of just 1).
Update pdf_load_image_stream to use the stm_buf from the xref if there
is one.
Update pdf_close_document to discard any stm_bufs it may be holding.
Update fz_dict_put to be pdf_dict_put - this was missed in a renaming
ages ago and has been inconsistent since.
|
|
|
|
|
|
|
|
Needs more work to use the linked list of free xref slots.
|
|
|
|
When including fitz.h from C++ files, we must not alter the definition
of inline, as it may upset code that follows it. We only alter the
definition to enable it if it's available, and it's always available
in C++ - so simply avoiding changing it in the C++ case gives us what
we want.
|
|
Make a separate constructor function that does not link in the
interpreter, so we can save space in the mubusy binary by not
including the font and cmap resources.
|
|
|
|
|
|
mupdfclean (or more correctly, the pdf_write function) currently has
a limitation, in that we cannot renumber objects when encryption is
being used. This is because the object/generation number is pickled
into the stream, and renumbering the object causes it to become
unreadable.
The solution used here is to provide extended functions that take both
the object/generation number and the original object/generation
number. The original object numbers are only used for setting up the
encryption.
pdf_write now keeps track of the original object/generation number
for each object.
This fix is important, if we ever want to output linearized pdf as
this requires us to be able to renumber objects to a very specific
order.
We also make a fix in removeduplicateobjects that should only
matter in the case where we fail to read an object correctly.
|
|
Also make page specification parsing in all tools look similar.
|
|
Keep texture position calculations in floats as long as possible, as
prematurely dropping back to ints can cause overflows in the
intermediate stages that don't nicely cancel out.
The fix for this makes 2000 or so bitmap differences, most trivial, but
with some progressions.
|
|
Thanks to stu-mupdf@spacehopper.org
|
|
Currently, if a page stream cannot be read, mupdf gives an alert box
and then exits. This is annoying when reading a large pdf.
Here we change the code to only exit if a page is completely broken;
in the case of missing page contents, or missing links, we give a warning
and just render the best we can.
Also, update a couple of error messages to be less misleading.
|
|
Previously, before interpreting a pages content stream we would
load it entirely into a buffer. Then we would interpret that
buffer. This has a cost in memory use.
Here, we update the code to read from a stream on the fly.
This has required changes in various different parts of the code.
Firstly, we have removed all use of the FILE lock - as stream
reads can now safely be interrupted by resource (or object) reads
from elsewhere in the file, the file lock becomes a very hard
thing to maintain, and doesn't actually benefit us at all. The
choices were to either use a recursive lock, or to remove it
entirely; I opted for the latter.
The file lock enum value remains as a placeholder for future use in
extendable data streams.
Secondly, we add a new 'concat' filter that concatenates a series of
streams together into one, optionally putting whitespace between each
stream (as the pdf parser requires this).
Finally, we change page/xobject/pattern content streams to work
on the fly, but we leave type3 glyphs using buffers (as presumably
these will be run repeatedly).
|
|
In order to (hopefully) allow page content streams to be interpreted
without having to preload them all into memory before we run them, we
need to make the stream reading code cope with other users moving
the stream pointer.
For example: Consider the case where we are midway through
interpreting a contents stream, and us hitting an operator that
requires something to be read from Resources. This will move the
underlying stream file pointer, and cause the contents stream to
read incorrectly when control returns to the interpreter.
The solution to this seems to be fairly simple; whenever we create
a filter out of the file stream, the existing code puts in a 'null'
filter first, to enforce a length limit on the stream. This null
filter already does most of the work we need it to, in that by it
being there, the buffering of data is done in the null filter rather
than in the underlying stream layer.
All we need to do is to keep track of where in the underlying stream
the null filter thinks it is, and ensure that it seeks there before
each read (in case anyone else has moved it).
We move the setting of the offset to be explicit in the pdf_open_filter
(and associated) call(s), rather than requiring fz_seeks elsewhere.
|
|
The scale_row_from_temp code was broken. Firstly the rounding was wrong
in the 'bulk' case (not a big deal), but more importantly on
configurations where unaligned loads were not allowed (such as the nook),
we could still crash due to an incorrect test to avoid that code.
Thanks to Kammerer for the report, and testing of fixed version.
|
|
|
|
This means pdfshow can show objects in encrypted PDFs again.
|
|
|
|
Zeniko points out that images that don't decode on demand (i.e. ones
that are held as pixmaps all the time) can never be evicted from the
cache due to them holding a pointer to the pixmap, which holds a pointer
back to the image.
His fix is to only cache images that decode on demand.
The actual patch applied here is slightly tweaked from his version;
firstly the 'dontcache' logic is reversed to 'cache' to avoid overloading
my poor brain with another negation.
Secondly, one change to a condition is not taken on, as it is (I believe)
unnecessary.
|
|
|