Age | Commit message (Collapse) | Author |
|
Certain glyphs such as found in nested Type 3 font can't be cached.
Currently, the text extraction device doesn't see these as they're sent
only as drawing operations. Sending them also as invisible text fixes
potentially missing letters.
|
|
Android NDK
(security issue because a variable is used as a format string with no parameters).
|
|
When throwing an error during fz_alpha_from_gray, the stack depth
can get confused. Fix this by moving some more code into the
appropriate fz_try().
In the course of fixing this bug, I added some new optional debug
code to display the stack level as it runs. This is committed here
disabled; just change the appropriate #define in draw-device.c to
enable it.
Also, add some code to run_xobject, to avoid throwing in an fz_always()
clause.
|
|
Various functions (such as fz_begin_group) handle errors internally
by use of the error_depth parameter. This means that if we call
them, we MUST ensure that we call the appropriate closing function.
Similarly, if we don't call them, we should NOT call the closing
function.
In order to ensure we do this correctly, we introduce a cleanup_state
variable that says which ones we tried to call.
This cures the original bug.
|
|
Without this, comparefiles/Bug695086 renders the barcode test upside
down.
|
|
|
|
Don't print the code point number, to let the inhibition of multiple
identical warnings kick in.
|
|
|
|
Previously pdf_process buffer did not understand inline images.
In order to make this work without needlessly duplicating complex code
from within pdf-op-run, the parsing of inline images has been moved to
happen in pdf-interpret.c. When the op_table entry for BI is called
it now expects the inline image to be in csi->img and the dictionary
object to be in csi->obj.
To make this work, we have had to improve the handling of inline images
in general. While non-inline images have been loaded and held in
memory in their compressed form and only decoded when required, until
now we have always loaded and decoded inline images immediately. This
has been due to the difficulty in knowing how many bytes of data to
read from the stream - we know the length of the stream once
uncompressed, but relating this to the compressed length is hard.
To cure this we introduce a new type of filter stream, a 'leecher'.
We insert a leecher stream before we build the filters required to
decode the image. We then read and discard the appropriate number
of uncompressed bytes from the filters. This pulls the compressed
data through the leecher stream, which stores it in an fz_buffer.
Thus images are now always held in their compressed forms in memory.
The pdf-op-run implementation is now trivial. The only real complexity
in the pdf-op-buffer implementation is the need to ensure that the
/Filter entry in the dictionary object matches the exact point at
which we backstopped the decompression.
|
|
Currently fz_streams have a 4K buffer within their header. The call
to read from a stream fills this buffer, resulting in more data being
pulled from any underlying stream than we might like. This causes
problems with the forthcoming 'leech' filter.
Here we simplify the fields available in the public stream header.
No specific buffer is given; simply the read and write pointers.
The underlying 'read' function is replaced by a 'next' function
that makes the next block of data available and returns the first
character of it (or EOF).
A caller to the 'next' function should supply the maximum number of
bytes that it knows it will need (possibly not now, but eventually).
This enables the underlying stream to efficiently decode just enough.
The underlying stream is free to return fewer, or a greater number
if it wants to.
The exact size of the 'block' of data returned will depend on the
filter in use and (possibly) the data therein.
Callers can get the currently available amount of data by calling
fz_available (but again should pass the maximum amount of data they know
they will need). The only time this will ever return 0 is if we have
hit EOF.
|
|
Currently, when parsing, each time we encounter a name, we throw away
the last name we had. BDC operators are called with:
/Name <object> BDC
If the <object> is a name, we lose the original /Name.
To fix this, parsing a name when we already have a name will cause
the name to be stored as an object.
This has various knock on effects throughout the code to read from
csi->obj rather than csi->name.
Also, ensure that when cleaning, we collect a list of the object
names in our new resources dictionary.
|
|
Currently the only processing we can do of PDF pages is to run
them through an fz_device. We introduce new "pdf_process"
functionality here to enable us to do more things.
We define a pdf_processor structure with a set of function
pointers in, one per PDF operator, together with functions
for processing xobjects etc. The guts of pdf_run_page_contents
and pdf_run_annot operations are then extracted to give
pdf_process_page_contents and pdf_process_annot, and the
originals implemented in terms of these.
This commit contains just one instance of a pdf_processor, namely
the "run" processor, which contains the original code refactored.
The graphical state (and device pointer) is now part of private data
to the run operator set, rather than being in pdf_csi.
|