Age | Commit message (Collapse) | Author |
|
|
|
|
|
Currently each xref in the file results in an array from 0 to
num_objects. If we have a file that has been updated many times
this causes a huge waste of memory.
Instead we now hold each xref as a list of non-overlapping subsections
(exactly as the file holds them).
Lookup is therefore potentially slower, but only on files where the
xrefs are highly fragmented (i.e. where we would be saving in memory
terms).
Some parts of our code (notably the file writing code that does
garbage collection etc) assumes that lookups of object entry pointers
will not change previous object entry pointers that have been
looked up. To cope with this, and to cope with the case where we are
updating/creating new objects, we introduce the idea of a 'solid'
xref.
A solid xref is one where it has a single subsection record that spans
the entire range of valid object numbers for a file. Once we have
ensured that an xref is 'solid', we can safely work on the pointers
within it without fear of them moving.
We ensure that any 'incremental' xref is solid.
We also ensure that any non-incremental write makes the xref solid.
|
|
pdf_lookup_page_loc_imp currently throws if any object in the page tree
is neither a /Pages node nor a /Page leaf. This unnecessarily rejects
slightly broken documents such as the ones from
https://code.google.com/p/sumatrapdf/issues/detail?id=2582 and
https://code.google.com/p/sumatrapdf/issues/detail?id=2608 .
pdf_count_pages_before_kid currently wrongly throws if a /Pages node
doesn't contain any kids and correctly states so (which even seems to
be permitted by the PDF specification).
|
|
In load_sample_func, the stream is not closed and thus leaked if one of
the fz_read_byte or fz_read_bits calls throws (which might happen e.g.
on a Deflate data error).
In pdf_load_compressed_inline_image, the allocated buffer is not freed
if one of the stream initializers or the tile creation throws
(fz_open_leecher does not take ownership of the stream).
|
|
|
|
Use the actual ranges from the cpt-to-gid cmap to optimize the
remapping of ToUnicode cmaps from cpt-to-unicode into gid-to-unicode
format.
|
|
When inverting the CMap to create a ToUnicode, first check the actual
range of input characters rather than relying only on the codespace
range list.
|
|
|
|
Add a new class of errors and use them to abort interpretation when
the test device detects a color page.
|
|
Even though the encryption key length isn't supposed to be taken from
the encryption dictionary's /Length for crypt version 4, other readers
such as Adobe's still use that value if a crypt filter's /Length is
missing.
See https://code.google.com/p/sumatrapdf/issues/detail?id=2710 for a
document where this makes a difference (or simply remove /Length from
the crypt filter in any document encrypted with crypt version 4 and an
AESV2 crypt filter).
|
|
|
|
If garbage is appended to an encrypted document, there could be another
trailer with /ID but without /Encrypt . The repairing code currently
always uses the last encountered values, but replacing the /ID value
alone can cause decryption to break. One possible solution is to
use the /ID value only when there's either none yet, when there's no
/Encrypt or when there's a matching /Encrypt in the same trailer.
See https://code.google.com/p/sumatrapdf/issues/detail?id=2697 for a
document which Adobe Reader is able to read but MuPDF isn't (it used
to before pdf_lex_no_string was introduced, but that's accidental).
|
|
|
|
Certain glyphs such as found in nested Type 3 font can't be cached.
Currently, the text extraction device doesn't see these as they're sent
only as drawing operations. Sending them also as invisible text fixes
potentially missing letters.
|
|
pdf_page::transparency is supposed to indicate whether a page uses PDF
transparency features. The checks aren't complete, though, which is
relevant for devices which require additional handling for transparency
(such as SumatraPDF's gdiplus_device).
See https://code.google.com/p/sumatrapdf/issues/detail?id=2107 and
https://code.google.com/p/sumatrapdf/issues/detail?id=2540 for
example documents.
|
|
If a PDF document is encrypted but broken, repairing caches all
strings in encrypted form. Clearing the xref after repairing
ensures that strings are returned to API callers as expected.
Cf. https://code.google.com/p/sumatrapdf/issues/detail?id=2610
|
|
fmt_obj calculates whether a string is better hex-encoded or written
using escapes. Due to a bug, '\0' is considered to be escapable same as
'\n' when instead it would have to be written as '\000'. Since UTF-16
strings tend to consist of many '\0' bytes, their octal encoded form is
much longer than their hex encoded form.
The issue is that the first argument to strchr contains an unintended
trailing '\0' which has to be special-cased first.
|
|
PDF documents aren't required to end in a linebreak. Objects however
must start on their own line (in particular for broken documents
relying on reparation). For this reason, a linebreak must be inserted
before starting an incremental update.
|
|
|
|
Android NDK
(security issue because a variable is used as a format string with no parameters).
|
|
When throwing an error during fz_alpha_from_gray, the stack depth
can get confused. Fix this by moving some more code into the
appropriate fz_try().
In the course of fixing this bug, I added some new optional debug
code to display the stack level as it runs. This is committed here
disabled; just change the appropriate #define in draw-device.c to
enable it.
Also, add some code to run_xobject, to avoid throwing in an fz_always()
clause.
|
|
Return the null object rather than throwing an exception when parsing
indirect object references with negative object numbers.
Do range check for object numbers (1 .. length) when object numbers
are used instead.
Object number 0 is not a valid object number. It must always be 'free'.
|
|
Replace the DroidSansFallback TTF files with a TTC that has two fonts:
The original and a copy where the OpenType 'vert' substitution
lookup has been pre-applied by copying the uniXXXX.vert glyph data
to uniXXXX.
|
|
pdf_create_document leaks the trailer and in pdf-device.c many objects
are inserted into dictionaries using pdf_dict_puts and leaked instead
of using pdf_dict_puts_drop.
|
|
...like the one Microsoft Word generates.
|
|
|
|
Various functions (such as fz_begin_group) handle errors internally
by use of the error_depth parameter. This means that if we call
them, we MUST ensure that we call the appropriate closing function.
Similarly, if we don't call them, we should NOT call the closing
function.
In order to ensure we do this correctly, we introduce a cleanup_state
variable that says which ones we tried to call.
This cures the original bug.
|
|
Without this, comparefiles/Bug695086 renders the barcode test upside
down.
|
|
Fixes bug introduced in commit 1679c1e7a89ae62260fd84ce55c6bef376c6e6ba:
Optimize UniXXX CMap files.
|
|
See bug 693314 (file Z23-04.pdf) for an example file.
|
|
key length.
This reverts commit b1ed116091b790223a976eca2381da2875341e10.
The key length for V==2 must be 40 <= length <= 128.
The key length for V==4 is not taken from the /Length entry.
|
|
|
|
|
|
Split common parts into separate CMap files and include them with usecmap.
This reduces the size of the compiled in CMap resources from 3Mb to 2Mb.
|
|
Increasing the existing data structure to 32-bit values would bloat the data
tables too much.
Simplify the data structure and use three separate range tables for lookups --
one with small 16-bit to 16-bit range lookups, one with 32-bit range lookups,
and a final one for one-to-many lookups.
This loses the range-to-table optimization we had before, but even with the
extra ranges this necessitates, the total size of the compiled binary CMap data
is smaller than if we were to extend the previous scheme to 32 bits.
|
|
Remove obsolete Adobe-Japan-2 based CMaps.
|
|
pdf_write_document still writes the entire xref with references to all
freed objects even if the xref has been compacted which makes the
result of mutool clean -ggg larger than necessary.
|
|
... instead convert a JPEG2000 used as a soft mask into grayscale.
This is more robust than trusting the PDF specified colorspace over
the internal JPX colorspace.
The spec implies that in a colorspace conflict, the internal JPX
colorspace should be used.
The PDF colorspace may be a DeviceN or Separation colorspace.
DeviceN and Separation colorspaces are not valid destination
colorspaces, so we may not always be able to convert the internal
JPX colorspace into the PDF specified colorspace.
Converting from the internal colorspace into grayscale is more robust,
and solves the issue that the original commit was intended to fix.
|
|
|
|
OpenType CFF fonts are detected as TYPE1 by ft_kind.
Relaxing the test for when to load a CIDToGIDMap lets us load
it even for OpenType fonts.
|
|
Don't print the code point number, to let the inhibition of multiple
identical warnings kick in.
|
|
|
|
NoExport (and ReadOnly) fields shouldn't mark the document for saving.
|
|
After rushing to get the fix for a crash in, I realised the
routine could be simplified a bit.
|
|
|
|
Michael spotted that double closing an fz_stream on an inline image
does bad things. Simple fix is not to double close.
|
|
Split functions out of pdf-form.c that shouldn't be there, and make
javascript initialization explicit.
|
|
|
|
Adds simpler choice of Javascript library to makefiles.
Will prefer in order: MuJS, JavaScriptCore, V8, none based
on HAVE_MUJS, HAVE_JSCORE, and HAVE_V8.
For simplicity, we build mujstest even with no javascript implementation.
|