Age | Commit message (Collapse) | Author |
|
|
|
When adding code to spot identical streams, I got the logic in
a test reversed as a result of a last minute change. Corrected here.
Thanks to zeniko for pointing this out.
|
|
When writing pdf files, we currently have the option to remove duplicate
copies of objects; all streams are treated as being different though.
Here we add the option to spot duplicate streams too.
Based on a patch submitted by Heng Liu. Many thanks!
|
|
If a colorspace refers to itself as a base, we can get an infinite
recursion and hence stack overflow. Thanks to zeniko for pointing out
that this occurs in embedded CMAPs and stitching functions. Also
solved here.
To avoid having to keep a long list of the objects we've traversed
through, extend the pdf_dict_mark functions to work on all pdf objects,
and hence rename them as pdf_obj_mark etc. Thanks to zeniko again for
feedback on this way of working.
Problem found in a test file, 3882.pdf.SIGSEGV.99.3204 supplied
by Mateusz "j00ru" Jurczyk and Gynvael Coldwind of the Google
Security Team using Address Sanitizer. Many thanks!
|
|
|
|
When cleaning a file with a corrupt stream in it, historically mupdf
would give up when it encountered such a stream. This is often not
what is desired, as information can be lost.
The changes herein allow us to use our best efforts when reading
a stream, so that broken streams are reproduced in the output
cleaned file.
Problem found in a test file, pdf_001/2599.pdf.asan.58.1778 supplied
by Mateusz "j00ru" Jurczyk and Gynvael Coldwind of the Google
Security Team using Address Sanitizer. Many thanks!
|
|
While investigating samples_mupdf_001/2599.pdf.asan.58.1778, a leak
showed up while cleaning the file, due to not dropping an object in
an error case.
mutool clean -dif samples_mupdf_001/2599.pdf.asan.58.1778 leak.pdf
Simple Fix. Also extend PDF writing so that it can cope with skipping
errors so we at least get something out at the end.
Problem found in a test file supplied by Mateusz "j00ru" Jurczyk and
Gynvael Coldwind of the Google Security Team using Address Sanitizer.
Many thanks!
|
|
Thanks to zeniko for these.
Use otf as extension for opentype fonts.
fz_clampi should take ints, not floats!
Fix typo in prototype.
Squash unwanted warning.
Remove magic number in favour of #define.
Reset generation numbers when renumbering.
|
|
Moritz Lipp points out that the check for opts being NULL in
pdf_write_document is unnecessary. Removing it brings the
function into line with the docs.
|
|
|
|
Does the same as pdf_dict_puts, but guarantees to always drop the
value passed in (even if the function throws an error). This
allows calling code to have a much simpler life in some cases.
Update pdf_write to make use of this.
Also, fix pdf_dict_puts so it doesn't leak the key object that it
creates in the event of an error while growing the dictionary.
|
|
And add a flag in the xref for evey PDF document to say whether it
has been localised or not; this will be important for PDF editing to
avoid us having to localise on every edit.
|
|
Conflicts:
cbz/mucbz.c
pdf/pdf_parse.c
pdf/pdf_form.c
xps/xps_zip.c
|
|
Should have been pdf_new_name ever since the pre 1.0 rename, but
evidently we missed it.
|
|
Conflicts:
Makefile
apps/mudraw.c
pdf/pdf_write.c
win32/libmupdf-v8.vcproj
|
|
Unused objects could cause problems with the sort order and picking
the object to start with. Now coped with.
If the hintstream object replaces another object that already had a
stream, pdf_open_raw_filter would get confused by the presence of a
stm_buf. Now fixed.
Fix a 64bit problem in page_objects_list_ensure, as well as tweaking
the code for readability.
When outputting single page files, we can end up with opts->start = 1
and this upset the offset calculating logic.
Insist on compacting the xref when linearising.
Thanks to Sebras and Zeniko for providing test cases. This commit
should (hopefully) stop the SEGVs, but there are still cases where
Acrobat doesn't think that the files output are "Optimised for Fast
Web View". I cannot see why.
|
|
|
|
|
|
Since I implemented linearisation, any invocation that hasn't used
garbage collection has produced broken files, due to every object
being marked as freed. This was because I'd forgotten to ever set
the use_list markers to be 1. Fixed here.
|
|
While pdf writing, compactxref would fail to take into account
that duplicated objects would have been mapped down to a lower
number, and the use_list value for the upper one would be set to
zero.
Thanks to Zeniko for pointing out this fix.
|
|
Instead of using macros for min/max/abs/clamp, we move to using
inline functions. These are more typesafe, and should produce
equivalent code on compilers that support inline (i.e. pretty much
everything we care about these days).
People can always do their own macro versions if they prefer.
|
|
In pdfwrite we have a flag to say "don't expand images", but this
isn't always honoured because we don't know that a stream is an
image (for instance the corrupt fax stream used as a thumbnail
in normal_439.pdf).
Here we update the code to spot that streams are likely to be
images based on the filters in use, or on the presence of both
Width and Height tags.
|
|
Remove unused variable, silencing compiler warning.
No need to initialize variables twice.
Remove initialization of unread variable.
Remove unnecessary check for NULL.
Close output file upon error in cmapdump.
|
|
Extend mupdfclean to have a new -l file that writes the file
linearized. This should still be considered experimental
When writing a pdf file, analyse object use, flatten resource use,
reorder the objects, generate a hintstream and output with linearisaton
parameters.
This is enough for Acrobat to accept the file as being optimised
for Fast Web View. We ought to add more tables to the hintstream
in some cases, but I doubt anyone actually uses it, the spec is so
badly written. Certainly acrobat accepts the file as being optimised
for 'Fast Web View'.
Update fz_dict_put to allow for us adding a reference to the dictionary
that is the sole owner of that reference already (i.e. don't drop then
keep something that has a reference count of just 1).
Update pdf_load_image_stream to use the stm_buf from the xref if there
is one.
Update pdf_close_document to discard any stm_bufs it may be holding.
Update fz_dict_put to be pdf_dict_put - this was missed in a renaming
ages ago and has been inconsistent since.
|
|
Needs more work to use the linked list of free xref slots.
|
|
mupdfclean (or more correctly, the pdf_write function) currently has
a limitation, in that we cannot renumber objects when encryption is
being used. This is because the object/generation number is pickled
into the stream, and renumbering the object causes it to become
unreadable.
The solution used here is to provide extended functions that take both
the object/generation number and the original object/generation
number. The original object numbers are only used for setting up the
encryption.
pdf_write now keeps track of the original object/generation number
for each object.
This fix is important, if we ever want to output linearized pdf as
this requires us to be able to renumber objects to a very specific
order.
We also make a fix in removeduplicateobjects that should only
matter in the case where we fail to read an object correctly.
|
|
Expose pdf_write function through the document interface.
|