Age | Commit message (Collapse) | Author |
|
This goes well with the 'mutool clean -d' decompression option to debug
content streams, without doing the sanitize optimization pass.
|
|
|
|
|
|
The handling of not-decompressing images/fonts was geared towards
pdfclean usage; but now that we can create new PDF files, it makes
more sense to ask for images and fonts to be compressed, rather than
asking for them not to be decompressed with quirky interaction with
the 'expand' and 'deflate' flags.
If -f or -i are set, we will never decompress images, and we will
compress them if they are uncompressed.
If -d is set, we will first decompress all streams (module -f or -i).
If -z is set, we will then compress all uncompressed streams.
|
|
|
|
Also remove redundant assignments.
Fixes http://bugs.ghostscript.com/show_bug.cgi?id=695968
|
|
In preparation of adding pdf_write_document that writes a document
to a fz_output stream.
|
|
Separate naming of functions that save complete files to disk
from functions that write data to streams.
|
|
|
|
Michael needs to be able to call pdfclean from gsview. At the moment
he's having to do this by including the pdfclean.c file into the lib
build, and then calling pdfclean_main with a faked up command line.
This isn't nice.
pdfclean.c is implemented by pdfclean_main parsing the options/filenames
out of argv and then passing the filenames/options on to a
pdfclean_clean function.
This seems like a much nicer API to offer to the world.
We therefore pull the guts of pdfclean.c (pdfclean_clean and its
subsidiary structures/functions) into pdf-clean-file.c and include
this in the library build.
This leaves pdfclean.c just as the command line parsing.
This should not affect the size of any of the resulting binaries.
|
|
Silly typo. Thanks to Daniel Bloemer for pointing this out.
|
|
|
|
Currently, every PDF name is allocated in a pdf_obj structure, and
comparisons are done using strcmp. Given that we can predict most
of the PDF names we'll use in a given file, this seems wasteful.
The pdf_obj type is opaque outside the pdf-object.c file, so we can
abuse it slightly without anyone outside knowing.
We collect a sorted list of names used in PDF (resources/pdf/names.txt),
and we add a utility (namedump) that preprocesses this into 2 header
files.
The first (include/mupdf/pdf/pdf-names-table.h, included as part of
include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx"
entries. These are pdf_obj *'s that callers can use to mean "A PDF
object that means literal name 'xxxx'"
The second (source/pdf/pdf-name-impl.h) is a C array of names.
We therefore update the code so that rather than passing "xxxx" to
functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to
pdf_dict_get(...)). This is a fairly natural (if widespread) change.
The pdf_dict_getp (and sibling) functions that take a path (e.g.
"foo/bar/baz") are therefore supplemented with equivalents that
take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar,
PDF_NAME_baz, NULL)).
The actual implementation of this relies on the fact that small
pointer values are never valid values. For a given pdf_obj *p,
if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal
entry in the name table.
This enables us to do fast pointer compares and to skip expensive
strcmps.
Also, bring "null", "true" and "false" into the same style as PDF names.
Rather than using full pdf_obj structures for null/true/false, use
special pointer values just above the PDF_NAME_ table. This saves
memory and makes comparisons easier.
|
|
Purge several embedded contexts:
Remove embedded context in fz_output.
Remove embedded context in fz_stream.
Remove embedded context in fz_device.
Remove fz_rebind_stream (since it is no longer necessary).
Remove embedded context in svg_device.
Remove embedded context in XML parser.
Add ctx argument to fz_document functions.
Remove embedded context in fz_document.
Remove embedded context in pdf_document.
Remove embedded context in pdf_obj.
Make fz_page independent of fz_document in the interface.
We shouldn't need to pass the document to all functions handling a page.
If a page is tied to the source document, it's redundant; otherwise it's
just pointless.
Fix reference counting oddity in fz_new_image_from_pixmap.
|
|
Rename fz_close to fz_drop_stream.
Rename fz_close_archive to fz_drop_archive.
Rename fz_close_output to fz_drop_output.
Rename fz_free_* to fz_drop_*.
Rename pdf_free_* to pdf_drop_*.
Rename xps_free_* to xps_drop_*.
|
|
|
|
New routine to filter the content streams for pages, xobjects,
type3 charprocs, patterns etc. The filtered streams are guaranteed
to be properly matched with q/Q's, and to not have changed the top
level ctm. Additionally we remove (some) repeated settings of
colors etc. This filtering can be extended to be smarter later.
The idea of this is to both repair after editing, and to leave the
streams in a form that can be easily appended to.
This is preparatory to work on Bates numbering and Watermarking.
Currently the streams produced are uncompressed.
|
|
When you use mutool clean to subset pages out of a PDF, we already
remove the Name tree entries for named locations that aren't in the
target file. We have henceforth failed to remove references to these
removed names though. This can cause errors (really warnings) on
reading the file back.
|
|
Firstly, we remove the use of global variables; this is done by
introducing a 'globals' structure for each of these files and
passing it internally between functions.
Next, split the core of pdfclean_main into pdfclean_clean, and the
core of pdfinfo_main into pdfinfo_info.
The _main functions now do the argv processing. The new functions now
run entirely thread safely, so can be called from library functions.
|
|
No more caching a flattened page tree in doc->page_objs/refs.
No more flattening of page resources, rotation and boxes.
Smart page number lookup by following Parent links.
Naive implementation of insert and delet page that doesn't rebalance the trees.
Requires existing page tree to hook into, cannot be used to create a page tree
from scratch.
|
|
|
|
Remove the fz_context field to avoid the structure growing.
|
|
|