Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
When encoding truetype fonts via the mac roman cmap table, we should be
using the additional entries introduced in PDF 1.5, which are different
from the standard MacRomanEncoding table in the appendix.
|
|
Move the definition of the structure contents into new fitz-imp.h
file. Make all code outside of fitz access the buffer through the
defined API.
Add a convenience API for people that want to get buffers as
null terminated C strings.
|
|
|
|
As fz_drop_*()/fz_free() all must handle NULL.
|
|
Move the definition of fz_font to be in a private header file
rather than in the public API. Add accessors for specific
parts of the structure and use them as appropriate.
The font flags, and the harfbuzz records remain public.
This means that only 3 files now need access to the font
implementation (font.c, pdf-font.c and pdf-type3.c). This
may be able to be improved further in future.
|
|
|
|
The generation number is only needed for decryption, and is assumed
to be zero or irrelevant for all other uses.
Store the original object number and generation in the xref slot, so
that we can decrypt them even when the objects have been renumbered,
without needing to pass the original object number around through
the stream loading APIs.
|
|
|
|
|
|
Currently, every PDF name is allocated in a pdf_obj structure, and
comparisons are done using strcmp. Given that we can predict most
of the PDF names we'll use in a given file, this seems wasteful.
The pdf_obj type is opaque outside the pdf-object.c file, so we can
abuse it slightly without anyone outside knowing.
We collect a sorted list of names used in PDF (resources/pdf/names.txt),
and we add a utility (namedump) that preprocesses this into 2 header
files.
The first (include/mupdf/pdf/pdf-names-table.h, included as part of
include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx"
entries. These are pdf_obj *'s that callers can use to mean "A PDF
object that means literal name 'xxxx'"
The second (source/pdf/pdf-name-impl.h) is a C array of names.
We therefore update the code so that rather than passing "xxxx" to
functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to
pdf_dict_get(...)). This is a fairly natural (if widespread) change.
The pdf_dict_getp (and sibling) functions that take a path (e.g.
"foo/bar/baz") are therefore supplemented with equivalents that
take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar,
PDF_NAME_baz, NULL)).
The actual implementation of this relies on the fact that small
pointer values are never valid values. For a given pdf_obj *p,
if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal
entry in the name table.
This enables us to do fast pointer compares and to skip expensive
strcmps.
Also, bring "null", "true" and "false" into the same style as PDF names.
Rather than using full pdf_obj structures for null/true/false, use
special pointer values just above the PDF_NAME_ table. This saves
memory and makes comparisons easier.
|
|
Purge several embedded contexts:
Remove embedded context in fz_output.
Remove embedded context in fz_stream.
Remove embedded context in fz_device.
Remove fz_rebind_stream (since it is no longer necessary).
Remove embedded context in svg_device.
Remove embedded context in XML parser.
Add ctx argument to fz_document functions.
Remove embedded context in fz_document.
Remove embedded context in pdf_document.
Remove embedded context in pdf_obj.
Make fz_page independent of fz_document in the interface.
We shouldn't need to pass the document to all functions handling a page.
If a page is tied to the source document, it's redundant; otherwise it's
just pointless.
Fix reference counting oddity in fz_new_image_from_pixmap.
|
|
|
|
We are testing this using a new -p flag to mupdf that sets a bitrate at
which data will appear to arrive progressively as time goes on. For
example:
mupdf -p 102400 pdf_reference17.pdf
Details of the scheme used here are presented in docs/progressive.txt
|
|
Thanks to zeniko for spotting the problem here.
Type 3 fonts contain a reference to the resources objects required
to render the glyphs. Traditionally these have been freed when the
font is freed. Unfortunately, after recent changes, freeing a PDF
object requires the pdf_document concerned to still exist.
While in most cases the type 3 resources are not used after we have
converted the type3 glyphs to display lists, this is not always the
case. For uncachable Type 3 glyphs (such as those that do not
completely define elements in the graphics state that they use, such
as color or line width), we end up running the glyphs at interpretation
time.
[ Interpretation time = when doing a direct render on the main thread,
or when creating a display list - so also on the main thread. No
multi-threading issues with file access here. ]
The fix implemented here is for each pdf document to keep a list of
the type3 fonts it has created, and to 'decouple' them from the
document when the document is destroyed. The sole effect of this
decoupling is to remove the resources (and the PDF operator buffers)
from the font. These are only ever used during interpretation, and
no further interpretations are possible without the document being
alive anyway, so this should have no net effect on operation, other
than allowing cleanup to proceed cleanly later on.
|
|
For historical reasons lots of the code uses "xref" when talking about
a pdf document. Now pdf_xref is a separate type this has become
confusing, so replace 'xref' with 'doc' for clarity.
|
|
|