Age | Commit message (Collapse) | Author |
|
Also change unsigned char into const char for embedded data.
|
|
|
|
|
|
Take on a (slightly tweaked) version of Simon Reinhardt's
patch.
The actual logic is left entirely unchanged; minor changes
have been made to the names of functions/types to avoid
clashing in the cmapdump.c repeated inclusion.
Currently this should really only affect xps files, as strtof
is only used as fz_atof, and that's (effectively) all xps for
now.
I will look at updating lex_number to call this in future.
|
|
Look up fallback fonts by unicode script, with a flag to select the serif or
sans-serif font style where such variants exist.
Move all builtin fonts into fitz namespace.
|
|
Newer versions of clang supports .incbin, so enable it for clang.
Disable .incbin for release builds, since then the compiler can strip
out unused font data.
|
|
In general, we should use 'const fz_blah' in device calls whenever
the callee should not alter the fz_blah.
Push this through. This shows up various places where we fz_keep
and fz_drop these const things.
I've updated the fz_keep and fz_drops with appropriate casts
to remove the consts. We may need to do the union dance to avoid
the consts for some compilers, but will only do that if required.
I think this is nicer overall, even allowing for the const<->no const
problems.
|
|
|
|
|
|
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets
for files; this allows us to open streams larger than 2Gig.
The downsides to this are that:
* The xref entries are larger.
* All PDF ints are held as 64bit things rather than 32bit things
(to cope with /Prev entries, hint stream offsets etc).
* All file positions are stored as 64bits rather than 32.
The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery
in fitz/system.h sets fz_off_t to either int or int64_t as appropriate,
and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required.
These call the fseeko64 etc functions on linux (and so define
_LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
|
|
Currently, every PDF name is allocated in a pdf_obj structure, and
comparisons are done using strcmp. Given that we can predict most
of the PDF names we'll use in a given file, this seems wasteful.
The pdf_obj type is opaque outside the pdf-object.c file, so we can
abuse it slightly without anyone outside knowing.
We collect a sorted list of names used in PDF (resources/pdf/names.txt),
and we add a utility (namedump) that preprocesses this into 2 header
files.
The first (include/mupdf/pdf/pdf-names-table.h, included as part of
include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx"
entries. These are pdf_obj *'s that callers can use to mean "A PDF
object that means literal name 'xxxx'"
The second (source/pdf/pdf-name-impl.h) is a C array of names.
We therefore update the code so that rather than passing "xxxx" to
functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to
pdf_dict_get(...)). This is a fairly natural (if widespread) change.
The pdf_dict_getp (and sibling) functions that take a path (e.g.
"foo/bar/baz") are therefore supplemented with equivalents that
take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar,
PDF_NAME_baz, NULL)).
The actual implementation of this relies on the fact that small
pointer values are never valid values. For a given pdf_obj *p,
if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal
entry in the name table.
This enables us to do fast pointer compares and to skip expensive
strcmps.
Also, bring "null", "true" and "false" into the same style as PDF names.
Rather than using full pdf_obj structures for null/true/false, use
special pointer values just above the PDF_NAME_ table. This saves
memory and makes comparisons easier.
|
|
|
|
Purge several embedded contexts:
Remove embedded context in fz_output.
Remove embedded context in fz_stream.
Remove embedded context in fz_device.
Remove fz_rebind_stream (since it is no longer necessary).
Remove embedded context in svg_device.
Remove embedded context in XML parser.
Add ctx argument to fz_document functions.
Remove embedded context in fz_document.
Remove embedded context in pdf_document.
Remove embedded context in pdf_obj.
Make fz_page independent of fz_document in the interface.
We shouldn't need to pass the document to all functions handling a page.
If a page is tied to the source document, it's redundant; otherwise it's
just pointless.
Fix reference counting oddity in fz_new_image_from_pixmap.
|
|
Rename fz_close to fz_drop_stream.
Rename fz_close_archive to fz_drop_archive.
Rename fz_close_output to fz_drop_output.
Rename fz_free_* to fz_drop_*.
Rename pdf_free_* to pdf_drop_*.
Rename xps_free_* to xps_drop_*.
|
|
The dtoa function is for doubles (which is what MuJS uses) but for MuPDF
we only need and want float precision in our output formatting.
|
|
|
|
Use intptr_t when casting between a jlong and a pointer to suppress errors
about different size words.
Add a 'u' suffix to unsigned values output by the cmap dump utility.
|
|
One to write a CMap out in expanded form ready for text processing tools.
Another to write a CMap out as compactly as possible.
The output is not in proper CMap format and can only be parsed by MuPDF.
|
|
Increasing the existing data structure to 32-bit values would bloat the data
tables too much.
Simplify the data structure and use three separate range tables for lookups --
one with small 16-bit to 16-bit range lookups, one with 32-bit range lookups,
and a final one for one-to-many lookups.
This loses the range-to-table optimization we had before, but even with the
extra ranges this necessitates, the total size of the compiled binary CMap data
is smaller than if we were to extend the previous scheme to 32 bits.
|
|
Stupid unportable code needs stupid unportable preprocessor macros.
This only works with GCC, but should be good enough since I expect
anyone using a big-endian machine to also use a GCC compatible compiler.
|
|
|
|
|
|
The primary motivator for this is so that we can print floating point
values and get the full accuracy out, without having to print 1.5 as
1.5000000, and without getting 23e24 etc.
We only support %c, %f, %d, %o, %x and %s currently.
We only support the zero padding qualifier, for integers.
We do support some extensions:
%C turns values >=128 into UTF-8.
%M prints a fz_matrix.
%R prints a fz_rect.
%P prints a fz_point.
We also implement a fprintf variant on top of this to allow for
consistent results when using fz_output.
a
|
|
We define a document handler for each file type (2 in the case of PDF, one
to handle files with the ability to 'run' them, and one without).
We then register these handlers with the context at startup, and then
call fz_open_document... as usual. This enables people to select the
document types they want at will (and even to extend the library with more
document types should they wish).
|
|
Only -I the config header directory if building the thirdparty library,
not if using the system library.
Fix bug 694808.
|
|
|
|
If MuPDF is used in a project using Subversion or another VCS adding
hidden subfolders to each folder, cmapdump breaks when trying to
load the subfolder as cmap file. This fix is required starting with
643370f04348569b5e5e577660031d638537671c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Run "bash scripts/gitsetup.sh" to set up the hooks after cloning.
|
|
|
|
|
|
|
|
When running under Windows, replace fopen with our own fopen_utf8
that converts from utf8 to unicode before calling the unicode
version of fopen.
|
|
|
|
Simple typo; trying to write to input file. Thanks to Moritz
Lipp for pointing out the problem.
|
|
|
|
|
|
Remove unused variable, silencing compiler warning.
No need to initialize variables twice.
Remove initialization of unread variable.
Remove unnecessary check for NULL.
Close output file upon error in cmapdump.
|
|
Previously, before interpreting a pages content stream we would
load it entirely into a buffer. Then we would interpret that
buffer. This has a cost in memory use.
Here, we update the code to read from a stream on the fly.
This has required changes in various different parts of the code.
Firstly, we have removed all use of the FILE lock - as stream
reads can now safely be interrupted by resource (or object) reads
from elsewhere in the file, the file lock becomes a very hard
thing to maintain, and doesn't actually benefit us at all. The
choices were to either use a recursive lock, or to remove it
entirely; I opted for the latter.
The file lock enum value remains as a placeholder for future use in
extendable data streams.
Secondly, we add a new 'concat' filter that concatenates a series of
streams together into one, optionally putting whitespace between each
stream (as the pdf parser requires this).
Finally, we change page/xobject/pattern content streams to work
on the fly, but we leave type3 glyphs using buffers (as presumably
these will be run repeatedly).
|