summaryrefslogtreecommitdiff
path: root/scripts/cmapdump.c
AgeCommit message (Collapse)Author
2016-09-01Simplify PDF resource caching table handling.Tor Andersson
2016-06-08Move to using size_t for all mallocs.Robin Watts
This has knock on effects in the store. fix
2016-05-13Introduce a general output context.Sebastian Rasmussen
This makes it possible to redirect standard out and standard error output streams to output streams of your liking. This means that now you can, in gdb, type: (gdb) call pdf_print_obj(ctx, fz_stdout(ctx), obj, 0) (gdb) call fflush(0) or when dealing with an unresolved indirect reference: (gdb) call pdf_print_obj(ctx, fz_stdout(ctx), pdf_resolve_indirect(ctx, ref), 0) (gdb) call fflush(0)
2016-04-28Introduce tuning context.Robin Watts
For now, just use it for controlling image decoding and image scaling.
2016-02-10Add build=sanitize option to makefile.Tor Andersson
2016-02-03Bug 696546: Add fast strtofRobin Watts
Take on a (slightly tweaked) version of Simon Reinhardt's patch. The actual logic is left entirely unchanged; minor changes have been made to the names of functions/types to avoid clashing in the cmapdump.c repeated inclusion. Currently this should really only affect xps files, as strtof is only used as fz_atof, and that's (effectively) all xps for now. I will look at updating lex_number to call this in future.
2016-01-13Add lots of consts.Robin Watts
In general, we should use 'const fz_blah' in device calls whenever the callee should not alter the fz_blah. Push this through. This shows up various places where we fz_keep and fz_drop these const things. I've updated the fz_keep and fz_drops with appropriate casts to remove the consts. We may need to do the union dance to avoid the consts for some compilers, but will only do that if required. I think this is nicer overall, even allowing for the const<->no const problems.
2015-12-14Fix Windows build; cmapdump.c requires fz_fopen_utf8Robin Watts
2015-05-15Support pdf files larger than 2Gig.Robin Watts
If FZ_LARGEFILE is defined when building, MuPDF uses 64bit offsets for files; this allows us to open streams larger than 2Gig. The downsides to this are that: * The xref entries are larger. * All PDF ints are held as 64bit things rather than 32bit things (to cope with /Prev entries, hint stream offsets etc). * All file positions are stored as 64bits rather than 32. The implementation works by detecting FZ_LARGEFILE. Some #ifdeffery in fitz/system.h sets fz_off_t to either int or int64_t as appropriate, and sets defines for fz_fopen, fz_fseek, fz_ftell etc as required. These call the fseeko64 etc functions on linux (and so define _LARGEFILE64_SOURCE) and the explicit 64bit functions on windows.
2015-02-17Add ctx parameter and remove embedded contexts for API regularity.Tor Andersson
Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2015-02-17Rename fz_close_* and fz_free_* to fz_drop_*.Tor Andersson
Rename fz_close to fz_drop_stream. Rename fz_close_archive to fz_drop_archive. Rename fz_close_output to fz_drop_output. Rename fz_free_* to fz_drop_*. Rename pdf_free_* to pdf_drop_*. Rename xps_free_* to xps_drop_*.
2014-09-22Fix 695467: Add and use fz_ftoa function (like dtoa but with floats).Tor Andersson
The dtoa function is for doubles (which is what MuJS uses) but for MuPDF we only need and want float precision in our output formatting.
2014-09-02Add locale-independent number formatting and parsing functions.Tor Andersson
2014-06-26Fix some compiler warnings on Android.Matt Holgate
Use intptr_t when casting between a jlong and a pointer to suppress errors about different size words. Add a 'u' suffix to unsigned values output by the cmap dump utility.
2014-05-10Fix 694698: Support 32-bit values in CMaps.Tor Andersson
Increasing the existing data structure to 32-bit values would bloat the data tables too much. Simplify the data structure and use three separate range tables for lookups -- one with small 16-bit to 16-bit range lookups, one with 32-bit range lookups, and a final one for one-to-many lookups. This loses the range-to-table optimization we had before, but even with the extra ranges this necessitates, the total size of the compiled binary CMap data is smaller than if we were to extend the previous scheme to 32 bits.
2014-03-19Implement our own vsnprintf variant.Tor Andersson
The primary motivator for this is so that we can print floating point values and get the full accuracy out, without having to print 1.5 as 1.5000000, and without getting 23e24 etc. We only support %c, %f, %d, %o, %x and %s currently. We only support the zero padding qualifier, for integers. We do support some extensions: %C turns values >=128 into UTF-8. %M prints a fz_matrix. %R prints a fz_rect. %P prints a fz_point. We also implement a fprintf variant on top of this to allow for consistent results when using fz_output. a
2014-01-07Introduce 'document handlers'.Robin Watts
We define a document handler for each file type (2 in the case of PDF, one to handle files with the ability to 'run' them, and one without). We then register these handlers with the context at startup, and then call fz_open_document... as usual. This enables people to select the document types they want at will (and even to extend the library with more document types should they wish).
2013-08-28make cmapdump ignore VCS foldersSimon Bünzli
If MuPDF is used in a project using Subversion or another VCS adding hidden subfolders to each folder, cmapdump breaks when trying to load the subfolder as cmap file. This fix is required starting with 643370f04348569b5e5e577660031d638537671c
2013-06-20Update source, makefiles and win32 projects.Tor Andersson
2013-06-18Merge common and internal headers into one.Tor Andersson
2013-06-18Move header files into separate include directory.Tor Andersson
2013-05-27Strip trailing whitespace.Tor Andersson
2013-05-17Add colorspace context dummy functions to cmapdump.cTor Andersson
2013-03-01Bug 693624: Ensure that windows copes with utf8 filenamesRobin Watts
When running under Windows, replace fopen with our own fopen_utf8 that converts from utf8 to unicode before calling the unicode version of fopen.
2012-06-25Warning fixes and various clean ups:Sebastian Rasmussen
Remove unused variable, silencing compiler warning. No need to initialize variables twice. Remove initialization of unread variable. Remove unnecessary check for NULL. Close output file upon error in cmapdump.
2012-05-08Switch to reading content streams on the fly during interpretation.Robin Watts
Previously, before interpreting a pages content stream we would load it entirely into a buffer. Then we would interpret that buffer. This has a cost in memory use. Here, we update the code to read from a stream on the fly. This has required changes in various different parts of the code. Firstly, we have removed all use of the FILE lock - as stream reads can now safely be interrupted by resource (or object) reads from elsewhere in the file, the file lock becomes a very hard thing to maintain, and doesn't actually benefit us at all. The choices were to either use a recursive lock, or to remove it entirely; I opted for the latter. The file lock enum value remains as a placeholder for future use in extendable data streams. Secondly, we add a new 'concat' filter that concatenates a series of streams together into one, optionally putting whitespace between each stream (as the pdf parser requires this). Finally, we change page/xobject/pattern content streams to work on the fly, but we leave type3 glyphs using buffers (as presumably these will be run repeatedly).
2012-04-09Bug 692977: Stop harmless thread debugging messages during cmapdumpRobin Watts
If compiled with -DDEBUG, cmapdump throws a large number of warnings regarding thread locking. These are harmless and can be ignored, but are, nonetheless, not pretty. Fixed here. Thanks to Bas Weelinck for the report.
2012-03-12More API tidying.Robin Watts
Make fz_clone_context copy existing AA settings. Add accessor function for fz_bitmap. Add more documentation for various functions/types.
2012-03-06Split fitz.h/mupdf.h into internal/external headers.Robin Watts
Attempt to separate public API from internal functions.
2012-03-01Add some more docs to fitz.hRobin Watts
Add docs for fz_store, fz_image, fz_halftones. Move fz_item definition into res_store.c as it does not need to be external. Rename fz_store_context to fz_keep_store_context to be consistent.
2012-02-13Add locking around freetype calls.Robin Watts
We only open one instance of freetype per document. We therefore have to ensure that only 1 call to it takes place at a time. We introduce a lock for this purpose (FZ_LOCK_FREETYPE), and arrange to take/release it as required. We also update the font context so it is properly shared.
2012-02-08Lock reworking.Robin Watts
This is a significant change to the use of locks in MuPDF. Previously, the user had the option of passing us lock/unlock functions for a single mutex as part of the allocation struct. Now we remove these entries from the allocation struct, and make a separate 'locks' struct. This enables people to use fz_alloc_default with locking. If multithreaded operation is required, then the user is required to create FZ_LOCK_MAX mutexes, which will be locked or unlocked by MuPDF calling the lock/unlock functions within the new fz_locks_context structure passed in at context creation. These mutexes are not required to be recursive (they may be, but MuPDF should never call them in this way). MuPDF avoids deadlocks by imposing a locking ordering on itself; a thread will never take lock n, if it already holds any lock i for which 0 <= i <= n. Currently, there are 4 locks used within MuPDF. Lock 0: The alloc lock; taken around all calls to user supplied (or default) allocation functions. Also taken around all accesses to the refs field of storable items. Lock 1: The store lock; taken whenever the store data structures (specifically the linked list pointers) are accessed. Lock 2: The file lock; taken whenever a thread is accessing the raw file. We use the debugging macros to insist that this is held whenever we do a file based seek or read. We also insist that this is never held when we resolve an indirect reference, as this can have the effect of moving the file pointer. Lock 3: The glyphcache lock; taken whenever a thread calls freetype, or accesses the glyphcache data structures. This introduces some complexities w.r.t type3 fonts. Locking can be hugely problematic, so to ease our minds as to the correctness of this code, we introduce some debugging macros. These compile away to nothing unless FITZ_DEBUG_LOCKING is defined. fz_assert_lock_held(ctx, lock) checks that we hold lock. fz_assert_lock_not_held(ctx, lock) checks that we do not hold lock. In addition fz_lock_debug_lock and fz_lock_debug_unlock are used on every fz_lock/fz_unlock to check the validity of the operation we are performing - in particular it checks that we do/do not already hold the lock we are trying to take/drop, and that by taking this lock we are not violating our defined locking order. The RESOLVE macro (used throughout the code to check whether we need to resolve an indirect reference) calls fz_assert_lock_not_held to ensure that we aren't about to resolve an indirect reference (and hence move the stream pointer) when the file is locked. In order to implement the file locking properly, pdf_open_stream (and friends) now lock the file as a side effect (because they fz_seek to the start of the stream). The lock is automatically dropped on an fz_close of such streams. Previously, the glyph cache was created in a context when it was first required; this presents problems as it can be shared between several contexts or not, depending on whether it is created before the contexts are cloned. We now always create it at startup, so it is always shared. This means that we need reference counting for the glyph caches. Added here. In fz_render_glyph, we take the glyph cache lock, and check to see whether the glyph is in the cache. If it is, we bump the refcount, drop the lock and returned the cached character. If it is not, we need to render the character. For freetype based fonts we keep the lock throughout the rendering process, thus ensuring that freetype is only called in a single threaded manner. For type3 fonts, however, we need to invoke the interpreter again to render the glyph streams. This can require reentrance to this routine. We therefore drop the glyph cache lock, call the interpreter to render us our pixmap, and take the lock again. This dropping and retaking of the lock introduces a possible race condition; 2 threads may try to render the same character at the same time. We therefore modify our hash table insert routines to behave differently if it comes to insert an entry only to find that an entry with the same key is already there. We spot this case; if we have just rendered a type3 glyph and when we try to insert it into the cache discover that someone has beaten us to it, we just discard our entry and use the cached one. Hopefully this will seldom be a problem in practise; to solve it properly would require greater complexity (probably involving spotting that another thread is already working on the desired rendering, and sleeping on a semaphore until it completes).
2012-02-06Pass context to cmap and font descriptor functions.Tor Andersson
2012-01-19Multi-threading support for MuPDFRobin Watts
When we moved over to a context based system, we laid the foundation for a thread-safe mupdf. This commit should complete that process. Firstly, fz_clone_context is properly implemented so that it makes a new context, but shares certain sections (currently just the allocator, and the store). Secondly, we add locking (to parts of the code that have previously just had placeholder LOCK/UNLOCK comments). Functions to lock and unlock a mutex are added to the allocator structure; omit these (as is the case today) and no multithreading is (safely) possible. The context will refuse to clone if these are not provided. Finally we flesh out the LOCK/UNLOCK comments to be real calls of the functions - unfortunately this requires us to plumb fz_context into the fz_keep_storable function (and all the fz_keep_xxx functions that call it). This is the largest section of the patch. No changes expected to any test files.
2012-01-11Use enum for FZ_STORE_DEFAULT default size.Tor Andersson
2012-01-11Hide glyph cache in context.Tor Andersson
2011-12-15Add scavenging functionality.Robin Watts
When fz_malloc (etc) are about to fail, we try to scavenge memory from the store and then retry. We repeatedly try to bin objects from the store until the malloc succeeds, or until we have nothing else to bin. This means we no longer need the 'aging' of the store, so this is removed.
2011-12-15Rework pdf_store to fz_store, a part of fz_context.Robin Watts
Firstly, we rename pdf_store to fz_store, reflecting the fact that there are no pdf specific dependencies on it. Next, we rework it so that all the objects that can be stored in the store start with an fz_storable structure. This consists of a reference count, and a function used to free the object when the reference count reaches zero. All the keep/drop functions are then reimplemented by calling fz_keep_sharable/fz_drop_sharable. The 'drop' functions as supplied by the callers are thus now 'free' functions, only called if the reference count drops to 0. The store changes to keep all the items in the store in the linked list (which becomes a doubly linked one). We still make use of the hashtable to index into this list quickly, but we now have the objects in an LRU ordering within the list. Every object is put into the store, with a size record; this is an estimate of how much memory would be freed by freeing that object. The store is moved into the context and given a maximum size; when new things are inserted into the store, care is taken to ensure that we do not expand beyond this size. We evict any stored items (that are not in use) starting from the least recently used. Finding an object in the store now takes a reference to it already. LOCK and UNLOCK comments are used to indicate where locks need to be taken and released to ensure thread safety.
2011-12-08Stylistic changes when testing pointer values for NULL.Tor Andersson
Also: use 'cannot' instead of 'failed to' in error messages.
2011-12-08Remove remaining fz_error_note calls in the pdf code.Tor Andersson
2011-12-06Move antialias levels into context.Robin Watts
In builds that support configurable layers of antialiasing, move the variables that control this into the context. This makes it possible to safely use different levels of antialiasing in different threads.
2011-11-28Move Freetype globals into context.Robin Watts
Freetype globals are not shared between threads currently - to do that we'll need to introduce a lock.
2011-11-15Merge branch 'master' into contextRobin Watts
Mostly redoing the xps_context to xps_document change and adding contexts to newly written code. Conflicts: apps/pdfapp.c apps/pdfapp.h apps/x11_main.c apps/xpsdraw.c draw/draw_device.c draw/draw_scale.c fitz/base_object.c fitz/fitz.h pdf/mupdf.h pdf/pdf_interpret.c pdf/pdf_outline.c pdf/pdf_page.c xps/muxps.h xps/xps_doc.c xps/xps_xml.c
2011-11-01Tweak build scripts for iOS viewer.Tor Andersson
2011-10-04Reintroduce alloc context section.Robin Watts
This was removed during a previous commit to make the editing easier. Now added back in.
2011-10-04Move to exception handling rather than error passing throughout.Robin Watts
This frees us from passing errors back everywhere, and hence enables us to pass results back as return values. Rather than having to explicitly check for errors everywhere and bubble them, we now allow exception handling to do the work for us; the downside to this is that we no longer emit as much debugging information as we did before (though this could be put back in). For now, the debugging information we have lost has been retained in comments with 'RJW:' at the start. This code needs fuller testing, but is being committed as a work in progress.
2011-09-21Rename malloc functions for arrays (fz_calloc and fz_realloc).Tor Andersson
2011-09-21Don't thread ctx through safe fz_obj functions.Tor Andersson
2011-09-20Reshuffle exception context code to fit Tor's aesthetic sense.Tor Andersson
2011-09-15Add context to mupdf.Robin Watts
Huge pervasive change to lots of files, adding a context for exception handling and allocation. In time we'll move more statics into there. Also fix some for(i = 0; i < function(...); i++) calls.