summaryrefslogtreecommitdiff
path: root/pdf/pdf_interpret.c
AgeCommit message (Collapse)Author
2012-07-05Cope with illegal alpha values (out of range)Robin Watts
normal_457.pdf specifies an alpha value of 1.005, which causes overflows when calculating values. Simply clamp values into the 0..1 range.
2012-06-27Fix text clipping operations in pdf.Robin Watts
In PDF the text rendering mode can have bit 2 set to mean "add to clipping path". Experiments (and in particular normal_130.pdf) show that the text should be stroked and/or filled BEFORE the path is added to the clipping path.
2012-06-25Warning fixes and various clean ups:Sebastian Rasmussen
Remove unused variable, silencing compiler warning. No need to initialize variables twice. Remove initialization of unread variable. Remove unnecessary check for NULL. Close output file upon error in cmapdump.
2012-06-22Rework pdf_lexbuf to allow for dynamic parsing buffers.Robin Watts
Currently pdf_lexbufs use a static scratch buffer for parsing. In the main case this is 64K in size, but in other cases it can be just 256 bytes; this causes problems when parsing long strings. Even the 64K limit is an implementation limit of Acrobat, not an architectural limit of PDF. Change here to allow dynamic buffers. This means a slightly more complex setup and destruction for each buffer, but more importantly requires correct cleanup on errors. To avoid having to insert lots more try/catch clauses this commit includes various changes to the code so we reuse pdf_lexbufs where possible. This keeps the speed up.
2012-06-12Followup to commit 120dadb; improved error handling during interpretation.Robin Watts
After commit 120dadb, it's far too easy to get into a seemingly infinite loop while processing a corrupt file. We fix this by changing the process to abort when we receive an invalid keyword. Also, we add another layer of nesting to pdf_run_stream to avoid us push/popping an fz_try level on every keyword.
2012-06-11Fix Bug 693099: Render failure due to corrupt jpeg data.Robin Watts
The file supplied with the bug contains corrupt jpeg data on page 61. This causes an error to be thrown which results in mudraw exiting. Previously, when image decode was done at loading time, the error would have been thrown under the pdf interpreter rather than under the display list renderer. This error would have been caught, a warning given, and the program would have continued. This is not ideal behaviour, as there is no way for a caller to know that there was a problem, and that the image is potentially incomplete. The solution adopted here, solves both these problems. The fz_cookie structure is expanded to include a 'errors' count. Whenever we meet an error during rendering, we increment the 'errors' count, and continue. This enables applications to spot the errors count being non-zero on exit and to display a warning. mupdf is updated here to pass a cookie in and to check the error count at the end; if it is found to be non zero, then a warning is given (just once per visit to each page) to say that the page may have errors on it.
2012-05-08Switch to reading content streams on the fly during interpretation.Robin Watts
Previously, before interpreting a pages content stream we would load it entirely into a buffer. Then we would interpret that buffer. This has a cost in memory use. Here, we update the code to read from a stream on the fly. This has required changes in various different parts of the code. Firstly, we have removed all use of the FILE lock - as stream reads can now safely be interrupted by resource (or object) reads from elsewhere in the file, the file lock becomes a very hard thing to maintain, and doesn't actually benefit us at all. The choices were to either use a recursive lock, or to remove it entirely; I opted for the latter. The file lock enum value remains as a placeholder for future use in extendable data streams. Secondly, we add a new 'concat' filter that concatenates a series of streams together into one, optionally putting whitespace between each stream (as the pdf parser requires this). Finally, we change page/xobject/pattern content streams to work on the fly, but we leave type3 glyphs using buffers (as presumably these will be run repeatedly).
2012-04-19Bug 692847: Tiling problem fix.Robin Watts
The test in mupdf for 'is more than one tile needed' is wrong, as it assumes that tile bboxes start at 0. Fix that, and everything else should work OK.
2012-03-14Bug 692917: Move to dynamic stroke_states.Robin Watts
Move fz_stroke_state from being a simple structure whose contents are copied repeatedly to being a dynamically allocated reference counted object so we can cope with large numbers of entries in the dash array.
2012-03-13Rename some functions and accessors to be more consistent.Tor Andersson
Debug printing functions: debug -> print. Accessors: get noun attribute -> noun attribute. Find -> lookup when the returned value is not reference counted. pixmap_with_rect -> pixmap_with_bbox. We are reserving the word "find" to mean lookups that give ownership of objects to the caller. Lookup is used in other places where the ownership is not transferred, or simple values are returned. The rename is done by the sed script in scripts/rename3.sed
2012-03-06Split fitz.h/mupdf.h into internal/external headers.Robin Watts
Attempt to separate public API from internal functions.
2012-02-29Fix trailing whitespace and mixed tabs/spaces in indentation.Tor Andersson
2012-02-26Move fz_obj to be pdf_obj.Robin Watts
Currently, we are in the slightly strange position of having the PDF specific object types as part of fitz. Here we pull them out into the pdf layer instead. This has been made possible by the recent changes to make the store no longer be tied to having fz_obj's as keys. Most of this work is a simple huge rename; to help customers who may have code that use such functions we have provided a sed script to do the renaming; scripts/rename2.sed. Various other small tweaks are required; the store used to have some debugging code that still required knowledge of fz_obj types - we extract that into a nicer 'type' based function pointer. Also, the type 3 font handling used to have an fz_obj pointer for type 3 resources, and therefore needed to know how to free this; this has become a void * with a function to free it.
2012-02-25Revamp pdf lexing codeRobin Watts
A huge amount (20%+ on some files) of our runtime is spent in fz_atof. A survey of results on the net suggests we will get much better speed by writing our own atof. Part of the job of doing this involves parsing the string to identify the component parts of the number - ludicrously, we are already doing this as part of the lexing process, so it would make sense to do the atoi/atof as part of this process. In order to do this, we need somewhere to store the lexed results; rather than add a float * and an int * to every single pdf_lex call, we generalise the calls to pass a pdf_lexbuf * pointer instead of separate buffer/max/string length pointers. This should help us overall.
2012-02-25Rework image handling for on demand decodeRobin Watts
Introduce a new 'fz_image' type; this type contains rudimentary information about images (such as native, size, colorspace etc) and a function to call to get a pixmap of that image (with a size hint). Instead of passing pixmaps through the device interface (and holding pixmaps in the display list) we now pass images instead. The rendering routines therefore call fz_image_to_pixmap to get pixmaps to render, and fz_pixmap_drop those afterwards. The file format handling routines therefore need to produce images rather than pixmaps; xps and cbz currently just wrap pixmaps as images. PDF is more involved. The stream handling routines in PDF have been altered so that they can recognise when the last stream entry in a filter dictionary is an image decoding filter. Rather than applying this filter, they read and store the parameters into a pdf_image_params structure, and stop decoding at that point. This allows us to read the compressed data for an image into memory as a block. We can then restart the image decode process later. pdf_images therefore consist of the compressed image data for images. When a pixmap is requested for such an image, the code checks to see if we have one (of an appropriate size), and if not, decodes it. The size hint is used to determine whether it is possible to subsample the image; currently this is only supported for JPEGs, but we could add generic subsampling code later. In order to handle caching the produced images, various changes have been made to the store and the underlying hash table. Previously the store was indexed purely by fz_obj keys; we don't have an fz_obj key any more, so have extended the store by adding a concept of a key 'type'. A key type is a pointer to a set of functions that keep/drop/compare and make a hashable key from a key pointer. We make a pdf_store.c file that contains functions to offer the existing fz_obj based functions, and add a new 'type' for keys (based on the fz_image handle, and the subsample factor) in the pdf_image.c file. While working on this, a problem became apparent in the existing store codel; fz_obj objects had no protection on their reference counts, hence an interpreter thread could try to alter a ref count at the same time as a malloc caused an eviction from the store. This has been solved by using the alloc lock as protection. This in turn requires some tweaks to the code to make sure we don't try and keep/drop fz_obj's from the store code while the alloc lock is held. A side effect of this work is that when a hash table is created, we inform it what lock should be used to protect its innards (if any). If the alloc lock is used, the insert method knows to drop/retake it to allow it to safely expand the hash table. Callers to the hash functions have the responsibility of taking/dropping the appropriate lock, and ensuring that they cope with the possibility that insert might drop the alloc lock, causing race conditions.
2012-02-08Lock reworking.Robin Watts
This is a significant change to the use of locks in MuPDF. Previously, the user had the option of passing us lock/unlock functions for a single mutex as part of the allocation struct. Now we remove these entries from the allocation struct, and make a separate 'locks' struct. This enables people to use fz_alloc_default with locking. If multithreaded operation is required, then the user is required to create FZ_LOCK_MAX mutexes, which will be locked or unlocked by MuPDF calling the lock/unlock functions within the new fz_locks_context structure passed in at context creation. These mutexes are not required to be recursive (they may be, but MuPDF should never call them in this way). MuPDF avoids deadlocks by imposing a locking ordering on itself; a thread will never take lock n, if it already holds any lock i for which 0 <= i <= n. Currently, there are 4 locks used within MuPDF. Lock 0: The alloc lock; taken around all calls to user supplied (or default) allocation functions. Also taken around all accesses to the refs field of storable items. Lock 1: The store lock; taken whenever the store data structures (specifically the linked list pointers) are accessed. Lock 2: The file lock; taken whenever a thread is accessing the raw file. We use the debugging macros to insist that this is held whenever we do a file based seek or read. We also insist that this is never held when we resolve an indirect reference, as this can have the effect of moving the file pointer. Lock 3: The glyphcache lock; taken whenever a thread calls freetype, or accesses the glyphcache data structures. This introduces some complexities w.r.t type3 fonts. Locking can be hugely problematic, so to ease our minds as to the correctness of this code, we introduce some debugging macros. These compile away to nothing unless FITZ_DEBUG_LOCKING is defined. fz_assert_lock_held(ctx, lock) checks that we hold lock. fz_assert_lock_not_held(ctx, lock) checks that we do not hold lock. In addition fz_lock_debug_lock and fz_lock_debug_unlock are used on every fz_lock/fz_unlock to check the validity of the operation we are performing - in particular it checks that we do/do not already hold the lock we are trying to take/drop, and that by taking this lock we are not violating our defined locking order. The RESOLVE macro (used throughout the code to check whether we need to resolve an indirect reference) calls fz_assert_lock_not_held to ensure that we aren't about to resolve an indirect reference (and hence move the stream pointer) when the file is locked. In order to implement the file locking properly, pdf_open_stream (and friends) now lock the file as a side effect (because they fz_seek to the start of the stream). The lock is automatically dropped on an fz_close of such streams. Previously, the glyph cache was created in a context when it was first required; this presents problems as it can be shared between several contexts or not, depending on whether it is created before the contexts are cloned. We now always create it at startup, so it is always shared. This means that we need reference counting for the glyph caches. Added here. In fz_render_glyph, we take the glyph cache lock, and check to see whether the glyph is in the cache. If it is, we bump the refcount, drop the lock and returned the cached character. If it is not, we need to render the character. For freetype based fonts we keep the lock throughout the rendering process, thus ensuring that freetype is only called in a single threaded manner. For type3 fonts, however, we need to invoke the interpreter again to render the glyph streams. This can require reentrance to this routine. We therefore drop the glyph cache lock, call the interpreter to render us our pixmap, and take the lock again. This dropping and retaking of the lock introduces a possible race condition; 2 threads may try to render the same character at the same time. We therefore modify our hash table insert routines to behave differently if it comes to insert an entry only to find that an entry with the same key is already there. We spot this case; if we have just rendered a type3 glyph and when we try to insert it into the cache discover that someone has beaten us to it, we just discard our entry and use the cached one. Hopefully this will seldom be a problem in practise; to solve it properly would require greater complexity (probably involving spotting that another thread is already working on the desired rendering, and sleeping on a semaphore until it completes).
2012-02-06Pass context to cmap and font descriptor functions.Tor Andersson
2012-02-03Be consistent about passing a fz_context in path/text/shade functions.Tor Andersson
2012-02-03Remove extraneous blank lines.Tor Andersson
2012-02-01Tweak to previous pdf_decode_cmap fix.Robin Watts
More aesthetically pleasing version.
2012-01-31Fix big 692824: incorrect application of word space.Robin Watts
Word space should only be applied when the codepoint is 32, and is read from a single byte encoding region. Ghostscript gets this wrong too.
2012-01-27Rename pdf_xref type to pdf_document.Tor Andersson
2012-01-20Flip images the right side up in the PDF interpreter.Tor Andersson
This way both pixmaps for rendering and image data are top-down.
2012-01-19Transform link rectangles by the hidden page CTM.Tor Andersson
2012-01-19Multi-threading support for MuPDFRobin Watts
When we moved over to a context based system, we laid the foundation for a thread-safe mupdf. This commit should complete that process. Firstly, fz_clone_context is properly implemented so that it makes a new context, but shares certain sections (currently just the allocator, and the store). Secondly, we add locking (to parts of the code that have previously just had placeholder LOCK/UNLOCK comments). Functions to lock and unlock a mutex are added to the allocator structure; omit these (as is the case today) and no multithreading is (safely) possible. The context will refuse to clone if these are not provided. Finally we flesh out the LOCK/UNLOCK comments to be real calls of the functions - unfortunately this requires us to plumb fz_context into the fz_keep_storable function (and all the fz_keep_xxx functions that call it). This is the largest section of the patch. No changes expected to any test files.
2012-01-18Better handling of 'uncacheable' Type3 glyphs. Bug 692745.Robin Watts
Some Type 3 fonts contain glyphs that rely on inheriting various aspects of the graphics state from their calling code. (i.e. a glyph might use d0, then fill an area without setting a color first). While the spec is vague on this point, we believe that technically it is invalid. Previously mupdf defaulted all elements of the graphic state back when beginning to draw the glyph. This does not match what Acrobat does though, so we change the approach taken. We now watch (by use of bits in the device flags word) for the use of parts of the graphics state before it is set. If such use is detected, then we note that the glyph is 'uncacheable' and render it direct. This seems to match Acrobats behaviour.
2012-01-13Avoid infinite loops with XObjects.Robin Watts
Every xobject keeps a reference to the object from whence it came. This is marked/unmarked as it is executed. Thanks to Zeniko for spotting the potential problem.
2012-01-13Allow the interpreter gstate stack to grow.Robin Watts
Copes with files with many many gstates in; such as 'tikz-gtree' documents, according to Sumatra. Thanks to Zeniko for this.
2012-01-12Support proper XPS mitering. (And stroke fixes).Robin Watts
XPS differs from PS/PDF/etc in the way it handles miters; rather than simply converting a miter that's overly long to a bevel, it truncates it at the miter limit. As such it needs to be handled correctly. For clarity, expose new enumerated types for linejoins and linecaps, and use these throughout code. When we upgrade our freetype, we can move to using proper xps mitering in that too. Add new fz_matrix_max_expansion function to return a safer expansion value that works in the case where we scale up in one direction and down in another. In the xps path drawing code, avoid generating unnecessary linetos. Thanks to Zeniko for spotting these and providing implementations.
2012-01-12Use the same coordinate system for pdf and xps pages in the interface.Tor Andersson
Move coordinate space tweaks into pdf_ and xps_run_page, and provide neutral pdf_ and xps_bound_page functions to return the page size as a zero-origined bounding box.
2012-01-11Calculate accurate per-glyph bounding boxes for fz_bound_text.Tor Andersson
2012-01-06PDF fixes/tweaks.Robin Watts
Fix 2 places where we were filling a stroked pattern rather than stroking it. Cope with being asked to run a NULL buffer. If running a stream fails, warn and return what we have, rather than giving up entirely. Should really set a return code for each render. Only look at the Print flag bit for Print renders. Only look at the View flag bit for view renders. If we find an unexpected ) or > during object parsing, warn and continue rather than giving up entirely. If optional content groups are broken, render the rest of the page anyway. Previously indirect objects that point to another indirection would cause a failure; now attempt to resolve these. We set an arbitrary limit of 10 such redirections to avoid infinite loops.
2012-01-04Bug 692739: Add ability to abort time consuming actionsRobin Watts
A new 'cookie' parameter is added to page rendering/interpretation functions. Supply this as NULL to get existing behaviour. If you supply a non-NULL cookie, then this is taken as a pointer to a struct that can be used for simple, non-thread locked communication between caller and library. The entire struct should be memset to zero before entry, except for specific flags (thus coping with future extensions to this struct). The abort flag should be zero on entry. It will be checked periodically by the library - if the caller sets it non-zero (via another thread) then the current operation will be aborted. No guarantees are given as to how often this will be checked, or how fast it will be responded to. The progress_max field will be set to an integer (-1 for unknown) representing the number of 'things' to do. The progress field will count up from 0 to this number as time goes by. No guarantees are made as to the accuracy of this information, but it should be useful for offering some sort of progress bar etc. Note that progress_max may increase during the job. In general, callers should be careful to accept out of range or invalid data in this structure as this is deliberately accessed 'unlocked'.
2011-12-19More Memory squeezing fixesRobin Watts
2011-12-17Change stream 'close' functions to facilitate error cleanup.Robin Watts
Rather than passing a stream to a close function, just pass context and state - that's all that is required. This enables us to call close to cleanup neatly if the stream fails to allocate.
2011-12-16More memsqueezing fixes.Robin Watts
2011-12-16More memsqueezing fixes.Robin Watts
2011-12-16Mote memsqueezing fixes.Robin Watts
2011-12-16Add fz_malloc_struct, and make code use it.Robin Watts
The new fz_malloc_struct(A,B) macro allocates sizeof(B) bytes using fz_malloc, and then passes the resultant pointer to Memento_label to label it with "B". This costs nothing in non-memento builds, but gives much nicer listings of leaked blocks when memento is enabled.
2011-12-08Stylistic changes when testing pointer values for NULL.Tor Andersson
Also: use 'cannot' instead of 'failed to' in error messages.
2011-12-08Move from volatile to fz_var.Robin Watts
When using exceptions (which are implemented using setjmp/longjmp), we need to be careful to ensure that variable values get written back before any exception happens. Previously we've done that using volatile, but that produces nasty warnings (and unduly limits the compilers freedom to optimise). Here we introduce a new macro fz_var that passes the address of the variable out of scope. This means that the compiler has to ensure that any changes to its value are written back to memory before calling any out of scope function.
2011-12-07Fix tile coverage calculations.Robin Watts
The code attempts to spot cases where a pattern tile is so large that only 1 repeat is visible. Due to rounding errors, this test could sometimes fail, and (on badly formed files) we'd attempt to allocate huge pixmaps. The fix is to allow for rounding errors.
2011-11-25Merge branch 'master' into contextRobin Watts
2011-11-25First cut at support for OCGs in mupdf (bug 692314)Robin Watts
When opening a file, create a pdf_ocg_descriptor that lists the OCGs in a file. Add a new function to allow us to set the configuration in use (currently just the default one). This sets the states of the OCGs as appropriate. When decoding the file respect the states of the OCGs. This results in Invite.pdf rendering correctly. There is more to be done in this area (with automatic setting of OCGs by language/zoom level etc), but this is a good start.
2011-11-24Fix *STUPID* error in recent clipping changes.Robin Watts
Once we've applied the clipping path, don't clip again on every subsequent path.
2011-11-15Fix clipping error.Robin Watts
When reverting the clip path handling, I made a mistake. We need to set up the clip before starting any local group to ensure correct nesting.
2011-11-15Fix clipping error.Robin Watts
When reverting the clip path handling, I made a mistake. We need to set up the clip before starting any local group to ensure correct nesting.
2011-11-15Merge branch 'master' into contextRobin Watts
Mostly redoing the xps_context to xps_document change and adding contexts to newly written code. Conflicts: apps/pdfapp.c apps/pdfapp.h apps/x11_main.c apps/xpsdraw.c draw/draw_device.c draw/draw_scale.c fitz/base_object.c fitz/fitz.h pdf/mupdf.h pdf/pdf_interpret.c pdf/pdf_outline.c pdf/pdf_page.c xps/muxps.h xps/xps_doc.c xps/xps_xml.c
2011-11-14mupdf clip path handling; revert commit 2f8acb0Robin Watts
In commit 2f8acb0, we tweaked mupdf's clip path handling so that clip paths were resolved as soon as the operator for them was called; this protected against subsequent changes to the path happening before something else was drawn ready for clipping. Unfortunately, various PDF files out there seem to rely on the fact that they can call the 'W' operator before fully defining the path, and that the region that will be clipped is given by the final path, not the one that was in place when the operator was called. We therefore revert back to the old behaviour.
2011-10-04Move to exception handling rather than error passing throughout.Robin Watts
This frees us from passing errors back everywhere, and hence enables us to pass results back as return values. Rather than having to explicitly check for errors everywhere and bubble them, we now allow exception handling to do the work for us; the downside to this is that we no longer emit as much debugging information as we did before (though this could be put back in). For now, the debugging information we have lost has been retained in comments with 'RJW:' at the start. This code needs fuller testing, but is being committed as a work in progress.