summaryrefslogtreecommitdiff
path: root/pdf/mupdf.h
AgeCommit message (Collapse)Author
2013-06-18Move header files into separate include directory.Tor Andersson
2013-06-09Remove fz_interactive API in favour of direct use of pdf APIPaul Gardiner
2013-04-11Convert UTF-8 passwords to correct encoding.Tor Andersson
PDFDocEncoding for crypt revisions <= 4, UTF-8 for newer.
2013-02-22Add fz_get_annot_typePaul Gardiner
2013-02-20Bug 693639: some convenience functions.Tor Andersson
Added primarily for use by SumatraPDF. Thanks to zeniko.
2013-02-06Change to pass structures by reference rather than value.Robin Watts
This is faster on ARM in particular. The primary changes involve fz_matrix, fz_rect and fz_bbox. Rather than passing 'fz_rect r' into a function, we now consistently pass 'const fz_rect *r'. Where a rect is passed in and modified, we miss the 'const' off. Where possible, we return the pointer to the modified structure to allow 'chaining' of expressions. The basic upshot of this work is that we do far fewer copies of rectangle/matrix structures, and all the copies we do are explicit. This has opened the way to other optimisations, also performed in this commit. Rather than using expressions like: fz_concat(fz_scale(sx, sy), fz_translate(tx, ty)) we now have fz_pre_{scale,translate,rotate} functions. These can be implemented much more efficiently than doing the fully fledged matrix multiplication that fz_concat requires. We add fz_rect_{min,max} functions to return pointers to the min/max points of a rect. These can be used to in transformations to directly manipulate values. With a little casting in the path transformation code we can avoid more needless copying. We rename fz_widget_bbox to the more consistent fz_bound_widget.
2013-01-31Add support for annotation creationPaul Gardiner
2013-01-30Parts of Robin's PDF editing/page creation commit useful for annotationsPaul Gardiner
2013-01-30Always pass value structs (rect, matrix, etc) as values not by pointer.Tor Andersson
2013-01-04Bug 693503: Fix stack overflows due to infinite recursion.Robin Watts
If a colorspace refers to itself as a base, we can get an infinite recursion and hence stack overflow. Thanks to zeniko for pointing out that this occurs in embedded CMAPs and stitching functions. Also solved here. To avoid having to keep a long list of the objects we've traversed through, extend the pdf_dict_mark functions to work on all pdf objects, and hence rename them as pdf_obj_mark etc. Thanks to zeniko again for feedback on this way of working. Problem found in a test file, 3882.pdf.SIGSEGV.99.3204 supplied by Mateusz "j00ru" Jurczyk and Gynvael Coldwind of the Google Security Team using Address Sanitizer. Many thanks!
2012-10-25Support separate rendering of the main page contents and the annotationsPaul Gardiner
2012-10-17First steps towards supporting transitions.Robin Watts
Only Fade, Wipe and Blinds supported so far. Hit 'p' in the viewer to go into 'presentation' mode. Page swaps then transition from page to page. Pages auto advance until key or mouse is used.
2012-10-08Bug 693350: Add some dreaded 'const's for string keys.Robin Watts
On the whole we avoid using const within MuPDF, but bug 693350 highlights cases where this can cause a problem with C++. In C, if you do: foo("bar"); then "bar" has type char *. In C++, if you do foo("bar"); then "bar" has type const char *. This means that any calls to the MuPDF library from C++ that take strings give warnings. The fix is simple, so it seems to be worthwhile adding a few consts. None of our internal data structures are affected in any way by this change. Thanks to Franz Fellner for pointing out this issue.
2012-09-06Add pdf_dict_puts_drop function.Robin Watts
Does the same as pdf_dict_puts, but guarantees to always drop the value passed in (even if the function throws an error). This allows calling code to have a much simpler life in some cases. Update pdf_write to make use of this. Also, fix pdf_dict_puts so it doesn't leak the key object that it creates in the event of an error while growing the dictionary.
2012-09-04Merge branch 'master' into formsPaul Gardiner
Conflicts: pdf/pdf_xref_aux.c
2012-08-29Merge branch 'master' into formsPaul Gardiner
Conflicts: cbz/mucbz.c pdf/pdf_parse.c pdf/pdf_form.c xps/xps_zip.c
2012-08-28Add fz_open_document_with_stream function.Tor Andersson
Use a "magic" string for filetype detection: filename or mime-type.
2012-08-23Rename fz_new_name to pdf_new_name.Robin Watts
Should have been pdf_new_name ever since the pre 1.0 rename, but evidently we missed it.
2012-08-01Merge branch 'master' into formsPaul Gardiner
Conflicts: pdf/mupdf-internal.h pdf/pdf_font.c
2012-07-18Update pdf_to_utf8 to handle either a stream or a stringPaul Gardiner
Also change first argument from fz_context to pdf_document in each of pdf_to_utf8, pdf_to_utf8_name, pdf_to_ucs2 and pdf_to_ucs2_name
2012-07-06Remove debugging functions for release builds.Sebastian Rasmussen
2012-06-20Add better mechanism for enumerating annotation rectangles.Robin Watts
Rather than having a dedicated call to enumerate the rectangles for the annotations on a page, add an interface for enumerating annotations with accessor functions. Currently the only accessor function is the one to get the annotation rectangle. Use this new scheme in place of fz_bound_annots within mudraw. Also use this scheme to set the caret cursor in the viewer when over a data field.
2012-06-14Add -j flag to mudraw; create simple mujstest scripts automatically.Robin Watts
We add a new fz_bound_annots function (and associated pdf_bound_annots function) that calls a given callback with the page rectangle of the annotations on a given page. This is marked as being a 'temporary' function, so we can remove it/change it in future if required. It seems likely that we'll want to have some sort of 'iterate over annotations' function eventually, and this does the job for now. Add a -j flag to mudraw that outputs a simple mujstest script. For each page with annotations, the script jumps to that page, then for each annotation on the page, it sets some text to be entered, and clicks the annotation. In the case of text fields, this will cause the text to be entered into that text field; in the case of buttons it will execute the button. At the end of each page with annotations, the script is told to snapshot the page. These test scripts are not designed to be full tests, but they do at least provide an easy way for us to generate scripts where every field in our test suite is interacted with.
2012-06-13Remove unnecessary function and improve namingPaul Gardiner
2012-06-12A few general utility functions added for the sake of the forms workPaul Gardiner
2012-05-31Add linearization to pdf_write function.Robin Watts
Extend mupdfclean to have a new -l file that writes the file linearized. This should still be considered experimental When writing a pdf file, analyse object use, flatten resource use, reorder the objects, generate a hintstream and output with linearisaton parameters. This is enough for Acrobat to accept the file as being optimised for Fast Web View. We ought to add more tables to the hintstream in some cases, but I doubt anyone actually uses it, the spec is so badly written. Certainly acrobat accepts the file as being optimised for 'Fast Web View'. Update fz_dict_put to allow for us adding a reference to the dictionary that is the sole owner of that reference already (i.e. don't drop then keep something that has a reference count of just 1). Update pdf_load_image_stream to use the stm_buf from the xref if there is one. Update pdf_close_document to discard any stm_bufs it may be holding. Update fz_dict_put to be pdf_dict_put - this was missed in a renaming ages ago and has been inconsistent since.
2012-05-23Bring xref object and stream mutation functions back from the dead.Tor Andersson
Needs more work to use the linked list of free xref slots.
2012-05-11Split part of fz_document interface for pdf_document into separate file.Tor Andersson
Make a separate constructor function that does not link in the interpreter, so we can save space in the mubusy binary by not including the font and cmap resources.
2012-04-30Simple mupdfposter appRobin Watts
Divides large format pdfs into a new pdf with multiple pages, that tile the original PDF.
2012-04-28Move guts of pdfclean into new pdf_write function.Robin Watts
Expose pdf_write function through the document interface.
2012-03-13Rename some functions and accessors to be more consistent.Tor Andersson
Debug printing functions: debug -> print. Accessors: get noun attribute -> noun attribute. Find -> lookup when the returned value is not reference counted. pixmap_with_rect -> pixmap_with_bbox. We are reserving the word "find" to mean lookups that give ownership of objects to the caller. Lookup is used in other places where the ownership is not transferred, or simple values are returned. The rename is done by the sed script in scripts/rename3.sed
2012-03-07Splitting tweaks.Tor Andersson
2012-03-06Split fitz.h/mupdf.h into internal/external headers.Robin Watts
Attempt to separate public API from internal functions.
2012-02-26Move fz_obj to be pdf_obj.Robin Watts
Currently, we are in the slightly strange position of having the PDF specific object types as part of fitz. Here we pull them out into the pdf layer instead. This has been made possible by the recent changes to make the store no longer be tied to having fz_obj's as keys. Most of this work is a simple huge rename; to help customers who may have code that use such functions we have provided a sed script to do the renaming; scripts/rename2.sed. Various other small tweaks are required; the store used to have some debugging code that still required knowledge of fz_obj types - we extract that into a nicer 'type' based function pointer. Also, the type 3 font handling used to have an fz_obj pointer for type 3 resources, and therefore needed to know how to free this; this has become a void * with a function to free it.
2012-02-26Continued documentation improvements.Sebastian Rasmussen
More changes still to come.
2012-02-26Document the most commonly used interface functions.Sebastian Rasmussen
2012-02-25Revamp pdf lexing codeRobin Watts
A huge amount (20%+ on some files) of our runtime is spent in fz_atof. A survey of results on the net suggests we will get much better speed by writing our own atof. Part of the job of doing this involves parsing the string to identify the component parts of the number - ludicrously, we are already doing this as part of the lexing process, so it would make sense to do the atoi/atof as part of this process. In order to do this, we need somewhere to store the lexed results; rather than add a float * and an int * to every single pdf_lex call, we generalise the calls to pass a pdf_lexbuf * pointer instead of separate buffer/max/string length pointers. This should help us overall.
2012-02-25Rework image handling for on demand decodeRobin Watts
Introduce a new 'fz_image' type; this type contains rudimentary information about images (such as native, size, colorspace etc) and a function to call to get a pixmap of that image (with a size hint). Instead of passing pixmaps through the device interface (and holding pixmaps in the display list) we now pass images instead. The rendering routines therefore call fz_image_to_pixmap to get pixmaps to render, and fz_pixmap_drop those afterwards. The file format handling routines therefore need to produce images rather than pixmaps; xps and cbz currently just wrap pixmaps as images. PDF is more involved. The stream handling routines in PDF have been altered so that they can recognise when the last stream entry in a filter dictionary is an image decoding filter. Rather than applying this filter, they read and store the parameters into a pdf_image_params structure, and stop decoding at that point. This allows us to read the compressed data for an image into memory as a block. We can then restart the image decode process later. pdf_images therefore consist of the compressed image data for images. When a pixmap is requested for such an image, the code checks to see if we have one (of an appropriate size), and if not, decodes it. The size hint is used to determine whether it is possible to subsample the image; currently this is only supported for JPEGs, but we could add generic subsampling code later. In order to handle caching the produced images, various changes have been made to the store and the underlying hash table. Previously the store was indexed purely by fz_obj keys; we don't have an fz_obj key any more, so have extended the store by adding a concept of a key 'type'. A key type is a pointer to a set of functions that keep/drop/compare and make a hashable key from a key pointer. We make a pdf_store.c file that contains functions to offer the existing fz_obj based functions, and add a new 'type' for keys (based on the fz_image handle, and the subsample factor) in the pdf_image.c file. While working on this, a problem became apparent in the existing store codel; fz_obj objects had no protection on their reference counts, hence an interpreter thread could try to alter a ref count at the same time as a malloc caused an eviction from the store. This has been solved by using the alloc lock as protection. This in turn requires some tweaks to the code to make sure we don't try and keep/drop fz_obj's from the store code while the alloc lock is held. A side effect of this work is that when a hash table is created, we inform it what lock should be used to protect its innards (if any). If the alloc lock is used, the insert method knows to drop/retake it to allow it to safely expand the hash table. Callers to the hash functions have the responsibility of taking/dropping the appropriate lock, and ensuring that they cope with the possibility that insert might drop the alloc lock, causing race conditions.
2012-02-06Pass context to cmap and font descriptor functions.Tor Andersson
2012-02-03Add document interface.Tor Andersson
2012-02-01Tweak to previous pdf_decode_cmap fix.Robin Watts
More aesthetically pleasing version.
2012-01-31Fix big 692824: incorrect application of word space.Robin Watts
Word space should only be applied when the codepoint is 32, and is read from a single byte encoding region. Ghostscript gets this wrong too.
2012-01-27Rename pdf_xref type to pdf_document.Tor Andersson
2012-01-19Transform link rectangles by the hidden page CTM.Tor Andersson
2012-01-19Remove confusing optional 'password' argument to pdf_open_xref.Tor Andersson
Require that clients call pdf_needs_password/pdf_authenticate_password instead. For dumb clients, we still allow for decrypting a file with a blank password without calling those functions.
2012-01-19Multi-threading support for MuPDFRobin Watts
When we moved over to a context based system, we laid the foundation for a thread-safe mupdf. This commit should complete that process. Firstly, fz_clone_context is properly implemented so that it makes a new context, but shares certain sections (currently just the allocator, and the store). Secondly, we add locking (to parts of the code that have previously just had placeholder LOCK/UNLOCK comments). Functions to lock and unlock a mutex are added to the allocator structure; omit these (as is the case today) and no multithreading is (safely) possible. The context will refuse to clone if these are not provided. Finally we flesh out the LOCK/UNLOCK comments to be real calls of the functions - unfortunately this requires us to plumb fz_context into the fz_keep_storable function (and all the fz_keep_xxx functions that call it). This is the largest section of the patch. No changes expected to any test files.
2012-01-18Better handling of 'uncacheable' Type3 glyphs. Bug 692745.Robin Watts
Some Type 3 fonts contain glyphs that rely on inheriting various aspects of the graphics state from their calling code. (i.e. a glyph might use d0, then fill an area without setting a color first). While the spec is vague on this point, we believe that technically it is invalid. Previously mupdf defaulted all elements of the graphic state back when beginning to draw the glyph. This does not match what Acrobat does though, so we change the approach taken. We now watch (by use of bits in the device flags word) for the use of parts of the graphics state before it is set. If such use is detected, then we note that the glyph is 'uncacheable' and render it direct. This seems to match Acrobats behaviour.
2012-01-13Avoid infinite loops with XObjects.Robin Watts
Every xobject keeps a reference to the object from whence it came. This is marked/unmarked as it is executed. Thanks to Zeniko for spotting the potential problem.
2012-01-12Use the same coordinate system for pdf and xps pages in the interface.Tor Andersson
Move coordinate space tweaks into pdf_ and xps_run_page, and provide neutral pdf_ and xps_bound_page functions to return the page size as a zero-origined bounding box.
2012-01-10Automatically load page tree when accessing a page/page count.Sebastian Rasmussen