summaryrefslogtreecommitdiff
path: root/source/pdf/pdf-page.c
AgeCommit message (Collapse)Author
2017-06-08Plug leak of PDF page object.Sebastian Rasmussen
2017-04-27Typedef function pointers consistently.Tor Andersson
2017-04-27Include required system headers.Tor Andersson
2017-04-11Bug 697662: Support named actions to first/last/next/previous page.Sebastian Rasmussen
2017-03-23Introduce fz_new_derived_...Robin Watts
Instead of having fz_new_XXXX(ctx, type, ...) macros that call fz_new_XXXX_of_size etc, use fz_new_derived_... Clearer naming, and doesn't clash with fz_new_document_writer.
2017-03-22Update fz_new_page.Robin Watts
Move this into the same style as fz_new_document and fz_new_image.
2017-03-01Add page lookup cache for faster link destination lookups in outlines.Tor Andersson
Loading outlines wants to look up all link destinations, and doing the normal link destination lookups triggers loading all page objects used. This means we need to parse a lot of objects, which can be quite slow. We can load the page tree faster by only looking at intermediate page tree nodes. If we load the page tree and create a reverse lookup table for use when loading the outline, we can speed up the time to run 'mutool show pdfref17.pdf outline' from 900ms to 100ms.
2017-01-17Fix typos.Sebastian Rasmussen
2017-01-09Remove some dead code.Tor Andersson
2016-12-23Cope with NULL resource and concents arguments to pdf_add_page.Tor Andersson
2016-11-16pdf: Add x and y output parameters to pdf_lookup_anchor.Tor Andersson
2016-11-16pdf: Use '#page=N' for remote destination pages.Tor Andersson
As per Adobe's recommendation: https://helpx.adobe.com/acrobat/kb/link-html-pdf-page-acrobat.html
2016-11-16pdf: Add 'compressed/raw' flag to pdf_add_stream.Tor Andersson
Also expose the argument to JS and JNI.
2016-11-14Fix return in void function.Tor Andersson
2016-11-14Add/fix page coordinates to link targets.Tor Andersson
Correctly transformed target coordinates for PDF. Target coordinates for EPUB and HTML.
2016-11-14Add optional 'object' argument to pdf_add_stream.Tor Andersson
2016-11-02Fix 697284: The origin was incorrectly calculated for rotated pages.Tor Andersson
2016-10-28Clean up link destination handling.Tor Andersson
All link destinations should be URIs, and a document specific function can be called to resolve them to actual page numbers. Outlines have cached page numbers as well as string URIs.
2016-10-18Internal drop functions don't need to check for NULL.Sebastian Rasmussen
2016-10-18Avoid checking argument to fz_drop_*()/fz_free().Sebastian Rasmussen
As fz_drop_*()/fz_free() all must handle NULL.
2016-10-07Remove separate tmp/deleted/changed annotation lists.Tor Andersson
Use a flag in the pdf_annot struct instead. Don't pass pdf_document to annotation edit functions.
2016-07-08Separate close and drop functionality for devices and writers.Tor Andersson
Closing a device or writer may throw exceptions, but much of the foreign language bindings (JNI and JS) depend on drop to never throw an exception (exceptions in finalizers are bad).
2016-07-08Slim pdf_xobject struct: remove cached resources field.Tor Andersson
The "contents" field is the same as the "obj" field, so can also be removed.
2016-07-08Slim pdf_annot struct: remove cached page_ctm field.Tor Andersson
2016-07-06Start slimming pdf_page.Tor Andersson
We want to turn pdf_page into a thin wrapper around a pdf_obj, so that any updates to the underlying PDF objects will be reflected without having to reload the pdf_page.
2016-07-06Fix garbage collection and page grafting for indirect reference chains.Tor Andersson
The mark & sweep pass of garbage collection, and resolving indirect objects when grafting objects was following the full chain of indirect references. In the unusual case where a numbered object is itself only an indirect reference to another object, this intermediate numbered object would be missed both when marking for garbage collection, and when copying objects for grafting. Add a function to resolve only one step for these two uses. The following is an example of a file that would break during garbage collection if we follow full indirect reference chains: %PDF-1.3 1 0 obj <</Type/Catalog /Foo[2 0 R 3 0 R]>> endobj 2 0 obj 4 0 R endobj 3 0 obj 5 0 R endobj 4 0 obj <</Length 1>> stream A endstream endobj 5 0 obj <</Length 1>> stream B endstream endobj
2016-07-06pdf: Flatten inheritable page properties when copying pages.Tor Andersson
Affects pdfclean, pdfmerge, and pdfposter.
2016-05-06Fix pdf_delete_page_range.Robin Watts
And improve the header file commenting.
2016-04-27Fix 696649: remove fz_rethrow_message calls.Tor Andersson
2016-04-26Change order of arguments to pdf_add_page etc.Tor Andersson
Resources are defined before they are used; so it's only logical to have the resource dictionary before the content buffer in the argument list.
2016-04-12Fix some warnings.Robin Watts
Remove some bonkers conditions arising (presumably) as a result of search and replace.
2016-03-02js: Fix reference counting errors.Tor Andersson
2016-03-01Don't use pdf_page struct when creating pages.Tor Andersson
2016-03-01Rename pdf_new_ref to pdf_add_object.Tor Andersson
2016-02-29Pass mediabox to pdf_create_page by const pointer, and pass resources.Tor Andersson
2016-02-29Change order of arguments to pdf_create_page.Tor Andersson
2016-02-29Add mutool create tool, and PDF font and image resource creation.Michael Vrhel
Initial framework for creating pdfs This adds a create option to mutool for us to use in working on the API for creating content as well as adding content to existing documents. mutool create: Get page sizes and add them Start the parsing of the contents.txt file which may have multiple page information. Add the pages at the proper sizes. Further work on mutool create_pdf Remove the calls that were being made to the pdf-write device. Clean up several issues with the reading of the page contents. Get the content streams for each page associated with the page->contents Temp. created a pdf_create_page_contents procedure. I will merge this with pdf_create_page as there is significant overlap. Next is to add in the font and image resources and indirect references. Include pdfcreate in build Merge pdf_create_page_contents and pdf_create_page Add support for images in pdfcreate This adds images to the pdf document using a function stolen from pdf-device (send_image). This was renamed pdf_add_image_res and added to pdf-image. Down the road, send-image will be removed. Prior to that, I need to work on making sure that multiple copies of the same image do not end up in the document. Code was also added to create the page resources to point to the proper image in the document. Next fonts will be added in a similar manner, then I will work on computing the md5 sums of image and fonts to ensure only one copy ends up in the document. Then pdf-write will be reworked to use the same code as opposed to its current list of md5 sums that are stored in a device structure. mutool pdfcreate: support for WinAnsiEncoded fonts Added support for very simple fonts (WinAnsiEncoding). Methods added in pdf-font.c. Added first_width and last_width to fz_font_s and stem_v to pdf_font_desc_s. Ran code through memento with simple test of 4 page document creation including an image and a font. Fixed several leaks as well as buffer corruption issues (main changes in pdfcreate). Thanks to Robin for the help with Memento in finding leaks. Added StemV to pdf names as it was needed for the font descriptor creation. Fix for pdf_write_document rename to pdf_save_document Add resource_ids to pdf document structure The purpose of this structure will be to allow the search and reuse of resources when we attempt to add new ones to the document. Fix name changes from recent updates pdf_create branch updated to work with recent changes in master Initial use of hash table for resources To avoid adding in the same resource this adds a resource_tables member to pdf_document. The resource_tables structure consists of multiple fz_hash_table entries, one for each resource type. When an attempt is made to search for an existing resource, the table will be initialized in a brute force search for existing resources. Currently this is only set up for the image resources and accessed through pdf_add_image_res. If a match is found, the reference object is returned. If no match is found NULL is returned and the ref object created in pdf_add_image_res is added into the hash table. In this case, a command line such as create -o output.pdf -f F0:font.ttf -i Im0:image.jpg -i Im1:image1.jpg \\ -i Im2:image.jpg contents.txt will avoid the insertion of two copies of image.jpg into the output PDF document. CID Identity-H Font added for handing ttf This adds a method for adding a ttf to a PDF as a CID font with Identity-H mapping and a ToUnicode entry that is created using FT_Get_Char_Index This takes much care in the creation of the ToUnicode CMap to ensure that the minimum number of entries are created in that we try to use beginbfrange as much as possible before using beginbfchar. The code makes sure to limit the number of entries in a group to 100 and to not cross first-byte boundaries for the CID values as described in the Adobe Technical note 5411. Add missing file pdf-resources.c pdf-resources.c was missing and should have been committed earlier. Added to windows project file. Not sure where else it needs to be added for the other platforms. Clean up names and spacing Make sure that the visible functions have the proper namespace (e.g. pdf_xxxx) Also make sure we have a blank line prior to comment. Be consistent with static function naming in pdf_resources.c pdfwrite make use of image resource fz_hash_table The pdfwrite device now shares the structure that stores the resource images for pdfcreate. With this fix, pdfwrite now avoids duplicating the writing of the same images that are shared across multiple pages. Add missing file pdf-resources.c Initial work toward having pdfwrite use Identity-H Type0 encoding for fonts Finish of CID type0 Identity-H font for pdfwrite This adds in the proper widths which may have been stored in the source font in the width table (parsed from the W entry in the pdf file) or if the free type structure has its own cmap then we can get the width from free type. Widths are restructured into format described in 5.6.3 of PDF spec. Fix issue from conflict merging and multiple define of structure Clean up warnings and make mutool create use simple font
2016-01-08pdf: Add function to look up the page for a named destination.Tor Andersson
2016-01-05Separate pdf_drop_annots (that drops lists) and fz_drop_annot.Tor Andersson
2016-01-05Remove fz_page argument from fz_annot function calls.Tor Andersson
2015-04-03Pages created with pdf_create_page were duff.Robin Watts
They hadn't been updated with recent changes. Extract the pdf page creation code from pdf_load_page into a new static function, pdf_new_page, and use that from both places.
2015-04-03Bug 694713: Avoid assert when using pdf_page_writeRobin Watts
When writing a pdf page, we pass page->contents to pdf_new_pdf_device. This object is assumed to be a dictionary (stream) that can be updated with the Length and stream contents once the page writing process has completed. When we are rewriting a pdf page however, this can go wrong; page->contents can be an array of objects. Not only this, in general it would be possible for several pages to share the same page contents (or maybe some of the elements of a page contents array). Updating one page should not update the others. We therefore update pdf_page_write to always create a new page->contents object and use that. Thanks to Michael Cadilhac for spotting the basic problem here.
2015-03-24Rework handling of PDF names for speed and memory.Robin Watts
Currently, every PDF name is allocated in a pdf_obj structure, and comparisons are done using strcmp. Given that we can predict most of the PDF names we'll use in a given file, this seems wasteful. The pdf_obj type is opaque outside the pdf-object.c file, so we can abuse it slightly without anyone outside knowing. We collect a sorted list of names used in PDF (resources/pdf/names.txt), and we add a utility (namedump) that preprocesses this into 2 header files. The first (include/mupdf/pdf/pdf-names-table.h, included as part of include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx" entries. These are pdf_obj *'s that callers can use to mean "A PDF object that means literal name 'xxxx'" The second (source/pdf/pdf-name-impl.h) is a C array of names. We therefore update the code so that rather than passing "xxxx" to functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to pdf_dict_get(...)). This is a fairly natural (if widespread) change. The pdf_dict_getp (and sibling) functions that take a path (e.g. "foo/bar/baz") are therefore supplemented with equivalents that take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar, PDF_NAME_baz, NULL)). The actual implementation of this relies on the fact that small pointer values are never valid values. For a given pdf_obj *p, if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal entry in the name table. This enables us to do fast pointer compares and to skip expensive strcmps. Also, bring "null", "true" and "false" into the same style as PDF names. Rather than using full pdf_obj structures for null/true/false, use special pointer values just above the PDF_NAME_ table. This saves memory and makes comparisons easier.
2015-02-17Fix memory leak in fz_page.Tor Andersson
2015-02-17Add ctx parameter and remove embedded contexts for API regularity.Tor Andersson
Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2015-02-17Rename fz_close_* and fz_free_* to fz_drop_*.Tor Andersson
Rename fz_close to fz_drop_stream. Rename fz_close_archive to fz_drop_archive. Rename fz_close_output to fz_drop_output. Rename fz_free_* to fz_drop_*. Rename pdf_free_* to pdf_drop_*. Rename xps_free_* to xps_drop_*.
2014-10-28more liberally accept page trees with unexpected contentSimon Bünzli
pdf_lookup_page_loc_imp currently throws if any object in the page tree is neither a /Pages node nor a /Page leaf. This unnecessarily rejects slightly broken documents such as the ones from https://code.google.com/p/sumatrapdf/issues/detail?id=2582 and https://code.google.com/p/sumatrapdf/issues/detail?id=2608 . pdf_count_pages_before_kid currently wrongly throws if a /Pages node doesn't contain any kids and correctly states so (which even seems to be permitted by the PDF specification).
2014-07-18report more cases of blending use for pdf_pageSimon Bünzli
pdf_page::transparency is supposed to indicate whether a page uses PDF transparency features. The checks aren't complete, though, which is relevant for devices which require additional handling for transparency (such as SumatraPDF's gdiplus_device). See https://code.google.com/p/sumatrapdf/issues/detail?id=2107 and https://code.google.com/p/sumatrapdf/issues/detail?id=2540 for example documents.
2014-04-11Invalidate cached page count value when inserting and deleting pages.Tor Andersson
2014-02-10Bug 695021: Fix pdf_insert_page operation with empty page tree.Robin Watts
Patch from Thomas Fach-Pedersen to fix the operation of pdf_insert_page when called with an empty page tree. Many thanks! As noted in the code with a FIXME this currently throws an error. Also, cope with being told to add a page "at" INT_MAX as meaning to add it at the end of the document. Possibly this code should cope with a Root without a Pages entry, or a Pages without a Kids too, but we can fix this in future if it ever becomes a problem.