mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2016-05-24	fz_pixmap revamp: add stride and make alpha optional	Robin Watts
	fz_pixmaps now have an explicit stride value. By default no change from before, but code all copes with extra gaps at the end of the line. The alpha data in fz_pixmaps is no longer compulsory. mudraw: use rgb not rgba (ppmraw), cmyk not cmyka (pkmraw). Update halftone code to not expect alpha plane. Update PNG writing to cope with alpha less input. Also hide repeated params within the png output context. ARM code needs updating.
2016-04-28	Fix JPX breakage caused during refactor.	Robin Watts
	I was using fz_compressed_image when I should have been using fz_pixmap_image.
2016-04-28	Refactor fz_image code cases.	Robin Watts
	Split compressed images (images based on a compressed buffer) and pixmap images (images based on a pixmap) out into separate subclasses.
2016-04-28	Tweak fz_image in preparation for things to come.	Robin Watts
	Move from ints to bits where possible.
2016-04-28	Partial image decode.	Robin Watts
	Update the core fz_get_pixmap_from_image code to allow fetching a subarea of a pixmap. We pass in the required subarea, together with the transformation matrix for the whole image. On return, we have a pixmap at least as big as was requested, and the transformation matrix is updated to map the supplied area to the correct place on the screen. The draw device is updated to use this as required. Everywhere else passes NULLs in, and so gets unchanged behaviour. The standard 'get_pixmap' function has been updated to decode just the required areas of the bitmaps. This means that banded rendering of pages will decode just the image subareas that are required for each band, limiting the memory use. The downside to this is that each band will redecode the image again to extract just the section we want. The image subareas are put into the fz_store in the same way as full images. Currently image areas in the store are only matched when they match exactly; subareas are not identified as being able to use existing images.
2016-03-01	Rename pdf_new_ref to pdf_add_object.	Tor Andersson

2016-02-29	Rename pdf_add_simple_font_res and friends.	Tor Andersson

2016-02-29	Remove pdf_res struct. Use pdf_obj indirect references directly.	Tor Andersson
	Fix refcounting bugs.
2016-02-29	Rename some functions.	Tor Andersson
	Remove void* typecasts.
2016-02-29	Add mutool create tool, and PDF font and image resource creation.	Michael Vrhel
	Initial framework for creating pdfs This adds a create option to mutool for us to use in working on the API for creating content as well as adding content to existing documents. mutool create: Get page sizes and add them Start the parsing of the contents.txt file which may have multiple page information. Add the pages at the proper sizes. Further work on mutool create_pdf Remove the calls that were being made to the pdf-write device. Clean up several issues with the reading of the page contents. Get the content streams for each page associated with the page->contents Temp. created a pdf_create_page_contents procedure. I will merge this with pdf_create_page as there is significant overlap. Next is to add in the font and image resources and indirect references. Include pdfcreate in build Merge pdf_create_page_contents and pdf_create_page Add support for images in pdfcreate This adds images to the pdf document using a function stolen from pdf-device (send_image). This was renamed pdf_add_image_res and added to pdf-image. Down the road, send-image will be removed. Prior to that, I need to work on making sure that multiple copies of the same image do not end up in the document. Code was also added to create the page resources to point to the proper image in the document. Next fonts will be added in a similar manner, then I will work on computing the md5 sums of image and fonts to ensure only one copy ends up in the document. Then pdf-write will be reworked to use the same code as opposed to its current list of md5 sums that are stored in a device structure. mutool pdfcreate: support for WinAnsiEncoded fonts Added support for very simple fonts (WinAnsiEncoding). Methods added in pdf-font.c. Added first_width and last_width to fz_font_s and stem_v to pdf_font_desc_s. Ran code through memento with simple test of 4 page document creation including an image and a font. Fixed several leaks as well as buffer corruption issues (main changes in pdfcreate). Thanks to Robin for the help with Memento in finding leaks. Added StemV to pdf names as it was needed for the font descriptor creation. Fix for pdf_write_document rename to pdf_save_document Add resource_ids to pdf document structure The purpose of this structure will be to allow the search and reuse of resources when we attempt to add new ones to the document. Fix name changes from recent updates pdf_create branch updated to work with recent changes in master Initial use of hash table for resources To avoid adding in the same resource this adds a resource_tables member to pdf_document. The resource_tables structure consists of multiple fz_hash_table entries, one for each resource type. When an attempt is made to search for an existing resource, the table will be initialized in a brute force search for existing resources. Currently this is only set up for the image resources and accessed through pdf_add_image_res. If a match is found, the reference object is returned. If no match is found NULL is returned and the ref object created in pdf_add_image_res is added into the hash table. In this case, a command line such as create -o output.pdf -f F0:font.ttf -i Im0:image.jpg -i Im1:image1.jpg \\ -i Im2:image.jpg contents.txt will avoid the insertion of two copies of image.jpg into the output PDF document. CID Identity-H Font added for handing ttf This adds a method for adding a ttf to a PDF as a CID font with Identity-H mapping and a ToUnicode entry that is created using FT_Get_Char_Index This takes much care in the creation of the ToUnicode CMap to ensure that the minimum number of entries are created in that we try to use beginbfrange as much as possible before using beginbfchar. The code makes sure to limit the number of entries in a group to 100 and to not cross first-byte boundaries for the CID values as described in the Adobe Technical note 5411. Add missing file pdf-resources.c pdf-resources.c was missing and should have been committed earlier. Added to windows project file. Not sure where else it needs to be added for the other platforms. Clean up names and spacing Make sure that the visible functions have the proper namespace (e.g. pdf_xxxx) Also make sure we have a blank line prior to comment. Be consistent with static function naming in pdf_resources.c pdfwrite make use of image resource fz_hash_table The pdfwrite device now shares the structure that stores the resource images for pdfcreate. With this fix, pdfwrite now avoids duplicating the writing of the same images that are shared across multiple pages. Add missing file pdf-resources.c Initial work toward having pdfwrite use Identity-H Type0 encoding for fonts Finish of CID type0 Identity-H font for pdfwrite This adds in the proper widths which may have been stored in the source font in the width table (parsed from the W entry in the pdf file) or if the free type structure has its own cmap then we can get the width from free type. Widths are restructured into format described in 5.6.3 of PDF spec. Fix issue from conflict merging and multiple define of structure Clean up warnings and make mutool create use simple font
2016-02-22	Remove pointless casts from void*.	Tor Andersson
	Extraneous explicit type casts can mask errors, especially if a function prototype or return value changes in the future.
2015-09-01	Default to invert_cmyk_jpeg for all formats other than PDF.	Tor Andersson

2015-03-24	Rework handling of PDF names for speed and memory.	Robin Watts
	Currently, every PDF name is allocated in a pdf_obj structure, and comparisons are done using strcmp. Given that we can predict most of the PDF names we'll use in a given file, this seems wasteful. The pdf_obj type is opaque outside the pdf-object.c file, so we can abuse it slightly without anyone outside knowing. We collect a sorted list of names used in PDF (resources/pdf/names.txt), and we add a utility (namedump) that preprocesses this into 2 header files. The first (include/mupdf/pdf/pdf-names-table.h, included as part of include/mupdf/pdf/object.h), defines a set of "PDF_NAME_xxxx" entries. These are pdf_obj 's that callers can use to mean "A PDF object that means literal name 'xxxx'" The second (source/pdf/pdf-name-impl.h) is a C array of names. We therefore update the code so that rather than passing "xxxx" to functions (such as pdf_dict_gets(...)) we now pass PDF_NAME_xxxx (to pdf_dict_get(...)). This is a fairly natural (if widespread) change. The pdf_dict_getp (and sibling) functions that take a path (e.g. "foo/bar/baz") are therefore supplemented with equivalents that take a list (pdf_dict_getl(... , PDF_NAME_foo, PDF_NAME_bar, PDF_NAME_baz, NULL)). The actual implementation of this relies on the fact that small pointer values are never valid values. For a given pdf_obj p, if NULL < (intptr_t)p < PDF_NAME__LIMIT then p is a literal entry in the name table. This enables us to do fast pointer compares and to skip expensive strcmps. Also, bring "null", "true" and "false" into the same style as PDF names. Rather than using full pdf_obj structures for null/true/false, use special pointer values just above the PDF_NAME_ table. This saves memory and makes comparisons easier.
2015-02-17	Add ctx parameter and remove embedded contexts for API regularity.	Tor Andersson
	Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2015-02-17	Rename fz_close_* and fz_free_* to fz_drop_*.	Tor Andersson
	Rename fz_close to fz_drop_stream. Rename fz_close_archive to fz_drop_archive. Rename fz_close_output to fz_drop_output. Rename fz_free_* to fz_drop_. Rename pdf_free_ to pdf_drop_. Rename xps_free_ to xps_drop_*.
2014-05-27	Fix 693517: Support /SMask/Matte preblended images.	Tor Andersson

2014-05-06	Fix 694909: revert "Force colorspaces to match with JPX images." and ...	Tor Andersson
	... instead convert a JPEG2000 used as a soft mask into grayscale. This is more robust than trusting the PDF specified colorspace over the internal JPX colorspace. The spec implies that in a colorspace conflict, the internal JPX colorspace should be used. The PDF colorspace may be a DeviceN or Separation colorspace. DeviceN and Separation colorspaces are not valid destination colorspaces, so we may not always be able to convert the internal JPX colorspace into the PDF specified colorspace. Converting from the internal colorspace into grayscale is more robust, and solves the issue that the original commit was intended to fix.
2014-03-18	Fix operator buffering of inline images.	Robin Watts
	Previously pdf_process buffer did not understand inline images. In order to make this work without needlessly duplicating complex code from within pdf-op-run, the parsing of inline images has been moved to happen in pdf-interpret.c. When the op_table entry for BI is called it now expects the inline image to be in csi->img and the dictionary object to be in csi->obj. To make this work, we have had to improve the handling of inline images in general. While non-inline images have been loaded and held in memory in their compressed form and only decoded when required, until now we have always loaded and decoded inline images immediately. This has been due to the difficulty in knowing how many bytes of data to read from the stream - we know the length of the stream once uncompressed, but relating this to the compressed length is hard. To cure this we introduce a new type of filter stream, a 'leecher'. We insert a leecher stream before we build the filters required to decode the image. We then read and discard the appropriate number of uncompressed bytes from the filters. This pulls the compressed data through the leecher stream, which stores it in an fz_buffer. Thus images are now always held in their compressed forms in memory. The pdf-op-run implementation is now trivial. The only real complexity in the pdf-op-buffer implementation is the need to ensure that the /Filter entry in the dictionary object matches the exact point at which we backstopped the decompression.
2014-01-16	fix memory leaks in pdf_load_jpx and fz_new_image_from_pixmap	Simon Bünzli
	fz_new_image_from_pixmap expects that the pixmap's colorspace has two references which is contrary to expectations. If it instead addrefs the pixmap's colorspace, the only caller pdf_load_jpx can consistently drop the colorspace after passing it to fz_load_jpx. Also, if the contract is that whatever is passed into fz_new_image_from_pixmap belongs to the new image, then the pixmap also has to be dropped on error so that it isn't leaked.
2014-01-06	fix various MSVC warnings	Simon Bünzli
	Some warnings we'd like to enable for MuPDF and still be able to compile it with warnings as errors using MSVC (2008 to 2013): * C4115: 'timeval' : named type definition in parentheses * C4204: nonstandard extension used : non-constant aggregate initializer * C4295: 'hex' : array is too small to include a terminating null character * C4389: '==' : signed/unsigned mismatch * C4702: unreachable code * C4706: assignment within conditional expression Also, globally disable C4701 which is frequently caused by MSVC not being able to correctly figure out fz_try/fz_catch code flow. And don't define isnan for VS2013 and later where that's no longer needed.
2014-01-02	Cull code unused as a result of the "tolerate inline images..." fix.	Robin Watts
	Remove code that's not used any more as a result of the previous fix, plus some code that was unused anyway.
2013-09-13	Fix various compile warnings spotted by the cluster.	Robin Watts

2013-09-10	correctly set indexed colors in pdf_set_color	Simon Bünzli
	Required for 1879_-_Indexed_colors_wrongly_converted.pdf Also, removing broken code in the same place (where mat->v[] is overwritten right after being set in the Lab* case).
2013-08-28	fix memory leaks	Simon Bünzli
	* If fz_alpha_from_gray throws in fz_render_t3_glyph, then glyph is leaked. * If fz_new_image throws in pdf_load_image_imp, then colorspace and mask are leaked. * pdf_copy_pattern_gstate overwrites font and softmask without dropping them first.
2013-06-25	Rid the world of "pdf_document *xref".	Robin Watts
	For historical reasons lots of the code uses "xref" when talking about a pdf document. Now pdf_xref is a separate type this has become confusing, so replace 'xref' with 'doc' for clarity.
2013-06-20	Rearrange source files.	Tor Andersson