Add mutool create tool, and PDF font and image resource creation.

Initial framework for creating pdfs This adds a create option to mutool for us to use in working on the API for creating content as well as adding content to existing documents. mutool create: Get page sizes and add them Start the parsing of the contents.txt file which may have multiple page information. Add the pages at the proper sizes. Further work on mutool create_pdf Remove the calls that were being made to the pdf-write device. Clean up several issues with the reading of the page contents. Get the content streams for each page associated with the page->contents Temp. created a pdf_create_page_contents procedure. I will merge this with pdf_create_page as there is significant overlap. Next is to add in the font and image resources and indirect references. Include pdfcreate in build Merge pdf_create_page_contents and pdf_create_page Add support for images in pdfcreate This adds images to the pdf document using a function stolen from pdf-device (send_image). This was renamed pdf_add_image_res and added to pdf-image. Down the road, send-image will be removed. Prior to that, I need to work on making sure that multiple copies of the same image do not end up in the document. Code was also added to create the page resources to point to the proper image in the document. Next fonts will be added in a similar manner, then I will work on computing the md5 sums of image and fonts to ensure only one copy ends up in the document. Then pdf-write will be reworked to use the same code as opposed to its current list of md5 sums that are stored in a device structure. mutool pdfcreate: support for WinAnsiEncoded fonts Added support for very simple fonts (WinAnsiEncoding). Methods added in pdf-font.c. Added first_width and last_width to fz_font_s and stem_v to pdf_font_desc_s. Ran code through memento with simple test of 4 page document creation including an image and a font. Fixed several leaks as well as buffer corruption issues (main changes in pdfcreate). Thanks to Robin for the help with Memento in finding leaks. Added StemV to pdf names as it was needed for the font descriptor creation. Fix for pdf_write_document rename to pdf_save_document Add resource_ids to pdf document structure The purpose of this structure will be to allow the search and reuse of resources when we attempt to add new ones to the document. Fix name changes from recent updates pdf_create branch updated to work with recent changes in master Initial use of hash table for resources To avoid adding in the same resource this adds a resource_tables member to pdf_document. The resource_tables structure consists of multiple fz_hash_table entries, one for each resource type. When an attempt is made to search for an existing resource, the table will be initialized in a brute force search for existing resources. Currently this is only set up for the image resources and accessed through pdf_add_image_res. If a match is found, the reference object is returned. If no match is found NULL is returned and the ref object created in pdf_add_image_res is added into the hash table. In this case, a command line such as create -o output.pdf -f F0:font.ttf -i Im0:image.jpg -i Im1:image1.jpg \\ -i Im2:image.jpg contents.txt will avoid the insertion of two copies of image.jpg into the output PDF document. CID Identity-H Font added for handing ttf This adds a method for adding a ttf to a PDF as a CID font with Identity-H mapping and a ToUnicode entry that is created using FT_Get_Char_Index This takes much care in the creation of the ToUnicode CMap to ensure that the minimum number of entries are created in that we try to use beginbfrange as much as possible before using beginbfchar. The code makes sure to limit the number of entries in a group to 100 and to not cross first-byte boundaries for the CID values as described in the Adobe Technical note 5411. Add missing file pdf-resources.c pdf-resources.c was missing and should have been committed earlier. Added to windows project file. Not sure where else it needs to be added for the other platforms. Clean up names and spacing Make sure that the visible functions have the proper namespace (e.g. pdf_xxxx) Also make sure we have a blank line prior to comment. Be consistent with static function naming in pdf_resources.c pdfwrite make use of image resource fz_hash_table The pdfwrite device now shares the structure that stores the resource images for pdfcreate. With this fix, pdfwrite now avoids duplicating the writing of the same images that are shared across multiple pages. Add missing file pdf-resources.c Initial work toward having pdfwrite use Identity-H Type0 encoding for fonts Finish of CID type0 Identity-H font for pdfwrite This adds in the proper widths which may have been stored in the source font in the width table (parsed from the W entry in the pdf file) or if the free type structure has its own cmap then we can get the width from free type. Widths are restructured into format described in 5.6.3 of PDF spec. Fix issue from conflict merging and multiple define of structure Clean up warnings and make mutool create use simple font
author: Michael Vrhel <michael.vrhel@artifex.com> 2015-11-04 15:05:22 -0800
committer: Tor Andersson <tor.andersson@artifex.com> 2016-02-29 15:52:53 +0100
commit: 927d36d58bf0896c2ab8b470f926e1ed1a736561 (patch)
tree: f7a0f4fbfeb9c7620cb46ea3c5bcf72ae785aab4 /source/pdf/pdf-resources.c
parent: bbe65a9910fd5fa6478985351a0fd2690d5a27f5 (diff)
download: mupdf-927d36d58bf0896c2ab8b470f926e1ed1a736561.tar.xz
1 files changed, 295 insertions, 0 deletions
diff --git a/source/pdf/pdf-resources.c b/source/pdf/pdf-resources.c
new file mode 100644
index 00000000..31812b3c
--- /dev/null
+++ b/source/pdf/pdf-resources.c
@@ -0,0 +1,295 @@
+#include "mupdf/pdf.h"
+
+static void
+res_table_free(fz_context *ctx, pdf_res_table *table)
+{
+	int i, n;
+	pdf_res *res;
+
+	if (table == NULL)
+		return;
+	if (table->hash != NULL)
+	{
+		n = fz_hash_len(ctx, table->hash);
+		for (i = 0; i < n; i++)
+		{
+			void *v = fz_hash_get_val(ctx, table->hash, i);
+			if (v)
+			{
+				res = (pdf_res*)v;
+				pdf_drop_obj(ctx, res->obj);
+				fz_free(ctx, res);
+			}
+		}
+		fz_drop_hash(ctx, table->hash);
+	}
+	fz_free(ctx, table);
+}
+
+static void
+res_image_get_md5(fz_context *ctx, fz_image *image, unsigned char *digest)
+{
+	fz_pixmap *pixmap = NULL;
+	int n, size;
+	fz_buffer *buffer = NULL;
+	fz_md5 state;
+
+	fz_var(pixmap);
+	fz_var(buffer);
+
+	fz_try(ctx)
+	{
+		pixmap = fz_get_pixmap_from_image(ctx, image, 0, 0);
+		n = (pixmap->n == 1 ? 1 : pixmap->n - 1);
+		size = image->w * image->h * n;
+		buffer = fz_new_buffer(ctx, size);
+		buffer->len = size;
+		if (pixmap->n == 1)
+		{
+			memcpy(buffer->data, pixmap->samples, size);
+		}
+		else
+		{
+			/* Need to remove the alpha plane */
+			unsigned char *d = buffer->data;
+			unsigned char *s = pixmap->samples;
+			int mod = n;
+			while (size--)
+			{
+				*d++ = *s++;
+				mod--;
+				if (mod == 0)
+					s++, mod = n;
+			}
+		}
+		fz_md5_init(&state);
+		fz_md5_update(&state, buffer->data, buffer->len);
+		fz_md5_final(&state, digest);
+	}
+	fz_always(ctx)
+	{
+		fz_drop_pixmap(ctx, pixmap);
+		fz_drop_buffer(ctx, buffer);
+	}
+	fz_catch(ctx)
+	{
+		fz_rethrow_message(ctx, "image md5 calculation failed");
+	}
+}
+
+/* Image specific methods */
+static void
+res_image_init(fz_context *ctx, pdf_document *doc, pdf_res_table *table)
+{
+	int len, k;
+	pdf_obj *obj;
+	pdf_obj *type;
+	pdf_res *res = NULL;
+	fz_image *image = NULL;
+	unsigned char digest[16];
+	int num = 0;
+
+	fz_var(obj);
+	fz_var(image);
+	fz_var(res);
+
+	fz_try(ctx)
+	{
+		table->hash = fz_new_hash_table(ctx, 4096, 16, -1);
+		len = pdf_count_objects(ctx, doc);
+		for (k = 1; k < len; k++)
+		{
+			obj = pdf_load_object(ctx, doc, k, 0);
+			type = pdf_dict_get(ctx, obj, PDF_NAME_Subtype);
+			if (pdf_name_eq(ctx, type, PDF_NAME_Image))
+			{
+				image = pdf_load_image(ctx, doc, obj);
+				res_image_get_md5(ctx, image, digest);
+				fz_drop_image(ctx, image);
+				image = NULL;
+
+				/* Don't allow overwrites. Number the resources for pdfwrite */
+				if (fz_hash_find(ctx, table->hash, (void *)digest) == NULL)
+				{
+					res = fz_malloc(ctx, sizeof(pdf_res));
+					res->num = num;
+					res->obj = obj;
+					num = num + 1;
+					fz_hash_insert(ctx, table->hash, (void *)digest, obj);
+				}
+			}
+			else
+			{
+				pdf_drop_obj(ctx, obj);
+			}
+			obj = NULL;
+		}
+	}
+	fz_always(ctx)
+	{
+		table->count = num;
+		fz_drop_image(ctx, image);
+		pdf_drop_obj(ctx, obj);
+	}
+	fz_catch(ctx)
+	{
+		res_table_free(ctx, table);
+		fz_rethrow_message(ctx, "image resources table failed to initialize");
+	}
+}
+
+static void*
+res_image_search(fz_context *ctx, pdf_document *doc, pdf_res_table *table, void *item,
+	void *md5)
+{
+	unsigned char digest[16];
+
+	fz_image *image = (fz_image*)item;
+	fz_hash_table *hash = table->hash;
+	pdf_res *res;
+
+	if (hash == NULL)
+		res_image_init(ctx, doc, doc->resources->image);
+	hash = doc->resources->image->hash;
+
+	/* Create md5 and see if we have the item in our table */
+	res_image_get_md5(ctx, image, digest);
+	res = fz_hash_find(ctx, hash, (void*)digest);
+
+	/* Return the digest value so that we can avoid having to recompute it when
+	 * we come back to add the new resource reference */
+	if (res == NULL)
+		memcpy(md5, digest, 16);
+	else
+		pdf_keep_obj(ctx, res->obj);
+	return (void*) res;
+}
+
+/* Font specific methods */
+
+/* We do need to come up with an effective way to see what is already in the
+ * file to avoid adding to what is already there. This is avoided for pdfwrite
+ * as we check as we add each font.  For adding text to an existing file though
+ * it may be more problematic */
+static void
+res_font_init(fz_context *ctx, pdf_document *doc, pdf_res_table *table)
+{
+	table->hash = fz_new_hash_table(ctx, 4096, 16, -1);
+}
+
+static void
+res_font_get_md5(fz_context *ctx, fz_buffer *buffer, unsigned char *digest)
+{
+	fz_md5 state;
+
+	fz_md5_init(&state);
+	fz_md5_update(&state, buffer->data, buffer->len);
+	fz_md5_final(&state, digest);
+}
+
+static void*
+res_font_search(fz_context *ctx, pdf_document *doc, pdf_res_table *table, void *item,
+	void *md5)
+{
+	unsigned char digest[16];
+	fz_buffer *buffer = (fz_buffer*)item;
+	fz_hash_table *hash = table->hash;
+	pdf_res *res;
+
+	if (hash == NULL)
+		res_font_init(ctx, doc, doc->resources->font);
+	hash = doc->resources->font->hash;
+
+	/* Create md5 and see if we have the item in our table */
+	res_font_get_md5(ctx, buffer, digest);
+	res = fz_hash_find(ctx, hash, (void*)digest);
+
+	/* Return the digest value so that we can avoid having to recompute it when
+	 * we come back to add the new resource reference */
+	if (res == NULL)
+		memcpy(md5, digest, 16);
+	else
+		pdf_keep_obj(ctx, res->obj);
+	return (void*)res;
+}
+
+/* Accessible methods */
+void*
+pdf_resource_table_search(fz_context *ctx, pdf_document *doc, pdf_res_table *table,
+	void *item, void *md5)
+{
+	return table->search(ctx, doc, table, item, md5);
+}
+
+void*
+pdf_resource_table_put(fz_context *ctx, pdf_res_table *table, void *key, pdf_obj *obj)
+{
+	void *result;
+	pdf_res *res = NULL;
+
+	fz_var(res);
+
+	fz_try(ctx)
+	{
+		res = fz_malloc(ctx, sizeof(pdf_res));
+		res->num = table->count + 1;
+		res->obj = obj;
+		result = fz_hash_insert(ctx, table->hash, key, (void*)res);
+		if (result != NULL)
+		{
+			fz_free(ctx, res);
+			fz_warn(ctx, "warning: hash already present");
+		}
+		else
+		{
+			table->count = table->count + 1;
+			pdf_keep_obj(ctx, obj);
+			result = res;
+		}
+	}
+	fz_catch(ctx)
+	{
+		fz_free(ctx, res);
+		fz_rethrow(ctx);
+	}
+	return result;
+}
+
+void
+pdf_resource_table_free(fz_context *ctx, pdf_document *doc)
+{
+	if (doc->resources == NULL)
+		return;
+	res_table_free(ctx, doc->resources->color);
+	res_table_free(ctx, doc->resources->font);
+	res_table_free(ctx, doc->resources->image);
+	res_table_free(ctx, doc->resources->pattern);
+	res_table_free(ctx, doc->resources->shading);
+	fz_free(ctx, doc->resources);
+	doc->resources = NULL;
+}
+
+void
+pdf_resource_table_init(fz_context *ctx, pdf_document *doc)
+{
+	fz_var(doc);
+	fz_try(ctx)
+	{
+		doc->resources = fz_calloc(ctx, 1, sizeof(pdf_resource_tables));
+		doc->resources->image = fz_calloc(ctx, 1, sizeof(pdf_res_table));
+		doc->resources->image->search = res_image_search;
+		doc->resources->font = fz_calloc(ctx, 1, sizeof(pdf_res_table));
+		doc->resources->font->search = res_font_search;
+	}
+	fz_catch(ctx)
+	{
+		if (doc->resources != NULL)
+		{
+			fz_free(ctx, doc->resources->color);
+			fz_free(ctx, doc->resources->font);
+			fz_free(ctx, doc->resources);
+			doc->resources = NULL;
+		}
+		fz_rethrow_message(ctx, "resources failed to allocate");
+	}
+}
author	Michael Vrhel <michael.vrhel@artifex.com>	2015-11-04 15:05:22 -0800
committer	Tor Andersson <tor.andersson@artifex.com>	2016-02-29 15:52:53 +0100
commit	927d36d58bf0896c2ab8b470f926e1ed1a736561 (patch)
tree	f7a0f4fbfeb9c7620cb46ea3c5bcf72ae785aab4 /source/pdf/pdf-resources.c
parent	bbe65a9910fd5fa6478985351a0fd2690d5a27f5 (diff)
download	mupdf-927d36d58bf0896c2ab8b470f926e1ed1a736561.tar.xz