Another PDF Xref speedup from Malc.

Following the recent change to hold pdf xrefs in their native 'sparse' representation, searching the xref takes longer. Malc has investigated this slowdown and found that it can be largely avoided by not searching the xref lists first. A modified version of his first patch has gone in already (getting us from 10x slower to just 5x slower). This commit is a modified version of a second patch from him. Again it works by avoiding searching the xref list twice. The original version of this patch 1) appears broken to me, as it could return the wrong xref entry when object streams have more than one object in them, and 2) supposedly gets the speed back to the original 'pre-sparse change' speed. I have updated the patch to fix 1), and I hope this should not affect 2). I am slightly suspicious that removing a search can get us a 5x speed increase, but certainly this is an improvemnet. There is scope for us further reducing the search times, by us using a new table to map object number -> xref number, but unless we find a case where we are noticably slower than before, I think we can ignore this.
author: Robin Watts <robin.watts@artifex.com> 2015-01-05 11:36:35 +0000
committer: Robin Watts <robin.watts@artifex.com> 2015-01-05 13:52:08 +0000
commit: e460246943441078ff28dd413a19ee6f186c3764 (patch)
tree: 031c30e248e0c438af35be6bbfc74f4e2f5aaf80
parent: 15d7a79f92e3799d3db0b2b1c733635701da04d9 (diff)
download: mupdf-e460246943441078ff28dd413a19ee6f186c3764.tar.xz
1 files changed, 9 insertions, 4 deletions
diff --git a/source/pdf/pdf-xref.c b/source/pdf/pdf-xref.c
index 897d512f..0444085f 100644
--- a/source/pdf/pdf-xref.c
+++ b/source/pdf/pdf-xref.c
@@ -1494,8 +1494,8 @@ pdf_print_xref(pdf_document *doc)
  * compressed object streams
  */
 
-static void
-pdf_load_obj_stm(pdf_document *doc, int num, int gen, pdf_lexbuf *buf)
+static pdf_xref_entry *
+pdf_load_obj_stm(pdf_document *doc, int num, int gen, pdf_lexbuf *buf, int target)
 {
 	fz_stream *stm = NULL;
 	pdf_obj *objstm = NULL;
@@ -1508,6 +1508,7 @@ pdf_load_obj_stm(pdf_document *doc, int num, int gen, pdf_lexbuf *buf)
 	int i;
 	pdf_token tok;
 	fz_context *ctx = doc->ctx;
+	pdf_xref_entry *ret_entry = NULL;
 
 	fz_var(numbuf);
 	fz_var(ofsbuf);
@@ -1577,6 +1578,8 @@ pdf_load_obj_stm(pdf_document *doc, int num, int gen, pdf_lexbuf *buf)
 					pdf_drop_obj(obj);
 				} else
 					entry->obj = obj;
+				if (numbuf[i] == target)
+					ret_entry = entry;
 			}
 			else
 			{
@@ -1595,6 +1598,7 @@ pdf_load_obj_stm(pdf_document *doc, int num, int gen, pdf_lexbuf *buf)
 	{
 		fz_rethrow_message(ctx, "cannot open object stream (%d %d R)", num, gen);
 	}
+	return ret_entry;
 }
 
 /*
@@ -1905,13 +1909,14 @@ object_updated:
 		{
 			fz_try(ctx)
 			{
-				pdf_load_obj_stm(doc, x->ofs, 0, &doc->lexbuf.base);
+				x = pdf_load_obj_stm(doc, x->ofs, 0, &doc->lexbuf.base, num);
 			}
 			fz_catch(ctx)
 			{
 				fz_rethrow_message(ctx, "cannot load object stream containing object (%d %d R)", num, gen);
 			}
-			x = pdf_get_xref_entry(doc, num);
+			if (x == NULL)
+				fz_throw(ctx, FZ_ERROR_GENERIC, "cannot load object stream containing object (%d %d R)", num, gen);
 			if (!x->obj)
 				fz_throw(ctx, FZ_ERROR_GENERIC, "object (%d %d R) was not found in its object stream", num, gen);
 		}
author	Robin Watts <robin.watts@artifex.com>	2015-01-05 11:36:35 +0000
committer	Robin Watts <robin.watts@artifex.com>	2015-01-05 13:52:08 +0000
commit	e460246943441078ff28dd413a19ee6f186c3764 (patch)
tree	031c30e248e0c438af35be6bbfc74f4e2f5aaf80
parent	15d7a79f92e3799d3db0b2b1c733635701da04d9 (diff)
download	mupdf-e460246943441078ff28dd413a19ee6f186c3764.tar.xz