From c88941abc6f0fe91a41dc35dcaa1874d4de2c429 Mon Sep 17 00:00:00 2001 From: Tor Andersson Date: Fri, 7 Apr 2017 16:18:53 +0200 Subject: Organize docs into HTML files. --- docs/coding-progressive.html | 378 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 378 insertions(+) create mode 100644 docs/coding-progressive.html (limited to 'docs/coding-progressive.html') diff --git a/docs/coding-progressive.html b/docs/coding-progressive.html new file mode 100644 index 00000000..fb9ea78d --- /dev/null +++ b/docs/coding-progressive.html @@ -0,0 +1,378 @@ + + + +MuPDF Progressive Loading + + + + + + +
+

MuPDF Progressive Loading

+
+ + + +
+ +

+How to do progressive loading with MuPDF. + +

What is progressive loading?

+ + +

+The idea of progressive loading is that as you download a PDF file +into a browser, you can display the pages as they appear. + +

+MuPDF can make use of 2 different mechanisms to achieve this. The +first relies on the file being "linearized", the second relies on +the caller of MuPDF having fine control over the http fetch and on +the server supporting byte-range fetches. + +

+For optimum performance a file should be both linearized and be +available over a byte-range supporting link, but benefits can still +be had with either one of these alone. + +

Progressive download using "linearized" files

+ +

+Adobe defines "linearized" PDFs as being ones that have both a +specific layout of objects and a small amount of extra +information to help avoid seeking within a file. The stated aim +is to deliver the first page of a document in advance of the whole +document downloading, whereupon subsequent pages will become +available. Adobe also refers to these as "Optimized for fast web +view" or "Web Optimized". + +

+In fact, the standard outlines (poorly) a mechanism by which 'hints' +can be included that enable the subsequent pages to be found within +the file too. Unfortunately this is very poorly supported with +many tools, and so the hints have to be treated with suspicion. + +

+MuPDF will attempt to use hints if they are available, but will also +use a linear search of the file to discover pages if not. This means +that the first page will be displayed quickly, and then subsequent +ones will appear with 'incomplete' renderings that improve over time +as more and more resources are gradually delivered. + +

+Essentially the file starts with a slightly modified header, and the +first object in the file is a special one (the linearization object) +that a) indicates that the file is linearized, and b) gives some +useful information (like the number of pages in the file etc). + +

+This object is then followed by all the objects required for the +first page, then the "hint stream", then sets of object for each +subsequent page in turn, then shared objects required for those +pages, then various other random things. + +

+[Yes, really. While page 1 is sent with all the objects that it +uses, shared or otherwise, subsequent pages do not get shared +resources until after all the unshared page objects have been +sent.] + +

The Hint Stream

+ +

+Adobe intended Hint Stream to be useful to facilitate the display +of subsequent pages, but it has never used it. Consequently you +can't trust people to write it properly - indeed Adobe outputs +something that doesn't quite conform to the spec. + +

+Consequently very few people actually use it. MuPDF will use it +after sanity checking the values, and should cope with illegal/ +incorrect values. + +

So how does MuPDF handle progressive loading?

+ +

+MuPDF has made various extensions to its mechanisms for handling +progressive loading. + +

+ +

Progressive loading using byte range requests

+ +

+If the caller has control over the http fetch, then it is possible +to use byte range requests to fetch the document 'out of order'. +This enables non-linearized files to be progressively displayed as +they download, and fetches complete renderings of pages earlier than +would otherwise be the case. This process requires no changes within +MuPDF itself, but rather in the way the progressive stream learns +from the attempts MuPDF makes to fetch data. + +

+Consider for example, an attempt to fetch a hypothetical file from +a server. + +

+ +
+ + + + + -- cgit v1.2.3