MuPDF from JavaScript
The 'mutool run' command executes a JavaScript program, which has access to most of the features of the MuPDF library.
The command supports ECMAScript 5 syntax in strict mode.
All of the MuPDF constructors and functions live in the global object, and the command line arguments are accessible
from the global 'argv' object.
mutool run script.js [ arguments ... ]
If invoked without any arguments, it will drop you into an interactive REPL (read-eval-print-loop).
On the interactive prompt, if you prefix a line with an equal ('=') character it will automatically print the result
of the line.
Example scripts
Create and edit PDF documents:
Graphics and the device interface:
Advanced examples:
JavaScript Shell
Several global functions that are common for command line shells are available:
- gc(report)
- Run the garbage collector to free up memory. Optionally report statistics on the garbage collection.
- load(fileName)
- Load and execute script in 'fileName'.
- print(...)
- Print arguments to stdout, separated by spaces and followed by a newline.
- quit()
- Exit the shell.
- read(fileName)
- Read the contents of a file and return them as a UTF-8 decoded string.
- readline()
- Read one line of input from stdin and return it as a string.
- require(module)
- Load a JavaScript module.
- write(...)
- Print arguments to stdout, separated by spaces.
Buffer
The Buffer objects are used for working with binary data.
They can be used much like arrays, but are much more efficient since they
only store bytes.
- new Buffer()
- Create a new empty buffer.
- readFile(fileName)
- Create a new buffer with the contents of a file.
- Buffer#length
- The number of bytes in the buffer.
- Buffer#[n]
- Read/write the byte at index 'n'. Will throw exceptions on out of bounds accesses.
- Buffer#writeByte(b)
- Append a single byte to the end of the buffer.
- Buffer#writeRune(c)
- Encode a unicode character as UTF-8 and append to the end of the buffer.
- Buffer#writeLine(...)
- Append arguments to the end of the buffer, separated by spaces, ending with a newline.
- Buffer#write(...)
- Append arguments to the end of the buffer, separated by spaces.
- Buffer#writeBuffer(data)
- Append the contents of the 'data' buffer to the end of the buffer.
- Buffer#save(fileName)
- Write the contents of the buffer to a file.
Matrices and Rectangles
Matrices are simply 6-element arrays representing a 3-by-3 transformation matrix as
/ a b 0 \
| c d 0 |
\ e f 1 /
This matrix is represented in JavaScript as [a,b,c,d,e,f]
.
- Identity
- The identity matrix, short hand for
[1,0,0,1,0,0]
.
- Scale(sx, sy)
- Return a scaling matrix, short hand for
[sx,0,0,sy,0,0]
.
- Translate(tx, ty)
- Return a translation matrix, short hand for
[1,0,0,1,tx,ty]
.
- Concat(a, b)
- Concatenate matrixes a and b. Bear in mind that matrix multiplication is not commutative.
Rectangles are 4-element arrays, specifying the minimum and maximum corners (typically
upper left and lower right, in a coordinate space with the origin at the top left with
descending y): [ulx,uly,lrx,lry]
.
If the minimum x coordinate is bigger than the maximum x coordinate, MuPDF treats the rectangle
as infinite in size.
Document and Page
MuPDF can open many document types (PDF, XPS, CBZ, EPUB, FB2 and a handful of image formats).
- new Document(fileName)
- Open the named document.
- Document#needsPassword()
- Returns true if a password is required to open this password protected PDF.
- Document#authenticatePassword(password)
- Returns true if the password matches.
- Document#getMetaData(key)
- Return various meta data information. The common keys are: "format", "encryption", "info:Author", and "info:Title".
- Document#toPDF()
- Returns a PDFDocument (see below) or null if the document is not a PDF.
- Document#layout(pageWidth, pageHeight, fontSize)
- Layout a reflowable document (EPUB, FB2, or XHTML) to fit the specified page and font size.
- Document#countPages()
- Count the number of pages in the document. This may change if you call the layout function with different parameters.
- Document#loadPage(number)
- Returns a Page object for the given page number. Page number zero (0) is the first page in the document.
- Page#bound()
- Returns a rectangle containing the page dimensions.
- Page#run(device, transform)
- Calls device functions for all the contents on the page, using the specified transform matrix.
The device can be one of the built-in devices or a JavaScript object with methods for the device calls.
- Page#toPixmap(transform, colorspace)
- Render the page into a Pixmap, using the transform and colorspace.
- Page#toDisplayList()
- Record the contents on the page into a DisplayList.
- Page#search(needle)
- Search for 'needle' text on the page, and return an array with rectangles of all matches found.
ColorSpace
- DeviceGray
- The default grayscale colorspace.
- DeviceRGB
- The default RGB colorspace.
- DeviceBGR
- The default RGB colorspace, but with components in reverse order.
- DeviceCMYK
- The default CMYK colorspace.
- ColorSpace#getNumberOfComponents()
- A grayscale colorspace has one component, RGB has 3, CMYK has 4, and DeviceN may have any number of components.
Pixmap
A Pixmap object contains a color raster image (short for pixel map).
The components in a pixel in the pixmap are all byte values, with the transparency as the last component.
A pixmap also has a location (x, y) in addition to its size; so that they can easily be used to represent
tiles of a page.
- new Pixmap(colorspace, bounds, alpha)
- Create a new pixmap. The pixel data is not initialized; and will contain garbage.
- Pixmap#clear(value)
- Clear the pixels to the specifed value. Pass 255 for white, or undefined for transparent.
- Pixmap#bound()
- Return the pixmap bounds.
- Pixmap#getWidth()
- Pixmap#getHeight()
- Pixmap#getNumberOfComponents()
- Number of colors; plus one if an alpha channel is present.
- Pixmap#getAlpha()
- True if alpha channel is present.
- Pixmap#getStride()
- Number of bytes per row.
- Pixmap#getColorSpace()
- Pixmap#getXResolution()
- Pixmap#getYResolution()
- Image resolution in dots per inch.
- Pixmap#getSample(x, y, k)
- Get the value of component k at position x, y (relative to the image origin: 0, 0 is the top left pixel).
- Pixmap#saveAsPNG(fileName, saveAlpha)
- Save the pixmap as a PNG. Only works for Gray and RGB images.
DrawDevice
The DrawDevice can be used to render to a Pixmap; either by running a Page with it or by calling its methods directly.
- new DrawDevice(pixmap)
- Create a device for drawing into a pixmap. The pixmap bounds used should match the transformed page bounds,
or you can adjust them to only draw a part of the page.
DisplayList and DisplayListDevice
A display list records all the device calls for playback later.
If you want to run a page through several devices, or run it multiple times for any other reason,
recording the page to a display list and replaying the display list may be a performance gain
since then you can avoid reinterpreting the page each time. Be aware though, that a display list
will keep all the graphics required in memory, so will increase the amount of memory required.
- new DisplayList()
- Create an empty display list.
- DisplayList#run(device, transform)
- Play back the recorded device calls onto the device.
- DisplayList#toPixmap(transform, colorspace, alpha)
- Render display list to a pixmap. If alpha is true, it will render to a transparent background, otherwise white.
- new DisplayListDevice(displayList)
- Create a device for recording onto a display list.
Device
All built-in devices have the methods listed below. Any function that accepts a device will also
accept a JavaScript object with the same methods. Any missing methods are simply ignored, so you
only need to create methods for the device calls you care about.
Many of the methods take graphics objects as arguments: Path, Text, Image and Shade.
The stroking state is a dictionary with keys for:
- startCap, dashCap, endCap:
- "Butt", "Round", "Square", or "Triangle".
- lineCap:
- Set startCap, dashCap, and endCap all at once.
- lineJoin:
- "Miter", "Round", "Bevel", or "MiterXPS".
- lineWidth:
- Thickness of the line.
- miterLimit:
- Maximum ratio of the miter length to line width, before beveling the join instead.
- dashPhase:
- Starting offset for dash pattern.
- dashes:
- Array of on/off dash lengths.
Colors are specified as arrays with the appropriate number of components for the color space.
The methods that clip graphics must be balanced with a corresponding popClip.
- Device#fillPath(path, evenOdd, transform, colorspace, color, alpha)
- Device#strokePath(path, stroke, transform)
- Device#clipPath(path, evenOdd, transform, colorspace, color, alpha)
- Device#clipStrokePath(path, stroke, transform)
- Fill/stroke/clip a path.
- Device#fillText(text, transform, colorspace, color, alpha)
- Device#strokeText(text, stroke, transform, colorspace, color, alpha)
- Device#clipText(text, transform)
- Device#clipStrokeText(text, stroke, transform)
- Fill/stroke/clip a text object.
- Device#ignoreText(text, transform)
- Invisible text that can be searched but should not be visible, such as for overlaying a scanned OCR image.
- Device#fillShade(shade, transform, alpha)
- Fill a shade (a.k.a. gradient). TODO: this details of gradient fills are not exposed to JavaScript yet.
- Device#fillImage(shade, transform, alpha)
- Draw an image. An image always fills a unit rectangle [0,0,1,1], so must be transformed to be placed and drawn at the appropriate size.
- Device#fillImageMask(shade, transform, colorspace, color, alpha)
- An image mask is an image without color. Fill with the color where the image is opaque.
- Device#clipImageMask(shade, transform)
- Clip graphics using the image to mask the areas to be drawn.
- Device#beginMask(area, luminosity, backdropColorspace, backdropColor)
- Device#endMask()
- Create a soft mask. Any drawing commands between beginMask and endMask are grouped and used as a clip mask.
If luminosity is true, the mask is derived from the luminosity (grayscale value) of the graphics drawn;
otherwise the color is ignored completely and the mask is derived from the alpha of the group.
- Device#popClip()
- Pop the clip mask installed by the last clipping operation.
- Device#beginGroup(area, isolated, knockout, blendmode, alpha)
- Device#endGroup()
- Push/pop a transparency blending group. Blendmode is one of the standard PDF blend modes: "Normal", "Multiply", "Screen", etc. See the PDF reference for details on isolated and knockout.
- Device#beginTile(areaRect, viewRect, xStep, yStep, transform)
- Device#endTile()
- Draw a tiling pattern. Any drawing commands between beginTile and endTile are grouped and then repeated across the whole page.
Apply a clip mask to restrict the pattern to the desired shape.
- Device#close()
- Tell the device that we are done, and flush any pending output.
Path
A Path object represents vector graphics as drawn by a pen. A path can be either stroked or filled, or used as a clip mask.
- new Path()
- Create a new empty path.
- Path#moveTo(x, y)
- Lift and move the pen to the coordinate.
- Path#lineTo(x, y)
- Draw a line to the coordinate.
- Path#curveTo(x1, y1, x2, y2, x3, y3)
- Draw a cubic bezier curve to (x3,y3) using (x1,y1) and (x2,y2) as control points.
- Path#closePath()
- Close the path by drawing a line to the last moveTo.
- Path#rect(x1, y1, x2, y2)
- Shorthand for moveTo, lineTo, lineTo, lineTo, closePath to draw a rectangle.
- Path#walk(pathWalker)
- Call moveTo, lineTo, curveTo and closePath methods on the pathWalker to replay the path.
Text
A Text object contains text.
- new Text()
- Create a new empty text object.
- Text#showGlyph(font, transform, glyph, unicode, wmode)
- Add a glyph to the text object. Transform is the text matrix, specifying font size and glyph location. For example:
[size,0,0,-size,x,y]
.
Glyph and unicode may be -1 for n-to-m cluster mappings.
For example, the "fi" ligature would be added in two steps: first the glyph for the 'fi' ligature and the unicode value for 'f';
then glyph -1 and the unicode value for 'i'.
WMode is 0 for horizontal writing, and 1 for vertical writing.
- Text#showString(font, transform, string)
- Add a simple string to the text object. Will do font substitution if the font does not have all the unicode characters required.
- Text#walk(textWalker)
- Call showGlyph on textWalker for each glyph in the text object.
Font
Font objects can be created from TrueType, OpenType, Type1 or CFF fonts.
In PDF there are also special Type3 fonts.
- new Font(fontName or fileName)
- Create a new font, either using a built-in font name or a filename.
The built-in fonts are: Times-Roman, Times-Italic, Times-Bold, Times-BoldItalic,
Helvetica, Helvetica-Oblique, Helvetica-Bold, Helvetica-BoldOblique,
Courier, Courier-Oblique, Courier-Bold, Courier-BoldOblique,
Symbol, and ZapfDingbats.
- Font#getName()
- Get the font name.
- Font#encodeCharacter(unicode)
- Get the glyph index for a unicode character. Glyph zero (.notdef) is returned if the font does not have a glyph for the character.
- Font#advanceGlyph(glyph, wmode)
- Return advance width for a glyph in either horizontal or vertical writing mode.
Image
Image objects are similar to Pixmaps, but can contain compressed data.
- new Image(pixmap or fileName)
- Create a new image from a pixmap data, or load an image from a file.
- Image#getWidth()
- Image#getHeight()
- Image size in pixels.
- Image#getXResolution()
- Image#getYResolution()
- Image resolution in dots per inch.
- Image#getColorSpace()
- Image#toPixmap(scaledWidth, scaledHeight)
- Create a pixmap from the image. The scaledWidth and scaledHeight arguments are optional,
but may be used to decode a down-scaled pixmap.
Document Writer
Document writer objects are used to create new documents in several formats.
- new DocumentWriter(filename, format, options)
- Create a new document writer to create a document with the specified format and output options.
If format is null it is inferred from the filename suffix. The options argument is a comma separated list
of flags and key-value pairs. See below for more details.
- DocumentWriter#beginPage(mediabox, transform)
- Begin rendering a new page. Returns a Device that can be used to render the page graphics.
The transform argument must be an empty array which is set to the transform matrix to be
used with the device functions.
- DocumentWriter#endPage(device)
- Finish the page rendering.
The argument must be the same device object that was returned by the beginPage method.
- DocumentWriter#close()
- Finish the document and flush any pending output.
The current output formats supported are CBZ and PDF.
The CBZ output options are:
- resolution=N
- Render each page to an image at N pixels per inch.
The PDF output options are:
- linearize
- optimize for web browsers.
- garbage
- remove unused objects.
- garbage=compact
- Remove unused objects, and compact cross reference table.
- garbage=deduplicate
- Remove unused objects, compact cross reference table, and remove duplicate objects.
- pretty
- Pretty-print objects with indentation.
- ascii
- ASCII hex encode binary streams.
- compress-fonts
- Compress embedded fonts.
- compress-images
- Compress images.
- compress
- Compress all streams.
- decompress
- Decompress all streams (except when compress-fonts or compress-images).
- sanitize
- Clean up graphics commands in content streams.
PDFDocument and PDFObject
With MuPDF it is also possible to create, edit and manipulate PDF documents
using low level access to the objects and streams contained in a PDF file.
- new PDFDocument()
- Create a new empty PDF document.
- new PDFDocument(fileName)
- Load a PDF document from file.
- Document#toPDF()
- Get access to the raw PDFDocument from a Document; returns null if the document is not a PDF.
- PDFDocument#toDocument()
- Cast the PDF document to a Document.
- PDFDocument#save(fileName, options)
- Write the PDF document to file.
The write options are a string of comma separated options (see the document writer options).
PDF Object Access
A PDF document contains objects, similar to those in JavaScript: arrays, dictionaries, strings, booleans, and numbers.
At the root of the PDF document is the trailer object; which contains pointers to the meta data dictionary and the
catalog object which contains the pages and other information.
Pointers in PDF are also called indirect references,
and are of the form "32 0 R" (where 32 is the object number, 0 is the generation, and R is magic syntax).
All functions in MuPDF dereference indirect references automatically.
PDF has two types of strings: /Names and (Strings). All dictionary keys are names.
Some dictionaries in PDF also have attached binary data. These are called streams, and may be compressed.
- PDFDocument#getTrailer()
- The trailer dictionary. This contains indirect references to the Root and Info dictionaries.
- PDFDocument#countObjects()
- Return the number of objects in the PDF. Object number 0 is reserved, and may not be used for anything.
- PDFDocument#createObject()
- Allocate a new numbered object in the PDF, and return an indirect reference to it.
The object itself is uninitialized.
- PDFDocument#deleteObject(obj)
- Delete the object referred to by the indirect reference.
PDFObjects are always bound to the document that created them.
Do NOT mix and match objects from one document with another document!
- PDFDocument#addObject(obj)
- Add 'obj' to the PDF as a numbered object, and return an indirect reference to it.
- PDFDocument#addStream(buffer)
- Create a stream object with the contents of 'buffer', add it to the PDF, and return an indirect reference to it.
- PDFDocument#newNull()
- PDFDocument#newBoolean(boolean)
- PDFDocument#newInteger(number)
- PDFDocument#newReal(number)
- PDFDocument#newString(string)
- PDFDocument#newName(string)
- PDFDocument#newIndirect(objectNumber, generation)
- PDFDocument#newArray()
- PDFDocument#newDictionary()
The following functions can be used to copy objects from one document to another:
- PDFDocument#graftObject(sourceDocument, object, sourceGraftMap)
- Deep copy an object into the destination document. The graft map may be null, but should be used if you
are copying several objects from the same source document using multiple calls to graftObject.
- PDFDocument#newGraftMap()
- Create a graft map for the source document, so that objects that have already been copied can be found again.
All functions that take PDF objects, do automatic translation between JavaScript objects
and PDF objects using a few basic rules. Null, booleans, and numbers are translated directly.
JavaScript strings are translated to PDF names, unless they are surrounded by parentheses:
"Foo" becomes the PDF name /Foo and "(Foo)" becomes the PDF string (Foo).
Arrays and dictionaries are recursively translated to PDF arrays and dictionaries.
Be aware of cycles though! The translation does NOT cope with cyclic references!
The translation goes both ways: PDF dictionaries and arrays can be accessed similarly
to JavaScript objects and arrays by getting and setting their properties.
- PDFObject#get(key or index)
- PDFObject#put(key or index, value)
- PDFObject#delete(key or index)
- Access dictionaries and arrays. Dictionaries and arrays can also be accessed using normal property syntax: obj.Foo = 42; delete obj.Foo; x = obj[5].
- PDFObject#resolve()
- If the object is an indirect reference, return the object it points to; otherwise return the object itself.
- PDFObject#isArray()
- PDFObject#isDictionary()
- PDFObject#forEach(function(key,value){...})
- Iterate over all the entries in a dictionary or array and call fun for each key-value pair.
The only way to access a stream is via an indirect object, since all streams
are numbered objects.
- PDFObject#isIndirect()
- Is the object an indirect reference.
- PDFObject#toIndirect()
- Return the object number the indirect reference points to.
- PDFObject#isStream()
- True if the object is an indirect reference pointing to a stream.
- PDFObject#readStream()
- Read the contents of the stream object into a Buffer.
- PDFObject#readRawStream()
- Read the raw, uncompressed, contents of the stream object into a Buffer.
- PDFObject#writeObject(obj)
- Update the object the indirect reference points to.
- PDFObject#writeStream(buffer)
- Update the contents of the stream the indirect reference points to.
This will update the Length, Filter and DecodeParms automatically.
- PDFObject#writeRawStream(buffer)
- Update the raw, uncompressed, contents of the stream the indirect reference points to.
This will update the Length automatically, but leave the Filter and DecodeParms untouched.
PDF Page Access
All page objects are structured into a page tree, which defines the order the pages appear in.
- PDFDocument#countPages()
- Number of pages in the document.
- PDFDocument#findPage(number)
- Return the page object for a page number. The first page is number zero.
- PDFDocument#deletePage(number)
- Delete the numbered page.
- PDFDocument#insertPage(at, page)
- Insert the page object in the page tree at the location. If 'at' is -1, at the end of the document.
Pages consist of a content stream, and a resource dictionary containing all of the fonts and images used.
- PDFDocument#addPage(mediabox, rotate, resources, contents)
- Create a new page object. Note: this function does NOT add it to the page tree.
- PDFDocument#addSimpleFont(font)
- Create a PDF object from the Font object as a WinAnsiEncoding encoded simple font.
- PDFDocument#addFont(font)
- Create a PDF object from the Font object as an Identity-H encoded CID font.
- PDFDocument#addImage(image)
- Create a PDF object from the Image object.
TODO
There are several areas in MuPDF that still need bindings to access from JavaScript:
- Shadings
- PDFDocument#graftObject()
- PDFWriteDevice
- DocumentWriter
Copyright © 2016 Artifex Software