FITZ DISPLAY TREE DESIGN DOCUMENT -------------------------------------------------------------------- The Fitz world is a set of resources. There are many kinds of resources. Resources can depend on other resources, but no circular dependencies. The resource types are: tree, font, image, shade, and colorspace. A document is a sequence of tree resources that define the contents of its pages. A front-end is a producer of Fitz worlds, that reads a file and creates a Fitz world from its contents. Abstracting this into a high-level interface is useful primarily for viewers and converters. A back-end is a consumer of Fitz worlds. The default rasterizer is one back-end. PDF writers and other output drivers too. I don't think there should be a special interface for these. They are just functions or programs that take the Fitz world and do something unspecified with it. The resource API is what I need help fleshing out. Keep in mind that a Fitz world should be able to be serialized to disk, so having callbacks and other hooks into the front-end is a no no. -------------------------------------------------------------------- TREE The Fitz tree resource is the primary data structure. A tree consists of nodes. There are leaf nodes and branch nodes. Leaf nodes produce images, branch nodes combine or change images. An image here is a two-dimensional region of color and shape (alpha). The display tree consists of three basic node types. Some nodes provide shape, other nodes provide color, and some nodes combine other nodes. LEAF NODES Leaf nodes provide only shape and/or color. SOLID. A constant color or shape that stretches out to infinity. IMAGE. A rectangular region of color and or shape derived a from a grid of samples. Outside this rectangle is transparent. References an image resource. SHADE. A mesh of patches that defines a region of interpolated colors. Outside the mesh is transparent. References a shade resource. PATH. A path defines only shape. Moveto, lineto, curveto and closepath. Stroke, fill, even-odd-fill. Stroke parameters (line width, line caps, line joins) Dash pattern. TEXT. A text node is an optimization, efficiently combining a transform matrix and references to glyph shapes. Text nodes define only shape. Text nodes have a b c d matrix coefficients and a reference to a font, then an array of tuples with a glyph index and e and f coefficients. Text nodes may also have a separate unicode array, associating the (glyph,e,f) tuples with unicode character codes. One-to-many and many-to-one mappings are allowed. This is for text extractions, search and copy. BRANCH NODES TRANSFORM. Transform nodes apply an affine transform matrix to its one and only child. OVER. An over node stacks its children on top of each other, combining their colors with a blending mode. MASK. A mask node has two children. The second child is masked with the first child, multiplying their shapes. This causes the effect of clipping. (mask (path ...) (solid 'devicegray 0)) BLEND. This does the magic of PDF 1.4 transparency. The isolated and non-isolated and knockout group stuff happens here. It also sets the blend mode for its children. LINK. This is a dummy node that grafts in another tree resource. The effect is as if the other tree was copied here instead. This is used for XObject forms and similar cases. References a tree resource. TILE. Similar to a link node, but this node will tile its child to cover the page. -------------------------------------------------------------------- COLORSPACES. A colorspace needs the capability to transform colors into and out of any other colorspace. All colorspaces resources referenced from the display tree are in the form of ICC input profiles. SHADES. This is fairly simple. A mesh of patches as in PDF. Three levels of detail: with full tensors, with only patches, with linear quads. Axial and radial shadings are trivially converted. Type 1 (functional) shadings need to be sampled. If the backends cannot cope, it can either convert to linear shaded triangles (clip an axial shading with a triangular path) or render to an image. FONTS. There need to be a few types of font resources. For now I am going to use FreeType, but that should not be a necessity. The resource format for fonts should be independent. * Substitute fonts. Refer to another fall-back font resource and override the metrics. * Type 3 fonts. Each glyph is represented as a Fitz tree resource. * CFF fonts Type 1 and Type 1 CID fonts are losslessly convertible to CFF. OpenType fonts can have CFF glyph data. * TrueType fonts An option would be to just use an FT_Face for normal font files and not worry about the exact format. IMAGES. Image data could be chopped into tiles to allow for independent and random access and more CPU-cache friendly image scaling and color transforms. * JPEG encoded images. Save byte+bit offsets to allow random access to groups of eight scanlines. * Monochrome images. Chop into tiles. flate encoding. * Contone images. Chop into tiles. flate encoding. -------------------------------------------------------------------- Future considerations Reduce the number of node types. Over, mask and blend can be combined into one node type. Link and tile nodes can be combined. Remove the transform node type and put a transform matrix in the leaf nodes. If we remove the transform node, the display tree is in essence a very flat display list which only branches for clipping and masking parts of the display list. Representation The display tree can be represented ... As an in-memory tree. As a serialized list of objects, with the structure implicit. As a serialized list of objects, with the structure explicit by push/pop commands. On disk, as a serialized stream of resources, which have to be defined before they are referenced.