summaryrefslogtreecommitdiff
path: root/source/pdf/pdf-op-run.c
AgeCommit message (Collapse)Author
2016-03-11Implement fz_text_language support functions.Robin Watts
Add code to convert to and from fz_text_language codes from ISO 639 language strings. No validation is carried out.
2016-03-11Rejig Bidirectional and Text code.Robin Watts
We move to using bidirectional "levels" throughout. This should give us better behaviour vis-a-vis nested l2r/l2r text. This also allows us to carry xps levels throughout with no loss of information. This also avoids the need to special case numbers. We accordingly carry more information into fz_text. As well as wmode, also hold additional details about the text spans. We now include the directionality of the bidi level text (either as derived from bidi code, or from the original document (e.g. xps)), the directionality of text (as specified in the original document (e.g. html)), and the language of the text (if specified in the original document).
2016-02-24Add fz_show_string function and move wmode argument to end.Tor Andersson
2016-02-24Add optional scissor hint argument to text clipping functions.Tor Andersson
2016-02-24Clarify scissor argument to clip device functions.Tor Andersson
The scissor argument is an optional (potentially NULL) rectangle that can give hints to devices about the area that can be scissored. This is used by the draw device and display list device to minimize the size of temporary clip mask buffers. The scissor rectangle, if used, must have been transformed by the current transform matrix.
2016-02-22Rename fz_add_text to fz_show_glyph.Tor Andersson
Match naming of fz_moveto/lineto etc for paths.
2015-12-11Remove text clip accumulation.Tor Andersson
We can now group all clipped text into one fz_text object and simplify the device interface.
2015-12-11Keep spans of multiple fonts and sizes in one fz_text object.Tor Andersson
2015-10-19Fix 695582: add knockout group for text that is both stroke and fill.Tor Andersson
2015-10-15Fix 696241: fix bracketing of begin_group/end_group for text objects.Tor Andersson
2015-10-14pdf: Flush text if content stream processing is cut short by errors.Tor Andersson
2015-03-25Bug 695885: Avoid too many pop clips.Robin Watts
When making a new pdf_run_processor to handle type 3 glyph contents, we can inherit the current gstate. Do NOT inherit the current clip depth, as otherwise we pop too many on exit.
2015-03-24Path rework for improved memory usage.Robin Watts
Firstly, we make the definition of the path structures local to path.c. This is achieved by using an fz_path_processor function to step through paths enumerating each section using callback functions. Next, we extend the internal path representation to include other section types, including quads, beziers with common control points rectangles, horizontal, vertical and degenerate lines. We also roll close path sections up into the previous sections commands. The hairiest part of this is that fz_transform_path has to cope with changing the path commands depending on the matrix. This is a relatively rare operation though.
2015-03-24Don't pass interpreter context to pdf_processor opcode callbacks.Tor Andersson
Update buffer and filter processors. Filter both colors and stroke states. Move OCG hiding logic into interpreter.
2015-02-24Bug 695843: Tweak bboxes of type3 fonts; honour the d1 values.Robin Watts
The example file for this bug has an invalid font bbox. The current code uses this bbox (or some multiple of it) to clip the glyphs size. In the new code, when we convert the glyphs to display lists we watch for the bbox given in any d1 operator used. If we find one, we gather the rectangle specified and store it as the glyph rectangle in the fz_font. If we then attempt to bound a glyph that used d1, it happens instantly without needing to run the list. This seems to match acrobats behaviour. Tests indicate that Acrobat never clips d0 glyphs, so our behaviour is still different here, but I am not changing this at the moment. Also, I note that t3flags should be a un unsigned short but are currently just a char. Fix that too. Also fix some missing code in fz_new_font that would cause leaks if mallocs failed.
2015-02-24Simplify/Correct logic in pdf_show_charRobin Watts
Move the logic in pdf_show_char to use the same idiom as used elsewhere. Specifically this ensures that empty rects are handled correctly.
2015-02-17Add ctx parameter and remove embedded contexts for API regularity.Tor Andersson
Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2015-02-17Reference count fz_path and fz_text.Tor Andersson
Disallow modification of shared fz_path and fz_text objects. They should follow a create once, consume often pattern, and as such should be immutable once created.
2015-01-20don't omit patterns with huge xstep/ystep valuesSimon Bünzli
If a pattern is expected to be rendered exactly once and its relevant part covers the target area, the xstep and ystep values may be far larger than the pattern's relevant content. Due to rounding applied in pdf_show_pattern, such patterns have been omitted so far. This issue is exposed e.g. by the document linked from http://forums.fofou.org/sumatrapdf/topic?id=3184639 .
2014-07-18allow to extract text for uncacheable glyphsSimon Bünzli
Certain glyphs such as found in nested Type 3 font can't be cached. Currently, the text extraction device doesn't see these as they're sent only as drawing operations. Sending them also as invisible text fixes potentially missing letters.
2014-06-16Fix a fatal compiler warning when building with the latest version of the ↵Matt Holgate
Android NDK (security issue because a variable is used as a format string with no parameters).
2014-06-09Bug 695300: Sanitize draw-device stack handling in error cases.Robin Watts
When throwing an error during fz_alpha_from_gray, the stack depth can get confused. Fix this by moving some more code into the appropriate fz_try(). In the course of fixing this bug, I added some new optional debug code to display the stack level as it runs. This is committed here disabled; just change the appropriate #define in draw-device.c to enable it. Also, add some code to run_xobject, to avoid throwing in an fz_always() clause.
2014-05-27Bug 695260: Fix error handling in do_xobjectRobin Watts
Various functions (such as fz_begin_group) handle errors internally by use of the error_depth parameter. This means that if we call them, we MUST ensure that we call the appropriate closing function. Similarly, if we don't call them, we should NOT call the closing function. In order to ensure we do this correctly, we introduce a cleanup_state variable that says which ones we tried to call. This cures the original bug.
2014-05-22Flush pending text on a change of CTM.Robin Watts
Without this, comparefiles/Bug695086 renders the barcode test upside down.
2014-05-13Fix signedness in cmap interface.Tor Andersson
2014-04-23Fix 693391: simplify warning messageTor Andersson
Don't print the code point number, to let the inhibition of multiple identical warnings kick in.
2014-03-25Bug 695089: inherit resources for softmasks (regression fix)Simon Bünzli
2014-03-18Fix operator buffering of inline images.Robin Watts
Previously pdf_process buffer did not understand inline images. In order to make this work without needlessly duplicating complex code from within pdf-op-run, the parsing of inline images has been moved to happen in pdf-interpret.c. When the op_table entry for BI is called it now expects the inline image to be in csi->img and the dictionary object to be in csi->obj. To make this work, we have had to improve the handling of inline images in general. While non-inline images have been loaded and held in memory in their compressed form and only decoded when required, until now we have always loaded and decoded inline images immediately. This has been due to the difficulty in knowing how many bytes of data to read from the stream - we know the length of the stream once uncompressed, but relating this to the compressed length is hard. To cure this we introduce a new type of filter stream, a 'leecher'. We insert a leecher stream before we build the filters required to decode the image. We then read and discard the appropriate number of uncompressed bytes from the filters. This pulls the compressed data through the leecher stream, which stores it in an fz_buffer. Thus images are now always held in their compressed forms in memory. The pdf-op-run implementation is now trivial. The only real complexity in the pdf-op-buffer implementation is the need to ensure that the /Filter entry in the dictionary object matches the exact point at which we backstopped the decompression.
2014-03-17Rework fz_streams.Robin Watts
Currently fz_streams have a 4K buffer within their header. The call to read from a stream fills this buffer, resulting in more data being pulled from any underlying stream than we might like. This causes problems with the forthcoming 'leech' filter. Here we simplify the fields available in the public stream header. No specific buffer is given; simply the read and write pointers. The underlying 'read' function is replaced by a 'next' function that makes the next block of data available and returns the first character of it (or EOF). A caller to the 'next' function should supply the maximum number of bytes that it knows it will need (possibly not now, but eventually). This enables the underlying stream to efficiently decode just enough. The underlying stream is free to return fewer, or a greater number if it wants to. The exact size of the 'block' of data returned will depend on the filter in use and (possibly) the data therein. Callers can get the currently available amount of data by calling fz_available (but again should pass the maximum amount of data they know they will need). The only time this will ever return 0 is if we have hit EOF.
2014-03-17Ensure that BDC operators get both params.Robin Watts
Currently, when parsing, each time we encounter a name, we throw away the last name we had. BDC operators are called with: /Name <object> BDC If the <object> is a name, we lose the original /Name. To fix this, parsing a name when we already have a name will cause the name to be stored as an object. This has various knock on effects throughout the code to read from csi->obj rather than csi->name. Also, ensure that when cleaning, we collect a list of the object names in our new resources dictionary.
2014-03-04Add pdf_process interface.Robin Watts
Currently the only processing we can do of PDF pages is to run them through an fz_device. We introduce new "pdf_process" functionality here to enable us to do more things. We define a pdf_processor structure with a set of function pointers in, one per PDF operator, together with functions for processing xobjects etc. The guts of pdf_run_page_contents and pdf_run_annot operations are then extracted to give pdf_process_page_contents and pdf_process_annot, and the originals implemented in terms of these. This commit contains just one instance of a pdf_processor, namely the "run" processor, which contains the original code refactored. The graphical state (and device pointer) is now part of private data to the run operator set, rather than being in pdf_csi.