summaryrefslogtreecommitdiff
path: root/source/fitz/stext-device.c
AgeCommit message (Collapse)Author
2016-10-07Add ctx to fz_font functions.Robin Watts
2016-10-06Hide internals of fz_colorspaceRobin Watts
The implementation does not need to be in the public API.
2016-10-05Move fz_font definition to be private.Robin Watts
Move the definition of fz_font to be in a private header file rather than in the public API. Add accessors for specific parts of the structure and use them as appropriate. The font flags, and the harfbuzz records remain public. This means that only 3 files now need access to the font implementation (font.c, pdf-font.c and pdf-type3.c). This may be able to be improved further in future.
2016-09-08Add options to control heuristics in structured text.Sebastian Rasmussen
2016-07-13Bug 696699: Fix Text extraction mediabox information.Robin Watts
Since the removal of the begin_page device function, structured text extraction has been unable to correctly establish the mediabox for extracted pages. Update the fz_new_stext_page call to take this mediabox information. This is an API change, but hopefully most people are calling fz_new_stext_page_from_page or fz_new_stext_page_from_display_list which are updated here to cope. Update all the apps/tools to behave properly.
2016-07-08Separate close and drop functionality for devices and writers.Tor Andersson
Closing a device or writer may throw exceptions, but much of the foreign language bindings (JNI and JS) depend on drop to never throw an exception (exceptions in finalizers are bad).
2016-07-07Ignore duplicate character in structured text extraction.Sebastian Rasmussen
2016-06-14stext: Non-initial glyphs in ligatures must set start/stop.Sebastian Rasmussen
Normal glyphs and inital glyphs in ligatures have their start/stop (p and q) set before determining whether to append to an existing span or insert a space. For non-initial glyphs the start/stop were never set which introduced uninitialized values into the span data structure. Now, all glyphs have their start/stop set and then if it is a non-initial glyph in a ligature the append and space detection is ignored. This means that no values are uninitialized.
2016-04-27Add fz_close_device function.Tor Andersson
Garbage collected languages need a way to signal that they are done with a device other than freeing it. Call it implicitly on fz_drop_device; so take care not to call it again in case it has been explicitly called already.
2016-04-06Split encoded ligatures (as from PDF) properly in text extraction.Tor Andersson
2016-04-05Handle many-to-one and many-to-many clusters in structured text extraction.Tor Andersson
2016-03-14Remove begin_page and end_page device calls.Tor Andersson
To be moved into a new document writer interface later.
2016-02-24Add fz_show_string function and move wmode argument to end.Tor Andersson
2016-02-24Add optional scissor hint argument to text clipping functions.Tor Andersson
2016-02-22Drop const from fz_image.Tor Andersson
Image objects are immutable and opaque once constructed. Therefore there is no need for the const keyword.
2016-01-21Drop const from fz_colorspace.Tor Andersson
It's an opaque immutable structure, that we don't expect to ever want to change after creation. Therefore the const keyword is not useful, and is only line noise.
2016-01-20Tidy bidirectional source.Robin Watts
Make the import follow mupdf style (better, if not perfect). Use ucdn where possible to avoid duplicating tables. Shrink the types, make them explicit (e.g. use fz_bidi_level rather than int) and make tables const. Use 32-bit integers for text.
2016-01-13Add lots of consts.Robin Watts
In general, we should use 'const fz_blah' in device calls whenever the callee should not alter the fz_blah. Push this through. This shows up various places where we fz_keep and fz_drop these const things. I've updated the fz_keep and fz_drops with appropriate casts to remove the consts. We may need to do the union dance to avoid the consts for some compilers, but will only do that if required. I think this is nicer overall, even allowing for the const<->no const problems.
2015-12-11Remove text clip accumulation.Tor Andersson
We can now group all clipped text into one fz_text object and simplify the device interface.
2015-12-11Keep spans of multiple fonts and sizes in one fz_text object.Tor Andersson
2015-12-11Rename structured text structs and functions to 'stext'.Tor Andersson
Less risk of confusion with the text type used in the device interface.
2015-08-24Move ucdn.h into public headers.Tor Andersson
2015-07-20Fix leak during text extraction.Robin Watts
MuPDF (the win32/linux viewer) leaks a span_soup each time it is run, even if (seemingly to the user) no text extraction operations are done. This is because the view does a text extraction pass silently, during which 'begin_page' is called for both page contents and annotation contents. This causes a leak of a span_soup. Change the implementation to allocate the span_soup just in time instead.
2015-04-07Fix some warnings.Tor Andersson
2015-04-07Fix structured text extraction in vertical mode.Robin Watts
When advancing a glyph in vertical mode, it should advance down the page. The origin of the glyph as supplied is bottom left, not top right - allow for this in calculations. Previously glyphs were not being collated into spans because of this.
2015-04-07Structured text extraction; improve glyph bounding box calculations.Robin Watts
In vertical motion mode, when calculating bboxes we should use horizontal rather vertical displacements from the 'axis of movement'. In horizontal mode, we displace by 'ascender' and 'descender'. Those concepts don't rotate with the motion mode, so repurpose those fields to hold bbox.x0 and bbox.x1 in vertical mode.
2015-04-07Use fz_advance_glyph rather than direct FT calls during PDF layout.Robin Watts
2015-02-25Text device; collect matrix and bbox for images too.Robin Watts
We were not filling in the matrix and bbox fields for images collected as part of the text extraction device. Fixed here.
2015-02-20Do not crash on text extraction on pages with no text.Robin Watts
Thanks to malc for pointing out the problem.
2015-02-17Use embedded superclass struct instead of user pointer in devices.Tor Andersson
2015-02-17Add ctx parameter and remove embedded contexts for API regularity.Tor Andersson
Purge several embedded contexts: Remove embedded context in fz_output. Remove embedded context in fz_stream. Remove embedded context in fz_device. Remove fz_rebind_stream (since it is no longer necessary). Remove embedded context in svg_device. Remove embedded context in XML parser. Add ctx argument to fz_document functions. Remove embedded context in fz_document. Remove embedded context in pdf_document. Remove embedded context in pdf_obj. Make fz_page independent of fz_document in the interface. We shouldn't need to pass the document to all functions handling a page. If a page is tied to the source document, it's redundant; otherwise it's just pointless. Fix reference counting oddity in fz_new_image_from_pixmap.
2015-02-17Rename fz_close_* and fz_free_* to fz_drop_*.Tor Andersson
Rename fz_close to fz_drop_stream. Rename fz_close_archive to fz_drop_archive. Rename fz_close_output to fz_drop_output. Rename fz_free_* to fz_drop_*. Rename pdf_free_* to pdf_drop_*. Rename xps_free_* to xps_drop_*.
2013-09-13Fix various compile warnings spotted by the cluster.Robin Watts
2013-06-20Rearrange source files.Tor Andersson