mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2013-04-30	Split dev_text into three parts.	Tor Andersson
	One for the raw span extraction pass, one for paragraph sorting, and another for HTML output.
2013-04-30	Move device hint functions to a more appropriate source file.	Tor Andersson

2013-04-29	Bug 693939: Fix memory problems.	Robin Watts
	2 more memory problems pointed out by mhfan - many thanks. In the text device, run through the line height list to it's length, not to it's capacity. In the X11 image code, when copying data unchanged, copy whole ints, not just the first quarter of the bytes.
2013-04-29	Fix various leaks in the dev_text device.	Robin Watts
	Thanks to mhfan for the reports.
2013-04-26	Rename functions for consistency.	Robin Watts
	Rename fz_new_output_buffer to be fz_new_output_with_buffer. Rename fz_new_output_file to be fz_new_output_with_file. This is more consistent with other functions such as fz_new_pixmap_with_data.
2013-04-26	Add image output for HTML.	Robin Watts
	JPEGs and PNGs are left unchanged. Any other image gets stored as a PNG and sent as a data URL.
2013-04-26	Hint enabling/disabling for devices.	Robin Watts
	Add configuration functions to control the hints set on a given device. Use this to set whether image data is captured or not in the text extraction process. Also update the display list device to respect the device hints during playback.
2013-04-25	Tweak fz_text_page to include image records.	Robin Watts
	Extract such records as part of the text device.
2013-04-11	Move pdf_image to fz_image.	Robin Watts
	In order to be able to output images (either in the pdfwrite device or in the html conversion), we need to be able to get to the original compressed data stream (or else we're going to end up recompressing images). To do that, we need to expose all of the contents of pdf_image into fz_image, so it makes sense to just amalgamate the two. This has knock on effects for the creation of indexed colorspaces, requiring some of that logic to be moved. Also, we need to make xps use the same structures; this means pushing PNG and TIFF support into the decoding code. Also we need to be able to load just the headers from PNG/TIFF/JPEGs as xps doesn't include dimension/resolution information. Also, separate out all the fz_image stuff into fitz/res_image.c rather than having it in res_pixmap.
2013-03-26	Reflow: Move from html output using tables to html output using div/span	Robin Watts
	The div/spans still use table style rendering, but it's simpler code (and html) this way.
2013-03-26	Spot indents.	Robin Watts

2013-03-26	Add superscript and subscript handling.	Robin Watts

2013-03-26	Simple dehyphenation support.	Robin Watts

2013-03-26	Text region analysis.	Robin Watts
	Update fz_text_analysis function to look for 'regions'; use this to spot columns etc. Spot columns/width/alignment info. "Intelligently" merge lines based on this. Update html output to make use of this extra information.
2013-03-26	Add simple bullet point detection to paragraph analysis.	Robin Watts
	If a line starts with a recognised unicode bullet char, then split the paragraph there. Don't use this lines separation from the previous line to determine paragraph line step. Also attempt to spot numbered list items (digits or roman numerals). The digits/roman numerals code is disabled by default, as while it worked, later commits made it less useful - but it may be worth reinstating later.
2013-03-26	Rework text extraction structures.	Robin Watts
	Rework the text extraction structures - the broad strokes are similar but we now hold more information at each stage to enable us to perform more detailed analysis on the structure of the page. We now hold: fz_text_char's (the position, ucs value, and style of each char). fz_text_span's (sets of chars that share the same baseline/transform, with no more than an expected amount of whitespace between each char). fz_text_line's (sets of spans that share the same baseline (more or less, allowing for super/subscript, but possibly with a larger than expected amount of whitespace). fz_text_block's (sets of lines that follow one another) After fz_text_analysis is called, we hope to have fz_text_blocks split such that each block is a paragraph. This new implementation has the same restrictions as the current implementation it replaces, namely that chars are only considered for addition onto the most recent span at the moment, but this revised form is designed to allow more easy extension, and for this restriction to be lifted. Also add simple paragraph splitting based on finding the most common 'line distance' in blocks. When we add spans together to collate them into lines, we record the 'horizontal' and 'vertical' spacing between them. (Not actually horizontal or vertical, so much as 'in the direction of writing' and 'perpendicular to the direction of writing'). The 'horizontal' value enables us to more correctly output spaces when converting to (say) html later. The 'vertical' value enables us to spot subscripts and superscripts etc, as well as small changes in the baseline due to style changes. We are careful to base the baseline comparison on the baseline for the line, not the baseline for the previous span, as otherwise superscripts/ subscripts on the end of the line affect what we match next. Also, we are less tolerant of vertical shifts after a large gap. This avoids false positives where different columns just happen to almost line up.
2013-02-11	Fix problem with text selection caused by 0399332d54	Paul Gardiner

2013-02-06	Change to pass structures by reference rather than value.	Robin Watts
	This is faster on ARM in particular. The primary changes involve fz_matrix, fz_rect and fz_bbox. Rather than passing 'fz_rect r' into a function, we now consistently pass 'const fz_rect *r'. Where a rect is passed in and modified, we miss the 'const' off. Where possible, we return the pointer to the modified structure to allow 'chaining' of expressions. The basic upshot of this work is that we do far fewer copies of rectangle/matrix structures, and all the copies we do are explicit. This has opened the way to other optimisations, also performed in this commit. Rather than using expressions like: fz_concat(fz_scale(sx, sy), fz_translate(tx, ty)) we now have fz_pre_{scale,translate,rotate} functions. These can be implemented much more efficiently than doing the fully fledged matrix multiplication that fz_concat requires. We add fz_rect_{min,max} functions to return pointers to the min/max points of a rect. These can be used to in transformations to directly manipulate values. With a little casting in the path transformation code we can avoid more needless copying. We rename fz_widget_bbox to the more consistent fz_bound_widget.
2013-02-06	Tweak text extraction block creation.	Robin Watts
	Better tolerate long horizontal spaces without breaking lines.
2013-02-05	Tweak HTML output.	Robin Watts
	Send blocks as paragraphs, rather than lines. Send lines as spans.
2013-02-04	Add fz_output, and make output functions use it.	Robin Watts
	Various functions in the code output to FILE *, when there are times we'd like them to output to other things, such as fz_buffers. Add an fz_output type, together with fz_printf to allow things to output to this.
2012-11-29	Bug 693463: Various small fixes.	Robin Watts
	Thanks to zeniko for these. Use otf as extension for opentype fonts. fz_clampi should take ints, not floats! Fix typo in prototype. Squash unwanted warning. Remove magic number in favour of #define. Reset generation numbers when renumbering.
2012-07-05	Move to static inline functions from macros.	Robin Watts
	Instead of using macros for min/max/abs/clamp, we move to using inline functions. These are more typesafe, and should produce equivalent code on compilers that support inline (i.e. pretty much everything we care about these days). People can always do their own macro versions if they prefer.
2012-04-05	Fix potential problems on malloc failure.	Robin Watts
	Don't reset the size of arrays until we have successfully resized them.
2012-03-19	Fix typo in text device where lines would group into blocks too eagerly.	Tor Andersson
	The default page userspace transform changed to a top-down coordinate space, and I forgot this detail when updating the text device branch. Also remove the final block sorting pass to give preference to the original PDF text order.
2012-03-19	Don't create empty spans and lines in the text device.	Tor Andersson

2012-03-14	Some fixes to the new text device, courtesy of Zeniko.	Tor Andersson

2012-03-14	Put 'lastchar' into the text device struct to remember what the	Tor Andersson
	last character was across style changes.
2012-03-14	Fix memory leaks in style sheet handling of the new text device.	Tor Andersson

2012-03-13	Make fz_print functions all take a FILE *.	Robin Watts
	Also tidy up the taking of fz_context *'s, and hide an unwanted indent param.
2012-03-13	Fix building on windows.	Robin Watts
	Fix a couple of silly problems (one gccism, and one windows specific bug).
2012-03-13	Rename some functions and accessors to be more consistent.	Tor Andersson
	Debug printing functions: debug -> print. Accessors: get noun attribute -> noun attribute. Find -> lookup when the returned value is not reference counted. pixmap_with_rect -> pixmap_with_bbox. We are reserving the word "find" to mean lookups that give ownership of objects to the caller. Lookup is used in other places where the ownership is not transferred, or simple values are returned. The rename is done by the sed script in scripts/rename3.sed
2012-03-12	Create style sheet and group extracted text into blocks, lines and spans.	Tor Andersson

2012-03-07	More release tidyups.	Robin Watts
	Add some function documentation to fitz.h. Add fz_ prefix to runetochar, chartorune, runelen etc. Change fz_runetochar to avoid passing unnecessary pointer.
2012-03-06	Split fitz.h/mupdf.h into internal/external headers.	Robin Watts
	Attempt to separate public API from internal functions.
2012-02-13	Add locking around freetype calls.	Robin Watts
	We only open one instance of freetype per document. We therefore have to ensure that only 1 call to it takes place at a time. We introduce a lock for this purpose (FZ_LOCK_FREETYPE), and arrange to take/release it as required. We also update the font context so it is properly shared.
2012-02-03	Be consistent about passing a fz_context in path/text/shade functions.	Tor Andersson

2011-12-16	Add fz_malloc_struct, and make code use it.	Robin Watts
	The new fz_malloc_struct(A,B) macro allocates sizeof(B) bytes using fz_malloc, and then passes the resultant pointer to Memento_label to label it with "B". This costs nothing in non-memento builds, but gives much nicer listings of leaked blocks when memento is enabled.
2011-11-25	Merge branch 'master' into context	Robin Watts

2011-11-17	Fix bug 692627: stack overflows in text handling.	Robin Watts
	The existing code uses recursion for text span handling. With sufficiently many chained spans we get stack overflow. Simple fixes to use a loop.
2011-09-21	Add warning context.	Tor Andersson

2011-09-21	Rename malloc functions for arrays (fz_calloc and fz_realloc).	Tor Andersson

2011-09-15	Add context to mupdf.	Robin Watts
	Huge pervasive change to lots of files, adding a context for exception handling and allocation. In time we'll move more statics into there. Also fix some for(i = 0; i < function(...); i++) calls.
2011-04-04	Le Roi est mort, vive le Roi!	Tor Andersson
	The run-together words are dead! Long live the underscores! The postscript inspired naming convention of using all run-together words has served us well, but it is now time for more readable code. In this commit I have also added the sed script, rename.sed, that I used to convert the source. Use it on your patches and application code.
2011-02-23	Remove fthint workaround for DynaLab fonts, since that is now a part of ↵	Tor Andersson
	freetype.
2011-02-18	Make pdfdraw -tt output valid XML.	Tor Andersson

2011-02-08	Use horizontal metrics to create text boxes instead of guessing at bad ↵	Tor Andersson
	vertical values.
2011-02-03	Various patches from SumatraPDF.	Tor Andersson

2011-01-27	Add fz_calloc function to check for integer overflow when allocating arrays, ↵	Tor Andersson
	and change the signature of fz_realloc to match.
2010-07-26	Fix bug where storage capacity of 0 or 1 was not taken care of.	Sebastian Rasmussen