mupdf - MuPDF PDF reader and library

Age	Commit message (Collapse)	Author
2013-05-03	Simple Image file format recogniser	Robin Watts
	Now can open jpeg/png/tiff files within mupdf.
2013-04-30	Move fz_normalize_vector into base_geometry.c	Tor Andersson

2013-04-30	Split dev_text into three parts.	Tor Andersson
	One for the raw span extraction pass, one for paragraph sorting, and another for HTML output.
2013-04-30	Move device hint functions to a more appropriate source file.	Tor Andersson

2013-04-29	Fix copyright statements	Sebastian Rasmussen

2013-04-29	Bug 693939: Fix memory problems.	Robin Watts
	2 more memory problems pointed out by mhfan - many thanks. In the text device, run through the line height list to it's length, not to it's capacity. In the X11 image code, when copying data unchanged, copy whole ints, not just the first quarter of the bytes.
2013-04-29	Add Memento build option to Android build. Fix memory leaks.	Robin Watts
	Following up on a report from a potential customer, fix various places in mupdf.c where we were leaking memory (devices not freed, context not properly freed etc). In order to track this down, I added a Memento build - just do: ndk-build MEMENTO=1 when building. This only checks for leaks, not for memory overwrites by default as it uses MEMENTO_LEAKONLY to avoid any possibility of the android kernel killing stuff for being too slow or using too much memory.
2013-04-29	Fix various leaks in the dev_text device.	Robin Watts
	Thanks to mhfan for the reports.
2013-04-26	Rename functions for consistency.	Robin Watts
	Rename fz_new_output_buffer to be fz_new_output_with_buffer. Rename fz_new_output_file to be fz_new_output_with_file. This is more consistent with other functions such as fz_new_pixmap_with_data.
2013-04-26	Fix SNAFU in the store handling.	Robin Watts
	When the store fills up, the existing code throws away items to make room. Due to a silly oversight (not updating the 'size' after each round of evictions) it keeps throwing away repeatedly until it fails. Fix that here. Should make the store more efficient.
2013-04-26	Multi-threaded store SEGV fixes and debug improvements.	Robin Watts
	Fix race condition in the store. When storing an item, we immediately put it into the hash (thus getting our existence check). We then check and try to free enough space for it in the budget. If we cannot free enough, we remove the item from the hash. The race condition comes if someone else finds it in the hash in the meantime. To fix this, we update all 'finds' of things in the hash to move it to the head of the LRU chain (regardless of whether it was in the chain before or not). We only remove it from the hash in the 'failed-to-fit-in-the-budget' case if it's not in the chain already. Also, we fix a bug in the "failed to fit" removal case where we were failing to realise that the the pos pointer was not valid any more. In the course of tracking this bug down various debug functions were improved. These are committed here too.
2013-04-26	Squash 2 const warnings.	Robin Watts
	Add some more consts's and use void *'s where appropriate.
2013-04-26	Add image output for HTML.	Robin Watts
	JPEGs and PNGs are left unchanged. Any other image gets stored as a PNG and sent as a data URL.
2013-04-26	Hint enabling/disabling for devices.	Robin Watts
	Add configuration functions to control the hints set on a given device. Use this to set whether image data is captured or not in the text extraction process. Also update the display list device to respect the device hints during playback.
2013-04-25	Generalise fz_write_png to fz_output_pixmap_to_png	Robin Watts
	Extract the core of fz_write_png so that it can work to an fz_output * rather than a FILE *. fz_write_png continues to work as before, but now we can output to buffer to.
2013-04-25	Add fz_write method for output streams.	Robin Watts

2013-04-25	Tweak fz_text_page to include image records.	Robin Watts
	Extract such records as part of the text device.
2013-04-22	Fix various multi-threading problems with the store.	Robin Watts
	When resizing the hash table, we have a special case to cope with someone else resizing the table before we get a chance to. In this rare situation we were unlocking (regardless of whether we should have been), and failing to relock. Fixed here. When storing an item, I recently changed the code to put the new item into the hash before ensuring that we had enough space. This change was motivated by us wanting not to evict to make room only to find that we didn't need the room as there was a duplicate entry anyway. In so doing, this opened up a potential race condition where another thread could 'find' the item from the hash before it had been filled out. To solve this, we move the "filling out" of the item entries earlier in the function. Another problem is found due to the same block of code; as soon as a new item is put into the hash, it can be found elsewhere. Any attempt to manipulate it's linked list will fail. We therefore set all new items with their prev/next pointers pointing to themselves, enabling us to spot this special case and avoid corrupting the linked list.
2013-04-19	When triangulating quads, send edges in consistent order.	Robin Watts
	Due to the underlying implementation, this probably doesn't make a difference. But it's more aesthetically pleasing. Most importantly, add a comment so we know what the tradeoffs are here.
2013-04-19	Rename internal mesh processing functions.	Robin Watts
	Be more consistent. No user visible changes.
2013-04-19	Add new function to return the accurate bbox of a path.	Robin Watts
	As requested by customer 530.
2013-04-19	Optimised mesh bounding functions	Robin Watts
	Don't decompose meshes just to find their bbox.
2013-04-16	Avoid expanding path for stroke twice.	Robin Watts
	fz_bound_path already takes care of stroke expansion - don't apply it twice.
2013-04-16	Mesh painting optimisation; tensor patch color splitting.	Robin Watts
	Only split as many components of colors in the tensor patch as we actually use.
2013-04-16	Avoid more needless copying in mesh processing	Robin Watts
	Apply the same optimisations to mesh type 6 as were just applied to mesh type 7.
2013-04-16	Try to copy less needless information in mesh processing	Robin Watts

2013-04-11	Remove fz_load_jpeg as now unused.	Robin Watts

2013-04-11	Move pdf_image to fz_image.	Robin Watts
	In order to be able to output images (either in the pdfwrite device or in the html conversion), we need to be able to get to the original compressed data stream (or else we're going to end up recompressing images). To do that, we need to expose all of the contents of pdf_image into fz_image, so it makes sense to just amalgamate the two. This has knock on effects for the creation of indexed colorspaces, requiring some of that logic to be moved. Also, we need to make xps use the same structures; this means pushing PNG and TIFF support into the decoding code. Also we need to be able to load just the headers from PNG/TIFF/JPEGs as xps doesn't include dimension/resolution information. Also, separate out all the fz_image stuff into fitz/res_image.c rather than having it in res_pixmap.
2013-03-29	Avoid uncompressing indexed images at load time.	Robin Watts
	This actually turned out to be far easier than I'd feared; remove the explicit check that stopped this working, and ensure that we pass the correct value in for the 'indexed' param. Add a function to check for colorspaces being indexed. Bit nasty that this requires a strcmp...
2013-03-29	Move bpc into fz_image	Robin Watts

2013-03-26	Make pdf_functions public as fz_functions.	Robin Watts
	Implementations remain unexposed, but this means we can safely pass functions in shades without having to 'sample' them (though we may still choose to do this for speed).
2013-03-26	Reflow: Move from html output using tables to html output using div/span	Robin Watts
	The div/spans still use table style rendering, but it's simpler code (and html) this way.
2013-03-26	Spot indents.	Robin Watts

2013-03-26	Add superscript and subscript handling.	Robin Watts

2013-03-26	Simple dehyphenation support.	Robin Watts

2013-03-26	Text region analysis.	Robin Watts
	Update fz_text_analysis function to look for 'regions'; use this to spot columns etc. Spot columns/width/alignment info. "Intelligently" merge lines based on this. Update html output to make use of this extra information.
2013-03-26	Add simple bullet point detection to paragraph analysis.	Robin Watts
	If a line starts with a recognised unicode bullet char, then split the paragraph there. Don't use this lines separation from the previous line to determine paragraph line step. Also attempt to spot numbered list items (digits or roman numerals). The digits/roman numerals code is disabled by default, as while it worked, later commits made it less useful - but it may be worth reinstating later.
2013-03-26	Rework text extraction structures.	Robin Watts
	Rework the text extraction structures - the broad strokes are similar but we now hold more information at each stage to enable us to perform more detailed analysis on the structure of the page. We now hold: fz_text_char's (the position, ucs value, and style of each char). fz_text_span's (sets of chars that share the same baseline/transform, with no more than an expected amount of whitespace between each char). fz_text_line's (sets of spans that share the same baseline (more or less, allowing for super/subscript, but possibly with a larger than expected amount of whitespace). fz_text_block's (sets of lines that follow one another) After fz_text_analysis is called, we hope to have fz_text_blocks split such that each block is a paragraph. This new implementation has the same restrictions as the current implementation it replaces, namely that chars are only considered for addition onto the most recent span at the moment, but this revised form is designed to allow more easy extension, and for this restriction to be lifted. Also add simple paragraph splitting based on finding the most common 'line distance' in blocks. When we add spans together to collate them into lines, we record the 'horizontal' and 'vertical' spacing between them. (Not actually horizontal or vertical, so much as 'in the direction of writing' and 'perpendicular to the direction of writing'). The 'horizontal' value enables us to more correctly output spaces when converting to (say) html later. The 'vertical' value enables us to spot subscripts and superscripts etc, as well as small changes in the baseline due to style changes. We are careful to base the baseline comparison on the baseline for the line, not the baseline for the previous span, as otherwise superscripts/ subscripts on the end of the line affect what we match next. Also, we are less tolerant of vertical shifts after a large gap. This avoids false positives where different columns just happen to almost line up.
2013-03-25	Fix error in store exception handling code.	Robin Watts
	Don't subtract the itemsize on error when we haven't added it yet.
2013-03-25	Fix device node bboxes for stroked paths.	Robin Watts
	When we calculate the bbox to store in display list nodes, we had been forgetting to allow for the stroke state.
2013-03-25	Avoid store collisions causing unnecessary evictions.	Robin Watts
	When storing tiling bitmaps from the draw_device to the store, we frequently hit the case where we insert tile records that are already there. (This also happens in other cases, such as an image being decoded simultaneously on 2 different threads, but more rarely). In such cases, the existing code attempts to evict store contents to bring the size down enough to fit the new object in, only to find that it needn't have. This patch attempts to fix that behaviour. The only way we know if an equivalent entry is in place already is to try to place the new one; we therefore do this earlier in the store function. If this encaching succeeds (no equivalent entry already exists) we are safe to evict as required. Should the eviction be incapable of removing enough from the store to make it fit, we now need to remove the entry we just added to the hash table. To avoid doing a full (and potentially expensive linear probe), we amend the hash table functions slightly. Firstly, we add a new function fz_hash_insert_with_pos that does the insert, but returns the position within the hashtable that the entry was inserted. Secondly, we then add a new fz_hash_remove_fast function that takes this position as an entry. The 'fast' removal function checks to see whether the entry is still correct (it always should be unless we have been very unlucky with a table rebuild, or another hashtable operation happening at the same time) and can quickly remove the entry. If lightning has struck, it works the old (slower) way.
2013-03-25	Ensure that store bookkeeping doesn't go wrong on reinsertion.	Robin Watts
	If we find that the store already contains a copy of an object, then we don't reinsert it. We should therefore undo the addition of the object size that we just did.
2013-03-25	Support creation of Ink annotations in MuPDF library	Paul Gardiner

2013-03-22	Bug 693708: Fix unclosed XML entry in fz_trace output.	Robin Watts
	Thanks to Brian Nixon for pointing this out.
2013-03-22	Squash some warnings.	Robin Watts
	Some -Wshadow ones, plus some 'set but not used' ones.
2013-03-22	Fix store debugging fns so that all output goes to the same file.	Robin Watts

2013-03-21	Add 'void' to a function declaration.	Robin Watts

2013-03-21	Bug 693708: Avoid dereferencing null pointer.	Robin Watts
	Ensure pointer is non NULL before dereferencing.
2013-03-21	Simple debug code to quantify locking times	Robin Watts

2013-03-20	Add caching of rendered tiles.	Robin Watts
	This requires a slight change to the device interface. Callers that use fz_begin_tile will see no change (and no caching will be done). We add a new fz_begin_tile_id function that takes an extra 'id' parameter, and returns 0 or 1. If the id is 0 then the function behaves exactly as fz_being_tile does, and always returns 0. The PDF and XPS code continues to call the old (uncached) version. The display list code however generates a unique id for every BEGIN_TILE node, and passes this in. If the id is non zero, then it is taken to be a unique identifier for this tile; the implementer of the fz_begin_tile_id entry point can choose to use this to implement caching. If it chooses to ignore the id (and do no caching), it returns 0. If the device implements caching, then it can check on entry for a previously rendered tile with the appropriate matrix and a matching id. If it finds one, then it returns 1. It is the callers responsibility to then skip over all the device calls that would usually happen to render the tiles (i.e. to skip forward to the matching 'END_TILE' operation).