Age | Commit message (Collapse) | Author |
|
The div/spans still use table style rendering, but it's simpler
code (and html) this way.
|
|
Rework the text extraction structures - the broad strokes are similar
but we now hold more information at each stage to enable us to perform
more detailed analysis on the structure of the page.
We now hold:
fz_text_char's (the position, ucs value, and style of each char).
fz_text_span's (sets of chars that share the same baseline/transform,
with no more than an expected amount of whitespace between each char).
fz_text_line's (sets of spans that share the same baseline (more or
less, allowing for super/subscript, but possibly with a larger than
expected amount of whitespace).
fz_text_block's (sets of lines that follow one another)
After fz_text_analysis is called, we hope to have fz_text_blocks split
such that each block is a paragraph.
This new implementation has the same restrictions as the current
implementation it replaces, namely that chars are only considered for
addition onto the most recent span at the moment, but this revised form
is designed to allow more easy extension, and for this restriction to
be lifted.
Also add simple paragraph splitting based on finding the most common
'line distance' in blocks.
When we add spans together to collate them into lines, we record the
'horizontal' and 'vertical' spacing between them. (Not actually
horizontal or vertical, so much as 'in the direction of writing' and
'perpendicular to the direction of writing').
The 'horizontal' value enables us to more correctly output spaces when
converting to (say) html later.
The 'vertical' value enables us to spot subscripts and superscripts etc,
as well as small changes in the baseline due to style changes. We are
careful to base the baseline comparison on the baseline for the line,
not the baseline for the previous span, as otherwise superscripts/
subscripts on the end of the line affect what we match next.
Also, we are less tolerant of vertical shifts after a large gap. This
avoids false positives where different columns just happen to almost
line up.
|
|
|
|
|
|
The AsyncTask class we took from android source makes use of ArrayDeque
which in turn makes use of Deque, neither of which are available below
API 9. Fix is to take these two classes from android source also
|
|
|
|
|
|
|
|
Return 0. Check for this case when opening a PDF and give a nice dialogue.
Fix the nice dialogue code so that it doesn't crash afterwards due to
a null mSearchTask.
|
|
This fixes bug #693664, and also simplifies app code.
The example file attached to the bug produces strange results, but that
is because the QuadPoint information is incorrect.
|
|
|
|
Use of the bbox device to derive the area of the display list can lead
to bad results because of heuristics used to handle corners of stroked
paths.
|
|
Also, in the app, protect against exceptions thrown due to unknown
annotation types.
|
|
We could do with non arm builds of v8 to go with these.
|
|
Highlight annotations currently come out opaque so aren't a lot of use.
|
|
Also change the way we pass the text rectangles so that
non-axis-aligned ones can be permitted, and relocate the code that
calculates the strike-out lines from the bounding boxes
|
|
|
|
|
|
|
|
|
|
This change showed up a bug where highlights may fail to show if passed
in while the page was set as blank. That bug is also fixed in this commit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Move the one time setup of the HTMLOUT javascript interface etc
into the constructor. This seems to avoid the occasional SEGV
caused while flipping pages on the HTC Desire in reflow mode.
|
|
|
|
|
|
|
|
|
|
This gets us styles.
|
|
Various functions in the code output to FILE *, when there are times
we'd like them to output to other things, such as fz_buffers.
Add an fz_output type, together with fz_printf to allow things to
output to this.
|
|
We should probably record the last scale for each mode and
reenstate it when returning to that mode, but there are
a few difficulties to that that need to be addressed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|