Age | Commit message (Collapse) | Author |
|
Add a test PDF with multiple pages, each with a different media box and
crop box. Demonstrate how FPDFText_GetText() gets all the text on the
page, and how FPDFText_GetBoundedText() with the right bounding boxes
gets only the visible text on the page.
Also fix a small nit in CPDF_TextPage::GetTextByRect() found while
writing this CL.
BUG=pdfium:387
Change-Id: I9ce4bb181e2ba5b454ea1341bbccef9ba94c9cd8
Reviewed-on: https://pdfium-review.googlesource.com/34550
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Bug: pdfium:1085
Change-Id: I62c526ae865f0cadfddd2e75a616bce73de0f88d
Reviewed-on: https://pdfium-review.googlesource.com/32632
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Tom Sepez <tsepez@chromium.org>
|
|
Change-Id: Ibc446d9f4606d29f997ee3521f31f0fbcf1ad84b
Reviewed-on: https://pdfium-review.googlesource.com/32176
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Break out parts of ProcessTextObject() into helper functions.
Change-Id: I76ad141728ce77e8d24d7dc9635722d670663f39
Reviewed-on: https://pdfium-review.googlesource.com/32175
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
- Move code into a GenerateSpace() function.
- Break apart some font size conversions.
Change-Id: I4d5ea112fc004a31ac38b7c19ff77fcbfe764d38
Reviewed-on: https://pdfium-review.googlesource.com/32157
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I959d687d7d46fa61e1fe097b0b876ad02d2b123c
Reviewed-on: https://pdfium-review.googlesource.com/32153
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I667a3cd696d44692fa3d73bdee7c2f48d3039255
Reviewed-on: https://pdfium-review.googlesource.com/32152
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
- Refer to the string in CFX_BidiString by const-ref.
- Remove useless CharAt() method.
- Turn a member variable into a local variable.
Change-Id: I30f221b7350150c839a793129789d8ea7cc1f331
Reviewed-on: https://pdfium-review.googlesource.com/31670
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I183a53d08f5da73d788c92b53382e3fac3b823e2
Reviewed-on: https://pdfium-review.googlesource.com/31671
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: Ie5bea82757682390b274ad2da77d1686cc597046
Reviewed-on: https://pdfium-review.googlesource.com/31657
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I9a5acb59790fd8527ced745370bdfe35e4d21c36
Reviewed-on: https://pdfium-review.googlesource.com/31656
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: Ib0b1d014af31493c73a74d81c1f3454a203da949
Reviewed-on: https://pdfium-review.googlesource.com/31655
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I079bc3bf1242fd28fdd51930d9deb6efa34d7509
Reviewed-on: https://pdfium-review.googlesource.com/30055
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I92c7ba605bf95a9023ad046b8dddebe0a0592802
Reviewed-on: https://pdfium-review.googlesource.com/29992
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Tom Sepez <tsepez@chromium.org>
|
|
Transitively mark the same pointers as const in callers.
Change-Id: I1f9669b35c6d7f4b1a11c25163480bc687fbc7f8
Reviewed-on: https://pdfium-review.googlesource.com/28870
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
- Augment/reuse existing NormalizeThreshold() function.
- Use std::min() / std::max().
Change-Id: I709e246c58bf8a69638e1d77dcc2a79ae8a27e77
Reviewed-on: https://pdfium-review.googlesource.com/28736
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
Change-Id: Ia6cb19269ef93440669b68d76c3c378a5d4da7a5
Reviewed-on: https://pdfium-review.googlesource.com/28735
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
CountRects() calls GetRectArray(), which performs the same calculation.
Change-Id: I79dcd8e82f6d0fe7ed992da06237f31d0761a902
Reviewed-on: https://pdfium-review.googlesource.com/28734
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
BUG=chromium:821305
Change-Id: I371572f60ea3984ce044e25125d882b3c2d03115
Reviewed-on: https://pdfium-review.googlesource.com/28733
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
Treat values less than -1 as -1.
BUG=chromium:821305
Change-Id: Ieaced045473fa51097400e5af1286f0d3f4d0143
Reviewed-on: https://pdfium-review.googlesource.com/28732
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Instances are either replaced with FXSYS_iswalpha, which calls out to
the ICU library to do the proper Unicode operations, or have been
converted to a isascii && isalpha pair, if ASCII alpha is actually
what was wanted.
BUG=pdfium:1035
Change-Id: I971ff639ee1ff818ad08793a1900a8bcbb0a3e04
Reviewed-on: https://pdfium-review.googlesource.com/28450
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Instances are either replaced with FXSYS_iswalnum, which calls out to
the ICU library to do the proper Unicode operations, or have been
converted to a isascii && isalnum pair, if ASCII alnum is actually
what was wanted.
BUG=pdfium:1035
Change-Id: I959ec8739a4d020e61562180393ab8113a81577c
Reviewed-on: https://pdfium-review.googlesource.com/28430
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
A number of our character helper methods take in wide character types,
but only do tests/operations on the ASCII range of characters. As a
very quick first pass I am renaming all of the foot-gun methods to
explictly call out this behaviour, while I do a bigger
cleanup/refactor.
BUG=pdfium:1035
Change-Id: Ia035dfa1cb6812fa6d45155c4565475032c4c165
Reviewed-on: https://pdfium-review.googlesource.com/28330
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
These are generally cheap enough to compute as needed, rather than
keeping around in memory all the time (plus the memory for the static
flag the compiler generates to check if initialized).
Change-Id: If3a5365521f6a7781e66fb11f04883a5c673ee11
Reviewed-on: https://pdfium-review.googlesource.com/27150
Commit-Queue: Tom Sepez <tsepez@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
When parsing text streams there is an internal character list that is
generated of all the characters in the stream. Additionally a text
string is generated that is exposed via the public API. This string
will have all of the printing, i.e. non-control characters, in it. For
characters that are not in the font of the stream the unicode, but
printable, the character 0xFFFE is used in the text to indicate a
missing character. This a non-printing character to indicate
non-unicode.
The internal character list gets a Unicode value 0x0 when there isn't
a glyph in the font for it and the original character code is
preserved. This means that when generating the mapping between text
string and character list, the code is mistakenly thinking that the
unprintable character was not present in the text string. I have
changed the check in the mapping generation code to correctly account
for this. Additional investigation is needed to determine if inserting
0xFFFE in the text is the correct behaviour.
This patch resolves an issue where the find highlights in Chrome for a
PDF would be offset when there are unprintable characters in a stream.
BUG=pdfium:1010
Change-Id: I7547c46c5645e039a4b5138f2ce1137fa31990a5
Reviewed-on: https://pdfium-review.googlesource.com/27051
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Bug: 806612
Change-Id: I22bd9046dd37a1b596762c46a6b29a323d6e9fa1
Reviewed-on: https://pdfium-review.googlesource.com/24410
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
|
|
Introduced here
https://pdfium-review.googlesource.com/#/c/17950/5/core/fpdftext/cpdf_textpage.cpp@1237
BUG=chromium:805881
Change-Id: I0c9109f3eebec968360734ff4d9d0542881d6823
Reviewed-on: https://pdfium-review.googlesource.com/24210
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
BUG=pdfium:858
Change-Id: Idc9900fe6f85b1fef06c97f5023653f77156d410
Reviewed-on: https://pdfium-review.googlesource.com/22730
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Change-Id: I48f27026292917e6f6e6b636afd499336e41afea
Reviewed-on: https://pdfium-review.googlesource.com/22310
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I29769f78eaad10c6a8b79e27524336c4f330377e
Reviewed-on: https://pdfium-review.googlesource.com/22258
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I3efc57cd7325d16e3ca8ebdeeaec06012b2c56e3
Reviewed-on: https://pdfium-review.googlesource.com/20110
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
The current implementation of text extraction was difficult to
understand, duplicated logic that existed in other methods, and wasn't
clear about the units the inputs were in. It also didn't handle
control characters correctly.
The new implementation leans on the methods for converting indices
between the text buffer index and character list index spaces to avoid
duplication of code. It also makes it clear to the reader that inputs
are in the character list index space. Finally, it fixes issues being
seen in Chrome with respect of ranges being slightly off.
This CL also adds a test for extracting text that has control
characters.
BUG=pdfium:942,chromium:654578
Change-Id: Id9d1f360c2d7492c7b5a48d6c9ae29f530892742
Reviewed-on: https://pdfium-review.googlesource.com/20014
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
This CL updates various methods in CPDF_TextObject to return or received
size_t values. Callers have been updated as needed.
Bug: pdfium:774
Change-Id: Id72511bc74637c6261add39f5414c9a4b8390b82
Reviewed-on: https://pdfium-review.googlesource.com/19430
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I45468fa7944290fbbe3d2e67f884164ae8d84160
Reviewed-on: https://pdfium-review.googlesource.com/19171
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
BUG=chromium:782596,chromium:781804
Change-Id: I020be3cf813221bb8314f045d83014a25cb9a950
Reviewed-on: https://pdfium-review.googlesource.com/18070
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
|
|
The original version of this code was landed in
https://pdfium-review.googlesource.com/c/pdfium/+/13690/. A corner
case has been found that breaks.
In this CL I have reverted the changes in IsHyphen and implemented a
less aggressive cleanup that I have tested works as expected.
BUG=chromium:781804
Change-Id: I3b36f420834081fdd9e1ae17efc234b561b4df41
Reviewed-on: https://pdfium-review.googlesource.com/17950
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Also remove a conditional in the CPDF_Form ctor that cannot be true.
Change-Id: Icd00233969cea33e9c63d0d6a9d07226c2b173f2
Reviewed-on: https://pdfium-review.googlesource.com/17070
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Bug: pdfium:921
Change-Id: I8864fd2ebdccc5f94aaf70cd8295068bf4db8b68
Reviewed-on: https://pdfium-review.googlesource.com/16492
Reviewed-by: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Remove unused defines.
Change-Id: Ibf10d8470f19cbf4528fe1342398a39ef15c1d12
Reviewed-on: https://pdfium-review.googlesource.com/15110
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
With the conversion of internal string sizes to size_t, these wrappers
are no longer needed. Replacing them with strlen and wcslen
respectively.
BUG=pdfium:828
Change-Id: Ia087ca2ddaf688a57ec9bd9ddfb8533cbe41510d
Reviewed-on: https://pdfium-review.googlesource.com/14890
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
BUG=pdfium:828
Change-Id: I5c40237433ebabaeabdb43aec9cdf783e41dfe16
Reviewed-on: https://pdfium-review.googlesource.com/13230
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
This CL moves CFX_UnownedPtr to UnownedPtr and places in the fxcrt
namespace.
Bug: pdfium:898
Change-Id: I6d1fa463f365e5cb3aafa8c8a7a5f7eff62ed8e0
Reviewed-on: https://pdfium-review.googlesource.com/14620
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
|
|
Automated using git grep & sed.
Replace StringC classes with StringView classes.
Remove the CFX_ prefix and put string classes in fxcrt namespace.
Change AsStringC() to AsStringView().
Rename tests from TEST(fxcrt, *String*Foo) to TEST(*String*,
Foo).
Couple of tests needed to have their names regularlized.
BUG=pdfium:894
Change-Id: I7ca038685c8d803795f3ed02545124f7a224c83d
Reviewed-on: https://pdfium-review.googlesource.com/14151
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
The existing code did end of range checks by making sure that the
value was never less then 0. This isn't correct when using an unsigned
type, since 0 - 1 will wrap around to the max possible value, and
thus still be less then 0. Additionally the existing code was hard to
follow due to the complexity of some of the low level operations being
performed.
It has been rewritten using higher level string operations to make it
clearer and correct.
BUG=chromium:763256
Change-Id: Ib8bf5ca0e29e73724c4a1c4781362e8a8fc30149
Reviewed-on: https://pdfium-review.googlesource.com/13690
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
File naming now matches.
Fix one usage not going through the accessor function.
Change-Id: I5cc4986238764964f2a71807a94bd2facf517263
Reviewed-on: https://pdfium-review.googlesource.com/12930
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Adjust loop conditions and behaviours in preperation for convering the
underlying type of FX_STRSIZE to size_t. These changes are not
dependent on the type switch occuring, so can be landed before hand.
BUG=pdfium:828
Change-Id: I5f950c99c10e5ef0836959e3b1dd2e09f8f5afc0
Reviewed-on: https://pdfium-review.googlesource.com/12750
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
This CL removes the fx_basic.h header and fixes up includes as needed.
Change-Id: I49af32a8327bdbcda40c50a61ffbd75d06609040
Reviewed-on: https://pdfium-review.googlesource.com/12670
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
This CL moves CFX_WideTextBuf to its own files and updates includes as
needed.
Change-Id: Ibe66ecf3e66f8f01dd8e9eaf6b467588be86ad4f
Reviewed-on: https://pdfium-review.googlesource.com/12413
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
|
|
Through out the code base there are numerous places where variables
are declared using a signed integer type when interacting with the
string classes, since they assume that FX_STRSIZE is 'int'. As part of
changing the underling type of FX_STRSIZE to be unsigned, these
locations are being changed to use FX_STRSIZE. This is necessary as
part of converting the type, but has been broken off into a separate CL,
since it should be low risk.
Some related cleanups that are low risk are included as part of
this CL.
BUG=pdfium:828
Change-Id: Ifaae54ad195ccde0fe8672f71271d29a6ebd65fd
Reviewed-on: https://pdfium-review.googlesource.com/12210
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Currently these use -1 as a special value to indicate not set. This
creates the same issues that FX_STRNPOS created for converting
FX_STRSIZE to size_t, so this code has been rewritten.
BUG=pdfium:828
Change-Id: Iaaa96af0dcb2eb8b600f3ea39060a398ac9a3800
Reviewed-on: https://pdfium-review.googlesource.com/12130
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|