Age | Commit message (Collapse) | Author |
|
Add a test PDF with multiple pages, each with a different media box and
crop box. Demonstrate how FPDFText_GetText() gets all the text on the
page, and how FPDFText_GetBoundedText() with the right bounding boxes
gets only the visible text on the page.
Also fix a small nit in CPDF_TextPage::GetTextByRect() found while
writing this CL.
BUG=pdfium:387
Change-Id: I9ce4bb181e2ba5b454ea1341bbccef9ba94c9cd8
Reviewed-on: https://pdfium-review.googlesource.com/34550
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: Ibc446d9f4606d29f997ee3521f31f0fbcf1ad84b
Reviewed-on: https://pdfium-review.googlesource.com/32176
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Break out parts of ProcessTextObject() into helper functions.
Change-Id: I76ad141728ce77e8d24d7dc9635722d670663f39
Reviewed-on: https://pdfium-review.googlesource.com/32175
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
- Move code into a GenerateSpace() function.
- Break apart some font size conversions.
Change-Id: I4d5ea112fc004a31ac38b7c19ff77fcbfe764d38
Reviewed-on: https://pdfium-review.googlesource.com/32157
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I959d687d7d46fa61e1fe097b0b876ad02d2b123c
Reviewed-on: https://pdfium-review.googlesource.com/32153
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I667a3cd696d44692fa3d73bdee7c2f48d3039255
Reviewed-on: https://pdfium-review.googlesource.com/32152
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
- Refer to the string in CFX_BidiString by const-ref.
- Remove useless CharAt() method.
- Turn a member variable into a local variable.
Change-Id: I30f221b7350150c839a793129789d8ea7cc1f331
Reviewed-on: https://pdfium-review.googlesource.com/31670
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I183a53d08f5da73d788c92b53382e3fac3b823e2
Reviewed-on: https://pdfium-review.googlesource.com/31671
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: Ie5bea82757682390b274ad2da77d1686cc597046
Reviewed-on: https://pdfium-review.googlesource.com/31657
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I9a5acb59790fd8527ced745370bdfe35e4d21c36
Reviewed-on: https://pdfium-review.googlesource.com/31656
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: Ib0b1d014af31493c73a74d81c1f3454a203da949
Reviewed-on: https://pdfium-review.googlesource.com/31655
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I079bc3bf1242fd28fdd51930d9deb6efa34d7509
Reviewed-on: https://pdfium-review.googlesource.com/30055
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I92c7ba605bf95a9023ad046b8dddebe0a0592802
Reviewed-on: https://pdfium-review.googlesource.com/29992
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Tom Sepez <tsepez@chromium.org>
|
|
Transitively mark the same pointers as const in callers.
Change-Id: I1f9669b35c6d7f4b1a11c25163480bc687fbc7f8
Reviewed-on: https://pdfium-review.googlesource.com/28870
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
- Augment/reuse existing NormalizeThreshold() function.
- Use std::min() / std::max().
Change-Id: I709e246c58bf8a69638e1d77dcc2a79ae8a27e77
Reviewed-on: https://pdfium-review.googlesource.com/28736
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
Change-Id: Ia6cb19269ef93440669b68d76c3c378a5d4da7a5
Reviewed-on: https://pdfium-review.googlesource.com/28735
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
CountRects() calls GetRectArray(), which performs the same calculation.
Change-Id: I79dcd8e82f6d0fe7ed992da06237f31d0761a902
Reviewed-on: https://pdfium-review.googlesource.com/28734
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
BUG=chromium:821305
Change-Id: I371572f60ea3984ce044e25125d882b3c2d03115
Reviewed-on: https://pdfium-review.googlesource.com/28733
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
Treat values less than -1 as -1.
BUG=chromium:821305
Change-Id: Ieaced045473fa51097400e5af1286f0d3f4d0143
Reviewed-on: https://pdfium-review.googlesource.com/28732
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Instances are either replaced with FXSYS_iswalpha, which calls out to
the ICU library to do the proper Unicode operations, or have been
converted to a isascii && isalpha pair, if ASCII alpha is actually
what was wanted.
BUG=pdfium:1035
Change-Id: I971ff639ee1ff818ad08793a1900a8bcbb0a3e04
Reviewed-on: https://pdfium-review.googlesource.com/28450
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Instances are either replaced with FXSYS_iswalnum, which calls out to
the ICU library to do the proper Unicode operations, or have been
converted to a isascii && isalnum pair, if ASCII alnum is actually
what was wanted.
BUG=pdfium:1035
Change-Id: I959ec8739a4d020e61562180393ab8113a81577c
Reviewed-on: https://pdfium-review.googlesource.com/28430
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
A number of our character helper methods take in wide character types,
but only do tests/operations on the ASCII range of characters. As a
very quick first pass I am renaming all of the foot-gun methods to
explictly call out this behaviour, while I do a bigger
cleanup/refactor.
BUG=pdfium:1035
Change-Id: Ia035dfa1cb6812fa6d45155c4565475032c4c165
Reviewed-on: https://pdfium-review.googlesource.com/28330
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
When parsing text streams there is an internal character list that is
generated of all the characters in the stream. Additionally a text
string is generated that is exposed via the public API. This string
will have all of the printing, i.e. non-control characters, in it. For
characters that are not in the font of the stream the unicode, but
printable, the character 0xFFFE is used in the text to indicate a
missing character. This a non-printing character to indicate
non-unicode.
The internal character list gets a Unicode value 0x0 when there isn't
a glyph in the font for it and the original character code is
preserved. This means that when generating the mapping between text
string and character list, the code is mistakenly thinking that the
unprintable character was not present in the text string. I have
changed the check in the mapping generation code to correctly account
for this. Additional investigation is needed to determine if inserting
0xFFFE in the text is the correct behaviour.
This patch resolves an issue where the find highlights in Chrome for a
PDF would be offset when there are unprintable characters in a stream.
BUG=pdfium:1010
Change-Id: I7547c46c5645e039a4b5138f2ce1137fa31990a5
Reviewed-on: https://pdfium-review.googlesource.com/27051
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Bug: 806612
Change-Id: I22bd9046dd37a1b596762c46a6b29a323d6e9fa1
Reviewed-on: https://pdfium-review.googlesource.com/24410
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
|
|
Introduced here
https://pdfium-review.googlesource.com/#/c/17950/5/core/fpdftext/cpdf_textpage.cpp@1237
BUG=chromium:805881
Change-Id: I0c9109f3eebec968360734ff4d9d0542881d6823
Reviewed-on: https://pdfium-review.googlesource.com/24210
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
BUG=pdfium:858
Change-Id: Idc9900fe6f85b1fef06c97f5023653f77156d410
Reviewed-on: https://pdfium-review.googlesource.com/22730
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Change-Id: I3efc57cd7325d16e3ca8ebdeeaec06012b2c56e3
Reviewed-on: https://pdfium-review.googlesource.com/20110
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
The current implementation of text extraction was difficult to
understand, duplicated logic that existed in other methods, and wasn't
clear about the units the inputs were in. It also didn't handle
control characters correctly.
The new implementation leans on the methods for converting indices
between the text buffer index and character list index spaces to avoid
duplication of code. It also makes it clear to the reader that inputs
are in the character list index space. Finally, it fixes issues being
seen in Chrome with respect of ranges being slightly off.
This CL also adds a test for extracting text that has control
characters.
BUG=pdfium:942,chromium:654578
Change-Id: Id9d1f360c2d7492c7b5a48d6c9ae29f530892742
Reviewed-on: https://pdfium-review.googlesource.com/20014
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
This CL updates various methods in CPDF_TextObject to return or received
size_t values. Callers have been updated as needed.
Bug: pdfium:774
Change-Id: Id72511bc74637c6261add39f5414c9a4b8390b82
Reviewed-on: https://pdfium-review.googlesource.com/19430
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I45468fa7944290fbbe3d2e67f884164ae8d84160
Reviewed-on: https://pdfium-review.googlesource.com/19171
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
BUG=chromium:782596,chromium:781804
Change-Id: I020be3cf813221bb8314f045d83014a25cb9a950
Reviewed-on: https://pdfium-review.googlesource.com/18070
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
|
|
The original version of this code was landed in
https://pdfium-review.googlesource.com/c/pdfium/+/13690/. A corner
case has been found that breaks.
In this CL I have reverted the changes in IsHyphen and implemented a
less aggressive cleanup that I have tested works as expected.
BUG=chromium:781804
Change-Id: I3b36f420834081fdd9e1ae17efc234b561b4df41
Reviewed-on: https://pdfium-review.googlesource.com/17950
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Also remove a conditional in the CPDF_Form ctor that cannot be true.
Change-Id: Icd00233969cea33e9c63d0d6a9d07226c2b173f2
Reviewed-on: https://pdfium-review.googlesource.com/17070
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Bug: pdfium:921
Change-Id: I8864fd2ebdccc5f94aaf70cd8295068bf4db8b68
Reviewed-on: https://pdfium-review.googlesource.com/16492
Reviewed-by: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
BUG=pdfium:828
Change-Id: I5c40237433ebabaeabdb43aec9cdf783e41dfe16
Reviewed-on: https://pdfium-review.googlesource.com/13230
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Automated using git grep & sed.
Replace StringC classes with StringView classes.
Remove the CFX_ prefix and put string classes in fxcrt namespace.
Change AsStringC() to AsStringView().
Rename tests from TEST(fxcrt, *String*Foo) to TEST(*String*,
Foo).
Couple of tests needed to have their names regularlized.
BUG=pdfium:894
Change-Id: I7ca038685c8d803795f3ed02545124f7a224c83d
Reviewed-on: https://pdfium-review.googlesource.com/14151
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
The existing code did end of range checks by making sure that the
value was never less then 0. This isn't correct when using an unsigned
type, since 0 - 1 will wrap around to the max possible value, and
thus still be less then 0. Additionally the existing code was hard to
follow due to the complexity of some of the low level operations being
performed.
It has been rewritten using higher level string operations to make it
clearer and correct.
BUG=chromium:763256
Change-Id: Ib8bf5ca0e29e73724c4a1c4781362e8a8fc30149
Reviewed-on: https://pdfium-review.googlesource.com/13690
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
File naming now matches.
Fix one usage not going through the accessor function.
Change-Id: I5cc4986238764964f2a71807a94bd2facf517263
Reviewed-on: https://pdfium-review.googlesource.com/12930
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Through out the code base there are numerous places where variables
are declared using a signed integer type when interacting with the
string classes, since they assume that FX_STRSIZE is 'int'. As part of
changing the underling type of FX_STRSIZE to be unsigned, these
locations are being changed to use FX_STRSIZE. This is necessary as
part of converting the type, but has been broken off into a separate CL,
since it should be low risk.
Some related cleanups that are low risk are included as part of
this CL.
BUG=pdfium:828
Change-Id: Ifaae54ad195ccde0fe8672f71271d29a6ebd65fd
Reviewed-on: https://pdfium-review.googlesource.com/12210
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Currently, all three of CFX_Matrix::TransformRect() take in rect values
and modify them in place.
This CL converts them to take in constant values and return the
transformed values instead, and fixes all the call sites.
Bug=pdfium:874
Change-Id: I9c274df3b14e9d88c100ba0530068e06e8fec32b
Reviewed-on: https://pdfium-review.googlesource.com/11550
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Jane Liu <janeliulwq@google.com>
|
|
This method duplicates the behaviour of the const [] operator and
doesn't offer any additional safety. Folding them into one
implementation.
SetAt is retained, since implementing the non-const [] operator to
replace SetAt has potential performance concerns. Specifically many
non-obvious cases of reading an element using [] will cause a realloc
& copy.
BUG=pdfium:860
Change-Id: I3ef5e5e5a15376f040256b646eb0d90636e24b67
Reviewed-on: https://pdfium-review.googlesource.com/10870
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
Specifically the index parameter passed in to GetAt(), SetAt() and
operator[] are now being tested to be in bounds.
BUG=chromium:752480, pdfium:828
Change-Id: I9e94d58c98a8eaaaae53cd0e3ffe2123ea17d8c4
Reviewed-on: https://pdfium-review.googlesource.com/10651
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
The various string/byte classes support Mid(), Left(), and Right() for
extracting substrings. Mid() can handle all possible cases, but Left()
and Right() are useful for common cases and more explicit about what
is going on.
Calls like Mid(offset, length - offset) can be converted to
Right(length - offset). Calls like Mid(0, length) can be converted to
Left(length).
If the substring being extracted does not extend all the way to one of
the edges of the string, then Mid() still needs to be used.
BUG=pdfium:828
Change-Id: I2ec46ad3d71aac0f7b513e103c69cbe8c854cf62
Reviewed-on: https://pdfium-review.googlesource.com/9510
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
Change-Id: I43ec0d4a3b60d51c59ba5a540dfe24803e725089
Reviewed-on: https://pdfium-review.googlesource.com/9170
Reviewed-by: Nicolás Peña <npm@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
CFX_Matrix::GetInverse is much clearer.
Change-Id: Id10ab1723735332e1a78de853f28415ec3a4d834
Reviewed-on: https://pdfium-review.googlesource.com/7090
Reviewed-by: Lei Zhang <thestig@chromium.org>
Commit-Queue: Nicolás Peña <npm@chromium.org>
|
|
Change-Id: I96c3429dbe2c572ed409706adfe3707b8b9bf51b
Reviewed-on: https://pdfium-review.googlesource.com/6176
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I8365ba80e3395d59a3cf35dbd9d9162e86e712e3
Reviewed-on: https://pdfium-review.googlesource.com/5970
Commit-Queue: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: Ibdb20fca7e4daae9d61286df4801ac02faf3b281
Reviewed-on: https://pdfium-review.googlesource.com/5831
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
|
|
member.
Change-Id: I51e30d298e87b9ae0d5aca83b2f1d6787efce70a
Reviewed-on: https://pdfium-review.googlesource.com/5290
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Nicolás Peña <npm@chromium.org>
|
|
Change-Id: Iba1aa793567e69acc3cc1acbd5b9a9f531c80b7a
Reviewed-on: https://pdfium-review.googlesource.com/4453
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|