Age | Commit message (Collapse) | Author |
|
Currently if a text extraction region begins on a non-printing
character then "" will be returned. This is the incorrect behaviour,
instead the call should scan ahead until a printing character is
found and start extracting from there. Also proactively adds a
similar check and scan for the end of the extraction region.
BUG=pdfium:1139
Change-Id: Ia2001ac89740f3d31d2bb69e8000773f8b01091b
Reviewed-on: https://pdfium-review.googlesource.com/41532
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I6630ccdcce795c2f4767f5403303b5cd670c93ea
Reviewed-on: https://pdfium-review.googlesource.com/41351
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
|
|
Change-Id: Iaa3e8d88083d7ac15762ff2d5d74a378f626483c
Reviewed-on: https://pdfium-review.googlesource.com/41350
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
|
|
This CL fixes instances of variable shadowing that are
discovered by turning on -Wshadow.
BUG=pdfium:1137
Change-Id: I418d50de89ecbeb12e85b23a358bc61e8f16e888
Reviewed-on: https://pdfium-review.googlesource.com/41150
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
Change-Id: I12b418b06b097af87f1143dbda3b6e304ba437c6
Reviewed-on: https://pdfium-review.googlesource.com/41210
Reviewed-by: Lei Zhang <thestig@chromium.org>
Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
|
|
Disentangle setting an allocation step from estimating size, these
separate concepts can be handled separately.
Change-Id: I27bf3e193018a4377ccf266207b889fdb672826c
Reviewed-on: https://pdfium-review.googlesource.com/40210
Reviewed-by: Lei Zhang <thestig@chromium.org>
Commit-Queue: Tom Sepez <tsepez@chromium.org>
|
|
Generalize CPDF_TextPage::GetTextByRect(), so that it's possible to get
the text from a text page using a predicate, that way we can easily
get the text that belongs to single text object as well.
Change-Id: Ia457af0f41184694dc1481709be72b35685bce7f
Reviewed-on: https://pdfium-review.googlesource.com/39530
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
This CL creates the following new functions in the public API:
- FPDFPageObj_AddMark
- FPDFPageObjMark_SetIntParam
- FPDFPageObjMark_SetStringParam
Bug: pdfium:1037
Change-Id: Icabf3fdd8e8153b9156bab807a3708d38a9365d8
Reviewed-on: https://pdfium-review.googlesource.com/37330
Commit-Queue: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
|
|
This is called by many client to make sure CountItems() does not
crash. Moving the check to CountItems() makes HasRef() unnecessary.
Bug: pdfium:1037
Change-Id: I4f21f33a88c9aad54f0dae18a38b370c6ceaec80
Reviewed-on: https://pdfium-review.googlesource.com/37133
Commit-Queue: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Add a test PDF with multiple pages, each with a different media box and
crop box. Demonstrate how FPDFText_GetText() gets all the text on the
page, and how FPDFText_GetBoundedText() with the right bounding boxes
gets only the visible text on the page.
Also fix a small nit in CPDF_TextPage::GetTextByRect() found while
writing this CL.
BUG=pdfium:387
Change-Id: I9ce4bb181e2ba5b454ea1341bbccef9ba94c9cd8
Reviewed-on: https://pdfium-review.googlesource.com/34550
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: Ibc446d9f4606d29f997ee3521f31f0fbcf1ad84b
Reviewed-on: https://pdfium-review.googlesource.com/32176
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Break out parts of ProcessTextObject() into helper functions.
Change-Id: I76ad141728ce77e8d24d7dc9635722d670663f39
Reviewed-on: https://pdfium-review.googlesource.com/32175
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
- Move code into a GenerateSpace() function.
- Break apart some font size conversions.
Change-Id: I4d5ea112fc004a31ac38b7c19ff77fcbfe764d38
Reviewed-on: https://pdfium-review.googlesource.com/32157
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I959d687d7d46fa61e1fe097b0b876ad02d2b123c
Reviewed-on: https://pdfium-review.googlesource.com/32153
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I667a3cd696d44692fa3d73bdee7c2f48d3039255
Reviewed-on: https://pdfium-review.googlesource.com/32152
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
- Refer to the string in CFX_BidiString by const-ref.
- Remove useless CharAt() method.
- Turn a member variable into a local variable.
Change-Id: I30f221b7350150c839a793129789d8ea7cc1f331
Reviewed-on: https://pdfium-review.googlesource.com/31670
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I183a53d08f5da73d788c92b53382e3fac3b823e2
Reviewed-on: https://pdfium-review.googlesource.com/31671
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: Ie5bea82757682390b274ad2da77d1686cc597046
Reviewed-on: https://pdfium-review.googlesource.com/31657
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I9a5acb59790fd8527ced745370bdfe35e4d21c36
Reviewed-on: https://pdfium-review.googlesource.com/31656
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: Ib0b1d014af31493c73a74d81c1f3454a203da949
Reviewed-on: https://pdfium-review.googlesource.com/31655
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I079bc3bf1242fd28fdd51930d9deb6efa34d7509
Reviewed-on: https://pdfium-review.googlesource.com/30055
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I92c7ba605bf95a9023ad046b8dddebe0a0592802
Reviewed-on: https://pdfium-review.googlesource.com/29992
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Tom Sepez <tsepez@chromium.org>
|
|
Transitively mark the same pointers as const in callers.
Change-Id: I1f9669b35c6d7f4b1a11c25163480bc687fbc7f8
Reviewed-on: https://pdfium-review.googlesource.com/28870
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
- Augment/reuse existing NormalizeThreshold() function.
- Use std::min() / std::max().
Change-Id: I709e246c58bf8a69638e1d77dcc2a79ae8a27e77
Reviewed-on: https://pdfium-review.googlesource.com/28736
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
Change-Id: Ia6cb19269ef93440669b68d76c3c378a5d4da7a5
Reviewed-on: https://pdfium-review.googlesource.com/28735
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
CountRects() calls GetRectArray(), which performs the same calculation.
Change-Id: I79dcd8e82f6d0fe7ed992da06237f31d0761a902
Reviewed-on: https://pdfium-review.googlesource.com/28734
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
BUG=chromium:821305
Change-Id: I371572f60ea3984ce044e25125d882b3c2d03115
Reviewed-on: https://pdfium-review.googlesource.com/28733
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
Treat values less than -1 as -1.
BUG=chromium:821305
Change-Id: Ieaced045473fa51097400e5af1286f0d3f4d0143
Reviewed-on: https://pdfium-review.googlesource.com/28732
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Instances are either replaced with FXSYS_iswalpha, which calls out to
the ICU library to do the proper Unicode operations, or have been
converted to a isascii && isalpha pair, if ASCII alpha is actually
what was wanted.
BUG=pdfium:1035
Change-Id: I971ff639ee1ff818ad08793a1900a8bcbb0a3e04
Reviewed-on: https://pdfium-review.googlesource.com/28450
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Instances are either replaced with FXSYS_iswalnum, which calls out to
the ICU library to do the proper Unicode operations, or have been
converted to a isascii && isalnum pair, if ASCII alnum is actually
what was wanted.
BUG=pdfium:1035
Change-Id: I959ec8739a4d020e61562180393ab8113a81577c
Reviewed-on: https://pdfium-review.googlesource.com/28430
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
A number of our character helper methods take in wide character types,
but only do tests/operations on the ASCII range of characters. As a
very quick first pass I am renaming all of the foot-gun methods to
explictly call out this behaviour, while I do a bigger
cleanup/refactor.
BUG=pdfium:1035
Change-Id: Ia035dfa1cb6812fa6d45155c4565475032c4c165
Reviewed-on: https://pdfium-review.googlesource.com/28330
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
When parsing text streams there is an internal character list that is
generated of all the characters in the stream. Additionally a text
string is generated that is exposed via the public API. This string
will have all of the printing, i.e. non-control characters, in it. For
characters that are not in the font of the stream the unicode, but
printable, the character 0xFFFE is used in the text to indicate a
missing character. This a non-printing character to indicate
non-unicode.
The internal character list gets a Unicode value 0x0 when there isn't
a glyph in the font for it and the original character code is
preserved. This means that when generating the mapping between text
string and character list, the code is mistakenly thinking that the
unprintable character was not present in the text string. I have
changed the check in the mapping generation code to correctly account
for this. Additional investigation is needed to determine if inserting
0xFFFE in the text is the correct behaviour.
This patch resolves an issue where the find highlights in Chrome for a
PDF would be offset when there are unprintable characters in a stream.
BUG=pdfium:1010
Change-Id: I7547c46c5645e039a4b5138f2ce1137fa31990a5
Reviewed-on: https://pdfium-review.googlesource.com/27051
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Bug: 806612
Change-Id: I22bd9046dd37a1b596762c46a6b29a323d6e9fa1
Reviewed-on: https://pdfium-review.googlesource.com/24410
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
|
|
Introduced here
https://pdfium-review.googlesource.com/#/c/17950/5/core/fpdftext/cpdf_textpage.cpp@1237
BUG=chromium:805881
Change-Id: I0c9109f3eebec968360734ff4d9d0542881d6823
Reviewed-on: https://pdfium-review.googlesource.com/24210
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
BUG=pdfium:858
Change-Id: Idc9900fe6f85b1fef06c97f5023653f77156d410
Reviewed-on: https://pdfium-review.googlesource.com/22730
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Change-Id: I3efc57cd7325d16e3ca8ebdeeaec06012b2c56e3
Reviewed-on: https://pdfium-review.googlesource.com/20110
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
The current implementation of text extraction was difficult to
understand, duplicated logic that existed in other methods, and wasn't
clear about the units the inputs were in. It also didn't handle
control characters correctly.
The new implementation leans on the methods for converting indices
between the text buffer index and character list index spaces to avoid
duplication of code. It also makes it clear to the reader that inputs
are in the character list index space. Finally, it fixes issues being
seen in Chrome with respect of ranges being slightly off.
This CL also adds a test for extracting text that has control
characters.
BUG=pdfium:942,chromium:654578
Change-Id: Id9d1f360c2d7492c7b5a48d6c9ae29f530892742
Reviewed-on: https://pdfium-review.googlesource.com/20014
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
This CL updates various methods in CPDF_TextObject to return or received
size_t values. Callers have been updated as needed.
Bug: pdfium:774
Change-Id: Id72511bc74637c6261add39f5414c9a4b8390b82
Reviewed-on: https://pdfium-review.googlesource.com/19430
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I45468fa7944290fbbe3d2e67f884164ae8d84160
Reviewed-on: https://pdfium-review.googlesource.com/19171
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
BUG=chromium:782596,chromium:781804
Change-Id: I020be3cf813221bb8314f045d83014a25cb9a950
Reviewed-on: https://pdfium-review.googlesource.com/18070
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
|
|
The original version of this code was landed in
https://pdfium-review.googlesource.com/c/pdfium/+/13690/. A corner
case has been found that breaks.
In this CL I have reverted the changes in IsHyphen and implemented a
less aggressive cleanup that I have tested works as expected.
BUG=chromium:781804
Change-Id: I3b36f420834081fdd9e1ae17efc234b561b4df41
Reviewed-on: https://pdfium-review.googlesource.com/17950
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Also remove a conditional in the CPDF_Form ctor that cannot be true.
Change-Id: Icd00233969cea33e9c63d0d6a9d07226c2b173f2
Reviewed-on: https://pdfium-review.googlesource.com/17070
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Bug: pdfium:921
Change-Id: I8864fd2ebdccc5f94aaf70cd8295068bf4db8b68
Reviewed-on: https://pdfium-review.googlesource.com/16492
Reviewed-by: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
BUG=pdfium:828
Change-Id: I5c40237433ebabaeabdb43aec9cdf783e41dfe16
Reviewed-on: https://pdfium-review.googlesource.com/13230
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Automated using git grep & sed.
Replace StringC classes with StringView classes.
Remove the CFX_ prefix and put string classes in fxcrt namespace.
Change AsStringC() to AsStringView().
Rename tests from TEST(fxcrt, *String*Foo) to TEST(*String*,
Foo).
Couple of tests needed to have their names regularlized.
BUG=pdfium:894
Change-Id: I7ca038685c8d803795f3ed02545124f7a224c83d
Reviewed-on: https://pdfium-review.googlesource.com/14151
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
The existing code did end of range checks by making sure that the
value was never less then 0. This isn't correct when using an unsigned
type, since 0 - 1 will wrap around to the max possible value, and
thus still be less then 0. Additionally the existing code was hard to
follow due to the complexity of some of the low level operations being
performed.
It has been rewritten using higher level string operations to make it
clearer and correct.
BUG=chromium:763256
Change-Id: Ib8bf5ca0e29e73724c4a1c4781362e8a8fc30149
Reviewed-on: https://pdfium-review.googlesource.com/13690
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
File naming now matches.
Fix one usage not going through the accessor function.
Change-Id: I5cc4986238764964f2a71807a94bd2facf517263
Reviewed-on: https://pdfium-review.googlesource.com/12930
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Through out the code base there are numerous places where variables
are declared using a signed integer type when interacting with the
string classes, since they assume that FX_STRSIZE is 'int'. As part of
changing the underling type of FX_STRSIZE to be unsigned, these
locations are being changed to use FX_STRSIZE. This is necessary as
part of converting the type, but has been broken off into a separate CL,
since it should be low risk.
Some related cleanups that are low risk are included as part of
this CL.
BUG=pdfium:828
Change-Id: Ifaae54ad195ccde0fe8672f71271d29a6ebd65fd
Reviewed-on: https://pdfium-review.googlesource.com/12210
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Currently, all three of CFX_Matrix::TransformRect() take in rect values
and modify them in place.
This CL converts them to take in constant values and return the
transformed values instead, and fixes all the call sites.
Bug=pdfium:874
Change-Id: I9c274df3b14e9d88c100ba0530068e06e8fec32b
Reviewed-on: https://pdfium-review.googlesource.com/11550
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Jane Liu <janeliulwq@google.com>
|
|
This method duplicates the behaviour of the const [] operator and
doesn't offer any additional safety. Folding them into one
implementation.
SetAt is retained, since implementing the non-const [] operator to
replace SetAt has potential performance concerns. Specifically many
non-obvious cases of reading an element using [] will cause a realloc
& copy.
BUG=pdfium:860
Change-Id: I3ef5e5e5a15376f040256b646eb0d90636e24b67
Reviewed-on: https://pdfium-review.googlesource.com/10870
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|