Age | Commit message (Collapse) | Author |
|
Currently if a text extraction region begins on a non-printing
character then "" will be returned. This is the incorrect behaviour,
instead the call should scan ahead until a printing character is
found and start extracting from there. Also proactively adds a
similar check and scan for the end of the extraction region.
BUG=pdfium:1139
Change-Id: Ia2001ac89740f3d31d2bb69e8000773f8b01091b
Reviewed-on: https://pdfium-review.googlesource.com/41532
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Then revert the ones that break compilation.
Fix one IWYU noticed during presubmit.
Change-Id: I881a8a72818e55dbc4816247e35ff5e3015194e7
Reviewed-on: https://pdfium-review.googlesource.com/41470
Reviewed-by: Lei Zhang <thestig@chromium.org>
Commit-Queue: Tom Sepez <tsepez@chromium.org>
|
|
Change-Id: I6630ccdcce795c2f4767f5403303b5cd670c93ea
Reviewed-on: https://pdfium-review.googlesource.com/41351
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
|
|
Change-Id: Iaa3e8d88083d7ac15762ff2d5d74a378f626483c
Reviewed-on: https://pdfium-review.googlesource.com/41350
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
|
|
This CL fixes instances of variable shadowing that are
discovered by turning on -Wshadow.
BUG=pdfium:1137
Change-Id: I418d50de89ecbeb12e85b23a358bc61e8f16e888
Reviewed-on: https://pdfium-review.googlesource.com/41150
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
Change-Id: I12b418b06b097af87f1143dbda3b6e304ba437c6
Reviewed-on: https://pdfium-review.googlesource.com/41210
Reviewed-by: Lei Zhang <thestig@chromium.org>
Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
|
|
Disentangle setting an allocation step from estimating size, these
separate concepts can be handled separately.
Change-Id: I27bf3e193018a4377ccf266207b889fdb672826c
Reviewed-on: https://pdfium-review.googlesource.com/40210
Reviewed-by: Lei Zhang <thestig@chromium.org>
Commit-Queue: Tom Sepez <tsepez@chromium.org>
|
|
Generalize CPDF_TextPage::GetTextByRect(), so that it's possible to get
the text from a text page using a predicate, that way we can easily
get the text that belongs to single text object as well.
Change-Id: Ia457af0f41184694dc1481709be72b35685bce7f
Reviewed-on: https://pdfium-review.googlesource.com/39530
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
This CL creates the following new functions in the public API:
- FPDFPageObj_AddMark
- FPDFPageObjMark_SetIntParam
- FPDFPageObjMark_SetStringParam
Bug: pdfium:1037
Change-Id: Icabf3fdd8e8153b9156bab807a3708d38a9365d8
Reviewed-on: https://pdfium-review.googlesource.com/37330
Commit-Queue: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
|
|
This is called by many client to make sure CountItems() does not
crash. Moving the check to CountItems() makes HasRef() unnecessary.
Bug: pdfium:1037
Change-Id: I4f21f33a88c9aad54f0dae18a38b370c6ceaec80
Reviewed-on: https://pdfium-review.googlesource.com/37133
Commit-Queue: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I5dfadcb68e640235be6e3eb7c8d57ae3b8013d26
Reviewed-on: https://pdfium-review.googlesource.com/35691
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Tom Sepez <tsepez@chromium.org>
|
|
Because its a code smell of a sort.
Change-Id: Id1c1b124f539e31a929701fb9486da9d396d3563
Reviewed-on: https://pdfium-review.googlesource.com/34695
Commit-Queue: Tom Sepez <tsepez@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Add a test PDF with multiple pages, each with a different media box and
crop box. Demonstrate how FPDFText_GetText() gets all the text on the
page, and how FPDFText_GetBoundedText() with the right bounding boxes
gets only the visible text on the page.
Also fix a small nit in CPDF_TextPage::GetTextByRect() found while
writing this CL.
BUG=pdfium:387
Change-Id: I9ce4bb181e2ba5b454ea1341bbccef9ba94c9cd8
Reviewed-on: https://pdfium-review.googlesource.com/34550
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Bug: pdfium:1085
Change-Id: I62c526ae865f0cadfddd2e75a616bce73de0f88d
Reviewed-on: https://pdfium-review.googlesource.com/32632
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Tom Sepez <tsepez@chromium.org>
|
|
Change-Id: Ibc446d9f4606d29f997ee3521f31f0fbcf1ad84b
Reviewed-on: https://pdfium-review.googlesource.com/32176
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Break out parts of ProcessTextObject() into helper functions.
Change-Id: I76ad141728ce77e8d24d7dc9635722d670663f39
Reviewed-on: https://pdfium-review.googlesource.com/32175
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
- Move code into a GenerateSpace() function.
- Break apart some font size conversions.
Change-Id: I4d5ea112fc004a31ac38b7c19ff77fcbfe764d38
Reviewed-on: https://pdfium-review.googlesource.com/32157
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I959d687d7d46fa61e1fe097b0b876ad02d2b123c
Reviewed-on: https://pdfium-review.googlesource.com/32153
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I667a3cd696d44692fa3d73bdee7c2f48d3039255
Reviewed-on: https://pdfium-review.googlesource.com/32152
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
- Refer to the string in CFX_BidiString by const-ref.
- Remove useless CharAt() method.
- Turn a member variable into a local variable.
Change-Id: I30f221b7350150c839a793129789d8ea7cc1f331
Reviewed-on: https://pdfium-review.googlesource.com/31670
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I183a53d08f5da73d788c92b53382e3fac3b823e2
Reviewed-on: https://pdfium-review.googlesource.com/31671
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: Ie5bea82757682390b274ad2da77d1686cc597046
Reviewed-on: https://pdfium-review.googlesource.com/31657
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I9a5acb59790fd8527ced745370bdfe35e4d21c36
Reviewed-on: https://pdfium-review.googlesource.com/31656
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: Ib0b1d014af31493c73a74d81c1f3454a203da949
Reviewed-on: https://pdfium-review.googlesource.com/31655
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I079bc3bf1242fd28fdd51930d9deb6efa34d7509
Reviewed-on: https://pdfium-review.googlesource.com/30055
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I92c7ba605bf95a9023ad046b8dddebe0a0592802
Reviewed-on: https://pdfium-review.googlesource.com/29992
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Tom Sepez <tsepez@chromium.org>
|
|
Transitively mark the same pointers as const in callers.
Change-Id: I1f9669b35c6d7f4b1a11c25163480bc687fbc7f8
Reviewed-on: https://pdfium-review.googlesource.com/28870
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
- Augment/reuse existing NormalizeThreshold() function.
- Use std::min() / std::max().
Change-Id: I709e246c58bf8a69638e1d77dcc2a79ae8a27e77
Reviewed-on: https://pdfium-review.googlesource.com/28736
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
Change-Id: Ia6cb19269ef93440669b68d76c3c378a5d4da7a5
Reviewed-on: https://pdfium-review.googlesource.com/28735
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
CountRects() calls GetRectArray(), which performs the same calculation.
Change-Id: I79dcd8e82f6d0fe7ed992da06237f31d0761a902
Reviewed-on: https://pdfium-review.googlesource.com/28734
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
BUG=chromium:821305
Change-Id: I371572f60ea3984ce044e25125d882b3c2d03115
Reviewed-on: https://pdfium-review.googlesource.com/28733
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
Treat values less than -1 as -1.
BUG=chromium:821305
Change-Id: Ieaced045473fa51097400e5af1286f0d3f4d0143
Reviewed-on: https://pdfium-review.googlesource.com/28732
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Instances are either replaced with FXSYS_iswalpha, which calls out to
the ICU library to do the proper Unicode operations, or have been
converted to a isascii && isalpha pair, if ASCII alpha is actually
what was wanted.
BUG=pdfium:1035
Change-Id: I971ff639ee1ff818ad08793a1900a8bcbb0a3e04
Reviewed-on: https://pdfium-review.googlesource.com/28450
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Instances are either replaced with FXSYS_iswalnum, which calls out to
the ICU library to do the proper Unicode operations, or have been
converted to a isascii && isalnum pair, if ASCII alnum is actually
what was wanted.
BUG=pdfium:1035
Change-Id: I959ec8739a4d020e61562180393ab8113a81577c
Reviewed-on: https://pdfium-review.googlesource.com/28430
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
A number of our character helper methods take in wide character types,
but only do tests/operations on the ASCII range of characters. As a
very quick first pass I am renaming all of the foot-gun methods to
explictly call out this behaviour, while I do a bigger
cleanup/refactor.
BUG=pdfium:1035
Change-Id: Ia035dfa1cb6812fa6d45155c4565475032c4c165
Reviewed-on: https://pdfium-review.googlesource.com/28330
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
These are generally cheap enough to compute as needed, rather than
keeping around in memory all the time (plus the memory for the static
flag the compiler generates to check if initialized).
Change-Id: If3a5365521f6a7781e66fb11f04883a5c673ee11
Reviewed-on: https://pdfium-review.googlesource.com/27150
Commit-Queue: Tom Sepez <tsepez@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
When parsing text streams there is an internal character list that is
generated of all the characters in the stream. Additionally a text
string is generated that is exposed via the public API. This string
will have all of the printing, i.e. non-control characters, in it. For
characters that are not in the font of the stream the unicode, but
printable, the character 0xFFFE is used in the text to indicate a
missing character. This a non-printing character to indicate
non-unicode.
The internal character list gets a Unicode value 0x0 when there isn't
a glyph in the font for it and the original character code is
preserved. This means that when generating the mapping between text
string and character list, the code is mistakenly thinking that the
unprintable character was not present in the text string. I have
changed the check in the mapping generation code to correctly account
for this. Additional investigation is needed to determine if inserting
0xFFFE in the text is the correct behaviour.
This patch resolves an issue where the find highlights in Chrome for a
PDF would be offset when there are unprintable characters in a stream.
BUG=pdfium:1010
Change-Id: I7547c46c5645e039a4b5138f2ce1137fa31990a5
Reviewed-on: https://pdfium-review.googlesource.com/27051
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Bug: 806612
Change-Id: I22bd9046dd37a1b596762c46a6b29a323d6e9fa1
Reviewed-on: https://pdfium-review.googlesource.com/24410
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
|
|
Introduced here
https://pdfium-review.googlesource.com/#/c/17950/5/core/fpdftext/cpdf_textpage.cpp@1237
BUG=chromium:805881
Change-Id: I0c9109f3eebec968360734ff4d9d0542881d6823
Reviewed-on: https://pdfium-review.googlesource.com/24210
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
BUG=pdfium:858
Change-Id: Idc9900fe6f85b1fef06c97f5023653f77156d410
Reviewed-on: https://pdfium-review.googlesource.com/22730
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Change-Id: I48f27026292917e6f6e6b636afd499336e41afea
Reviewed-on: https://pdfium-review.googlesource.com/22310
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I29769f78eaad10c6a8b79e27524336c4f330377e
Reviewed-on: https://pdfium-review.googlesource.com/22258
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I3efc57cd7325d16e3ca8ebdeeaec06012b2c56e3
Reviewed-on: https://pdfium-review.googlesource.com/20110
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
The current implementation of text extraction was difficult to
understand, duplicated logic that existed in other methods, and wasn't
clear about the units the inputs were in. It also didn't handle
control characters correctly.
The new implementation leans on the methods for converting indices
between the text buffer index and character list index spaces to avoid
duplication of code. It also makes it clear to the reader that inputs
are in the character list index space. Finally, it fixes issues being
seen in Chrome with respect of ranges being slightly off.
This CL also adds a test for extracting text that has control
characters.
BUG=pdfium:942,chromium:654578
Change-Id: Id9d1f360c2d7492c7b5a48d6c9ae29f530892742
Reviewed-on: https://pdfium-review.googlesource.com/20014
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
This CL updates various methods in CPDF_TextObject to return or received
size_t values. Callers have been updated as needed.
Bug: pdfium:774
Change-Id: Id72511bc74637c6261add39f5414c9a4b8390b82
Reviewed-on: https://pdfium-review.googlesource.com/19430
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I45468fa7944290fbbe3d2e67f884164ae8d84160
Reviewed-on: https://pdfium-review.googlesource.com/19171
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
BUG=chromium:782596,chromium:781804
Change-Id: I020be3cf813221bb8314f045d83014a25cb9a950
Reviewed-on: https://pdfium-review.googlesource.com/18070
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
|
|
The original version of this code was landed in
https://pdfium-review.googlesource.com/c/pdfium/+/13690/. A corner
case has been found that breaks.
In this CL I have reverted the changes in IsHyphen and implemented a
less aggressive cleanup that I have tested works as expected.
BUG=chromium:781804
Change-Id: I3b36f420834081fdd9e1ae17efc234b561b4df41
Reviewed-on: https://pdfium-review.googlesource.com/17950
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Also remove a conditional in the CPDF_Form ctor that cannot be true.
Change-Id: Icd00233969cea33e9c63d0d6a9d07226c2b173f2
Reviewed-on: https://pdfium-review.googlesource.com/17070
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Bug: pdfium:921
Change-Id: I8864fd2ebdccc5f94aaf70cd8295068bf4db8b68
Reviewed-on: https://pdfium-review.googlesource.com/16492
Reviewed-by: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|