summaryrefslogtreecommitdiff
path: root/core/fpdftext/cpdf_textpage.h
AgeCommit message (Collapse)Author
2018-10-04Remove more uncalled methods in core/Tom Sepez
Change-Id: I10679c9d28eb495c6bc21fd1355cb3ef330a1209 Reviewed-on: https://pdfium-review.googlesource.com/c/43471 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-08-07Add FPDFTextObj_GetText() APIMiklos Vajna
Generalize CPDF_TextPage::GetTextByRect(), so that it's possible to get the text from a text page using a predicate, that way we can easily get the text that belongs to single text object as well. Change-Id: Ia457af0f41184694dc1481709be72b35685bce7f Reviewed-on: https://pdfium-review.googlesource.com/39530 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-05-16Remove some more unused #definesTom Sepez
Bug: pdfium:1085 Change-Id: I62c526ae865f0cadfddd2e75a616bce73de0f88d Reviewed-on: https://pdfium-review.googlesource.com/32632 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-05-07Move some CPDF_TextPage methods into an anonymous namespace.Lei Zhang
Change-Id: I959d687d7d46fa61e1fe097b0b876ad02d2b123c Reviewed-on: https://pdfium-review.googlesource.com/32153 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Ryan Harrison <rharrison@chromium.org>
2018-05-07Initialize CPDF_TextPage members in the header.Lei Zhang
Change-Id: I667a3cd696d44692fa3d73bdee7c2f48d3039255 Reviewed-on: https://pdfium-review.googlesource.com/32152 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Ryan Harrison <rharrison@chromium.org>
2018-04-30Add CPDF_TextPage::GetPrevCharInfo() helper method.Lei Zhang
Change-Id: Ie5bea82757682390b274ad2da77d1686cc597046 Reviewed-on: https://pdfium-review.googlesource.com/31657 Reviewed-by: Ryan Harrison <rharrison@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-04-30Fix some nits in CPDF_TextPage.Lei Zhang
Change-Id: Ib0b1d014af31493c73a74d81c1f3454a203da949 Reviewed-on: https://pdfium-review.googlesource.com/31655 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Ryan Harrison <rharrison@chromium.org>
2018-03-19Remove non-const ref parameters in CPDF_TextPage.Lei Zhang
Change-Id: Ia6cb19269ef93440669b68d76c3c378a5d4da7a5 Reviewed-on: https://pdfium-review.googlesource.com/28735 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-01-30Use unsigned for char widthchromium/3335Nicolas Pena
Bug: 806612 Change-Id: I22bd9046dd37a1b596762c46a6b29a323d6e9fa1 Reviewed-on: https://pdfium-review.googlesource.com/24410 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
2018-01-11Change FPDFText_GetRect() to return a boolean.Lei Zhang
BUG=pdfium:858 Change-Id: Idc9900fe6f85b1fef06c97f5023653f77156d410 Reviewed-on: https://pdfium-review.googlesource.com/22730 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-11-30Rewrite lower level details of extracting text from pageRyan Harrison
The current implementation of text extraction was difficult to understand, duplicated logic that existed in other methods, and wasn't clear about the units the inputs were in. It also didn't handle control characters correctly. The new implementation leans on the methods for converting indices between the text buffer index and character list index spaces to avoid duplication of code. It also makes it clear to the reader that inputs are in the character list index space. Finally, it fixes issues being seen in Chrome with respect of ranges being slightly off. This CL also adds a test for extracting text that has control characters. BUG=pdfium:942,chromium:654578 Change-Id: Id9d1f360c2d7492c7b5a48d6c9ae29f530892742 Reviewed-on: https://pdfium-review.googlesource.com/20014 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
2017-11-27Convert CPDF_TextObject to not use CollectionSizeDan Sinclair
This CL updates various methods in CPDF_TextObject to return or received size_t values. Callers have been updated as needed. Bug: pdfium:774 Change-Id: Id72511bc74637c6261add39f5414c9a4b8390b82 Reviewed-on: https://pdfium-review.googlesource.com/19430 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Ryan Harrison <rharrison@chromium.org>
2017-10-02Removing unused definesDan Sinclair
Remove unused defines. Change-Id: Ibf10d8470f19cbf4528fe1342398a39ef15c1d12 Reviewed-on: https://pdfium-review.googlesource.com/15110 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Ryan Harrison <rharrison@chromium.org>
2017-09-21Move CFX_UnownedPtr to UnownedPtrDan Sinclair
This CL moves CFX_UnownedPtr to UnownedPtr and places in the fxcrt namespace. Bug: pdfium:898 Change-Id: I6d1fa463f365e5cb3aafa8c8a7a5f7eff62ed8e0 Reviewed-on: https://pdfium-review.googlesource.com/14620 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-09-18Convert string class namesRyan Harrison
Automated using git grep & sed. Replace StringC classes with StringView classes. Remove the CFX_ prefix and put string classes in fxcrt namespace. Change AsStringC() to AsStringView(). Rename tests from TEST(fxcrt, *String*Foo) to TEST(*String*, Foo). Couple of tests needed to have their names regularlized. BUG=pdfium:894 Change-Id: I7ca038685c8d803795f3ed02545124f7a224c83d Reviewed-on: https://pdfium-review.googlesource.com/14151 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-09-13Rewrite IsHyphen using string operationsRyan Harrison
The existing code did end of range checks by making sure that the value was never less then 0. This isn't correct when using an unsigned type, since 0 - 1 will wrap around to the max possible value, and thus still be less then 0. Additionally the existing code was hard to follow due to the complexity of some of the low level operations being performed. It has been rewritten using higher level string operations to make it clearer and correct. BUG=chromium:763256 Change-Id: Ib8bf5ca0e29e73724c4a1c4781362e8a8fc30149 Reviewed-on: https://pdfium-review.googlesource.com/13690 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-08-30Move CFX_WideTextBuf out of fx_basicDan Sinclair
This CL moves CFX_WideTextBuf to its own files and updates includes as needed. Change-Id: Ibe66ecf3e66f8f01dd8e9eaf6b467588be86ad4f Reviewed-on: https://pdfium-review.googlesource.com/12413 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-05-25Mass conversion of remaining class members (non-xfa)Tom Sepez
Change-Id: I8365ba80e3395d59a3cf35dbd9d9162e86e712e3 Reviewed-on: https://pdfium-review.googlesource.com/5970 Commit-Queue: Tom Sepez <tsepez@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-05-24Convert to CFX_UnownedPtr, part 5Tom Sepez
Change-Id: Ibdb20fca7e4daae9d61286df4801ac02faf3b281 Reviewed-on: https://pdfium-review.googlesource.com/5831 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-03-17Handle web links across lineschromium/3045Wei Li
When a web link has a hyphen at the end of line, we consider it to be continued to the next line. For example, "http://www.abc.com/my-\r\ntest" should be extracted as "http://www.abc.com/my-test". BUG=pdfium:650 Change-Id: I64a93d9c66faf2be0abdaf8cfe8ee496c435d0ca Reviewed-on: https://pdfium-review.googlesource.com/3092 Commit-Queue: Wei Li <weili@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-03-14Replace FX_FLOAT with underlying float type.Dan Sinclair
Change-Id: I158b7d80b0ec28b742a9f2d5a96f3dde7fb3ab56 Reviewed-on: https://pdfium-review.googlesource.com/3031 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org> Reviewed-by: Nicolás Peña <npm@chromium.org>
2017-03-14Replace FX_CHAR and FX_WCHAR with underlying types.Dan Sinclair
Change-Id: I96e0a20d66b9184d22f64d8e4ce0dadd5a78c1e8 Reviewed-on: https://pdfium-review.googlesource.com/2967 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-02-21Remove non CFX_PointF GetIndexAtPosDan Sinclair
This Cl removes the overload that takes a x,y position and uses the CFX_PointF version in all cases. Change-Id: Iebd048d91a1e1ed1c90ec0019e19750afe06e745 Reviewed-on: https://pdfium-review.googlesource.com/2812 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org> Reviewed-by: Nicolás Peña <npm@chromium.org>
2017-02-16Convert CPDF_TextPage classes to CFX_PointFchromium/3015Dan Sinclair
This CL converts the FX_FLOAT Origin values to CFX_PointF. Change-Id: Ic47d2bae1dd185fbc4ae88ac15fdd0fdf293c7f9 Reviewed-on: https://pdfium-review.googlesource.com/2716 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Nicolás Peña <npm@chromium.org>
2017-02-14Reland "Convert CFX_FloatPoint to CFX_PointF"Dan Sinclair
This CL updates the CFX_FloatPoint Cl to accommodate for the Origin CL being reverted. Change-Id: I345fe1117938a49ad9ee5f310fe7b5e21d9f1948 Reviewed-on: https://pdfium-review.googlesource.com/2697 Reviewed-by: Nicolás Peña <npm@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-02-14Revert "Convert Origins to points"Dan Sinclair
This reverts commit da83d3a5cc09c4056310b3cf299dbbccd5c70d11. Reason for revert: Reverting chain to see if fixes Chrome roll. Original change's description: > Convert Origins to points > > This CL converts various OriginX, OriginY pairs into CFX_PointF objects. > > Change-Id: I9141f7fc713c710b2014d4fdcdec7dc93501f844 > Reviewed-on: https://pdfium-review.googlesource.com/2575 > Commit-Queue: dsinclair <dsinclair@chromium.org> > Reviewed-by: Nicolás Peña <npm@chromium.org> > TBR=tsepez@chromium.org,dsinclair@chromium.org,npm@chromium.org,pdfium-reviews@googlegroups.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= Change-Id: I949fb4ec712e2587e7d0ef0191c34db198b61dcc Reviewed-on: https://pdfium-review.googlesource.com/2696 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-02-14Revert "Convert CFX_FloatPoint to CFX_PointF"dsinclair
This reverts commit 4797c4240cb9e2d8cd36c583d46cd52ff94af95d. Reason for revert: Reverting chain to see if fixes Chrome roll. Original change's description: > Convert CFX_FloatPoint to CFX_PointF > > The two classes store the same information, remove the CFX_FloatPoint variant. > > Change-Id: Ie598c2ba5af04fb2bb3347dd48c30fd5e4845e62 > Reviewed-on: https://pdfium-review.googlesource.com/2612 > Commit-Queue: dsinclair <dsinclair@chromium.org> > Reviewed-by: Tom Sepez <tsepez@chromium.org> > TBR=tsepez@chromium.org,dsinclair@chromium.org,pdfium-reviews@googlegroups.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: Ia42074e706983c62d2e57497c3079b3c338343a3 Reviewed-on: https://pdfium-review.googlesource.com/2694 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-02-13Convert CFX_FloatPoint to CFX_PointFDan Sinclair
The two classes store the same information, remove the CFX_FloatPoint variant. Change-Id: Ie598c2ba5af04fb2bb3347dd48c30fd5e4845e62 Reviewed-on: https://pdfium-review.googlesource.com/2612 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-02-13Convert Origins to pointsDan Sinclair
This CL converts various OriginX, OriginY pairs into CFX_PointF objects. Change-Id: I9141f7fc713c710b2014d4fdcdec7dc93501f844 Reviewed-on: https://pdfium-review.googlesource.com/2575 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Nicolás Peña <npm@chromium.org>
2017-01-09Remove CFX_ArrayTemplate from fpdftext and fxcodec.tsepez
Remove unused m_Segments. Review-Url: https://codereview.chromium.org/2618863004
2016-11-02Remove FX_BOOL from coretsepez
Review-Url: https://codereview.chromium.org/2477443002
2016-10-04Move core/fpdfapi/fpdf_page to core/fpdfapi/pagedsinclair
BUG=pdfium:603 Review-Url: https://codereview.chromium.org/2386423004
2016-09-29Move core/fxcrt/include to core/fxcrtdsinclair
BUG=pdfium:611 Review-Url: https://codereview.chromium.org/2382723003
2016-09-29Move core/fpdftext/include to core/fpdftextdsinclair
BUG=pdfium:611 Review-Url: https://codereview.chromium.org/2383563002