summaryrefslogtreecommitdiff
path: root/core/fpdftext/cpdf_textpagefind.cpp
AgeCommit message (Collapse)Author
2017-12-01Get rid of else after break/continue/return.chromium/3284chromium/3283Lei Zhang
Change-Id: I3efc57cd7325d16e3ca8ebdeeaec06012b2c56e3 Reviewed-on: https://pdfium-review.googlesource.com/20110 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2017-11-30Rewrite lower level details of extracting text from pageRyan Harrison
The current implementation of text extraction was difficult to understand, duplicated logic that existed in other methods, and wasn't clear about the units the inputs were in. It also didn't handle control characters correctly. The new implementation leans on the methods for converting indices between the text buffer index and character list index spaces to avoid duplication of code. It also makes it clear to the reader that inputs are in the character list index space. Finally, it fixes issues being seen in Chrome with respect of ranges being slightly off. This CL also adds a test for extracting text that has control characters. BUG=pdfium:942,chromium:654578 Change-Id: Id9d1f360c2d7492c7b5a48d6c9ae29f530892742 Reviewed-on: https://pdfium-review.googlesource.com/20014 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
2017-09-27Remove FXSYS_strlen and FXSYS_wcslenchromium/3226Ryan Harrison
With the conversion of internal string sizes to size_t, these wrappers are no longer needed. Replacing them with strlen and wcslen respectively. BUG=pdfium:828 Change-Id: Ia087ca2ddaf688a57ec9bd9ddfb8533cbe41510d Reviewed-on: https://pdfium-review.googlesource.com/14890 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-09-27Remove FX_STRSIZE and replace with size_tRyan Harrison
BUG=pdfium:828 Change-Id: I5c40237433ebabaeabdb43aec9cdf783e41dfe16 Reviewed-on: https://pdfium-review.googlesource.com/13230 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-09-18Convert string class namesRyan Harrison
Automated using git grep & sed. Replace StringC classes with StringView classes. Remove the CFX_ prefix and put string classes in fxcrt namespace. Change AsStringC() to AsStringView(). Rename tests from TEST(fxcrt, *String*Foo) to TEST(*String*, Foo). Couple of tests needed to have their names regularlized. BUG=pdfium:894 Change-Id: I7ca038685c8d803795f3ed02545124f7a224c83d Reviewed-on: https://pdfium-review.googlesource.com/14151 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-08-28Convert find markers to Optionals in CPDF_TextPageFindRyan Harrison
Currently these use -1 as a special value to indicate not set. This creates the same issues that FX_STRNPOS created for converting FX_STRSIZE to size_t, so this code has been rewritten. BUG=pdfium:828 Change-Id: Iaaa96af0dcb2eb8b600f3ea39060a398ac9a3800 Reviewed-on: https://pdfium-review.googlesource.com/12130 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-08-23Convert string Find methods to return an OptionalRyan Harrison
The Find and ReverseFind methods for WideString, WideStringC, ByteString, and ByteStringC have been converted from returning a raw FX_STRSIZE, to returning Optional<FX_STRSIZE>, so that success/failure can be indicated without using FX_STRNPOS. This allows for removing FX_STRNPOS and by association makes the conversion of FX_STRSIZE to size_t easier, since it forces checking the return value of Find to be explictly done as well as taking the error value out of the range of FX_STRSIZE. New Contains methods have been added for cases where the success or failure is all the call site to Find cared about, and the actual position was ignored. BUG=pdfium:828 Change-Id: Id827e508c8660affa68cc08a13d96121369364b7 Reviewed-on: https://pdfium-review.googlesource.com/11350 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-08-15Remove GetAt from string classesRyan Harrison
This method duplicates the behaviour of the const [] operator and doesn't offer any additional safety. Folding them into one implementation. SetAt is retained, since implementing the non-const [] operator to replace SetAt has potential performance concerns. Specifically many non-obvious cases of reading an element using [] will cause a realloc & copy. BUG=pdfium:860 Change-Id: I3ef5e5e5a15376f040256b646eb0d90636e24b67 Reviewed-on: https://pdfium-review.googlesource.com/10870 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-08-01Remove support for negative params to ReleaseBuffer()Ryan Harrison
This CL removes the default param value for this method, which was negative. It also adds in a method to get buffer lengths, so that the callsites can explictly passing in the length of the buffer if they were using the default value previously. BUG=pdfium:828 Change-Id: I0170771ee81970b8b601631015ab3e6e39fea8ea Reviewed-on: https://pdfium-review.googlesource.com/9790 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-08-01Replace raw value for constant error value in string operationsRyan Harrison
Currently Find() and other methods that return a FX_STRSIZE return -1 to indicate error/failure. This means that there is a lot of magic numbers and magic checks floating around. The standard library for similar operations uses a npos constant. This CL implements FX_STRNPOS, and replaces usages of magic number checking. It also does some type cleanup along the way where it was obvious that FX_STRSIZE should be being used. Removing the magic numbers should make eventually changing FX_STRSIZE to be unsigned easier in the future. BUG=pdfium:828 Change-Id: I67e481e44cf2f75a1698afa8fbee4f375a74c490 Reviewed-on: https://pdfium-review.googlesource.com/9651 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-07-28Convert calls to Mid() to Left() or Right() if possibleRyan Harrison
The various string/byte classes support Mid(), Left(), and Right() for extracting substrings. Mid() can handle all possible cases, but Left() and Right() are useful for common cases and more explicit about what is going on. Calls like Mid(offset, length - offset) can be converted to Right(length - offset). Calls like Mid(0, length) can be converted to Left(length). If the substring being extracted does not extend all the way to one of the edges of the string, then Mid() still needs to be used. BUG=pdfium:828 Change-Id: I2ec46ad3d71aac0f7b513e103c69cbe8c854cf62 Reviewed-on: https://pdfium-review.googlesource.com/9510 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-05-24Convert to CFX_UnownedPtr, part 5Tom Sepez
Change-Id: Ibdb20fca7e4daae9d61286df4801ac02faf3b281 Reviewed-on: https://pdfium-review.googlesource.com/5831 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-04-21Replace FXSYS_iswdigit with std::iswdigit.Lei Zhang
Replace other one-off implementations as well. Change-Id: I2878f3fae479c12b7de5234ee3a26477d602d14d Reviewed-on: https://pdfium-review.googlesource.com/4398 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-04-03Drop FXSYS_ from mem methodsDan Sinclair
This Cl drops the FXSYS_ from mem methods which are the same on all platforms. Bug: pdfium:694 Change-Id: I9d5ae905997dbaaec5aa0b2ae4c07358ed9c6236 Reviewed-on: https://pdfium-review.googlesource.com/3613 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-03-14Replace FX_CHAR and FX_WCHAR with underlying types.Dan Sinclair
Change-Id: I96e0a20d66b9184d22f64d8e4ce0dadd5a78c1e8 Reviewed-on: https://pdfium-review.googlesource.com/2967 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2016-11-02Remove FX_BOOL from coretsepez
Review-Url: https://codereview.chromium.org/2477443002
2016-10-26Fix some bool/int mismatches.tsepez
Found by winxfa bot when fx_bool defined to bool. Review-Url: https://codereview.chromium.org/2449293002
2016-09-29Move core/fxcrt/include to core/fxcrtdsinclair
BUG=pdfium:611 Review-Url: https://codereview.chromium.org/2382723003
2016-09-29Move core/fpdftext/include to core/fpdftextdsinclair
BUG=pdfium:611 Review-Url: https://codereview.chromium.org/2383563002
2016-08-26Move the classes in fpdf_text_int.cpp into their own filesnpm
- CPDF_TextPageFind and CPDF_LinkExtract moved to new file - fpdf_text_int.cpp renamed to be CPDF_TextPage definition Review-Url: https://codereview.chromium.org/2286723003