summaryrefslogtreecommitdiff
path: root/core/fpdftext
AgeCommit message (Collapse)Author
2018-01-30Use unsigned for char widthchromium/3335Nicolas Pena
Bug: 806612 Change-Id: I22bd9046dd37a1b596762c46a6b29a323d6e9fa1 Reviewed-on: https://pdfium-review.googlesource.com/24410 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
2018-01-29Fix typo introduced in cleanup of IsHyphenRyan Harrison
Introduced here https://pdfium-review.googlesource.com/#/c/17950/5/core/fpdftext/cpdf_textpage.cpp@1237 BUG=chromium:805881 Change-Id: I0c9109f3eebec968360734ff4d9d0542881d6823 Reviewed-on: https://pdfium-review.googlesource.com/24210 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2018-01-11Change FPDFText_GetRect() to return a boolean.Lei Zhang
BUG=pdfium:858 Change-Id: Idc9900fe6f85b1fef06c97f5023653f77156d410 Reviewed-on: https://pdfium-review.googlesource.com/22730 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2018-01-05Convert ExtractSubString to return Optional<WideString>Ryan Harrison
Change-Id: I48f27026292917e6f6e6b636afd499336e41afea Reviewed-on: https://pdfium-review.googlesource.com/22310 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2018-01-04Convert usages of pdfium::Optional to OptionalRyan Harrison
Change-Id: I29769f78eaad10c6a8b79e27524336c4f330377e Reviewed-on: https://pdfium-review.googlesource.com/22258 Reviewed-by: Tom Sepez <tsepez@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-12-01Get rid of else after break/continue/return.chromium/3284chromium/3283Lei Zhang
Change-Id: I3efc57cd7325d16e3ca8ebdeeaec06012b2c56e3 Reviewed-on: https://pdfium-review.googlesource.com/20110 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2017-11-30Rewrite lower level details of extracting text from pageRyan Harrison
The current implementation of text extraction was difficult to understand, duplicated logic that existed in other methods, and wasn't clear about the units the inputs were in. It also didn't handle control characters correctly. The new implementation leans on the methods for converting indices between the text buffer index and character list index spaces to avoid duplication of code. It also makes it clear to the reader that inputs are in the character list index space. Finally, it fixes issues being seen in Chrome with respect of ranges being slightly off. This CL also adds a test for extracting text that has control characters. BUG=pdfium:942,chromium:654578 Change-Id: Id9d1f360c2d7492c7b5a48d6c9ae29f530892742 Reviewed-on: https://pdfium-review.googlesource.com/20014 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
2017-11-27Convert CPDF_TextObject to not use CollectionSizeDan Sinclair
This CL updates various methods in CPDF_TextObject to return or received size_t values. Callers have been updated as needed. Bug: pdfium:774 Change-Id: Id72511bc74637c6261add39f5414c9a4b8390b82 Reviewed-on: https://pdfium-review.googlesource.com/19430 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Ryan Harrison <rharrison@chromium.org>
2017-11-22Change CPDF_ContentMark to return size_t for counts.Lei Zhang
Change-Id: I45468fa7944290fbbe3d2e67f884164ae8d84160 Reviewed-on: https://pdfium-review.googlesource.com/19171 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2017-11-08Make sure that iterator stays in bounds during IsHypenRyan Harrison
BUG=chromium:782596,chromium:781804 Change-Id: I020be3cf813221bb8314f045d83014a25cb9a950 Reviewed-on: https://pdfium-review.googlesource.com/18070 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-11-07Revert cleanup of IsHyphen and reimplementRyan Harrison
The original version of this code was landed in https://pdfium-review.googlesource.com/c/pdfium/+/13690/. A corner case has been found that breaks. In this CL I have reverted the changes in IsHyphen and implemented a less aggressive cleanup that I have tested works as expected. BUG=chromium:781804 Change-Id: I3b36f420834081fdd9e1ae17efc234b561b4df41 Reviewed-on: https://pdfium-review.googlesource.com/17950 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-11-02Encapsulate CPDF_FormObject members.Lei Zhang
Also remove a conditional in the CPDF_Form ctor that cannot be true. Change-Id: Icd00233969cea33e9c63d0d6a9d07226c2b173f2 Reviewed-on: https://pdfium-review.googlesource.com/17070 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2017-10-23Fix cpdf_textpage so it doesn't omit spaces.Andrew Weintraub
Bug: pdfium:921 Change-Id: I8864fd2ebdccc5f94aaf70cd8295068bf4db8b68 Reviewed-on: https://pdfium-review.googlesource.com/16492 Reviewed-by: Lei Zhang <thestig@chromium.org> Reviewed-by: Ryan Harrison <rharrison@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-10-02Removing unused definesDan Sinclair
Remove unused defines. Change-Id: Ibf10d8470f19cbf4528fe1342398a39ef15c1d12 Reviewed-on: https://pdfium-review.googlesource.com/15110 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Ryan Harrison <rharrison@chromium.org>
2017-09-27Remove FXSYS_strlen and FXSYS_wcslenchromium/3226Ryan Harrison
With the conversion of internal string sizes to size_t, these wrappers are no longer needed. Replacing them with strlen and wcslen respectively. BUG=pdfium:828 Change-Id: Ia087ca2ddaf688a57ec9bd9ddfb8533cbe41510d Reviewed-on: https://pdfium-review.googlesource.com/14890 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-09-27Remove FX_STRSIZE and replace with size_tRyan Harrison
BUG=pdfium:828 Change-Id: I5c40237433ebabaeabdb43aec9cdf783e41dfe16 Reviewed-on: https://pdfium-review.googlesource.com/13230 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-09-21Move CFX_UnownedPtr to UnownedPtrDan Sinclair
This CL moves CFX_UnownedPtr to UnownedPtr and places in the fxcrt namespace. Bug: pdfium:898 Change-Id: I6d1fa463f365e5cb3aafa8c8a7a5f7eff62ed8e0 Reviewed-on: https://pdfium-review.googlesource.com/14620 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-09-18Convert string class namesRyan Harrison
Automated using git grep & sed. Replace StringC classes with StringView classes. Remove the CFX_ prefix and put string classes in fxcrt namespace. Change AsStringC() to AsStringView(). Rename tests from TEST(fxcrt, *String*Foo) to TEST(*String*, Foo). Couple of tests needed to have their names regularlized. BUG=pdfium:894 Change-Id: I7ca038685c8d803795f3ed02545124f7a224c83d Reviewed-on: https://pdfium-review.googlesource.com/14151 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-09-13Rewrite IsHyphen using string operationsRyan Harrison
The existing code did end of range checks by making sure that the value was never less then 0. This isn't correct when using an unsigned type, since 0 - 1 will wrap around to the max possible value, and thus still be less then 0. Additionally the existing code was hard to follow due to the complexity of some of the low level operations being performed. It has been rewritten using higher level string operations to make it clearer and correct. BUG=chromium:763256 Change-Id: Ib8bf5ca0e29e73724c4a1c4781362e8a8fc30149 Reviewed-on: https://pdfium-review.googlesource.com/13690 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-09-05Split fx_ucd.h into fx_unicode.h and fx_ucddata.h.Tom Sepez
File naming now matches. Fix one usage not going through the accessor function. Change-Id: I5cc4986238764964f2a71807a94bd2facf517263 Reviewed-on: https://pdfium-review.googlesource.com/12930 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-09-01Adjust loops in preperation for FX_STRSIZE int->size_tRyan Harrison
Adjust loop conditions and behaviours in preperation for convering the underlying type of FX_STRSIZE to size_t. These changes are not dependent on the type switch occuring, so can be landed before hand. BUG=pdfium:828 Change-Id: I5f950c99c10e5ef0836959e3b1dd2e09f8f5afc0 Reviewed-on: https://pdfium-review.googlesource.com/12750 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
2017-08-31Remove fx_basic.hDan Sinclair
This CL removes the fx_basic.h header and fixes up includes as needed. Change-Id: I49af32a8327bdbcda40c50a61ffbd75d06609040 Reviewed-on: https://pdfium-review.googlesource.com/12670 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-08-30Move CFX_WideTextBuf out of fx_basicDan Sinclair
This CL moves CFX_WideTextBuf to its own files and updates includes as needed. Change-Id: Ibe66ecf3e66f8f01dd8e9eaf6b467588be86ad4f Reviewed-on: https://pdfium-review.googlesource.com/12413 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-08-30Convert int* references to FX_STRSIZERyan Harrison
Through out the code base there are numerous places where variables are declared using a signed integer type when interacting with the string classes, since they assume that FX_STRSIZE is 'int'. As part of changing the underling type of FX_STRSIZE to be unsigned, these locations are being changed to use FX_STRSIZE. This is necessary as part of converting the type, but has been broken off into a separate CL, since it should be low risk. Some related cleanups that are low risk are included as part of this CL. BUG=pdfium:828 Change-Id: Ifaae54ad195ccde0fe8672f71271d29a6ebd65fd Reviewed-on: https://pdfium-review.googlesource.com/12210 Reviewed-by: Tom Sepez <tsepez@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-08-28Convert find markers to Optionals in CPDF_TextPageFindRyan Harrison
Currently these use -1 as a special value to indicate not set. This creates the same issues that FX_STRNPOS created for converting FX_STRSIZE to size_t, so this code has been rewritten. BUG=pdfium:828 Change-Id: Iaaa96af0dcb2eb8b600f3ea39060a398ac9a3800 Reviewed-on: https://pdfium-review.googlesource.com/12130 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-08-23Convert string Find methods to return an OptionalRyan Harrison
The Find and ReverseFind methods for WideString, WideStringC, ByteString, and ByteStringC have been converted from returning a raw FX_STRSIZE, to returning Optional<FX_STRSIZE>, so that success/failure can be indicated without using FX_STRNPOS. This allows for removing FX_STRNPOS and by association makes the conversion of FX_STRSIZE to size_t easier, since it forces checking the return value of Find to be explictly done as well as taking the error value out of the range of FX_STRSIZE. New Contains methods have been added for cases where the success or failure is all the call site to Find cared about, and the actual position was ignored. BUG=pdfium:828 Change-Id: Id827e508c8660affa68cc08a13d96121369364b7 Reviewed-on: https://pdfium-review.googlesource.com/11350 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-08-22Converted CFX_Matrix::TransformRect() to take in constsJane Liu
Currently, all three of CFX_Matrix::TransformRect() take in rect values and modify them in place. This CL converts them to take in constant values and return the transformed values instead, and fixes all the call sites. Bug=pdfium:874 Change-Id: I9c274df3b14e9d88c100ba0530068e06e8fec32b Reviewed-on: https://pdfium-review.googlesource.com/11550 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Jane Liu <janeliulwq@google.com>
2017-08-15Remove GetAt from string classesRyan Harrison
This method duplicates the behaviour of the const [] operator and doesn't offer any additional safety. Folding them into one implementation. SetAt is retained, since implementing the non-const [] operator to replace SetAt has potential performance concerns. Specifically many non-obvious cases of reading an element using [] will cause a realloc & copy. BUG=pdfium:860 Change-Id: I3ef5e5e5a15376f040256b646eb0d90636e24b67 Reviewed-on: https://pdfium-review.googlesource.com/10870 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-08-11Add checks of index operations on string classesRyan Harrison
Specifically the index parameter passed in to GetAt(), SetAt() and operator[] are now being tested to be in bounds. BUG=chromium:752480, pdfium:828 Change-Id: I9e94d58c98a8eaaaae53cd0e3ffe2123ea17d8c4 Reviewed-on: https://pdfium-review.googlesource.com/10651 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-08-01Remove support for negative params to ReleaseBuffer()Ryan Harrison
This CL removes the default param value for this method, which was negative. It also adds in a method to get buffer lengths, so that the callsites can explictly passing in the length of the buffer if they were using the default value previously. BUG=pdfium:828 Change-Id: I0170771ee81970b8b601631015ab3e6e39fea8ea Reviewed-on: https://pdfium-review.googlesource.com/9790 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-08-01Replace raw value for constant error value in string operationsRyan Harrison
Currently Find() and other methods that return a FX_STRSIZE return -1 to indicate error/failure. This means that there is a lot of magic numbers and magic checks floating around. The standard library for similar operations uses a npos constant. This CL implements FX_STRNPOS, and replaces usages of magic number checking. It also does some type cleanup along the way where it was obvious that FX_STRSIZE should be being used. Removing the magic numbers should make eventually changing FX_STRSIZE to be unsigned easier in the future. BUG=pdfium:828 Change-Id: I67e481e44cf2f75a1698afa8fbee4f375a74c490 Reviewed-on: https://pdfium-review.googlesource.com/9651 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-07-28Convert calls to Mid() to Left() or Right() if possibleRyan Harrison
The various string/byte classes support Mid(), Left(), and Right() for extracting substrings. Mid() can handle all possible cases, but Left() and Right() are useful for common cases and more explicit about what is going on. Calls like Mid(offset, length - offset) can be converted to Right(length - offset). Calls like Mid(0, length) can be converted to Left(length). If the substring being extracted does not extend all the way to one of the edges of the string, then Mid() still needs to be used. BUG=pdfium:828 Change-Id: I2ec46ad3d71aac0f7b513e103c69cbe8c854cf62 Reviewed-on: https://pdfium-review.googlesource.com/9510 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-07-27Simplify FX_GetMirrorChar() code.Lei Zhang
Change-Id: I43ec0d4a3b60d51c59ba5a540dfe24803e725089 Reviewed-on: https://pdfium-review.googlesource.com/9170 Reviewed-by: Nicolás Peña <npm@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2017-06-29Change SetReverse to GetInverse in CFX_MatrixNicolas Pena
CFX_Matrix::GetInverse is much clearer. Change-Id: Id10ab1723735332e1a78de853f28415ec3a4d834 Reviewed-on: https://pdfium-review.googlesource.com/7090 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Nicolás Peña <npm@chromium.org>
2017-06-12Trimming brackets and quotes outside URLs.chromium/3129Henrique Nakashima
Bug: pdfium:655 Change-Id: Ibf4217b35b613d21d3e8e060608b502ef79acd9e Reviewed-on: https://pdfium-review.googlesource.com/6392 Commit-Queue: Henrique Nakashima <hnakashima@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-06-08Fixing click area for links with a URL with prepended text.chromium/3125Henrique Nakashima
Bug: pdfium:655 Change-Id: Idd90be487d390f066a76140800096feead6b9e55 Reviewed-on: https://pdfium-review.googlesource.com/6310 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Henrique Nakashima <hnakashima@chromium.org>
2017-06-02Remove explicit CFX_Matrix identity matrix instantiations.Lei Zhang
Change-Id: I96c3429dbe2c572ed409706adfe3707b8b9bf51b Reviewed-on: https://pdfium-review.googlesource.com/6176 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2017-05-25Mass conversion of remaining class members (non-xfa)Tom Sepez
Change-Id: I8365ba80e3395d59a3cf35dbd9d9162e86e712e3 Reviewed-on: https://pdfium-review.googlesource.com/5970 Commit-Queue: Tom Sepez <tsepez@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-05-25Mass conversion of all const-lifetime class membersTom Sepez
Sed + minimal conversions to compile, including moving some constructors into the .cpp file. Any that caused ASAN issues during the tests were omitted rather than trying to resolve the underlying issue. Change-Id: I00a421f33b253eb4071ffd9af3f2922c7443b335 Reviewed-on: https://pdfium-review.googlesource.com/5891 Commit-Queue: Tom Sepez <tsepez@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-05-25Rename CPDF_LinkExtract test file to match classDan Sinclair
Change-Id: I6200968b0c72d2de32d51a741ac821084ad84f8a Reviewed-on: https://pdfium-review.googlesource.com/5952 Reviewed-by: Nicolás Peña <npm@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-05-24Convert to CFX_UnownedPtr, part 5Tom Sepez
Change-Id: Ibdb20fca7e4daae9d61286df4801ac02faf3b281 Reviewed-on: https://pdfium-review.googlesource.com/5831 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-05-20Better identify web links by trimming irrelevant charschromium/3107Wei Li
Sometimes, web links are written with other text such as punctuations which makes the extracted web links invalid. We improve this by trimming invalid chars at the end of host name only URLs. For example, host names never ends with ';' or ','. BUG=chromium:720578 Change-Id: Id619025b2153531376d268a69a3a89c3d49fce08 Reviewed-on: https://pdfium-review.googlesource.com/5692 Commit-Queue: Wei Li <weili@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-05-10Replace operator bool with HasRef() in classes with a CFX_SharedCopyOnWrite ↵Lei Zhang
member. Change-Id: I51e30d298e87b9ae0d5aca83b2f1d6787efce70a Reviewed-on: https://pdfium-review.googlesource.com/5290 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org> Reviewed-by: Nicolás Peña <npm@chromium.org>
2017-04-25Use fx_extension.h utilities in more places.Lei Zhang
Change-Id: Iba1aa793567e69acc3cc1acbd5b9a9f531c80b7a Reviewed-on: https://pdfium-review.googlesource.com/4453 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2017-04-21Replace FXSYS_iswdigit with std::iswdigit.Lei Zhang
Replace other one-off implementations as well. Change-Id: I2878f3fae479c12b7de5234ee3a26477d602d14d Reviewed-on: https://pdfium-review.googlesource.com/4398 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-04-20Cleanup the fx_extension code.Dan Sinclair
This CL cleans up the fx_extension file. The stream code was moved to fx_stream. IFX_FileAccess was removed and CFX_CRTFileAccess split to its own file. Code shuffled from header to cpp file. Change-Id: I700fdfcc9797cf4e8050cd9ba010ad8854feefbf Reviewed-on: https://pdfium-review.googlesource.com/4371 Reviewed-by: Nicolás Peña <npm@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-04-07Tweak CFDF_Font::AppendChar()Tom Sepez
Pass in/out argument as a pointer. Avoid pointless malloc just to copy in multibyte case. Then we can avoid special-casing the single-byte case. Change-Id: I3dd2d57e08ef6ad7b78ea38398b228fa41a9b3e6 Reviewed-on: https://pdfium-review.googlesource.com/3950 Reviewed-by: Nicolás Peña <npm@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2017-04-03Drop FXSYS_ from mem methodsDan Sinclair
This Cl drops the FXSYS_ from mem methods which are the same on all platforms. Bug: pdfium:694 Change-Id: I9d5ae905997dbaaec5aa0b2ae4c07358ed9c6236 Reviewed-on: https://pdfium-review.googlesource.com/3613 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-04-03Drop FXSYS_ from math methodsDan Sinclair
This Cl drops the FXSYS_ from math methods which are the same on all platforms. Bug: pdfium:694 Change-Id: I85c9ff841fd9095b1434f67319847ba0cd9df7ac Reviewed-on: https://pdfium-review.googlesource.com/3598 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-03-17Handle web links across lineschromium/3045Wei Li
When a web link has a hyphen at the end of line, we consider it to be continued to the next line. For example, "http://www.abc.com/my-\r\ntest" should be extracted as "http://www.abc.com/my-test". BUG=pdfium:650 Change-Id: I64a93d9c66faf2be0abdaf8cfe8ee496c435d0ca Reviewed-on: https://pdfium-review.googlesource.com/3092 Commit-Queue: Wei Li <weili@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>