pdfium.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2017-11-30	Rewrite lower level details of extracting text from page	Ryan Harrison
	The current implementation of text extraction was difficult to understand, duplicated logic that existed in other methods, and wasn't clear about the units the inputs were in. It also didn't handle control characters correctly. The new implementation leans on the methods for converting indices between the text buffer index and character list index spaces to avoid duplication of code. It also makes it clear to the reader that inputs are in the character list index space. Finally, it fixes issues being seen in Chrome with respect of ranges being slightly off. This CL also adds a test for extracting text that has control characters. BUG=pdfium:942,chromium:654578 Change-Id: Id9d1f360c2d7492c7b5a48d6c9ae29f530892742 Reviewed-on: https://pdfium-review.googlesource.com/20014 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
2017-09-27	Remove FX_STRSIZE and replace with size_t	Ryan Harrison
	BUG=pdfium:828 Change-Id: I5c40237433ebabaeabdb43aec9cdf783e41dfe16 Reviewed-on: https://pdfium-review.googlesource.com/13230 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-09-18	Convert string class names	Ryan Harrison
	Automated using git grep & sed. Replace StringC classes with StringView classes. Remove the CFX_ prefix and put string classes in fxcrt namespace. Change AsStringC() to AsStringView(). Rename tests from TEST(fxcrt, StringFoo) to TEST(String, Foo). Couple of tests needed to have their names regularlized. BUG=pdfium:894 Change-Id: I7ca038685c8d803795f3ed02545124f7a224c83d Reviewed-on: https://pdfium-review.googlesource.com/14151 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-09-01	Adjust loops in preperation for FX_STRSIZE int->size_t	Ryan Harrison
	Adjust loop conditions and behaviours in preperation for convering the underlying type of FX_STRSIZE to size_t. These changes are not dependent on the type switch occuring, so can be landed before hand. BUG=pdfium:828 Change-Id: I5f950c99c10e5ef0836959e3b1dd2e09f8f5afc0 Reviewed-on: https://pdfium-review.googlesource.com/12750 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
2017-08-30	Convert int* references to FX_STRSIZE	Ryan Harrison
	Through out the code base there are numerous places where variables are declared using a signed integer type when interacting with the string classes, since they assume that FX_STRSIZE is 'int'. As part of changing the underling type of FX_STRSIZE to be unsigned, these locations are being changed to use FX_STRSIZE. This is necessary as part of converting the type, but has been broken off into a separate CL, since it should be low risk. Some related cleanups that are low risk are included as part of this CL. BUG=pdfium:828 Change-Id: Ifaae54ad195ccde0fe8672f71271d29a6ebd65fd Reviewed-on: https://pdfium-review.googlesource.com/12210 Reviewed-by: Tom Sepez <tsepez@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-08-23	Convert string Find methods to return an Optional	Ryan Harrison
	The Find and ReverseFind methods for WideString, WideStringC, ByteString, and ByteStringC have been converted from returning a raw FX_STRSIZE, to returning Optional<FX_STRSIZE>, so that success/failure can be indicated without using FX_STRNPOS. This allows for removing FX_STRNPOS and by association makes the conversion of FX_STRSIZE to size_t easier, since it forces checking the return value of Find to be explictly done as well as taking the error value out of the range of FX_STRSIZE. New Contains methods have been added for cases where the success or failure is all the call site to Find cared about, and the actual position was ignored. BUG=pdfium:828 Change-Id: Id827e508c8660affa68cc08a13d96121369364b7 Reviewed-on: https://pdfium-review.googlesource.com/11350 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-08-15	Remove GetAt from string classes	Ryan Harrison
	This method duplicates the behaviour of the const [] operator and doesn't offer any additional safety. Folding them into one implementation. SetAt is retained, since implementing the non-const [] operator to replace SetAt has potential performance concerns. Specifically many non-obvious cases of reading an element using [] will cause a realloc & copy. BUG=pdfium:860 Change-Id: I3ef5e5e5a15376f040256b646eb0d90636e24b67 Reviewed-on: https://pdfium-review.googlesource.com/10870 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-08-01	Replace raw value for constant error value in string operations	Ryan Harrison
	Currently Find() and other methods that return a FX_STRSIZE return -1 to indicate error/failure. This means that there is a lot of magic numbers and magic checks floating around. The standard library for similar operations uses a npos constant. This CL implements FX_STRNPOS, and replaces usages of magic number checking. It also does some type cleanup along the way where it was obvious that FX_STRSIZE should be being used. Removing the magic numbers should make eventually changing FX_STRSIZE to be unsigned easier in the future. BUG=pdfium:828 Change-Id: I67e481e44cf2f75a1698afa8fbee4f375a74c490 Reviewed-on: https://pdfium-review.googlesource.com/9651 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-07-28	Convert calls to Mid() to Left() or Right() if possible	Ryan Harrison
	The various string/byte classes support Mid(), Left(), and Right() for extracting substrings. Mid() can handle all possible cases, but Left() and Right() are useful for common cases and more explicit about what is going on. Calls like Mid(offset, length - offset) can be converted to Right(length - offset). Calls like Mid(0, length) can be converted to Left(length). If the substring being extracted does not extend all the way to one of the edges of the string, then Mid() still needs to be used. BUG=pdfium:828 Change-Id: I2ec46ad3d71aac0f7b513e103c69cbe8c854cf62 Reviewed-on: https://pdfium-review.googlesource.com/9510 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-06-12	Trimming brackets and quotes outside URLs.chromium/3129	Henrique Nakashima
	Bug: pdfium:655 Change-Id: Ibf4217b35b613d21d3e8e060608b502ef79acd9e Reviewed-on: https://pdfium-review.googlesource.com/6392 Commit-Queue: Henrique Nakashima <hnakashima@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-06-08	Fixing click area for links with a URL with prepended text.chromium/3125	Henrique Nakashima
	Bug: pdfium:655 Change-Id: Idd90be487d390f066a76140800096feead6b9e55 Reviewed-on: https://pdfium-review.googlesource.com/6310 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Henrique Nakashima <hnakashima@chromium.org>
2017-05-20	Better identify web links by trimming irrelevant charschromium/3107	Wei Li
	Sometimes, web links are written with other text such as punctuations which makes the extracted web links invalid. We improve this by trimming invalid chars at the end of host name only URLs. For example, host names never ends with ';' or ','. BUG=chromium:720578 Change-Id: Id619025b2153531376d268a69a3a89c3d49fce08 Reviewed-on: https://pdfium-review.googlesource.com/5692 Commit-Queue: Wei Li <weili@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-04-20	Cleanup the fx_extension code.	Dan Sinclair
	This CL cleans up the fx_extension file. The stream code was moved to fx_stream. IFX_FileAccess was removed and CFX_CRTFileAccess split to its own file. Code shuffled from header to cpp file. Change-Id: I700fdfcc9797cf4e8050cd9ba010ad8854feefbf Reviewed-on: https://pdfium-review.googlesource.com/4371 Reviewed-by: Nicolás Peña <npm@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-03-17	Handle web links across lineschromium/3045	Wei Li
	When a web link has a hyphen at the end of line, we consider it to be continued to the next line. For example, "http://www.abc.com/my-\r\ntest" should be extracted as "http://www.abc.com/my-test". BUG=pdfium:650 Change-Id: I64a93d9c66faf2be0abdaf8cfe8ee496c435d0ca Reviewed-on: https://pdfium-review.googlesource.com/3092 Commit-Queue: Wei Li <weili@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-03-14	Replace FX_CHAR and FX_WCHAR with underlying types.	Dan Sinclair
	Change-Id: I96e0a20d66b9184d22f64d8e4ce0dadd5a78c1e8 Reviewed-on: https://pdfium-review.googlesource.com/2967 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2016-11-02	Remove FX_BOOL from core	tsepez
	Review-Url: https://codereview.chromium.org/2477443002
2016-09-29	Move core/fxcrt/include to core/fxcrt	dsinclair
	BUG=pdfium:611 Review-Url: https://codereview.chromium.org/2382723003
2016-09-29	Move core/fpdftext/include to core/fpdftext	dsinclair
	BUG=pdfium:611 Review-Url: https://codereview.chromium.org/2383563002
2016-08-26	Move the classes in fpdf_text_int.cpp into their own files	npm
	- CPDF_TextPageFind and CPDF_LinkExtract moved to new file - fpdf_text_int.cpp renamed to be CPDF_TextPage definition Review-Url: https://codereview.chromium.org/2286723003