Age | Commit message (Collapse) | Author |
|
When parsing text streams there is an internal character list that is
generated of all the characters in the stream. Additionally a text
string is generated that is exposed via the public API. This string
will have all of the printing, i.e. non-control characters, in it. For
characters that are not in the font of the stream the unicode, but
printable, the character 0xFFFE is used in the text to indicate a
missing character. This a non-printing character to indicate
non-unicode.
The internal character list gets a Unicode value 0x0 when there isn't
a glyph in the font for it and the original character code is
preserved. This means that when generating the mapping between text
string and character list, the code is mistakenly thinking that the
unprintable character was not present in the text string. I have
changed the check in the mapping generation code to correctly account
for this. Additional investigation is needed to determine if inserting
0xFFFE in the text is the correct behaviour.
This patch resolves an issue where the find highlights in Chrome for a
PDF would be offset when there are unprintable characters in a stream.
BUG=pdfium:1010
Change-Id: I7547c46c5645e039a4b5138f2ce1137fa31990a5
Reviewed-on: https://pdfium-review.googlesource.com/27051
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Bug: 806612
Change-Id: I22bd9046dd37a1b596762c46a6b29a323d6e9fa1
Reviewed-on: https://pdfium-review.googlesource.com/24410
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>
|
|
Introduced here
https://pdfium-review.googlesource.com/#/c/17950/5/core/fpdftext/cpdf_textpage.cpp@1237
BUG=chromium:805881
Change-Id: I0c9109f3eebec968360734ff4d9d0542881d6823
Reviewed-on: https://pdfium-review.googlesource.com/24210
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
BUG=pdfium:858
Change-Id: Idc9900fe6f85b1fef06c97f5023653f77156d410
Reviewed-on: https://pdfium-review.googlesource.com/22730
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Change-Id: I48f27026292917e6f6e6b636afd499336e41afea
Reviewed-on: https://pdfium-review.googlesource.com/22310
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I29769f78eaad10c6a8b79e27524336c4f330377e
Reviewed-on: https://pdfium-review.googlesource.com/22258
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I3efc57cd7325d16e3ca8ebdeeaec06012b2c56e3
Reviewed-on: https://pdfium-review.googlesource.com/20110
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
The current implementation of text extraction was difficult to
understand, duplicated logic that existed in other methods, and wasn't
clear about the units the inputs were in. It also didn't handle
control characters correctly.
The new implementation leans on the methods for converting indices
between the text buffer index and character list index spaces to avoid
duplication of code. It also makes it clear to the reader that inputs
are in the character list index space. Finally, it fixes issues being
seen in Chrome with respect of ranges being slightly off.
This CL also adds a test for extracting text that has control
characters.
BUG=pdfium:942,chromium:654578
Change-Id: Id9d1f360c2d7492c7b5a48d6c9ae29f530892742
Reviewed-on: https://pdfium-review.googlesource.com/20014
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
This CL updates various methods in CPDF_TextObject to return or received
size_t values. Callers have been updated as needed.
Bug: pdfium:774
Change-Id: Id72511bc74637c6261add39f5414c9a4b8390b82
Reviewed-on: https://pdfium-review.googlesource.com/19430
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
Change-Id: I45468fa7944290fbbe3d2e67f884164ae8d84160
Reviewed-on: https://pdfium-review.googlesource.com/19171
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
BUG=chromium:782596,chromium:781804
Change-Id: I020be3cf813221bb8314f045d83014a25cb9a950
Reviewed-on: https://pdfium-review.googlesource.com/18070
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
|
|
The original version of this code was landed in
https://pdfium-review.googlesource.com/c/pdfium/+/13690/. A corner
case has been found that breaks.
In this CL I have reverted the changes in IsHyphen and implemented a
less aggressive cleanup that I have tested works as expected.
BUG=chromium:781804
Change-Id: I3b36f420834081fdd9e1ae17efc234b561b4df41
Reviewed-on: https://pdfium-review.googlesource.com/17950
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Also remove a conditional in the CPDF_Form ctor that cannot be true.
Change-Id: Icd00233969cea33e9c63d0d6a9d07226c2b173f2
Reviewed-on: https://pdfium-review.googlesource.com/17070
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Bug: pdfium:921
Change-Id: I8864fd2ebdccc5f94aaf70cd8295068bf4db8b68
Reviewed-on: https://pdfium-review.googlesource.com/16492
Reviewed-by: Lei Zhang <thestig@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Remove unused defines.
Change-Id: Ibf10d8470f19cbf4528fe1342398a39ef15c1d12
Reviewed-on: https://pdfium-review.googlesource.com/15110
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
With the conversion of internal string sizes to size_t, these wrappers
are no longer needed. Replacing them with strlen and wcslen
respectively.
BUG=pdfium:828
Change-Id: Ia087ca2ddaf688a57ec9bd9ddfb8533cbe41510d
Reviewed-on: https://pdfium-review.googlesource.com/14890
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
BUG=pdfium:828
Change-Id: I5c40237433ebabaeabdb43aec9cdf783e41dfe16
Reviewed-on: https://pdfium-review.googlesource.com/13230
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
This CL moves CFX_UnownedPtr to UnownedPtr and places in the fxcrt
namespace.
Bug: pdfium:898
Change-Id: I6d1fa463f365e5cb3aafa8c8a7a5f7eff62ed8e0
Reviewed-on: https://pdfium-review.googlesource.com/14620
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
|
|
Automated using git grep & sed.
Replace StringC classes with StringView classes.
Remove the CFX_ prefix and put string classes in fxcrt namespace.
Change AsStringC() to AsStringView().
Rename tests from TEST(fxcrt, *String*Foo) to TEST(*String*,
Foo).
Couple of tests needed to have their names regularlized.
BUG=pdfium:894
Change-Id: I7ca038685c8d803795f3ed02545124f7a224c83d
Reviewed-on: https://pdfium-review.googlesource.com/14151
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
The existing code did end of range checks by making sure that the
value was never less then 0. This isn't correct when using an unsigned
type, since 0 - 1 will wrap around to the max possible value, and
thus still be less then 0. Additionally the existing code was hard to
follow due to the complexity of some of the low level operations being
performed.
It has been rewritten using higher level string operations to make it
clearer and correct.
BUG=chromium:763256
Change-Id: Ib8bf5ca0e29e73724c4a1c4781362e8a8fc30149
Reviewed-on: https://pdfium-review.googlesource.com/13690
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
File naming now matches.
Fix one usage not going through the accessor function.
Change-Id: I5cc4986238764964f2a71807a94bd2facf517263
Reviewed-on: https://pdfium-review.googlesource.com/12930
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Adjust loop conditions and behaviours in preperation for convering the
underlying type of FX_STRSIZE to size_t. These changes are not
dependent on the type switch occuring, so can be landed before hand.
BUG=pdfium:828
Change-Id: I5f950c99c10e5ef0836959e3b1dd2e09f8f5afc0
Reviewed-on: https://pdfium-review.googlesource.com/12750
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
This CL removes the fx_basic.h header and fixes up includes as needed.
Change-Id: I49af32a8327bdbcda40c50a61ffbd75d06609040
Reviewed-on: https://pdfium-review.googlesource.com/12670
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
This CL moves CFX_WideTextBuf to its own files and updates includes as
needed.
Change-Id: Ibe66ecf3e66f8f01dd8e9eaf6b467588be86ad4f
Reviewed-on: https://pdfium-review.googlesource.com/12413
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
|
|
Through out the code base there are numerous places where variables
are declared using a signed integer type when interacting with the
string classes, since they assume that FX_STRSIZE is 'int'. As part of
changing the underling type of FX_STRSIZE to be unsigned, these
locations are being changed to use FX_STRSIZE. This is necessary as
part of converting the type, but has been broken off into a separate CL,
since it should be low risk.
Some related cleanups that are low risk are included as part of
this CL.
BUG=pdfium:828
Change-Id: Ifaae54ad195ccde0fe8672f71271d29a6ebd65fd
Reviewed-on: https://pdfium-review.googlesource.com/12210
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Currently these use -1 as a special value to indicate not set. This
creates the same issues that FX_STRNPOS created for converting
FX_STRSIZE to size_t, so this code has been rewritten.
BUG=pdfium:828
Change-Id: Iaaa96af0dcb2eb8b600f3ea39060a398ac9a3800
Reviewed-on: https://pdfium-review.googlesource.com/12130
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
The Find and ReverseFind methods for WideString, WideStringC,
ByteString, and ByteStringC have been converted from returning a raw
FX_STRSIZE, to returning Optional<FX_STRSIZE>, so that success/failure
can be indicated without using FX_STRNPOS.
This allows for removing FX_STRNPOS and by association makes the
conversion of FX_STRSIZE to size_t easier, since it forces checking
the return value of Find to be explictly done as well as taking the
error value out of the range of FX_STRSIZE.
New Contains methods have been added for cases where the success or
failure is all the call site to Find cared about, and the actual
position was ignored.
BUG=pdfium:828
Change-Id: Id827e508c8660affa68cc08a13d96121369364b7
Reviewed-on: https://pdfium-review.googlesource.com/11350
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
Currently, all three of CFX_Matrix::TransformRect() take in rect values
and modify them in place.
This CL converts them to take in constant values and return the
transformed values instead, and fixes all the call sites.
Bug=pdfium:874
Change-Id: I9c274df3b14e9d88c100ba0530068e06e8fec32b
Reviewed-on: https://pdfium-review.googlesource.com/11550
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Jane Liu <janeliulwq@google.com>
|
|
This method duplicates the behaviour of the const [] operator and
doesn't offer any additional safety. Folding them into one
implementation.
SetAt is retained, since implementing the non-const [] operator to
replace SetAt has potential performance concerns. Specifically many
non-obvious cases of reading an element using [] will cause a realloc
& copy.
BUG=pdfium:860
Change-Id: I3ef5e5e5a15376f040256b646eb0d90636e24b67
Reviewed-on: https://pdfium-review.googlesource.com/10870
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
Specifically the index parameter passed in to GetAt(), SetAt() and
operator[] are now being tested to be in bounds.
BUG=chromium:752480, pdfium:828
Change-Id: I9e94d58c98a8eaaaae53cd0e3ffe2123ea17d8c4
Reviewed-on: https://pdfium-review.googlesource.com/10651
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
This CL removes the default param value for this method, which was
negative. It also adds in a method to get buffer lengths, so that the
callsites can explictly passing in the length of the buffer if they
were using the default value previously.
BUG=pdfium:828
Change-Id: I0170771ee81970b8b601631015ab3e6e39fea8ea
Reviewed-on: https://pdfium-review.googlesource.com/9790
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
Currently Find() and other methods that return a FX_STRSIZE return -1
to indicate error/failure. This means that there is a lot of magic
numbers and magic checks floating around. The standard library for
similar operations uses a npos constant. This CL implements
FX_STRNPOS, and replaces usages of magic number checking. It also does
some type cleanup along the way where it was obvious that FX_STRSIZE
should be being used.
Removing the magic numbers should make eventually changing FX_STRSIZE
to be unsigned easier in the future.
BUG=pdfium:828
Change-Id: I67e481e44cf2f75a1698afa8fbee4f375a74c490
Reviewed-on: https://pdfium-review.googlesource.com/9651
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
The various string/byte classes support Mid(), Left(), and Right() for
extracting substrings. Mid() can handle all possible cases, but Left()
and Right() are useful for common cases and more explicit about what
is going on.
Calls like Mid(offset, length - offset) can be converted to
Right(length - offset). Calls like Mid(0, length) can be converted to
Left(length).
If the substring being extracted does not extend all the way to one of
the edges of the string, then Mid() still needs to be used.
BUG=pdfium:828
Change-Id: I2ec46ad3d71aac0f7b513e103c69cbe8c854cf62
Reviewed-on: https://pdfium-review.googlesource.com/9510
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
Change-Id: I43ec0d4a3b60d51c59ba5a540dfe24803e725089
Reviewed-on: https://pdfium-review.googlesource.com/9170
Reviewed-by: Nicolás Peña <npm@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
CFX_Matrix::GetInverse is much clearer.
Change-Id: Id10ab1723735332e1a78de853f28415ec3a4d834
Reviewed-on: https://pdfium-review.googlesource.com/7090
Reviewed-by: Lei Zhang <thestig@chromium.org>
Commit-Queue: Nicolás Peña <npm@chromium.org>
|
|
Bug: pdfium:655
Change-Id: Ibf4217b35b613d21d3e8e060608b502ef79acd9e
Reviewed-on: https://pdfium-review.googlesource.com/6392
Commit-Queue: Henrique Nakashima <hnakashima@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
|
|
Bug: pdfium:655
Change-Id: Idd90be487d390f066a76140800096feead6b9e55
Reviewed-on: https://pdfium-review.googlesource.com/6310
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Henrique Nakashima <hnakashima@chromium.org>
|
|
Change-Id: I96c3429dbe2c572ed409706adfe3707b8b9bf51b
Reviewed-on: https://pdfium-review.googlesource.com/6176
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I8365ba80e3395d59a3cf35dbd9d9162e86e712e3
Reviewed-on: https://pdfium-review.googlesource.com/5970
Commit-Queue: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
|
|
Sed + minimal conversions to compile, including moving some
constructors into the .cpp file. Any that caused ASAN issues
during the tests were omitted rather than trying to resolve
the underlying issue.
Change-Id: I00a421f33b253eb4071ffd9af3f2922c7443b335
Reviewed-on: https://pdfium-review.googlesource.com/5891
Commit-Queue: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
|
|
Change-Id: I6200968b0c72d2de32d51a741ac821084ad84f8a
Reviewed-on: https://pdfium-review.googlesource.com/5952
Reviewed-by: Nicolás Peña <npm@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
|
|
Change-Id: Ibdb20fca7e4daae9d61286df4801ac02faf3b281
Reviewed-on: https://pdfium-review.googlesource.com/5831
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
|
|
Sometimes, web links are written with other text such as punctuations
which makes the extracted web links invalid. We improve this by trimming
invalid chars at the end of host name only URLs. For example, host names
never ends with ';' or ','.
BUG=chromium:720578
Change-Id: Id619025b2153531376d268a69a3a89c3d49fce08
Reviewed-on: https://pdfium-review.googlesource.com/5692
Commit-Queue: Wei Li <weili@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
|
|
member.
Change-Id: I51e30d298e87b9ae0d5aca83b2f1d6787efce70a
Reviewed-on: https://pdfium-review.googlesource.com/5290
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Nicolás Peña <npm@chromium.org>
|
|
Change-Id: Iba1aa793567e69acc3cc1acbd5b9a9f531c80b7a
Reviewed-on: https://pdfium-review.googlesource.com/4453
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Replace other one-off implementations as well.
Change-Id: I2878f3fae479c12b7de5234ee3a26477d602d14d
Reviewed-on: https://pdfium-review.googlesource.com/4398
Commit-Queue: Lei Zhang <thestig@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
This CL cleans up the fx_extension file. The stream code was moved to
fx_stream. IFX_FileAccess was removed and CFX_CRTFileAccess split to its
own file. Code shuffled from header to cpp file.
Change-Id: I700fdfcc9797cf4e8050cd9ba010ad8854feefbf
Reviewed-on: https://pdfium-review.googlesource.com/4371
Reviewed-by: Nicolás Peña <npm@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
|
|
Pass in/out argument as a pointer.
Avoid pointless malloc just to copy in multibyte case. Then we can
avoid special-casing the single-byte case.
Change-Id: I3dd2d57e08ef6ad7b78ea38398b228fa41a9b3e6
Reviewed-on: https://pdfium-review.googlesource.com/3950
Reviewed-by: Nicolás Peña <npm@chromium.org>
Commit-Queue: Tom Sepez <tsepez@chromium.org>
|
|
This Cl drops the FXSYS_ from mem methods which are the same on all
platforms.
Bug: pdfium:694
Change-Id: I9d5ae905997dbaaec5aa0b2ae4c07358ed9c6236
Reviewed-on: https://pdfium-review.googlesource.com/3613
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: dsinclair <dsinclair@chromium.org>
|
|
This Cl drops the FXSYS_ from math methods which are the same on all
platforms.
Bug: pdfium:694
Change-Id: I85c9ff841fd9095b1434f67319847ba0cd9df7ac
Reviewed-on: https://pdfium-review.googlesource.com/3598
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|