Age | Commit message (Collapse) | Author |
|
The current implementation of text extraction was difficult to
understand, duplicated logic that existed in other methods, and wasn't
clear about the units the inputs were in. It also didn't handle
control characters correctly.
The new implementation leans on the methods for converting indices
between the text buffer index and character list index spaces to avoid
duplication of code. It also makes it clear to the reader that inputs
are in the character list index space. Finally, it fixes issues being
seen in Chrome with respect of ranges being slightly off.
This CL also adds a test for extracting text that has control
characters.
BUG=pdfium:942,chromium:654578
Change-Id: Id9d1f360c2d7492c7b5a48d6c9ae29f530892742
Reviewed-on: https://pdfium-review.googlesource.com/20014
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
|
|
Use more variables to avoid redundant calculations. Add one more edge
test case.
Change-Id: I6c8a0aca9de3bdd1a394c39304fd9a75009f9489
Reviewed-on: https://pdfium-review.googlesource.com/19690
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
|
|
BUG=chromium:788103
Change-Id: I8ebdbc78eb14c358d7ac019b96de4828e6071b79
Reviewed-on: https://pdfium-review.googlesource.com/19350
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
There was a regression due to a refactor, where the public API was no
longer removing soft hyphens for line broken words. This was causing
issues with find and copy/paste operations that depend on selecting a
region of text. This change is covered by
FPDFTextEmbeddertest.GetTextWithHyphen.
FPDFTextEmbeddertest.bug_782596 is a regression test for a bug that was
introduced by the original fix. It only fails when running the test under
ASAN.
BUG=pdfium:935
Change-Id: I26096583c35f9246a3662e702f89b742f1146780
Reviewed-on: https://pdfium-review.googlesource.com/18610
Reviewed-by: Lei Zhang <thestig@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
|
|
BUG=pdfium:921
Change-Id: I6973359e6ac112c56843f66eb0b70462f42f9cae
Reviewed-on: https://pdfium-review.googlesource.com/16630
Reviewed-by: Ryan Harrison <rharrison@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
The conversion from WideString to ByeString adds in null characters at
the end, so we need to account for these when selecting the range of
text to initially extract.
BUG=chromium:761770,chromium:761626
Change-Id: Ib8f863e997ebccaaf882e0beb29733f27a18826d
Reviewed-on: https://pdfium-review.googlesource.com/13110
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
|
|
The current implementation of this function is problematic. It will
attempt to memcpy to NULL. It will accept obviously wrong inputs like
a negative start index. It will also accept -1 for the count, which in
theory is the amount of space the buffer has allocated to it, so
doesn't make sense, but instead an internal call will calculate the
number of characters to get if the count is -1. This will them lead to
the function attempting to call Left(-1) on a string, which is
invalid.
Ths documentation for this function mentions none of this behaviour,
so I am removing it, since it is inconsistent/bad. The implementation
should now more strictly meet defined API.
BUG=pdfium:828
Change-Id: I18afdb33e12d77c10d856b4bacd615481979c484
Reviewed-on: https://pdfium-review.googlesource.com/12733
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
This CL removes the fx_basic.h header and fixes up includes as needed.
Change-Id: I49af32a8327bdbcda40c50a61ffbd75d06609040
Reviewed-on: https://pdfium-review.googlesource.com/12670
Commit-Queue: dsinclair <dsinclair@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
Bug:
Change-Id: I376f4af26791cd4ed04049ab179c2b39dd262725
Reviewed-on: https://pdfium-review.googlesource.com/10690
Reviewed-by: Lei Zhang <thestig@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>
|
|
Sometimes, web links are written with other text such as punctuations
which makes the extracted web links invalid. We improve this by trimming
invalid chars at the end of host name only URLs. For example, host names
never ends with ';' or ','.
BUG=chromium:720578
Change-Id: Id619025b2153531376d268a69a3a89c3d49fce08
Reviewed-on: https://pdfium-review.googlesource.com/5692
Commit-Queue: Wei Li <weili@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
|
|
When a web link has a hyphen at the end of line, we consider it to
be continued to the next line. For example, "http://www.abc.com/my-\r\ntest"
should be extracted as "http://www.abc.com/my-test".
BUG=pdfium:650
Change-Id: I64a93d9c66faf2be0abdaf8cfe8ee496c435d0ca
Reviewed-on: https://pdfium-review.googlesource.com/3092
Commit-Queue: Wei Li <weili@chromium.org>
Reviewed-by: Lei Zhang <thestig@chromium.org>
|
|
The -build/include setting was masking out build/include_what_you_use. This CL
restores them, fixes any build errors, and adds NOLINT as needed. As well,
the runtime/explicit and runtime/printf flags are aslo enabled and NOLINT'd.
lint cleanups
Change-Id: Ib013b3eb29c8d0e48cad74c5df9028684130719f
Reviewed-on: https://pdfium-review.googlesource.com/2030
Reviewed-by: Tom Sepez <tsepez@chromium.org>
|
|
BUG=pdfium:611
Review-Url: https://codereview.chromium.org/2382723003
|
|
CPDF_Font::UnicodeFromCharcode returns 0 only if ToUnicode map maps the
charcode to 0. CPDF_SimpleFont::UnicodeFromCharcode and CPDF_CID_Font::
UnicodeFromCharCode return 0 only if the call to CPDF_Font returns 0.
In other cases, these methods return an empty string. So when
processing text, a 0 return from the method should not be replaced
with the charcode.
BUG=pdfium:583
Review-Url: https://codereview.chromium.org/2342073002
|
|
Review-Url: https://codereview.chromium.org/2032613003
|
|
This makes pdfium code on Linux and Mac sign-compare warning free.
The warning flag will be re-enabled after checking on windows clang build.
BUG=pdfium:29
R=tsepez@chromium.org
Review URL: https://codereview.chromium.org/1841643002 .
|
|
This CL moves the fxcrt code into the core/fxcrt directory. The only exception
was fx_bidi.h which was moved into core/fxcrt as it is not used outside of
core/.
R=tsepez@chromium.org
Review URL: https://codereview.chromium.org/1825953002 .
|
|
This CL moves the files in fpdfsdk/src/ up one level to fpdfsdk/ and fixes
up the include paths, include guards and build files.
R=tsepez@chromium.org
Review URL: https://codereview.chromium.org/1799773002 .
|