diff options
author | Ryan Harrison <rharrison@chromium.org> | 2018-02-16 20:02:50 +0000 |
---|---|---|
committer | Chromium commit bot <commit-bot@chromium.org> | 2018-02-16 20:02:50 +0000 |
commit | 886f932aeeb4c0ed3bb6ccb6ba4da45f9fd29a6f (patch) | |
tree | f8d2cf65a725c76babab8ec0031a208dd1cde9d3 | |
parent | 081da6b880f2a47bd7f3cf578315aaf598b34978 (diff) | |
download | pdfium-886f932aeeb4c0ed3bb6ccb6ba4da45f9fd29a6f.tar.xz |
Correct mapping text to characters for characters missing from font
When parsing text streams there is an internal character list that is
generated of all the characters in the stream. Additionally a text
string is generated that is exposed via the public API. This string
will have all of the printing, i.e. non-control characters, in it. For
characters that are not in the font of the stream the unicode, but
printable, the character 0xFFFE is used in the text to indicate a
missing character. This a non-printing character to indicate
non-unicode.
The internal character list gets a Unicode value 0x0 when there isn't
a glyph in the font for it and the original character code is
preserved. This means that when generating the mapping between text
string and character list, the code is mistakenly thinking that the
unprintable character was not present in the text string. I have
changed the check in the mapping generation code to correctly account
for this. Additional investigation is needed to determine if inserting
0xFFFE in the text is the correct behaviour.
This patch resolves an issue where the find highlights in Chrome for a
PDF would be offset when there are unprintable characters in a stream.
BUG=pdfium:1010
Change-Id: I7547c46c5645e039a4b5138f2ce1137fa31990a5
Reviewed-on: https://pdfium-review.googlesource.com/27051
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
-rw-r--r-- | core/fpdftext/cpdf_textpage.cpp | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/core/fpdftext/cpdf_textpage.cpp b/core/fpdftext/cpdf_textpage.cpp index 16214269ae..e712549ceb 100644 --- a/core/fpdftext/cpdf_textpage.cpp +++ b/core/fpdftext/cpdf_textpage.cpp @@ -181,7 +181,8 @@ void CPDF_TextPage::ParseTextPage() { int indexSize = pdfium::CollectionSize<int>(m_CharIndex); const PAGECHAR_INFO& charinfo = m_CharList[i]; if (charinfo.m_Flag == FPDFTEXT_CHAR_GENERATED || - (charinfo.m_Unicode != 0 && !IsControlChar(charinfo))) { + (charinfo.m_Unicode != 0 && !IsControlChar(charinfo)) || + (charinfo.m_Unicode == 0 && charinfo.m_CharCode != 0)) { if (indexSize % 2) { m_CharIndex.push_back(1); } else { |