summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRyan Harrison <rharrison@chromium.org>2018-02-16 20:02:50 +0000
committerChromium commit bot <commit-bot@chromium.org>2018-02-16 20:02:50 +0000
commit886f932aeeb4c0ed3bb6ccb6ba4da45f9fd29a6f (patch)
treef8d2cf65a725c76babab8ec0031a208dd1cde9d3
parent081da6b880f2a47bd7f3cf578315aaf598b34978 (diff)
downloadpdfium-886f932aeeb4c0ed3bb6ccb6ba4da45f9fd29a6f.tar.xz
Correct mapping text to characters for characters missing from font
When parsing text streams there is an internal character list that is generated of all the characters in the stream. Additionally a text string is generated that is exposed via the public API. This string will have all of the printing, i.e. non-control characters, in it. For characters that are not in the font of the stream the unicode, but printable, the character 0xFFFE is used in the text to indicate a missing character. This a non-printing character to indicate non-unicode. The internal character list gets a Unicode value 0x0 when there isn't a glyph in the font for it and the original character code is preserved. This means that when generating the mapping between text string and character list, the code is mistakenly thinking that the unprintable character was not present in the text string. I have changed the check in the mapping generation code to correctly account for this. Additional investigation is needed to determine if inserting 0xFFFE in the text is the correct behaviour. This patch resolves an issue where the find highlights in Chrome for a PDF would be offset when there are unprintable characters in a stream. BUG=pdfium:1010 Change-Id: I7547c46c5645e039a4b5138f2ce1137fa31990a5 Reviewed-on: https://pdfium-review.googlesource.com/27051 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
-rw-r--r--core/fpdftext/cpdf_textpage.cpp3
1 files changed, 2 insertions, 1 deletions
diff --git a/core/fpdftext/cpdf_textpage.cpp b/core/fpdftext/cpdf_textpage.cpp
index 16214269ae..e712549ceb 100644
--- a/core/fpdftext/cpdf_textpage.cpp
+++ b/core/fpdftext/cpdf_textpage.cpp
@@ -181,7 +181,8 @@ void CPDF_TextPage::ParseTextPage() {
int indexSize = pdfium::CollectionSize<int>(m_CharIndex);
const PAGECHAR_INFO& charinfo = m_CharList[i];
if (charinfo.m_Flag == FPDFTEXT_CHAR_GENERATED ||
- (charinfo.m_Unicode != 0 && !IsControlChar(charinfo))) {
+ (charinfo.m_Unicode != 0 && !IsControlChar(charinfo)) ||
+ (charinfo.m_Unicode == 0 && charinfo.m_CharCode != 0)) {
if (indexSize % 2) {
m_CharIndex.push_back(1);
} else {