Use CFX_WideString in CPDF_NameTree functions to strip BOMchromium/3162

PDFium doesn't strip BOMs during parsing, but we should strip BOMs when retrieving parsed strings in CPDF_NameTree to ensure consistency and appropriate function behavior. See the bug for more info. As outlined in Bug=pdfium:593, the solution is to call GetUnicodeText() instead of GetString(). I added a GetUnicodeTextAt() function in CPDF_Array, which is symmetrical to GetUnicodeTextFor() in CPDF_Dictionary. I then changed the input variable types to CPDF_NameTree functions to be CFX_WideString instead of CFX_ByteString, and modified all the calls to them. I also added a unit test for nametree, which would fail prior to this change. Nametrees with non-unicode names are already tested by embedder tests. Bug=pdfium:820 Change-Id: Id69d7343632f83d1f5180348c0eea290f478183f Reviewed-on: https://pdfium-review.googlesource.com/8091 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Jane Liu <janeliulwq@google.com>
author: Jane Liu <janeliulwq@google.com> 2017-07-19 13:10:50 -0400
committer: Chromium commit bot <commit-bot@chromium.org> 2017-07-19 19:09:39 +0000
commit: 67ccef73bf664b7cdb4c6eed7acbaa4163c22a80 (patch)
tree: 718061bc21fd52eab1bc70a8b9be97585f1d79f8 /core/fpdfapi
parent: eed247e9cb3b0e9ce5dcb8bf6ee7673c9dd3e544 (diff)
download: pdfium-67ccef73bf664b7cdb4c6eed7acbaa4163c22a80.tar.xz
2 files changed, 7 insertions, 0 deletions
diff --git a/core/fpdfapi/parser/cpdf_array.cpp b/core/fpdfapi/parser/cpdf_array.cpp
index ea4ca7eaeb..1f2740ea9c 100644
--- a/core/fpdfapi/parser/cpdf_array.cpp
+++ b/core/fpdfapi/parser/cpdf_array.cpp
@@ -105,6 +105,12 @@ CFX_ByteString CPDF_Array::GetStringAt(size_t i) const {
   return m_Objects[i]->GetString();
 }
 
+CFX_WideString CPDF_Array::GetUnicodeTextAt(size_t i) const {
+  if (i >= m_Objects.size())
+    return CFX_WideString();
+  return m_Objects[i]->GetUnicodeText();
+}
+
 int CPDF_Array::GetIntegerAt(size_t i) const {
   if (i >= m_Objects.size())
     return 0;
diff --git a/core/fpdfapi/parser/cpdf_array.h b/core/fpdfapi/parser/cpdf_array.h
index bb17c0a427..2590971d80 100644
--- a/core/fpdfapi/parser/cpdf_array.h
+++ b/core/fpdfapi/parser/cpdf_array.h
@@ -41,6 +41,7 @@ class CPDF_Array : public CPDF_Object {
   CPDF_Object* GetObjectAt(size_t index) const;
   CPDF_Object* GetDirectObjectAt(size_t index) const;
   CFX_ByteString GetStringAt(size_t index) const;
+  CFX_WideString GetUnicodeTextAt(size_t index) const;
   int GetIntegerAt(size_t index) const;
   float GetNumberAt(size_t index) const;
   CPDF_Dictionary* GetDictAt(size_t index) const;
author	Jane Liu <janeliulwq@google.com>	2017-07-19 13:10:50 -0400
committer	Chromium commit bot <commit-bot@chromium.org>	2017-07-19 19:09:39 +0000
commit	67ccef73bf664b7cdb4c6eed7acbaa4163c22a80 (patch)
tree	718061bc21fd52eab1bc70a8b9be97585f1d79f8 /core/fpdfapi
parent	eed247e9cb3b0e9ce5dcb8bf6ee7673c9dd3e544 (diff)
download	pdfium-67ccef73bf664b7cdb4c6eed7acbaa4163c22a80.tar.xz