From 15f1a88dece664ae7300d9a60fe124cec1f2b9de Mon Sep 17 00:00:00 2001
From: Ryan Harrison <rharrison@chromium.org>
Date: Wed, 22 Aug 2018 20:50:14 +0000
Subject: Properly handle language markers in decoded text

In text like document title 0x001B is used as a marker for the
beginning/end of a language metadata section. Currently PDFium does
nothing with this data, but when returning the 'decoded' text it needs
to be stripped out.

The existing code assumed that the two bytes following a marker would
be the data to be removed and did nothing to track if it was in/out of
one of these regions. This led to a situation where it would always
strip the two bytes following the region, since it assumed the end
marker was the beginning of a new region.

This CL corrects the detection and handling of these regions, and adds
a regression test for the reported bug.

BUG=pdfium:182

Change-Id: I92ddba5666274a8986fed03f502a0331f150f7ac
Reviewed-on: https://pdfium-review.googlesource.com/41070
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
---
 testing/resources/bug_182.pdf | Bin 0 -> 17012 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 testing/resources/bug_182.pdf

(limited to 'testing/resources')

diff --git a/testing/resources/bug_182.pdf b/testing/resources/bug_182.pdf
new file mode 100644
index 0000000000..bf35cc918a
Binary files /dev/null and b/testing/resources/bug_182.pdf differ
-- 
cgit v1.2.3