mupdf - MuPDF PDF reader and library

diff options

author	Robin Watts <robin.watts@artifex.com>	2018-01-01 17:24:42 +0000
committer	Robin Watts <robin.watts@artifex.com>	2018-01-05 11:47:08 +0000
commit	25593f4f9df0c4a9b9adaa84aaa33fe2a89087f6 (patch)
tree	207c75e3a1bb4b05e83846762e3cf5fb030a0eed /CHANGES
parent	1202a24a5b2729093545a89d013eaef1557a5fe9 (diff)
download	mupdf-25593f4f9df0c4a9b9adaa84aaa33fe2a89087f6.tar.xz

Fix "being able to search for redacted text" bug.

A customer reports that even after text has been redacted, we can still search for the redacted text. The example file supplied had many instances of the word 'words', and 4 instances of 'apple'. The 'apple' instances were redacted, and the document saved out. 2 such instances were on the first page; when we searched for 'apple' acrobat would find the word after the first removed instance of apple, then find the word 2 after the second removed instance of apple. After much head scratching and cutting down of the file, it appears that the information genuinely isn't in the file. Acrobat is somehow remembering it. It appears to be doing this using the 'ID' entries in the trailer dict. My suspicion is that Acrobat has cached the text extraction from the original document, and is using this on all files that match the IDs. Change the IDs (or remove them) and the problem goes away. The spec says that the ID should be 2 bytestrings in an array. The first is supposed to stay the same in all versions of a file (i.e. it shows the *original* version of the file, and it is the one that is used by encrypt). The second bytestring is supposed to change more often, so here we simply return a new random string on each writing.

Diffstat (limited to 'CHANGES')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: