Age | Commit message (Collapse) | Author |
|
We need both RC4 and AES encryption. RC4 is a straight reversable
stream, and our AES library knows how to encrypt as well as decrypt,
so it's "just" a matter of calling them correctly.
We therefore expose a generic "encrypt this data" routine (and a
matching "how long will the data be once encrypted" routine) within
pdf-crypt.c.
We then extend our our PDF object output routines to call these.
This is enough to get encrypted data preserved over calls to mutool
clean. Unfortunately the created files aren't readable, due to 2
further problems, also fixed here.
Firstly, mutool clean does not preserve the Encrypt entry in the
trailer. This is a simple fix.
Secondly, we are required NOT to encrypt the Encrypt entry. This
requires us to spot the crypt entry and to special case it.
|
|
A customer reports that even after text has been redacted, we can
still search for the redacted text. The example file supplied had
many instances of the word 'words', and 4 instances of 'apple'.
The 'apple' instances were redacted, and the document saved out.
2 such instances were on the first page; when we searched for
'apple' acrobat would find the word after the first removed
instance of apple, then find the word 2 after the second removed
instance of apple.
After much head scratching and cutting down of the file, it
appears that the information genuinely isn't in the file. Acrobat
is somehow remembering it. It appears to be doing this using the
'ID' entries in the trailer dict.
My suspicion is that Acrobat has cached the text extraction from
the original document, and is using this on all files that match
the IDs. Change the IDs (or remove them) and the problem goes away.
The spec says that the ID should be 2 bytestrings in an array. The
first is supposed to stay the same in all versions of a file (i.e.
it shows the *original* version of the file, and it is the one that
is used by encrypt).
The second bytestring is supposed to change more often, so here we
simply return a new random string on each writing.
|
|
|
|
|
|
|
|
|
|
|
|
Previously the copy had as many reference counts as the original
pixmap which lead to leaks of pixmaps.
|
|
Previously if a variable text annotation with a default appearance
string had multiple 'Tf' operators all but the last font name would
leak.
|
|
Previously the borrowed colorspace was dropped when updating annotation
appearances, leading to use after free warnings from valgrind/ASAN.
|
|
|
|
|
|
This is true because they are now limited below PDF_MAX_OBJECT_NUMBER.
|
|
|
|
|
|
|
|
|
|
This ensures that:
* xref tables with objects pointers do not grow out of bounds.
* other readers, e.g. Adobe Acrobat can parse PDFs written by mupdf.
|
|
|
|
|
|
|
|
This needs adding a fz_xml_doc type to hold the pool.
|
|
Return error tokens when parsing numbers with trailing garbage rather than
ignoring the extra characters.
Also handle error tokens more gracefully in array and dictionary parsing.
Treat error tokens as the 'null' keyword and continue parsing.
|
|
|
|
|
|
This goes well with the 'mutool clean -d' decompression option to debug
content streams, without doing the sanitize optimization pass.
|
|
This allows us to clean up memory so we can check for memory leaks.
Also fix one memory leak.
|
|
Fixes issues with dead keys in unicode input.
|
|
If the first TJ we meet in a file has an adjustment, but no chars,
then we end up calling 'adjustment' without ever having set
fontdesc. This causes a crash.
Fix it here.
|
|
|
|
Well, at least not to crash.
|
|
Don't attempt to rely on alpha, as it is incompatible with the way
we clip through the mask at the end.
|
|
|
|
|
|
Also do not do the extra group push if the destination pixmap
is in the proper color space and has all the required sep support.
|
|
|
|
Future proof the API for the Year 2038 problem.
|
|
|
|
|
|
Also clarify that a copy of author/contents is returned, and that
the caller must free them.
|
|
This mirrors the existing PDFObject.asByteString().
|
|
|
|
Create a PDF 'text string' type string from a UTF-8 input string.
If the input is plain ASCII, keep it as is, otherwise re-encode it
as UTF-16BE.
|
|
|
|
|
|
When iterating through blocks, make sure to include
text blocks. After building the char array for a
given line, be sure to add it to the line object.
|
|
|
|
|
|
|
|
|