Age | Commit message (Collapse) | Author |
|
|
|
The mupdf build included an implimentation of the pkcs7 functions that
are needed for signing documents and verifying signatures, the
implementation being either an openssl-based one, or a stub that returned
errors. This commit removes the pkcs7 functions from the main mupdf
library.
For the sake of verification, there wasn't really a need for the pkcs7
functions to be part of mupdf. It was only the checking function that used
them. The checking function is now provided as a helper, outside of the
main build. The openssl-based pkcs7 functions area also supplied as a
helper. Users wishing to verify signatures can either use the checking
function directly, or use the source on which to base their own.
Document signing requires more integration between mupdf and pkcs7
because part of the process is performed at time of signing and part when
saving the document. Mupdf already had a pdf_pkcs7_signer object that
kept information between the two phases. That object has now been extended
to include the pkcs7 functions involved in signing, and the signing
function now requires such an object, rather than a file path to a
certificate. The openssl-based pkcs7 helper provides a function that, given
the path to a certificate, will return a pdf_pkcs7_signer object.
The intention is that different implementations can be produced for
different platforms, based on cryptographic routines built into the
operationg system. In each case, for the sake of document signing, the
routines would be wrapped up as a pdf_pkcs7_signer object.
|
|
Previously, pdf-pkcs7.c contained mishmash of functions required
for creating and checking signatures, with no separation between
the parts relating to pdf and those relating to pkcs7. This
commit introduces pdf_signature.c which contains the pdf
specifics, leaving pdf-pkcs7.c to be purely pkcs7 functions.
This should more easily allow the use of pkcs7 solutions other
than openssl. The pkcs7 api is declared in pdf-pkcs7.h. It is
entirely free of mupdf specifics, other than using an fz_stream
to specify the bytes to be hashed.
|
|
Previously repair might end up increasing xref_len, but the lists
were not correspodingly expanded, leading to ASAN complaints.
|
|
|
|
This change achieves two goals. It allows signing to be performed even
when the document is obtained other than from a disk file. It also
reestablishes to a working state signing of file-based documents, a feature
that was broken due to complete_signatures being called after certain
tables, avaialble via the output options object, had been destroyed.
|
|
We need both RC4 and AES encryption. RC4 is a straight reversable
stream, and our AES library knows how to encrypt as well as decrypt,
so it's "just" a matter of calling them correctly.
We therefore expose a generic "encrypt this data" routine (and a
matching "how long will the data be once encrypted" routine) within
pdf-crypt.c.
We then extend our our PDF object output routines to call these.
This is enough to get encrypted data preserved over calls to mutool
clean. Unfortunately the created files aren't readable, due to 2
further problems, also fixed here.
Firstly, mutool clean does not preserve the Encrypt entry in the
trailer. This is a simple fix.
Secondly, we are required NOT to encrypt the Encrypt entry. This
requires us to spot the crypt entry and to special case it.
|
|
A customer reports that even after text has been redacted, we can
still search for the redacted text. The example file supplied had
many instances of the word 'words', and 4 instances of 'apple'.
The 'apple' instances were redacted, and the document saved out.
2 such instances were on the first page; when we searched for
'apple' acrobat would find the word after the first removed
instance of apple, then find the word 2 after the second removed
instance of apple.
After much head scratching and cutting down of the file, it
appears that the information genuinely isn't in the file. Acrobat
is somehow remembering it. It appears to be doing this using the
'ID' entries in the trailer dict.
My suspicion is that Acrobat has cached the text extraction from
the original document, and is using this on all files that match
the IDs. Change the IDs (or remove them) and the problem goes away.
The spec says that the ID should be 2 bytestrings in an array. The
first is supposed to stay the same in all versions of a file (i.e.
it shows the *original* version of the file, and it is the one that
is used by encrypt).
The second bytestring is supposed to change more often, so here we
simply return a new random string on each writing.
|
|
|
|
|
|
This goes well with the 'mutool clean -d' decompression option to debug
content streams, without doing the sanitize optimization pass.
|
|
|
|
|
|
|
|
When cleaning a pdf file, various lists (of pdf_xref_len length) are
defined early on.
If we trigger a repair during the clean, this can cause pdf_xref_len
to increase causing an overrun.
Fix this by watching for changes in the length, and checking accesses
to the list for validity.
This also appears to fix bugs 698700-698703.
|
|
|
|
Closing flushes output and may throw exceptions.
Dropping frees the state and never throws exceptions.
|
|
Don't mess with conditional compilation with LARGEFILE -- always expose
64-bit file offsets in our public API.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Remove superfluous '%' character in the comment with binary bytes.
|
|
|
|
|
|
|
|
|
|
|
|
Instead of having fz_new_XXXX(ctx, type, ...) macros that call
fz_new_XXXX_of_size etc, use fz_new_derived_...
Clearer naming, and doesn't clash with fz_new_document_writer.
|
|
Moves document_writers into the same style as
fz_new_{image,document,page} etc.
|
|
Rename fz_write to fz_write_data.
Rename fz_write_buffer_* and fz_buffer_printf to fz_append_*.
Be consistent in naming:
fz_write_* calls write to fz_output.
fz_append_* calls append to fz_buffer.
Update documentation.
|
|
|
|
|
|
|
|
|
|
It's only used to 'fix' duff indirect references when cleaning PDF
files. Writing general values into dictionaries should be done by key,
not by internal index.
|
|
|
|
Move the definition of the structure contents into new fitz-imp.h
file. Make all code outside of fitz access the buffer through the
defined API.
Add a convenience API for people that want to get buffers as
null terminated C strings.
|
|
|
|
Firstly, we avoid compressing streams if they get bigger.
Secondly, we ensure that we always update the Length field.
Seen as part of the investigation into bug 697092, though not the
actual cause. Thanks to Tor for the latter part of the fix.
|
|
|
|
|
|
Closing a device or writer may throw exceptions, but much of the
foreign language bindings (JNI and JS) depend on drop to never throw
an exception (exceptions in finalizers are bad).
|
|
The mark & sweep pass of garbage collection, and resolving indirect objects when grafting objects
was following the full chain of indirect references. In the unusual case where a numbered object
is itself only an indirect reference to another object, this intermediate numbered object would
be missed both when marking for garbage collection, and when copying objects for grafting.
Add a function to resolve only one step for these two uses.
The following is an example of a file that would break during garbage collection if we
follow full indirect reference chains:
%PDF-1.3
1 0 obj
<</Type/Catalog /Foo[2 0 R 3 0 R]>>
endobj
2 0 obj
4 0 R
endobj
3 0 obj
5 0 R
endobj
4 0 obj
<</Length 1>>
stream
A
endstream
endobj
5 0 obj
<</Length 1>>
stream
B
endstream
endobj
|
|
The generation number is only needed for decryption, and is assumed
to be zero or irrelevant for all other uses.
Store the original object number and generation in the xref slot, so
that we can decrypt them even when the objects have been renumbered,
without needing to pass the original object number around through
the stream loading APIs.
|
|
Allows us to remove the out parameter 'transform' from fz_begin_page.
|
|
This silences the many warnings we get when building for x64
in windows.
This does not address any of the warnings we get in thirdparty
libraries - in particular harfbuzz. These look (at a quick
glance) harmless though.
|
|
|