summaryrefslogtreecommitdiff
path: root/pdf/pdf_parse.c
AgeCommit message (Collapse)Author
2012-07-18Update pdf_to_utf8 to handle either a stream or a stringPaul Gardiner
Also change first argument from fz_context to pdf_document in each of pdf_to_utf8, pdf_to_utf8_name, pdf_to_ucs2 and pdf_to_ucs2_name
2012-07-05Move to static inline functions from macros.Robin Watts
Instead of using macros for min/max/abs/clamp, we move to using inline functions. These are more typesafe, and should produce equivalent code on compilers that support inline (i.e. pretty much everything we care about these days). People can always do their own macro versions if they prefer.
2012-05-31Add linearization to pdf_write function.Robin Watts
Extend mupdfclean to have a new -l file that writes the file linearized. This should still be considered experimental When writing a pdf file, analyse object use, flatten resource use, reorder the objects, generate a hintstream and output with linearisaton parameters. This is enough for Acrobat to accept the file as being optimised for Fast Web View. We ought to add more tables to the hintstream in some cases, but I doubt anyone actually uses it, the spec is so badly written. Certainly acrobat accepts the file as being optimised for 'Fast Web View'. Update fz_dict_put to allow for us adding a reference to the dictionary that is the sole owner of that reference already (i.e. don't drop then keep something that has a reference count of just 1). Update pdf_load_image_stream to use the stm_buf from the xref if there is one. Update pdf_close_document to discard any stm_bufs it may be holding. Update fz_dict_put to be pdf_dict_put - this was missed in a renaming ages ago and has been inconsistent since.
2012-05-08Bug 693028 - improve mupdf viewer handling of broken page streamsRobin Watts
Currently, if a page stream cannot be read, mupdf gives an alert box and then exits. This is annoying when reading a large pdf. Here we change the code to only exit if a page is completely broken; in the case of missing page contents, or missing links, we give a warning and just render the best we can. Also, update a couple of error messages to be less misleading.
2012-03-07More release tidyups.Robin Watts
Add some function documentation to fitz.h. Add fz_ prefix to runetochar, chartorune, runelen etc. Change fz_runetochar to avoid passing unnecessary pointer.
2012-03-06Split fitz.h/mupdf.h into internal/external headers.Robin Watts
Attempt to separate public API from internal functions.
2012-03-01Setjmp/longjmp exception tweaks.Robin Watts
First, fix a couple of the 'alternative formulations' of the try/catch code in the comments. Secondly, work around a Mac OS X compiler bug.
2012-02-26Move fz_obj to be pdf_obj.Robin Watts
Currently, we are in the slightly strange position of having the PDF specific object types as part of fitz. Here we pull them out into the pdf layer instead. This has been made possible by the recent changes to make the store no longer be tied to having fz_obj's as keys. Most of this work is a simple huge rename; to help customers who may have code that use such functions we have provided a sed script to do the renaming; scripts/rename2.sed. Various other small tweaks are required; the store used to have some debugging code that still required knowledge of fz_obj types - we extract that into a nicer 'type' based function pointer. Also, the type 3 font handling used to have an fz_obj pointer for type 3 resources, and therefore needed to know how to free this; this has become a void * with a function to free it.
2012-02-25Revamp pdf lexing codeRobin Watts
A huge amount (20%+ on some files) of our runtime is spent in fz_atof. A survey of results on the net suggests we will get much better speed by writing our own atof. Part of the job of doing this involves parsing the string to identify the component parts of the number - ludicrously, we are already doing this as part of the lexing process, so it would make sense to do the atoi/atof as part of this process. In order to do this, we need somewhere to store the lexed results; rather than add a float * and an int * to every single pdf_lex call, we generalise the calls to pass a pdf_lexbuf * pointer instead of separate buffer/max/string length pointers. This should help us overall.
2012-01-27Rename pdf_xref type to pdf_document.Tor Andersson
2012-01-06Various memory leak fixes.Robin Watts
In error cases, ensure we free objects correctly. Thanks to Zeniko for finding the problems (and many of the solutions!)
2011-12-16More memsqueezing fixes.Robin Watts
2011-12-15Another memsqueezing bug.Robin Watts
2011-12-15Various Memsqueezing fixes.Robin Watts
Fixes for leaks (and SEGVs, division by zeros etc) seen when Memsqueezing.
2011-12-08Move from volatile to fz_var.Robin Watts
When using exceptions (which are implemented using setjmp/longjmp), we need to be careful to ensure that variable values get written back before any exception happens. Previously we've done that using volatile, but that produces nasty warnings (and unduly limits the compilers freedom to optimise). Here we introduce a new macro fz_var that passes the address of the variable out of scope. This means that the compiler has to ensure that any changes to its value are written back to memory before calling any out of scope function.
2011-12-08Fix SEGV seen when repairing SumatraPDF1.1DOS.pdfRobin Watts
An error while parsing pdf_parse_array could result in double dropping of an object. Simple fix.
2011-10-04Move to exception handling rather than error passing throughout.Robin Watts
This frees us from passing errors back everywhere, and hence enables us to pass results back as return values. Rather than having to explicitly check for errors everywhere and bubble them, we now allow exception handling to do the work for us; the downside to this is that we no longer emit as much debugging information as we did before (though this could be put back in). For now, the debugging information we have lost has been retained in comments with 'RJW:' at the start. This code needs fuller testing, but is being committed as a work in progress.
2011-09-21Add warning context.Tor Andersson
2011-09-21Rename malloc functions for arrays (fz_calloc and fz_realloc).Tor Andersson
2011-09-21Don't thread ctx through safe fz_obj functions.Tor Andersson
2011-09-15Add context to mupdf.Robin Watts
Huge pervasive change to lots of files, adding a context for exception handling and allocation. In time we'll move more statics into there. Also fix some for(i = 0; i < function(...); i++) calls.
2011-09-14Initial import of exception handling codeRobin Watts
Import exception handling code from WSS, modified to fit into the fitz world. With this code we have 'real' fz_try/fz_catch/fz_rethrow functions, handling a fz_except type. We therefore rename the existing fz_throw/ fz_catch/fz_rethrow to be fz_error_make/fz_error_handle/fz_error_note. We don't actually use fz_try/fz_catch/fz_rethrow yet...
2011-09-06Support empty and little-endian UTF-16 strings.Tor Andersson
Also don't read out of bounds if the string is cut short in the middle of the last two-byte character.
2011-05-31Fix assert in scale: see Bug 692245.Robin Watts
Bug 692245 gives a file that produces a runtime assert in mupdf due to an extremely large ctm offset (unrepresentable in a float). We fix our code here so that such floats are always read as 1.0. In this particular case, the exact value read doesn't seem to matter. We match acrobat. We pick 1.0 rather than 0.0 as this is less likely to provoke division by 0 errors later on.
2011-04-08pdf: add pdf_from_ucs2 to encode a unicode string in pdfdocencoding.Tor Andersson
For use by SumatraPDF to check passwords.
2011-04-04Le Roi est mort, vive le Roi!Tor Andersson
The run-together words are dead! Long live the underscores! The postscript inspired naming convention of using all run-together words has served us well, but it is now time for more readable code. In this commit I have also added the sed script, rename.sed, that I used to convert the source. Use it on your patches and application code.
2011-04-04pdf: Rename mupdf directory.Tor Andersson