Age | Commit message (Collapse) | Author |
|
A few commits back, we introduced the fz_key_storable concept
to allow us to cope with objects that were used both as values
within the store and as parts of keys within the store.
This commit worked, but showed up performance problems; when the
store has several million PDF objects in it, bulk changes (such
as dropping a display list or document) could trigger many passes
across the store.
We therefore introduce a mechanism to ameliorate this. These
passes, now known as "reap passes", can be batched together using
fz_defer_reap_start and fz_defer_reap_end.
We trigger this start/end around display list dropping, and around
PDF content stream processing. This should be fine, as deferral
will be interrupted if we ever run our of memory during mallocing.
|
|
The store is effectively a list of items, where each item is a
(key, value) pair. The design is such that we can easily get
into the state where the only reference to a value is that held
by the store. Subsequent references can then be generated by
things being 'found' from within the store.
While the only reference to an object is that held by it being
a value in the store, the store is free to evict it to save
memory.
Images present a complication to this design; images are stored
both as values within the store (by the pdf agent, so that we
do not regenerate images each time we meet them in the file),
and as parts of the keys within the store.
For example, once an image is decoded to give a pixmap, the
pixmap is cached in the store. The key to look that pixmap up
again includes a reference to the image from which the pixmap
was generated.
This means, that for document handlers such as gproof that do
not place images in the store, we can end up with images that
are kept around purely by dint of being used as references in
store keys. There is no chance of the value (the decoded pixmap)
ever being 'found' from the store as no one other than the
key is holding a reference to the image required. Thus the
images/pixmaps are never freed until the store is emptied.
This commit offers a fix for this situation.
Standard store items are based on an fz_storable type. Here we
introduce a new fz_key_storable type derived from that. As well
as keeping track of the number of references a given item has
to it, it keeps a separate count of the number of references a
given item has to it from keys in the store.
On dropping a reference, we check to see if the number of
references has become the same as the number of references from
keys in the store. If it has, then we know that these keys can
never be 'found' again. So we filter them out of the store,
which drops the items.
|
|
|
|
Closing a device or writer may throw exceptions, but much of the
foreign language bindings (JNI and JS) depend on drop to never throw
an exception (exceptions in finalizers are bad).
|
|
|
|
|
|
|
|
Allows us to remove the out parameter 'transform' from fz_begin_page.
|
|
This silences the many warnings we get when building for x64
in windows.
This does not address any of the warnings we get in thirdparty
libraries - in particular harfbuzz. These look (at a quick
glance) harmless though.
|
|
|
|
Calculations that involved non power of 2 bpps were going wrong.
|
|
Use do {} while(--w) rather than while(w--) {} as this safes a
test each time around the loop.
|
|
|
|
fz_pixmaps now have an explicit stride value. By default no change
from before, but code all copes with extra gaps at the end of the
line.
The alpha data in fz_pixmaps is no longer compulsory.
mudraw: use rgb not rgba (ppmraw), cmyk not cmyka (pkmraw).
Update halftone code to not expect alpha plane.
Update PNG writing to cope with alpha less input.
Also hide repeated params within the png output context.
ARM code needs updating.
|
|
|
|
|
|
When decoding < 8 bpp images, we need to allow for the fact
that the data is byte aligned at the end of each row by
being careful in our calculation of r_skip.
|
|
I was using fz_compressed_image when I should have been using
fz_pixmap_image.
|
|
Split compressed images (images based on a compressed buffer)
and pixmap images (images based on a pixmap) out into separate
subclasses.
|
|
Move from ints to bits where possible.
|
|
For now, just use it for controlling image decoding and image scaling.
|
|
Update the core fz_get_pixmap_from_image code to allow fetching
a subarea of a pixmap. We pass in the required subarea, together
with the transformation matrix for the whole image.
On return, we have a pixmap at least as big as was requested,
and the transformation matrix is updated to map the supplied
area to the correct place on the screen.
The draw device is updated to use this as required. Everywhere
else passes NULLs in, and so gets unchanged behaviour.
The standard 'get_pixmap' function has been updated to decode
just the required areas of the bitmaps.
This means that banded rendering of pages will decode just the
image subareas that are required for each band, limiting the
memory use. The downside to this is that each band will redecode
the image again to extract just the section we want.
The image subareas are put into the fz_store in the same way
as full images. Currently image areas in the store are only
matched when they match exactly; subareas are not identified
as being able to use existing images.
|
|
|
|
|
|
An l2factor of 3 is equivalent to downscaling by a factor of 8.
We can get an l2factor of 3 downscale out of the jpeglib. We can
reasonably downscale by a further l2factor of 3 manually. Any more
than that and we start to completely drop pixels without them
having any effect.
Therefore it's pointless us keeping any tiles around with l2factors
> 6.
Fix the bug (which was that we were using < instead of <=) and
update the value to a more reasonable one anyway.
|
|
|
|
Image objects are immutable and opaque once constructed.
Therefore there is no need for the const keyword.
|
|
In general, we should use 'const fz_blah' in device calls whenever
the callee should not alter the fz_blah.
Push this through. This shows up various places where we fz_keep
and fz_drop these const things.
I've updated the fz_keep and fz_drops with appropriate casts
to remove the consts. We may need to do the union dance to avoid
the consts for some compilers, but will only do that if required.
I think this is nicer overall, even allowing for the const<->no const
problems.
|
|
|
|
|
|
Use fz_output in debug printing functions.
Use fz_output in pdfshow.
Use fz_output in fz_trace_device instead of stdout.
Use fz_output in pdf-write.c.
Rename fz_new_output_to_filename to fz_new_output_with_path.
Add seek and tell to fz_output.
Remove unused functions like fz_fprintf.
Fix typo in pdf_print_obj.
|
|
|
|
|
|
Important for gproof files.
|
|
Ensure that subsampling and caching happen in the generic image
code, not in the specific.
Previously, the subsampling happened only for images that were
decoded from streams. Images that were loaded direct were never
subsampled and hence were always cached at full size. After this
change both classes of image are correctly subsampled, and
the subsampled version kept in the cache.
This produces various image diffs in the cluster, none of which
are noticable to the naked eye.
|
|
Previously, we had people calling image->get_pixmap directly. Now we
have them all call fz_image_get_pixmap, which will look for a cached
version in the store, and only call get_pixmap if required.
Previously fz_image_get_pixmap used to look for the cached version
in the store, and decode if not - hence the decoding code is now
extracted out into standard_image_get_pixmap.
This was the original intent of the code, it just somehow didn't end
up like that.
This nicely queues us up for being able to have fz_images that use
a different get_pixel implementation, such as that which will be
required for the gprf code.
|
|
Add locks around fz_path and fz_text reference counting.
|
|
Purge several embedded contexts:
Remove embedded context in fz_output.
Remove embedded context in fz_stream.
Remove embedded context in fz_device.
Remove fz_rebind_stream (since it is no longer necessary).
Remove embedded context in svg_device.
Remove embedded context in XML parser.
Add ctx argument to fz_document functions.
Remove embedded context in fz_document.
Remove embedded context in pdf_document.
Remove embedded context in pdf_obj.
Make fz_page independent of fz_document in the interface.
We shouldn't need to pass the document to all functions handling a page.
If a page is tied to the source document, it's redundant; otherwise it's
just pointless.
Fix reference counting oddity in fz_new_image_from_pixmap.
|
|
Rename fz_close to fz_drop_stream.
Rename fz_close_archive to fz_drop_archive.
Rename fz_close_output to fz_drop_output.
Rename fz_free_* to fz_drop_*.
Rename pdf_free_* to pdf_drop_*.
Rename xps_free_* to xps_drop_*.
|
|
We end up trying to scale the JPEG up 72 times and fail a malloc.
A better plan is to make the image handler disbelieve any xres or
yres values less than 72dpi. We take care to still preserve aspect
ratios etc.
|
|
fz_image::n is used inconsistently: Sometimes it includes the alpha
channel and sometimes it doesn't. At the point where
fz_unblend_masked_tile is called, it doesn't.
|
|
|
|
|
|
If the reported height is 0 or too large, use the image size reported
in the PDF itself instead (in the case of height 0, the JPEG library
is supposed to read the correct value from the DNL segment, but libjpeg
doesn't support that).
|
|
If a JPEG stream is missing valid values for width/height (usually -1),
Adobe Reader substitutes these using the values read from the PDF
object. This can be done by scanning and patching the data before
passing it to libjpeg.
Thanks to zeniko for the patch.
|
|
Previously pdf_process buffer did not understand inline images.
In order to make this work without needlessly duplicating complex code
from within pdf-op-run, the parsing of inline images has been moved to
happen in pdf-interpret.c. When the op_table entry for BI is called
it now expects the inline image to be in csi->img and the dictionary
object to be in csi->obj.
To make this work, we have had to improve the handling of inline images
in general. While non-inline images have been loaded and held in
memory in their compressed form and only decoded when required, until
now we have always loaded and decoded inline images immediately. This
has been due to the difficulty in knowing how many bytes of data to
read from the stream - we know the length of the stream once
uncompressed, but relating this to the compressed length is hard.
To cure this we introduce a new type of filter stream, a 'leecher'.
We insert a leecher stream before we build the filters required to
decode the image. We then read and discard the appropriate number
of uncompressed bytes from the filters. This pulls the compressed
data through the leecher stream, which stores it in an fz_buffer.
Thus images are now always held in their compressed forms in memory.
The pdf-op-run implementation is now trivial. The only real complexity
in the pdf-op-buffer implementation is the need to ensure that the
/Filter entry in the dictionary object matches the exact point at
which we backstopped the decompression.
|
|
Gridfitting can increase the required width/height of images by up to
2 pixels. This makes images that are rendered very small very
sensitive to over quantisation.
This can produce 'mushier' images than it should, for instance on
tests/Ghent_V3.0/090_Font-Support_x3.pdf (pgmraw, 72dpi)
|
|
If the scale is too large, the calculation to determine the
required size of a pixmap can overflow. This can lead to negative
width/heights being passed in, which confuses the subsampling
code, leading to SEGVs.
|
|
fz_new_image_from_pixmap expects that the pixmap's colorspace has two
references which is contrary to expectations. If it instead addrefs the
pixmap's colorspace, the only caller pdf_load_jpx can consistently
drop the colorspace after passing it to fz_load_jpx.
Also, if the contract is that whatever is passed into
fz_new_image_from_pixmap belongs to the new image, then the pixmap also
has to be dropped on error so that it isn't leaked.
|
|
See SumatraPDF's repo for a Windows-only implementation using WIC.
|