diff options
author | brucedawson <brucedawson@chromium.org> | 2016-06-27 06:58:37 -0700 |
---|---|---|
committer | Commit bot <commit-bot@chromium.org> | 2016-06-27 06:58:37 -0700 |
commit | 160bd0e26f5dfe5fa11322f61b3d156c2214cba8 (patch) | |
tree | 100685ec0eafbad5c7579eecde9ff1c2d2a96448 /core/fxcrt/fx_basic_buffer.cpp | |
parent | afe3e4671b3f70e0459487aca89133a746c61797 (diff) | |
download | pdfium-160bd0e26f5dfe5fa11322f61b3d156c2214cba8.tar.xz |
Double AdobeCMYK_to_sRGB speed with faster rounding
FXSYS_round is painfully slow on Windows. It does range checking and
then calls an extremely expensive function. It ends up consuming half
the CPU time when decoding the images in PDFs such as this one:
https://www.ets.org/Media/Tests/GRE/pdf/gre_research_validity_data.pdf
SSE can be used to optimize this:
__m128 cmyk = {c * 255, m * 255, y * 255, k * 255};
uint32_t output[4];
_mm_storeu_si128((__m128i*)output, _mm_cvtps_epi32(cmyk));
but is cryptic, only works for x86/x64, and gives basically identical
performance to this solution - int(c * 255 + 0.5f);
The rounding behavior is not identical but in practice this rarely
matters, and in this specific case it does not matter because the edge
cases that vary are not hit.
The three divisions at the end were changed to multiplies because
profiling showed they were a significant cost.
This change reduces the image-decode stalls in the PDF listed above by
about 40%, making for a noticeably better experience. Further
optimizations are possible but would require significantly more time and
testing.
BUG=617365
Review-Url: https://codereview.chromium.org/2096723003
Diffstat (limited to 'core/fxcrt/fx_basic_buffer.cpp')
0 files changed, 0 insertions, 0 deletions