summaryrefslogtreecommitdiff
path: root/core/fpdfapi/parser
AgeCommit message (Collapse)Author
2018-08-23Fold CPDF_Document::LoadDocInternal() into caller.Tom Sepez
It is only called in one place. It also has a superfluous test and return as it is currently written introduced in https://pdfium-review.googlesource.com/c/pdfium/+/35490 Change-Id: Iba1aaac6e93c261f71729f39e51741f19c5dbb57 Reviewed-on: https://pdfium-review.googlesource.com/41071 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-08-22Properly handle language markers in decoded textRyan Harrison
In text like document title 0x001B is used as a marker for the beginning/end of a language metadata section. Currently PDFium does nothing with this data, but when returning the 'decoded' text it needs to be stripped out. The existing code assumed that the two bytes following a marker would be the data to be removed and did nothing to track if it was in/out of one of these regions. This led to a situation where it would always strip the two bytes following the region, since it assumed the end marker was the beginning of a new region. This CL corrects the detection and handling of these regions, and adds a regression test for the reported bug. BUG=pdfium:182 Change-Id: I92ddba5666274a8986fed03f502a0331f150f7ac Reviewed-on: https://pdfium-review.googlesource.com/41070 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2018-08-20Use UnownedPtr<> in CPDF_ObjectWalker.Tom Sepez
Change-Id: I99eed369f4d44f92607a0a58ba24e8b62ee348f7 Reviewed-on: https://pdfium-review.googlesource.com/40671 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-08-16Remove more optional args in core/Tom Sepez
Change-Id: I6a2bd03e00ad4e3d57f6931c0c6cf4ae0c760afb Reviewed-on: https://pdfium-review.googlesource.com/40290 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-08-14Remove unreachable vertical text code in CPDF_Document.Lei Zhang
Change-Id: I64b34da202a0f1c30a38cba2f24490aad4063828 Reviewed-on: https://pdfium-review.googlesource.com/40150 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-08-14Remove unused form/args of AddWindowsFont()Tom Sepez
Change-Id: I38b508b5518568ff134b70e0e494e5267571c1ca Reviewed-on: https://pdfium-review.googlesource.com/40110 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-08-14Remove CFX_BufferSeekableReadStream.Lei Zhang
Replace it with CFX_ReadOnlyMemoryStream, which does the same thing. Take some checks from CFX_BufferSeekableReadStream and add them CFX_ReadOnlyMemoryStream. Change-Id: I25554c3aec3ec96967f8df16ca68a64dba121b6f Reviewed-on: https://pdfium-review.googlesource.com/40070 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-08-14Remove more default arg = nullptr cases.Tom Sepez
Bring in line with standards. Remove argument entirely for mac code that is always nullptr. Change-Id: I0710bdbd51fc0bc2e1d428ef44976be39a631147 Reviewed-on: https://pdfium-review.googlesource.com/40091 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-08-14Remove |bTakeOver| parameter from CFX_MemoryStream ctor.Lei Zhang
It is always true now. BUG=pdfium:263 Change-Id: I74fd0091f5815701718e8cd5acc6e7a0de772a85 Reviewed-on: https://pdfium-review.googlesource.com/40031 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-08-13Make CFX_ReadOnlyMemoryStream take a span.Lei Zhang
Change-Id: Id097320ab2d9b5d1579582e5797e29c701499501 Reviewed-on: https://pdfium-review.googlesource.com/39991 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2018-08-13Change CFDF_Document::ParseMemory() to use pdfium::span.Lei Zhang
Change-Id: I1e9b02f0cb2628d41bc1c6bdcfcfa09c36faf97e Reviewed-on: https://pdfium-review.googlesource.com/39990 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-08-13Use CFX_ReadOnlyMemoryStream in more places.Lei Zhang
More const pointers, less const_casts. BUG=pdfium:263 Change-Id: I47fc6d8f2f837390e40ad22d8b67946065294eaa Reviewed-on: https://pdfium-review.googlesource.com/39879 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-08-10Kill some optional parameters that are always supplied.Tom Sepez
No need to even bring any .cpp files in line with these headers. Change-Id: I934169d77ae09adc11f02e5ea92b1f8b078c9477 Reviewed-on: https://pdfium-review.googlesource.com/39876 Commit-Queue: Tom Sepez <tsepez@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2018-08-10Remove unused form of PDF_EncodeText().Tom Sepez
Perform indexing via WideString::operator[] which will at least assert bounds, if not enforce them. Change-Id: I7b03cfd718c4acd642af910c2ca06f86d178e5fd Reviewed-on: https://pdfium-review.googlesource.com/39873 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-08-10Remove direct calls to timechromium/3519Ryan Harrison
Replaces them with calles to the proxy function, FXSYS_time, so that tests may use a stable time value instead of the wall clock value. BUG=pdfium:1104 Change-Id: I4743c4634f56d4a6cba1f1130c4562a35cee1887 Reviewed-on: https://pdfium-review.googlesource.com/39853 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2018-08-09Revert "Rework of CPDF_Parser::GetLastObjNum."Lei Zhang
This reverts commit b07deb3fc1f54bd700a66df573bfcbc4bcc1d787. Reason for revert: Causing https://crbug.com/870467 Original change's description: > Rework of CPDF_Parser::GetLastObjNum. > > Change-Id: I0481774858a9d9823580e1207807e35be8a9eea9 > Reviewed-on: https://pdfium-review.googlesource.com/36270 > Reviewed-by: Lei Zhang <thestig@chromium.org> > Commit-Queue: Art Snake <art-snake@yandex-team.ru> TBR=thestig@chromium.org,art-snake@yandex-team.ru # Not skipping CQ checks because original CL landed > 1 day ago. Change-Id: I58e23c752a582c21be8ba45e3538e63c0fa64504 Reviewed-on: https://pdfium-review.googlesource.com/39810 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-08-08Move ByteString::FromUnicode() to WideString::ToDefANSI()Tom Sepez
Turns out that "FromUnicode" is misleading in that, on linux, it simply removes any characters beyond 0xFF and passes the rest unchanged, so no unicode decoding actually takes place. On Windows, it passes it into the system function specifying FX_CODEPAGE_DefANSI, converting it into the so-called "default ANSI code plane", passing some characters, converting others to '?' and still others to 'A'. Either way, nothing resembling UTF8 comes out of this, so pick a better name. These now immediately look suspicious, so a follow-up CL will see which ones should really be WideString::UTF8Encode() instead. Making this a normal method on a widestring rather than a static method on a bytestring feels more natural; this is parallel to the UTF8Encode and UTF16LE_Encode functions. Add a test that shows these conversions. Change-Id: Ia7551b47199eba61b5c328a97bfe9176ac8e583c Reviewed-on: https://pdfium-review.googlesource.com/39690 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-08-06Avoid invalid object numbers in CPDF_Parser::LoadCrossRefV5().chromium/3515Lei Zhang
BUG=chromium:865272 Change-Id: I4606bdfd78ebd6553c36b985b4f49d07b579ac40 Reviewed-on: https://pdfium-review.googlesource.com/39438 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Art Snake <art-snake@yandex-team.ru>
2018-08-06Check for null object type in CPDF_Parser::LoadCrossRefV5().Lei Zhang
BUG=chromium:871042 Change-Id: Id4566b29270ab738c69d46cb96fc134485d6ee2f Reviewed-on: https://pdfium-review.googlesource.com/39510 Reviewed-by: Art Snake <art-snake@yandex-team.ru> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-08-06Do more CPDF_Parser::LoadCrossRefV5() cleanup.Lei Zhang
- Use range for-loop to avoid needing "i" and "j". - Avoid repeatedly calculating "startnum + j". - Reduce levels of nested ifs. - Remove variable that is only used once. Change-Id: I9d08cef1082812fcfaa2699f65720165c52ebcff Reviewed-on: https://pdfium-review.googlesource.com/39437 Reviewed-by: Art Snake <art-snake@yandex-team.ru> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-08-04Clarify integer types in CPDF_Parser::LoadCrossRefV5().Lei Zhang
GetVarInt() returns uint32_t. So assign the results to variables of type uint32_t. Then make sure those results get passed on as uint32_t, or use pdfium::base::IsValueInRangeForNumericType<T>() to make sure they can be converted to type T safely. Change-Id: I4556f0b89b4e5cdb99ab530119c8051ec8a9411d Reviewed-on: https://pdfium-review.googlesource.com/39436 Reviewed-by: Art Snake <art-snake@yandex-team.ru> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-08-02Rework of CPDF_DataAvail::CheckHintTables.Artem Strygin
Move HintTables parsing logic into CPDF_HintTables. Change-Id: I9748179fe9fc3ac44f88c19c347e30c0e7e3ac67 Reviewed-on: https://pdfium-review.googlesource.com/38771 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Art Snake <art-snake@yandex-team.ru>
2018-08-02Remove some checks in IsLinearizedHeaderValid().Lei Zhang
One check can never fail. The other check can be done earlier, before creating the CPDF_LinearizedHeader. Change-Id: I0bccb2a9e19e0d5517daf96684adba6bb3a203bf Reviewed-on: https://pdfium-review.googlesource.com/39412 Reviewed-by: Art Snake <art-snake@yandex-team.ru> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-08-02Rework of CPDF_Parser::GetLastObjNum.Artem Strygin
Change-Id: I0481774858a9d9823580e1207807e35be8a9eea9 Reviewed-on: https://pdfium-review.googlesource.com/36270 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Art Snake <art-snake@yandex-team.ru>
2018-07-30Check maximum bit count of shared group object numbers.Artem Strygin
Bug: chromium:868477 Change-Id: I5957c5ef051bc4fa8eb51efa6a7fc142996742c5 Reviewed-on: https://pdfium-review.googlesource.com/39130 Commit-Queue: Art Snake <art-snake@yandex-team.ru> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
2018-07-27Make pdfium_embeddertests pass on Windows 10.Lei Zhang
BUG=chromium:828177 NOTRY=true Change-Id: I30123087bbe11aaaa6175b5f729b7ab55107a975 Reviewed-on: https://pdfium-review.googlesource.com/38902 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
2018-07-27Parse obj nums range within Hint tables for shared groups.Artem Strygin
Change-Id: Ib22db6c57d2066ef70c0ef12e44d1e5eee6611a5 Reviewed-on: https://pdfium-review.googlesource.com/36410 Commit-Queue: Art Snake <art-snake@yandex-team.ru> Reviewed-by: Lei Zhang <thestig@chromium.org>
2018-07-25Change GetHeaderOffset() to return Optional<FX_FILESIZE>.Lei Zhang
Remove |kInvalidHeaderOffset|. Change-Id: I5978e745e97aa4e13299dd21028721725ac0c996 Reviewed-on: https://pdfium-review.googlesource.com/38853 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Art Snake <art-snake@yandex-team.ru>
2018-07-25Remove CFX_MemoryStream uses in tests.Lei Zhang
Replace with CFX_BufferSeekableReadStream, which allows for spans and const inputs. Change CXFA_DocumentParser to take IFX_SeekableReadStream instead of IFX_SeekableStream in the process. Change-Id: I0168451350c9fc250231f0414c38738a4d86ca42 Reviewed-on: https://pdfium-review.googlesource.com/38852 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Ryan Harrison <rharrison@chromium.org>
2018-07-25Change CFX_BufferSeekableReadStream to take a span.Lei Zhang
Change-Id: Ib9e20fdfc637b2ba0358586e23ad72454b0b8ad1 Reviewed-on: https://pdfium-review.googlesource.com/38851 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2018-07-25Move CPDF_SyntaxParser init methods into ctor.Lei Zhang
- CPDF_SyntaxParser can no longer be initialized multiple times. - Make the file length and header offset const. - Make the header offset type FX_FILESIZE consistently. - Simplify for the common case where the header offset is 0. Change-Id: I7138db1fbcec3b7578b0239b92fc1154fa4dc4ce Reviewed-on: https://pdfium-review.googlesource.com/38850 Reviewed-by: Art Snake <art-snake@yandex-team.ru> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-07-25Fix hint tables parsing.Artem Strygin
Sample PDF: https://yadi.sk/d/oWLtAEfy3YbEb3 For offsets, equal to the hint stream offset, added hint stream length to determine the actual offset, because linearization inserted the hint stream at the original location of the object. Also the number of bits needed to represent the numerator of the fractional position for each shared object reference may be zero, if each shared group contains only one object with obj num, incremented on 1. Change-Id: I4754d603f388354821e8d0cac97ad99a7578fe4b Reviewed-on: https://pdfium-review.googlesource.com/36610 Commit-Queue: Art Snake <art-snake@yandex-team.ru> Reviewed-by: Lei Zhang <thestig@chromium.org>
2018-07-25Use document size instead of file size while parsing.Artem Strygin
We should use document size instead of File size, because all offsets and sizes was read from document should take into account of header offset. Added some tests of parsing of documents with header offset. Also drop friendship of CPDF_SyntaxParser with CPDF_Parser. Change-Id: Iebec75ab2ee07fb644a6c653b4ef5c2e09af09fe Reviewed-on: https://pdfium-review.googlesource.com/35830 Commit-Queue: Art Snake <art-snake@yandex-team.ru> Reviewed-by: Lei Zhang <thestig@chromium.org>
2018-07-24Fix encryption dictionary owning.Artem Strygin
Return encryption dictionary as const reference from CPDF_Parser. Create a copy in CPDF_Creator if needed. Change-Id: I270f71d307d818fba7f65ebe379f5942ae816934 Reviewed-on: https://pdfium-review.googlesource.com/38390 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Art Snake <art-snake@yandex-team.ru>
2018-07-23Rework of CPDF_Object writing.Artem Strygin
Move writing logic into implementation of related clases. Change-Id: If70dc418b352b562ee681ea34fa6595d6f52eee3 Reviewed-on: https://pdfium-review.googlesource.com/36350 Commit-Queue: Art Snake <art-snake@yandex-team.ru> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2018-07-23Add support of rebuilding crossrefs with compressed objects.Artem Strygin
Change-Id: I0743c34f0206f85828570430edb9f62b6b0cdbb5 Reviewed-on: https://pdfium-review.googlesource.com/37315 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Art Snake <art-snake@yandex-team.ru>
2018-07-20Rework of CPDF_Parser::RebuildCrossRef.chromium/3498Artem Strygin
Use CPDF_SyntaxParser logic to rebuild crossref. Change-Id: I394f64e76294b97c6a7c2b8984a880712fd193a7 Reviewed-on: https://pdfium-review.googlesource.com/37314 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Art Snake <art-snake@yandex-team.ru>
2018-07-18Add pdfium::span::as_bytes() and as_writable_bytes().Tom Sepez
Picks up some enhancements from base/span.h. In turn, also adds the size_bytes() helper. Differs from base version in that it works around C++14 enable_if_t<>, and avoids the dynamic_extent template specialization tricks. Use it in a few places where appropriate. Change-Id: I86f72cf0023f2d4317a7afa351fddee601c8f86c Reviewed-on: https://pdfium-review.googlesource.com/38251 Reviewed-by: Daniel Cheng <dcheng@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-07-18Do not add invalid objects to the cross reference table.chromium/3496Lei Zhang
BUG=chromium:851994 Change-Id: I2e14401271c70afa204221e0f3d469f0b82ce8cf Reviewed-on: https://pdfium-review.googlesource.com/37871 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Art Snake <art-snake@yandex-team.ru>
2018-07-18Avoid writing const/non-const versions of the same function.Lei Zhang
Use const_cast for the non-const version to call the const version. Change-Id: Ibdf5fe53255ee6e983555080336f5d63e683afd1 Reviewed-on: https://pdfium-review.googlesource.com/37490 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-07-18Use CPDF_CrossRefTable within CPDF_ParserArtem Strygin
Change-Id: I354e8bed12606abdc67427bbc7928e3b1f11e243 Reviewed-on: https://pdfium-review.googlesource.com/35433 Commit-Queue: Art Snake <art-snake@yandex-team.ru> Reviewed-by: Lei Zhang <thestig@chromium.org>
2018-07-18Make CPDF_Parser::GetTrailer const method.Artem Strygin
Use own copy of encryption dictionary within CPDF_Parser, to prevent modification of original trailer. Change-Id: I6246b872d431b94411fcec694c5176f8d85dfe26 Reviewed-on: https://pdfium-review.googlesource.com/35450 Commit-Queue: Art Snake <art-snake@yandex-team.ru> Reviewed-by: Lei Zhang <thestig@chromium.org>
2018-07-16Fix some nits in CPDF_Document.Lei Zhang
Change-Id: I57f89b9f2a8ef3f351e7574a76d6064ffde150d3 Reviewed-on: https://pdfium-review.googlesource.com/37870 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2018-07-16Remove unused member from CPDF_DataAvail.Tom Sepez
Change-Id: I3686bd3d28a84aae39c750a371902e1e5d62b365 Reviewed-on: https://pdfium-review.googlesource.com/37050 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-07-16Get rid of some loose allocs/free in CPDF_Document.chromium/3494Tom Sepez
Use std::vector<> as a manager for contiguous buffers. Change-Id: Icaacbd4b7010b928237aa71485411ade7539412a Reviewed-on: https://pdfium-review.googlesource.com/37012 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-07-12Remove CPDF_HintTables::GetItemLength()Artem Strygin
Commit {Insert later} removed the last caller to this method. Change-Id: I1689b33486396cc3a41139f984f819b39ab02b2a Reviewed-on: https://pdfium-review.googlesource.com/35130 Commit-Queue: Art Snake <art-snake@yandex-team.ru> Reviewed-by: dsinclair <dsinclair@chromium.org>
2018-07-12Implement CPDF_HintsTable::SharedObjGroupInfo.Artem Strygin
Merge shared objects related data into CPDF_HintsTable::SharedObjGroupInfo. Change-Id: I53bb7fc42ea6bcd26b3ebf91b8c6aa402108d086 Reviewed-on: https://pdfium-review.googlesource.com/15830 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Art Snake <art-snake@yandex-team.ru>
2018-07-12Reland "Avoid duplicate data buffering in CPDF_SyntaxParser::ReadStream()."Artem Strygin
This is a reland of 77f15f7883638a4ced131d74c053af10a5970ce9 Original change's description: > Avoid duplicate data buffering in CPDF_SyntaxParser::ReadStream(). > > Allow sub-streams created from an IFX_SeekableReadStream to provide > stream data without copying memory. > The data will only reside in the top-level stream. > > For example: > For file > http://www.major-landrover.ru/upload/attachments/f/9/f96aab07dab04ae89c8a509ec1ef2b31.pdf > (18 Mb) > > The memory usage is reduced by ~13 Mb. > > Change-Id: I2595c014d0fbe1fdd181cc04965cfd7d901c2d88 > Reviewed-on: https://pdfium-review.googlesource.com/35930 > Commit-Queue: Art Snake <art-snake@yandex-team.ru> > Reviewed-by: dsinclair <dsinclair@chromium.org> Change-Id: I4c4d5dcf42ff44784468ac7a7c302df509fc804d Reviewed-on: https://pdfium-review.googlesource.com/37313 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Commit-Queue: Art Snake <art-snake@yandex-team.ru>
2018-07-11Fix crash and memory leak.Artem Strygin
Do not return size within CPDF_StreamAcc in case when read data failed. Also free buffers in this case. Bug: chromium:860210 Change-Id: Ifb2a061d7c8427409b68c33f213c5c55343fb946 Reviewed-on: https://pdfium-review.googlesource.com/37310 Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Commit-Queue: Art Snake <art-snake@yandex-team.ru>
2018-07-11Do not store cross ref v5 obj within document.Artem Strygin
Currently, not necessary to store the cross ref v5 obj within CPDF_IndirectObjectHolder(CPDF_Document), because all necessary data from the cross ref are parsed or cloned, and owned by CPDF_Parser seperately from CPDF_IndirectObjectHolder. Also this fix regression from commit 4ea4459e. BUG=chromium:810768 Change-Id: I0d1a11ff027210f4f15804a69d12416838ec9815 Reviewed-on: https://pdfium-review.googlesource.com/37110 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Art Snake <art-snake@yandex-team.ru>