summaryrefslogtreecommitdiff
path: root/core/fpdfapi/parser/cpdf_document.cpp
AgeCommit message (Collapse)Author
2017-08-30Add truly const versions of CPDF_Document getters.Lei Zhang
Instead of only having CPDF_Dictionary* GetRoot() const, provide const CPDF_Dictionary* GetRoot() const and CPDF_Dictionary* GetRoot(). Do the same for GetInfo(). Change-Id: I6eae1208d38327fcdc7d0cd75069a01c95f4a92a Reviewed-on: https://pdfium-review.googlesource.com/11671 Commit-Queue: Lei Zhang <thestig@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-08-29Optimize receiving page, if it have not obj num.Artem Strygin
Make all pages dictionary are not inlined. Original fix: https://codereview.chromium.org/2491583002 Change-Id: Ie3aa662182a70ef6ef1d6121c0576c171e0060dd Reviewed-on: https://pdfium-review.googlesource.com/11810 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Art Snake <art-snake@yandex-team.ru>
2017-06-19Fixing metadata not read from linearized file.chromium/3136Henrique Nakashima
This still won't work if the info dict is not on the first page without first calling FPDFAvail_IsFormAvail or FPDFAvail_IsPageAvail, as these are the methods that trigger parsing the rest of the data. Bug: pdfium:664 Change-Id: I0b0193e415a1153dcfb8bfba0e0482da6b6ba53c Reviewed-on: https://pdfium-review.googlesource.com/6610 Commit-Queue: Henrique Nakashima <hnakashima@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org> Reviewed-by: Nicolás Peña <npm@chromium.org>
2017-05-25Break apart the pageint.h file.Dan Sinclair
This CL separates pageint.h and the supporting cpp files into indivudal class files. Change-Id: Idcadce41976a8cd5f0d916e6a5ebbc283fd36527 Reviewed-on: https://pdfium-review.googlesource.com/5930 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-04-26Remove a few more |new|s.Tom Sepez
Change-Id: I8a50ed680c1e101f855644fca8d282dd21470577 Reviewed-on: https://pdfium-review.googlesource.com/4533 Commit-Queue: Tom Sepez <tsepez@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-04-26Get rid of a few |new|s in CPDF_Document.Tom Sepez
The chain of destructors may attempt to use m_pDocPage after it has been set to null by the unique_ptr destructor. Verify it is still present before using it from any code that may be called from some other CPDF_ destructor. Change-Id: I007160231d73feed85d90efc687d6da993653f96 Reviewed-on: https://pdfium-review.googlesource.com/4499 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2017-04-19Rename array names to match codepage namesDan Sinclair
Rename arrays to use code page names to make it clearer what they represent. Change-Id: Ia7d74353f6bae5fd7f030c05675664dafda03a7a Reviewed-on: https://pdfium-review.googlesource.com/4350 Reviewed-by: Nicolás Peña <npm@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-04-19Cleanup codepage and charset definitions.Dan Sinclair
This Cl cleans up the unused defines in fx_codepage.h. The FXFONT_CHARSET_ defines are replaced with fx_codepage defines, this moves fx_codepage into core instead of xfa only. Static asserts are added to verify the public/ charsets match the fx_codepage charsets. Change-Id: Ie2f749e093de60a9a6743128a1fb087912e4cc96 Reviewed-on: https://pdfium-review.googlesource.com/4316 Commit-Queue: dsinclair <dsinclair@chromium.org> Commit-Queue: Nicolás Peña <npm@chromium.org> Reviewed-by: Nicolás Peña <npm@chromium.org>
2017-04-04RefCount CPDF_StreamAcc all the time.Tom Sepez
Pass stream argument to constructor; it feels like a stream accessor should always be made from a stream rather than passing one in after the fact. Change-Id: Iaa46cb37677b81f0170f5d39bab76ad38ea4af44 Reviewed-on: https://pdfium-review.googlesource.com/3620 Commit-Queue: Tom Sepez <tsepez@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-04-04RefCount CPDF_IccProfile all the timeTom Sepez
Make the IccProfile track its stream so that it has a proper key with which to purge the docpagedata map. Change-Id: Ib05ebc1afb828f1f5e5df62a1a33a1bfdecf507d Reviewed-on: https://pdfium-review.googlesource.com/3619 Commit-Queue: Tom Sepez <tsepez@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-04-03Drop FXSYS_ from mem methodsDan Sinclair
This Cl drops the FXSYS_ from mem methods which are the same on all platforms. Bug: pdfium:694 Change-Id: I9d5ae905997dbaaec5aa0b2ae4c07358ed9c6236 Reviewed-on: https://pdfium-review.googlesource.com/3613 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-04-03Drop FXSYS_ from string methodsDan Sinclair
This Cl drops the FXSYS_ from string methods which are the same on all platforms. Bug: pdfium:694 Change-Id: I1698aafd84f40474997549ae91ce35603377e303 Reviewed-on: https://pdfium-review.googlesource.com/3597 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-04-01Refcount CPDF_Image all the time.chromium/3061chromium/3060Tom Sepez
Remove the old externally-counted CPDF_CountedImage type. Change-Id: Ia0b288586272da3f2daf7dfc153f08e62794321a Reviewed-on: https://pdfium-review.googlesource.com/3553 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>
2017-03-21Pop when Pages is malformed and has no kidsNicolas Pena
If the Kids array for the Pages dictionary does not exist, just treat this dictionary as the unique page in the document. BUG=chromium:702883 Change-Id: I9cb9645a53d60306ffe563f9b27cbbd37442f4ec Reviewed-on: https://pdfium-review.googlesource.com/3135 Commit-Queue: Nicolás Peña <npm@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-03-15Reset tree traversal when we think we're at the startchromium/3043Nicolas Pena
If the PDF declares it has a gazillion pages when it does not, we just start traversing again from the start. This CL fixes that. BUG=chromium:680222 Change-Id: Ie9b55abc0aaa372429b3d995a7e1e7ab58fb7965 Reviewed-on: https://pdfium-review.googlesource.com/3060 Commit-Queue: Nicolás Peña <npm@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-03-15Add IndexInBounds() convenience routine.Tom Sepez
Avoid writing |Type| in CollectionSize<Type>() so that index type can change without rewriting conditions. Change-Id: I40c94ca39148b379908760ba9b861114b88af7bb Reviewed-on: https://pdfium-review.googlesource.com/3056 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2017-03-14Replace FX_CHAR and FX_WCHAR with underlying types.Dan Sinclair
Change-Id: I96e0a20d66b9184d22f64d8e4ce0dadd5a78c1e8 Reviewed-on: https://pdfium-review.googlesource.com/2967 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-02-24Remove repeated flags from CPDF_Fontchromium/3023Nicolas Pena
Moved all the flags to CFX_Font. Explicitly stated which ones are valued according to the PDF spec to avoid their values being changed. Change-Id: Ib57593234a4b9b83ef1ad593d0396c64159f303f Reviewed-on: https://pdfium-review.googlesource.com/2837 Commit-Queue: Nicolás Peña <npm@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-01-18Bad indexing in CPDF_Document::FindPageIndex when page tree corrupt.tsepez
Moving to std::vector from the more forgiving CFX_ArrayTemplate revealed the dubious page tree traversal, which depends on the correctness of the /Count entries to properly summarize the total descendants under a given node. The only "correct" thing to do is to throw away these counts as parsed, and re-compute them, perhaps in CountPages(). But I'm not willing to do that since it may break unknown documents in the wild. Pass out-params as pointers while we're at it. BUG=680376 Review-Url: https://codereview.chromium.org/2636403003
2017-01-09Remove CFX_ArrayTemplate from fpdfapitsepez
Review-Url: https://codereview.chromium.org/2611413002
2017-01-04Kill render_int.hchromium/2973chromium/2972Nicolas Pena
CPDF_DIBSource was already in its own file, but files needed renaming. Change-Id: Ib3ac787a0bb33d3f78ecdcdfcdbc938867857a14 Reviewed-on: https://pdfium-review.googlesource.com/2152 Commit-Queue: Nicolás Peña <npm@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-01-03Force stop of page tree traversal when max level reachedNicolas Pena
The previous implementation, FindPDFPage, was already doing this since the recursive call was always with return. Currently, we were trying to keep going even after reaching max level. The problem is that if the page tree is not a tree, we might loop forever. This could also be solved by keeping track of the dictionaries that have been visited, but this solution takes much less space. BUG=672172 Change-Id: Ia37aea58e92b6068de69f26736c612aa6a0ff4b3 Reviewed-on: https://pdfium-review.googlesource.com/2138 Commit-Queue: Nicolás Peña <npm@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2016-11-21Split fwl/core class pt II.dsinclair
Split classes in FWL to be single class per file. In the case of data providers which added no new methods, removed and used the IFWL_Widget::DataProvider directly. Review-Url: https://codereview.chromium.org/2520063002
2016-11-21Remove some WrapUnique() calls by returing unique_ptrstsepez
Return these from underlying methods as appropriate. Review-Url: https://codereview.chromium.org/2520133002
2016-11-18Make CPDF_Dictionary use unique pointers.chromium/2926tsepez
Some changes were required to match underlying ctors as invoked by the templated methods. Many release() calls go away, a few WrapUniques() are introduced to avoid going deeper into other code. Review-Url: https://codereview.chromium.org/2510223002
2016-11-17Move CPDF_DocRenderData from render_intnpm
First CL in an attempt to reduce the class cluttering in fpdf_render_text, fpdf_render_image, and fpdf_render. Review-Url: https://codereview.chromium.org/2507023006
2016-11-16Make CPDF_Object subclass constructors intern stringstsepez
Make CDPF_Arrays intern the object they create. Allow passing nullptr as a CFX_WeakPtr shortcut as well. Review-Url: https://codereview.chromium.org/2509123002
2016-11-16Move ByteStringPool from document to indirect object holder.tsepez
Since the indirect object holder is now in the object creation business, this will allow it to intern strings in a subsequent CL. Review-Url: https://codereview.chromium.org/2509773003
2016-11-16Make CPDF_Array take unique_ptrstsepez
BUG= Review-Url: https://codereview.chromium.org/2498223005
2016-11-15Make AddIndirectObject() take a unique_ptr.tsepez
Add convenience routines to create and add object in one step. Review-Url: https://codereview.chromium.org/2489283003
2016-11-14Make CPDF_PageContentGenerator methods take object numberstsepez
This patch fixes a possibility that an owned CPDF_Stream is handed to the indirect object holder inside RealizeResource(). Its arguments are changed to take an object number, as is done elsewhere in the code, to suggest that only indirect objects are acceptable. BUG=660756 Review-Url: https://codereview.chromium.org/2489423002
2016-11-09Fix receiving page, if it have not obj num.art-snake
In some PDF's the page may not have the obj num. For example: testing\corpus\fx\other\jetman_std.pdf in pdfium repository. And CPDF_Document::GetPage failed on second call for this page. Restart the traversing of pages, to fix this Also added test. Review-Url: https://codereview.chromium.org/2491583002
2016-11-07Use unique_ptr return from CPDF_Parser::ParseIndirectObject()tsepez
In turn, propgate to callers. This introduces a few release() calls that will go away as more code is converted. It also removes a couple of WrapUnique calls that are no longer needed as ownership of the object flows along. Review-Url: https://codereview.chromium.org/2479303002
2016-11-07Rename CPDF_Linearized to CPDF_LinearizedHeadertsepez
My OCD insists that classes be named after nouns, and "linearized" feels like an adjective. Remove a redundant "if" while at it. Review-Url: https://codereview.chromium.org/2482973002
2016-11-07Reland of Unify some codeart-snake
Unify some code Move parsing of linearized header into separate CPDF_Linearized class. Original review: https://codereview.chromium.org/2466023002/ Revert review: https://codereview.chromium.org/2474283005/ Revert reason was: Breaking the chrome roll. See https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng/builds/331856 ___ Added Fix for fuzzers. Review-Url: https://codereview.chromium.org/2477213003
2016-11-04Revert of Unify some code (patchset #14 id:260001 of ↵chromium/2912chromium/2911dsinclair
https://codereview.chromium.org/2466023002/ ) Reason for revert: Breaking the chrome roll. See https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng/builds/331856 Original issue's description: > Unify some code > > Move parsing of linearized header into separate CPDF_Linearized class. > > Committed: https://pdfium.googlesource.com/pdfium/+/71333dc57ac7e4cf7963c83333730b3882ab371f TBR=thestig@chromium.org,brucedawson@chromium.org,art-snake@yandex-team.ru # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Review-Url: https://codereview.chromium.org/2474283005
2016-11-04Unify some codeart-snake
Move parsing of linearized header into separate CPDF_Linearized class. Review-Url: https://codereview.chromium.org/2466023002
2016-11-04Traverse PDF page tree only once in CPDF_Document Try 3npm
Now, we do not start traversal from where we were at, but from the top. This makes the code less prone to bugs, as now there is no need to call methods to recursively fix things. This will save a lot of time when the trees are rather flat, as in the PDF file in the bug. It can still be slow, for instance if we have a chain of page nodes, and the last in the chain contains all of the pages (this is artificial). Try 2 at https://codereview.chromium.org/2442403002/ Also added test where Try 2 would have failed. Tested the pdf from the bug on my Mac: With this CL: load in 21 seconds Without this CL: did not load in 4 minutes, got tired of waiting BUG=chromium:638513 Review-Url: https://codereview.chromium.org/2470803003
2016-11-04Reland "Remove CPDF_Object::Release() in favor of direct delete"tsepez
This reverts commit f0d5b6c35fa343108a3ab7a25bc2cc2b3cf105b3. Review-Url: https://codereview.chromium.org/2478303002
2016-11-04Revert of Remove CPDF_Object::Release() in favor of direct delete (patchset ↵dsinclair
#11 id:200001 of https://codereview.chromium.org/2384883003/ ) Reason for revert: Looks like it's blocking the roll. https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_compile_dbg_ng/builds/186619 Original issue's description: > Remove CPDF_Object::Release() in favor of direct delete > > Follow-on once we prove Release always deletes in previous CL. > > Committed: https://pdfium.googlesource.com/pdfium/+/4de3d095c9d9e961f93750cf1ebd489fd515be12 TBR=thestig@chromium.org,tsepez@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Review-Url: https://codereview.chromium.org/2478253002
2016-11-03Remove CPDF_Object::Release() in favor of direct deletetsepez
Follow-on once we prove Release always deletes in previous CL. Review-Url: https://codereview.chromium.org/2384883003
2016-11-03Move CPDF_Document insert methods from namespacenpm
Making the insert methods private allows us to use private members, as I will need on https://codereview.chromium.org/2470803003/ Review-Url: https://codereview.chromium.org/2472473005
2016-11-02Remove FX_BOOL from coretsepez
Review-Url: https://codereview.chromium.org/2477443002
2016-10-28Revert of Traverse PDF page tree only once in CPDF_Document Try 2 (patchset ↵npm
#3 id:40001 of https://codereview.chromium.org/2442403002/ ) Reason for revert: Not quite right yet. Original issue's description: > Traverse PDF page tree only once in CPDF_Document > > Try 2: main fix was recursively popping elements from the stack. Since > the Traverse method can be called on non-root nodes from GetPage(), we > have to make sure to properly update the parents. > > Try 1 at https://codereview.chromium.org/2414423002/ > > In our current implementation of CPDF_Document::GetPage, we traverse > the PDF page tree until we find the index we are looking for. This is > slow when we do calls GetPage(0), GetPage(1), ... since in this case > the page tree will be traversed n times if there are n pages. This CL > makes sure the page tree is only traversed once. > > Time to load the PDF from the bug below in chrome official build: > Before this CL: around 1 minute 25 seconds > After this CL: around 4 seconds > > BUG=chromium:638513 > > Committed: https://pdfium.googlesource.com/pdfium/+/d3a2009d75eac3cda442f545ef0865afae7b35cf TBR=tsepez@chromium.org,weili@chromium.org,thestig@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG=chromium:638513 Review-Url: https://codereview.chromium.org/2461063003
2016-10-26Traverse PDF page tree only once in CPDF_Documentnpm
Try 2: main fix was recursively popping elements from the stack. Since the Traverse method can be called on non-root nodes from GetPage(), we have to make sure to properly update the parents. Try 1 at https://codereview.chromium.org/2414423002/ In our current implementation of CPDF_Document::GetPage, we traverse the PDF page tree until we find the index we are looking for. This is slow when we do calls GetPage(0), GetPage(1), ... since in this case the page tree will be traversed n times if there are n pages. This CL makes sure the page tree is only traversed once. Time to load the PDF from the bug below in chrome official build: Before this CL: around 1 minute 25 seconds After this CL: around 4 seconds BUG=chromium:638513 Review-Url: https://codereview.chromium.org/2442403002
2016-10-21Revert of Fix loading page using hint tables. (patchset #5 id:80001 of ↵npm
https://codereview.chromium.org/2437773003/ ) Reason for revert: CPDF_DataAvail::IsPageAvail is causing crashes. BUG=chromium:658168, chromium:658170 Original issue's description: > Fix loading page using hint tables. > > When linearized document have hint table, > The FPDFAvail_IsPageAvail return true, but > FPDF_LoadPage return nullptr, for non first pages. > > This happens, bacause document not use hint tables, to load page. > > To fix this, I force save the page's ObjNum in document. > > R=npm, dsinclair > > Committed: https://pdfium.googlesource.com/pdfium/+/ef38283688c1ee7c08bcf4204cfb78e09c039782 TBR=dsinclair@chromium.org,tsepez@chromium.org,thestig@chromium.org,art-snake@yandex-team.ru # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Review-Url: https://chromiumcodereview.appspot.com/2442663005
2016-10-20Fix loading page using hint tables.chromium/2897art-snake
When linearized document have hint table, The FPDFAvail_IsPageAvail return true, but FPDF_LoadPage return nullptr, for non first pages. This happens, bacause document not use hint tables, to load page. To fix this, I force save the page's ObjNum in document. R=npm, dsinclair Review-Url: https://chromiumcodereview.appspot.com/2437773003
2016-10-20Revert of Traverse PDF page tree only once in CPDF_Document (patchset #4 ↵dsinclair
id:60001 of https://codereview.chromium.org/2414423002/ ) Reason for revert: Possible cause of crbug.com/657897 reverting to find out. BUG=657897 Original issue's description: > Traverse PDF page tree only once in CPDF_Document > > In our current implementation of CPDF_Document::GetPage, we traverse > the PDF page tree until we find the index we are looking for. This is > slow when we do calls GetPage(0), GetPage(1), ... since in this case > the page tree will be traversed n times if there are n pages. This CL > makes sure the page tree is only traversed once. > > Time to load the PDF from the bug below in chrome official build: > Before this CL: 1 minute 40 seconds > After this CL: 5 seconds > > BUG=chromium:638513 > > Committed: https://pdfium.googlesource.com/pdfium/+/7c29e27dae139a205755c1a29b7f3ac8b36ec0da TBR=thestig@chromium.org,tsepez@chromium.org,npm@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG=chromium:638513 Review-Url: https://chromiumcodereview.appspot.com/2430313006
2016-10-18Traverse PDF page tree only once in CPDF_Documentchromium/2895npm
In our current implementation of CPDF_Document::GetPage, we traverse the PDF page tree until we find the index we are looking for. This is slow when we do calls GetPage(0), GetPage(1), ... since in this case the page tree will be traversed n times if there are n pages. This CL makes sure the page tree is only traversed once. Time to load the PDF from the bug below in chrome official build: Before this CL: 1 minute 40 seconds After this CL: 5 seconds BUG=chromium:638513 Review-Url: https://codereview.chromium.org/2414423002
2016-10-14Revert "Update CPDF_IndirectObjectHolder APIs for unique objects."Tom Sepez
This reverts commit 3ba098595ae56b64eacc0c25ab76b89a4d78d920. TBR=thestig@chromium.org,weili@chromium.org Review URL: https://codereview.chromium.org/2424533003 .