summaryrefslogtreecommitdiff
path: root/xfa/fxfa/fm2js/cxfa_fmlexer.cpp
AgeCommit message (Collapse)Author
2018-08-10Remove const args and const_casts where not required.Tom Sepez
Introduce const/non-const versions of method where required. Part of the war on const_cast<>. Tidy one expression to use [] instead of .data(). Change-Id: I41e45669c79eee242ff2244c7dc3afcf6386a433 Reviewed-on: https://pdfium-review.googlesource.com/39852 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-06-14[formcalc] Calculate length of string when calling FXSYS_wcstofDan Sinclair
When calling the FXSYS_wctof method we currently pass in -1 from AdvanceForNumber. This tells the method to calculate the string length. This can be slow for a formcalc string with a lot of numbers. This CL changes the call to pass in the length of remaining data in the original string. This takes the MSAN runtime of the case in the linked bug from ~21seconds to ~500ms. The debug runtime goes from ~2s to ~500ms. Bug: chromium:846104 Change-Id: Idbd19a728160f35982e21c0d97567fbbeefe667a Reviewed-on: https://pdfium-review.googlesource.com/35210 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2018-05-14Use internal wcstof instead of system wcstod in formcalc lexerDan Sinclair
This CL switches the usage of wcstod to use the FXSYS_wcstof to determine if a given string is a valid floating point number. Using the internal method makes linux slightly slower (10's of ms) makes mac a lot faster 900ms to 60ms for the test case in the bug. The FXSYS_wcstof method has been extended to handle the parsing of float exponents. Unittests were added for FXSYS_wcstof. Bug: chromium:813646 Change-Id: Ie68287a336e3b95a0c0b845d5bf39db6fc82b39c Reviewed-on: https://pdfium-review.googlesource.com/32510 Reviewed-by: Ryan Harrison <rharrison@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2018-05-08[fm2js] Fail transpiling if lexer has left over dataDan Sinclair
If there is remaining data after the lexer has said it's complete then something has gone wrong while lexing the formcalc data. This CL changes the transpiler to return an error in the case of the lexer havign extra data. Bug: chromium:834575 Change-Id: I8a1288a7f01cc69faf2033829d68246d815258de Reviewed-on: https://pdfium-review.googlesource.com/32130 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
2018-04-11Make cxfa_fmlexer.cpp resilient to null stringsTom Sepez
As currently written, the calculation of m_end will underflow when passed a {nullptr, 0} pair as input, and m_end becomes essentially unbounded. Change-Id: Id3249b201c446555d9aa4fa04e6a3c94a357cd99 Reviewed-on: https://pdfium-review.googlesource.com/30230 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Tom Sepez <tsepez@chromium.org>
2018-03-12Remove all usages of FXSYS_iswASCIIalphaRyan Harrison
Instances are either replaced with FXSYS_iswalpha, which calls out to the ICU library to do the proper Unicode operations, or have been converted to a isascii && isalpha pair, if ASCII alpha is actually what was wanted. BUG=pdfium:1035 Change-Id: I971ff639ee1ff818ad08793a1900a8bcbb0a3e04 Reviewed-on: https://pdfium-review.googlesource.com/28450 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2018-03-09Explicitly mark helper methods that only operate on ASCII rangesRyan Harrison
A number of our character helper methods take in wide character types, but only do tests/operations on the ASCII range of characters. As a very quick first pass I am renaming all of the foot-gun methods to explictly call out this behaviour, while I do a bigger cleanup/refactor. BUG=pdfium:1035 Change-Id: Ia035dfa1cb6812fa6d45155c4565475032c4c165 Reviewed-on: https://pdfium-review.googlesource.com/28330 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2018-02-26Add some more missing consts.chromium/3356Tom Sepez
Get things out of the .data section. Change-Id: I375cf00186a3d5d8d10f5d147bd4b692f5db3683 Reviewed-on: https://pdfium-review.googlesource.com/27130 Commit-Queue: Tom Sepez <tsepez@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2018-02-20[formcalc] Remove unused line parameterDan Sinclair
The recorded line number from the formcalc parse is never used. This Cl removes the parameter and removes the need to pass it through all of the constructors. Change-Id: Ice716cc4880dd17dc05bffcdce1dc1e4745108ea Reviewed-on: https://pdfium-review.googlesource.com/27412 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2018-02-19Simplify CXFA_FMToken creationdan sinclair
This CL converts the CXFA_FMToken usages into an object instead of a pointer. A copy constructor has been added. The line number was removed from the token and is retrieved from the lexer where needed. Change-Id: I94c632653e9bf1439d2ddf374a816ae0d10b5b67 Reviewed-on: https://pdfium-review.googlesource.com/27192 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Ryan Harrison <rharrison@chromium.org>
2018-02-19Simplify formcalc token listdan sinclair
The keyword list in the formcalc lexer is only used to match identifiers. We don't need to store the non-identifier tokens in the list, so they're removed. The hash is removed and the list is compared by string instead. The token names have been moved to DEBUG so they won't be included in Release builds. Change-Id: Ieec00e9944960e559079083a605e3249c4128841 Reviewed-on: https://pdfium-review.googlesource.com/27190 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Ryan Harrison <rharrison@chromium.org>
2017-09-27Remove FX_STRSIZE and replace with size_tRyan Harrison
BUG=pdfium:828 Change-Id: I5c40237433ebabaeabdb43aec9cdf783e41dfe16 Reviewed-on: https://pdfium-review.googlesource.com/13230 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-09-18Convert string class namesRyan Harrison
Automated using git grep & sed. Replace StringC classes with StringView classes. Remove the CFX_ prefix and put string classes in fxcrt namespace. Change AsStringC() to AsStringView(). Rename tests from TEST(fxcrt, *String*Foo) to TEST(*String*, Foo). Couple of tests needed to have their names regularlized. BUG=pdfium:894 Change-Id: I7ca038685c8d803795f3ed02545124f7a224c83d Reviewed-on: https://pdfium-review.googlesource.com/14151 Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-08-31Clean up of typing in lexer codeRyan Harrison
BUG=pdfium:813 Change-Id: I4c638857bf114327dbc0344cc6d231b897f0d001 Reviewed-on: https://pdfium-review.googlesource.com/11971 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-07-27Remove 0 from the valid FormCalc charactersRyan Harrison
0 is not actually in the spec for characters and was only included due quirks in how the lexing code was written. Due to changes in the lexer, it is no longer needed and a potential source of errors. Bug: Change-Id: I6fbba027e40aaa286ed4c89a6c569d26e3d4cd8b Reviewed-on: https://pdfium-review.googlesource.com/9350 Reviewed-by: (OOO Jul 28 - Aug 8) dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-07-27Refactor whitespace lexing to be explicitRyan Harrison
Removed 'case 0xf00:' to either be the character being checked or moved to a IsWhitespaceCharacter() check. BUG=pdfium:815 Change-Id: I34727a00f6d54ecf8de2f9e4eb785b3c10b6c521 Reviewed-on: https://pdfium-review.googlesource.com/9310 Reviewed-by: (OOO Jul 28 - Aug 8) dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-07-27Rewrite FMLexer to use nullptr for errorsRyan Harrison
This CL rewrites how FMLexer returns errors, instead of having a flag that gets flipped and needs to be checked, it now returns nullptr for NextToken() when an error occurs. The Lexer's behaviour has also been changed to only return nullptr once an error has occurred, instead of advancing the lexing on further calls. FMParse now checks the returned value from the lexer instead of testing the error flag on the parser object. For any operation that might cause the error state of the parser to change, i.e. consuming a token, an error check has been added. In the event this check fails the related function returns nullptr. This will cause the parse to short circuit and exit. BUG=pdfium:814 Change-Id: I669012c4732c18d13009be7cd7bf1ae682950904 Reviewed-on: https://pdfium-review.googlesource.com/8950 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: (OOO Jul 28 - Aug 8) dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-07-27Add conversion from FMTokens to stringsRyan Harrison
This is to aid with debugging, since the getting things like the type is a pain due to it being a enum. Bug: Change-Id: I89bae7103b476d7fd09ba78699367a1a413ee700 Reviewed-on: https://pdfium-review.googlesource.com/9190 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org>
2017-07-27Removed unused helper functionRyan Harrison
XFA_FM_KeywordToString is no longer used, and there are no plans to use it in the future, so should be removed. Bug: Change-Id: I44652a40f6396b25262f840c42036fb00a14aca1 Reviewed-on: https://pdfium-review.googlesource.com/9210 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-07-25Clean up data passing in FormCalc Lexerchromium/3167Ryan Harrison
This CL removes the pattern used in the lexer of passing the lexing member variables around as args to methods. Instead it uses the fact that they are member variables in the methods. This CL also includes renaming of variable and function names to remove unneeded details or make them more precise. BUG=pdfium:814 Change-Id: Id4c592338db9ff462835314252d39ab3b4b2b2ab Reviewed-on: https://pdfium-review.googlesource.com/8850 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-07-24Rename FMLexer methods to be more descriptiveRyan Harrison
The existing methods had either very bare passive or in some cases misleading names, so this CL changes them active names that describe what they do. This also extracts the IsKeyword method into a helper function since it is actually static. BUG=pdfium:816 Change-Id: I47a113bc9ea8d88a77822a4441266ec56e6b4cbc Reviewed-on: https://pdfium-review.googlesource.com/8730 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-07-24Remove extracting characters from inputRyan Harrison
Throughout the lexer there is use of a pattern where a character is extracted from the location currently being pointed at in the input and tests are done on this value. This is uneeded, since normally the pointer isn't being modified while the character is stored, so the pointer can be dereferenced directly. This CL also cleans ups the implementation of string extraction from the input to be easier to read, since it did depend on the character being extracted to work. The IsFooCharacter methods are also changed to take in a character instead of a pointer, since they are only testing the character value not the pointer. BUG=pdfium:814 Change-Id: I8eda01e0af59ff1f6c6e97d6fb504856c7a08a24 Reviewed-on: https://pdfium-review.googlesource.com/8690 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-07-19Rename StringCs c_str() to unterminated_c_str().Tom Sepez
Since there is no guarantee of termination if the StringC was extracted from a snippet of another string. Make it more obvious that things like strlen(str.unterminated_c_str()) might be a bad idea. Change-Id: I7832248ed89ebbddf5c0bcd402aac7d40ec2adc2 Reviewed-on: https://pdfium-review.googlesource.com/8170 Commit-Queue: Tom Sepez <tsepez@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>
2017-07-18Correct lexer handling of FormCalc identifiersRyan Harrison
This makes the lexer stricter on valid characters for identifiers, and conform to the grammar in the FormCalc spec. This should remove a class of inputs that ClusterFuzz is attempting that are breaking later stages of the transpile. BUG: chromium:736234, pdfium:783, pdfium:784 Change-Id: I3987d6778a82b71d768fa751035993c0af2577ee Reviewed-on: https://pdfium-review.googlesource.com/8010 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>
2017-07-04Removed hand rolled bsearch from IsKeyword methodRyan Harrison
BUG=pdfium:786 Change-Id: I7a852566cdde301ee896c12d9d29043047c31ad4 Reviewed-on: https://pdfium-review.googlesource.com/7211 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org>
2017-05-18Use UnownedPtr to check CFX_*StringC lifetimeschromium/3104Tom Sepez
Change interform to avoid temp StringC with dangling ptr. Change-Id: I8d8659973bcdf2cdbcaa6efa6012e4acce5f1604 Reviewed-on: https://pdfium-review.googlesource.com/5571 Commit-Queue: Tom Sepez <tsepez@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org>
2017-05-18Remove CXFA_FMErrorInfodan sinclair
This Cl removes the CXFA_FMErrorInfo class. The message was never output, just used as a flag to determine if there was an error. The class has been replaced with a boolean. Change-Id: I1cde99ce6957f5f8c6be0755a198d80ec8378b3a Reviewed-on: https://pdfium-review.googlesource.com/5653 Reviewed-by: Nicolás Peña <npm@chromium.org> Commit-Queue: dsinclair <dsinclair@chromium.org>
2017-05-17Rename formcalc files to better match contentsDan Sinclair
Most files match the contents. The expression files are named to match their base type even though they contain all the expression subclasses. Change-Id: I3b7705c7b206a9fa1afae8b677f765e8b788e84d Reviewed-on: https://pdfium-review.googlesource.com/5492 Commit-Queue: dsinclair <dsinclair@chromium.org> Reviewed-by: Nicolás Peña <npm@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org>