summaryrefslogtreecommitdiff
path: root/ext/ply/CHANGES
diff options
context:
space:
mode:
Diffstat (limited to 'ext/ply/CHANGES')
-rw-r--r--ext/ply/CHANGES579
1 files changed, 579 insertions, 0 deletions
diff --git a/ext/ply/CHANGES b/ext/ply/CHANGES
index 9c7334066..d88f3e5d6 100644
--- a/ext/ply/CHANGES
+++ b/ext/ply/CHANGES
@@ -1,3 +1,582 @@
+Version 2.3
+-----------------------------
+02/20/07: beazley
+ Fixed a bug with character literals if the literal '.' appeared as the
+ last symbol of a grammar rule. Reported by Ales Smrcka.
+
+02/19/07: beazley
+ Warning messages are now redirected to stderr instead of being printed
+ to standard output.
+
+02/19/07: beazley
+ Added a warning message to lex.py if it detects a literal backslash
+ character inside the t_ignore declaration. This is to help
+ problems that might occur if someone accidentally defines t_ignore
+ as a Python raw string. For example:
+
+ t_ignore = r' \t'
+
+ The idea for this is from an email I received from David Cimimi who
+ reported bizarre behavior in lexing as a result of defining t_ignore
+ as a raw string by accident.
+
+02/18/07: beazley
+ Performance improvements. Made some changes to the internal
+ table organization and LR parser to improve parsing performance.
+
+02/18/07: beazley
+ Automatic tracking of line number and position information must now be
+ enabled by a special flag to parse(). For example:
+
+ yacc.parse(data,tracking=True)
+
+ In many applications, it's just not that important to have the
+ parser automatically track all line numbers. By making this an
+ optional feature, it allows the parser to run significantly faster
+ (more than a 20% speed increase in many cases). Note: positional
+ information is always available for raw tokens---this change only
+ applies to positional information associated with nonterminal
+ grammar symbols.
+ *** POTENTIAL INCOMPATIBILITY ***
+
+02/18/07: beazley
+ Yacc no longer supports extended slices of grammar productions.
+ However, it does support regular slices. For example:
+
+ def p_foo(p):
+ '''foo: a b c d e'''
+ p[0] = p[1:3]
+
+ This change is a performance improvement to the parser--it streamlines
+ normal access to the grammar values since slices are now handled in
+ a __getslice__() method as opposed to __getitem__().
+
+02/12/07: beazley
+ Fixed a bug in the handling of token names when combined with
+ start conditions. Bug reported by Todd O'Bryan.
+
+Version 2.2
+------------------------------
+11/01/06: beazley
+ Added lexpos() and lexspan() methods to grammar symbols. These
+ mirror the same functionality of lineno() and linespan(). For
+ example:
+
+ def p_expr(p):
+ 'expr : expr PLUS expr'
+ p.lexpos(1) # Lexing position of left-hand-expression
+ p.lexpos(1) # Lexing position of PLUS
+ start,end = p.lexspan(3) # Lexing range of right hand expression
+
+11/01/06: beazley
+ Minor change to error handling. The recommended way to skip characters
+ in the input is to use t.lexer.skip() as shown here:
+
+ def t_error(t):
+ print "Illegal character '%s'" % t.value[0]
+ t.lexer.skip(1)
+
+ The old approach of just using t.skip(1) will still work, but won't
+ be documented.
+
+10/31/06: beazley
+ Discarded tokens can now be specified as simple strings instead of
+ functions. To do this, simply include the text "ignore_" in the
+ token declaration. For example:
+
+ t_ignore_cppcomment = r'//.*'
+
+ Previously, this had to be done with a function. For example:
+
+ def t_ignore_cppcomment(t):
+ r'//.*'
+ pass
+
+ If start conditions/states are being used, state names should appear
+ before the "ignore_" text.
+
+10/19/06: beazley
+ The Lex module now provides support for flex-style start conditions
+ as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html.
+ Please refer to this document to understand this change note. Refer to
+ the PLY documentation for PLY-specific explanation of how this works.
+
+ To use start conditions, you first need to declare a set of states in
+ your lexer file:
+
+ states = (
+ ('foo','exclusive'),
+ ('bar','inclusive')
+ )
+
+ This serves the same role as the %s and %x specifiers in flex.
+
+ One a state has been declared, tokens for that state can be
+ declared by defining rules of the form t_state_TOK. For example:
+
+ t_PLUS = '\+' # Rule defined in INITIAL state
+ t_foo_NUM = '\d+' # Rule defined in foo state
+ t_bar_NUM = '\d+' # Rule defined in bar state
+
+ t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar
+ t_ANY_NUM = '\d+' # Rule defined in all states
+
+ In addition to defining tokens for each state, the t_ignore and t_error
+ specifications can be customized for specific states. For example:
+
+ t_foo_ignore = " " # Ignored characters for foo state
+ def t_bar_error(t):
+ # Handle errors in bar state
+
+ With token rules, the following methods can be used to change states
+
+ def t_TOKNAME(t):
+ t.lexer.begin('foo') # Begin state 'foo'
+ t.lexer.push_state('foo') # Begin state 'foo', push old state
+ # onto a stack
+ t.lexer.pop_state() # Restore previous state
+ t.lexer.current_state() # Returns name of current state
+
+ These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and
+ yy_top_state() functions in flex.
+
+ The use of start states can be used as one way to write sub-lexers.
+ For example, the lexer or parser might instruct the lexer to start
+ generating a different set of tokens depending on the context.
+
+ example/yply/ylex.py shows the use of start states to grab C/C++
+ code fragments out of traditional yacc specification files.
+
+ *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also
+ discussed various aspects of the design.
+
+10/19/06: beazley
+ Minor change to the way in which yacc.py was reporting shift/reduce
+ conflicts. Although the underlying LALR(1) algorithm was correct,
+ PLY was under-reporting the number of conflicts compared to yacc/bison
+ when precedence rules were in effect. This change should make PLY
+ report the same number of conflicts as yacc.
+
+10/19/06: beazley
+ Modified yacc so that grammar rules could also include the '-'
+ character. For example:
+
+ def p_expr_list(p):
+ 'expression-list : expression-list expression'
+
+ Suggested by Oldrich Jedlicka.
+
+10/18/06: beazley
+ Attribute lexer.lexmatch added so that token rules can access the re
+ match object that was generated. For example:
+
+ def t_FOO(t):
+ r'some regex'
+ m = t.lexer.lexmatch
+ # Do something with m
+
+
+ This may be useful if you want to access named groups specified within
+ the regex for a specific token. Suggested by Oldrich Jedlicka.
+
+10/16/06: beazley
+ Changed the error message that results if an illegal character
+ is encountered and no default error function is defined in lex.
+ The exception is now more informative about the actual cause of
+ the error.
+
+Version 2.1
+------------------------------
+10/02/06: beazley
+ The last Lexer object built by lex() can be found in lex.lexer.
+ The last Parser object built by yacc() can be found in yacc.parser.
+
+10/02/06: beazley
+ New example added: examples/yply
+
+ This example uses PLY to convert Unix-yacc specification files to
+ PLY programs with the same grammar. This may be useful if you
+ want to convert a grammar from bison/yacc to use with PLY.
+
+10/02/06: beazley
+ Added support for a start symbol to be specified in the yacc
+ input file itself. Just do this:
+
+ start = 'name'
+
+ where 'name' matches some grammar rule. For example:
+
+ def p_name(p):
+ 'name : A B C'
+ ...
+
+ This mirrors the functionality of the yacc %start specifier.
+
+09/30/06: beazley
+ Some new examples added.:
+
+ examples/GardenSnake : A simple indentation based language similar
+ to Python. Shows how you might handle
+ whitespace. Contributed by Andrew Dalke.
+
+ examples/BASIC : An implementation of 1964 Dartmouth BASIC.
+ Contributed by Dave against his better
+ judgement.
+
+09/28/06: beazley
+ Minor patch to allow named groups to be used in lex regular
+ expression rules. For example:
+
+ t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)'''
+
+ Patch submitted by Adam Ring.
+
+09/28/06: beazley
+ LALR(1) is now the default parsing method. To use SLR, use
+ yacc.yacc(method="SLR"). Note: there is no performance impact
+ on parsing when using LALR(1) instead of SLR. However, constructing
+ the parsing tables will take a little longer.
+
+09/26/06: beazley
+ Change to line number tracking. To modify line numbers, modify
+ the line number of the lexer itself. For example:
+
+ def t_NEWLINE(t):
+ r'\n'
+ t.lexer.lineno += 1
+
+ This modification is both cleanup and a performance optimization.
+ In past versions, lex was monitoring every token for changes in
+ the line number. This extra processing is unnecessary for a vast
+ majority of tokens. Thus, this new approach cleans it up a bit.
+
+ *** POTENTIAL INCOMPATIBILITY ***
+ You will need to change code in your lexer that updates the line
+ number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1"
+
+09/26/06: beazley
+ Added the lexing position to tokens as an attribute lexpos. This
+ is the raw index into the input text at which a token appears.
+ This information can be used to compute column numbers and other
+ details (e.g., scan backwards from lexpos to the first newline
+ to get a column position).
+
+09/25/06: beazley
+ Changed the name of the __copy__() method on the Lexer class
+ to clone(). This is used to clone a Lexer object (e.g., if
+ you're running different lexers at the same time).
+
+09/21/06: beazley
+ Limitations related to the use of the re module have been eliminated.
+ Several users reported problems with regular expressions exceeding
+ more than 100 named groups. To solve this, lex.py is now capable
+ of automatically splitting its master regular regular expression into
+ smaller expressions as needed. This should, in theory, make it
+ possible to specify an arbitrarily large number of tokens.
+
+09/21/06: beazley
+ Improved error checking in lex.py. Rules that match the empty string
+ are now rejected (otherwise they cause the lexer to enter an infinite
+ loop). An extra check for rules containing '#' has also been added.
+ Since lex compiles regular expressions in verbose mode, '#' is interpreted
+ as a regex comment, it is critical to use '\#' instead.
+
+09/18/06: beazley
+ Added a @TOKEN decorator function to lex.py that can be used to
+ define token rules where the documentation string might be computed
+ in some way.
+
+ digit = r'([0-9])'
+ nondigit = r'([_A-Za-z])'
+ identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)'
+
+ from ply.lex import TOKEN
+
+ @TOKEN(identifier)
+ def t_ID(t):
+ # Do whatever
+
+ The @TOKEN decorator merely sets the documentation string of the
+ associated token function as needed for lex to work.
+
+ Note: An alternative solution is the following:
+
+ def t_ID(t):
+ # Do whatever
+
+ t_ID.__doc__ = identifier
+
+ Note: Decorators require the use of Python 2.4 or later. If compatibility
+ with old versions is needed, use the latter solution.
+
+ The need for this feature was suggested by Cem Karan.
+
+09/14/06: beazley
+ Support for single-character literal tokens has been added to yacc.
+ These literals must be enclosed in quotes. For example:
+
+ def p_expr(p):
+ "expr : expr '+' expr"
+ ...
+
+ def p_expr(p):
+ 'expr : expr "-" expr'
+ ...
+
+ In addition to this, it is necessary to tell the lexer module about
+ literal characters. This is done by defining the variable 'literals'
+ as a list of characters. This should be defined in the module that
+ invokes the lex.lex() function. For example:
+
+ literals = ['+','-','*','/','(',')','=']
+
+ or simply
+
+ literals = '+=*/()='
+
+ It is important to note that literals can only be a single character.
+ When the lexer fails to match a token using its normal regular expression
+ rules, it will check the current character against the literal list.
+ If found, it will be returned with a token type set to match the literal
+ character. Otherwise, an illegal character will be signalled.
+
+
+09/14/06: beazley
+ Modified PLY to install itself as a proper Python package called 'ply'.
+ This will make it a little more friendly to other modules. This
+ changes the usage of PLY only slightly. Just do this to import the
+ modules
+
+ import ply.lex as lex
+ import ply.yacc as yacc
+
+ Alternatively, you can do this:
+
+ from ply import *
+
+ Which imports both the lex and yacc modules.
+ Change suggested by Lee June.
+
+09/13/06: beazley
+ Changed the handling of negative indices when used in production rules.
+ A negative production index now accesses already parsed symbols on the
+ parsing stack. For example,
+
+ def p_foo(p):
+ "foo: A B C D"
+ print p[1] # Value of 'A' symbol
+ print p[2] # Value of 'B' symbol
+ print p[-1] # Value of whatever symbol appears before A
+ # on the parsing stack.
+
+ p[0] = some_val # Sets the value of the 'foo' grammer symbol
+
+ This behavior makes it easier to work with embedded actions within the
+ parsing rules. For example, in C-yacc, it is possible to write code like
+ this:
+
+ bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; }
+
+ In this example, the printf() code executes immediately after A has been
+ parsed. Within the embedded action code, $1 refers to the A symbol on
+ the stack.
+
+ To perform this equivalent action in PLY, you need to write a pair
+ of rules like this:
+
+ def p_bar(p):
+ "bar : A seen_A B"
+ do_stuff
+
+ def p_seen_A(p):
+ "seen_A :"
+ print "seen an A =", p[-1]
+
+ The second rule "seen_A" is merely a empty production which should be
+ reduced as soon as A is parsed in the "bar" rule above. The use
+ of the negative index p[-1] is used to access whatever symbol appeared
+ before the seen_A symbol.
+
+ This feature also makes it possible to support inherited attributes.
+ For example:
+
+ def p_decl(p):
+ "decl : scope name"
+
+ def p_scope(p):
+ """scope : GLOBAL
+ | LOCAL"""
+ p[0] = p[1]
+
+ def p_name(p):
+ "name : ID"
+ if p[-1] == "GLOBAL":
+ # ...
+ else if p[-1] == "LOCAL":
+ #...
+
+ In this case, the name rule is inheriting an attribute from the
+ scope declaration that precedes it.
+
+ *** POTENTIAL INCOMPATIBILITY ***
+ If you are currently using negative indices within existing grammar rules,
+ your code will break. This should be extremely rare if non-existent in
+ most cases. The argument to various grammar rules is not usually not
+ processed in the same way as a list of items.
+
+Version 2.0
+------------------------------
+09/07/06: beazley
+ Major cleanup and refactoring of the LR table generation code. Both SLR
+ and LALR(1) table generation is now performed by the same code base with
+ only minor extensions for extra LALR(1) processing.
+
+09/07/06: beazley
+ Completely reimplemented the entire LALR(1) parsing engine to use the
+ DeRemer and Pennello algorithm for calculating lookahead sets. This
+ significantly improves the performance of generating LALR(1) tables
+ and has the added feature of actually working correctly! If you
+ experienced weird behavior with LALR(1) in prior releases, this should
+ hopefully resolve all of those problems. Many thanks to
+ Andrew Waters and Markus Schoepflin for submitting bug reports
+ and helping me test out the revised LALR(1) support.
+
+Version 1.8
+------------------------------
+08/02/06: beazley
+ Fixed a problem related to the handling of default actions in LALR(1)
+ parsing. If you experienced subtle and/or bizarre behavior when trying
+ to use the LALR(1) engine, this may correct those problems. Patch
+ contributed by Russ Cox. Note: This patch has been superceded by
+ revisions for LALR(1) parsing in Ply-2.0.
+
+08/02/06: beazley
+ Added support for slicing of productions in yacc.
+ Patch contributed by Patrick Mezard.
+
+Version 1.7
+------------------------------
+03/02/06: beazley
+ Fixed infinite recursion problem ReduceToTerminals() function that
+ would sometimes come up in LALR(1) table generation. Reported by
+ Markus Schoepflin.
+
+03/01/06: beazley
+ Added "reflags" argument to lex(). For example:
+
+ lex.lex(reflags=re.UNICODE)
+
+ This can be used to specify optional flags to the re.compile() function
+ used inside the lexer. This may be necessary for special situations such
+ as processing Unicode (e.g., if you want escapes like \w and \b to consult
+ the Unicode character property database). The need for this suggested by
+ Andreas Jung.
+
+03/01/06: beazley
+ Fixed a bug with an uninitialized variable on repeated instantiations of parser
+ objects when the write_tables=0 argument was used. Reported by Michael Brown.
+
+03/01/06: beazley
+ Modified lex.py to accept Unicode strings both as the regular expressions for
+ tokens and as input. Hopefully this is the only change needed for Unicode support.
+ Patch contributed by Johan Dahl.
+
+03/01/06: beazley
+ Modified the class-based interface to work with new-style or old-style classes.
+ Patch contributed by Michael Brown (although I tweaked it slightly so it would work
+ with older versions of Python).
+
+Version 1.6
+------------------------------
+05/27/05: beazley
+ Incorporated patch contributed by Christopher Stawarz to fix an extremely
+ devious bug in LALR(1) parser generation. This patch should fix problems
+ numerous people reported with LALR parsing.
+
+05/27/05: beazley
+ Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav,
+ and Thad Austin.
+
+05/27/05: beazley
+ Added outputdir option to yacc() to control output directory. Contributed
+ by Christopher Stawarz.
+
+05/27/05: beazley
+ Added rununit.py test script to run tests using the Python unittest module.
+ Contributed by Miki Tebeka.
+
+Version 1.5
+------------------------------
+05/26/04: beazley
+ Major enhancement. LALR(1) parsing support is now working.
+ This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu)
+ and optimized by David Beazley. To use LALR(1) parsing do
+ the following:
+
+ yacc.yacc(method="LALR")
+
+ Computing LALR(1) parsing tables takes about twice as long as
+ the default SLR method. However, LALR(1) allows you to handle
+ more complex grammars. For example, the ANSI C grammar
+ (in example/ansic) has 13 shift-reduce conflicts with SLR, but
+ only has 1 shift-reduce conflict with LALR(1).
+
+05/20/04: beazley
+ Added a __len__ method to parser production lists. Can
+ be used in parser rules like this:
+
+ def p_somerule(p):
+ """a : B C D
+ | E F"
+ if (len(p) == 3):
+ # Must have been first rule
+ elif (len(p) == 2):
+ # Must be second rule
+
+ Suggested by Joshua Gerth and others.
+
+Version 1.4
+------------------------------
+04/23/04: beazley
+ Incorporated a variety of patches contributed by Eric Raymond.
+ These include:
+
+ 0. Cleans up some comments so they don't wrap on an 80-column display.
+ 1. Directs compiler errors to stderr where they belong.
+ 2. Implements and documents automatic line counting when \n is ignored.
+ 3. Changes the way progress messages are dumped when debugging is on.
+ The new format is both less verbose and conveys more information than
+ the old, including shift and reduce actions.
+
+04/23/04: beazley
+ Added a Python setup.py file to simply installation. Contributed
+ by Adam Kerrison.
+
+04/23/04: beazley
+ Added patches contributed by Adam Kerrison.
+
+ - Some output is now only shown when debugging is enabled. This
+ means that PLY will be completely silent when not in debugging mode.
+
+ - An optional parameter "write_tables" can be passed to yacc() to
+ control whether or not parsing tables are written. By default,
+ it is true, but it can be turned off if you don't want the yacc
+ table file. Note: disabling this will cause yacc() to regenerate
+ the parsing table each time.
+
+04/23/04: beazley
+ Added patches contributed by David McNab. This patch addes two
+ features:
+
+ - The parser can be supplied as a class instead of a module.
+ For an example of this, see the example/classcalc directory.
+
+ - Debugging output can be directed to a filename of the user's
+ choice. Use
+
+ yacc(debugfile="somefile.out")
+
+
Version 1.3
------------------------------
12/10/02: jmdyck