AST¶
The whole idea of the nsre.ast
module is to help you building the
RegExp’s graph using regular Python syntax.
If you are wondering what the fuck is this thing about graphs, don’t worry too much, let’s just say that you need the AST to build a regular expression. You can know more in the inspirational article.
Usage¶
The idea was to create something convenient and familiar to use. As you might have read, there is on one side matchers which will validate tokens and on the other side AST nodes that will help you build the regular expression itself.
Let’s consider this:
from nsre import *
hi = Final(Eq("h")) + Final(Eq("e")) + Final(Eq("!"))
assert re = RegExp.from_ast(hi).match("hi!")
What you see here is
Eq(...)
is a matcher that matches a token equal to the reference passed to its constructor
Final(...)
is a final node, aka a node that will be used for matching
X + Y
by adding together two nodes, you expect a concatenation.
Example¶
You can look into the nsre.lib
module to see many examples of regular
expressions being built. Let’s have a look at the email parsing expression.
One of the largest advantages of this is that you can re-use the same AST several times to build your regular expressions. Let’s say that you already have an expression able to match a domain name, you can use it in an email address expression.
email_part = ascii_alnums
email_sep = Final(In(["+", "."]))
email_user = email_part + AnyNumber(email_sep + email_part)
email = email_user + seq("@") + domain_name
re = RegExp.from_ast(email)
assert re.match('remy@with-madrid.com')
Please note that here nsre.shortcuts.seq()
is a shortcut that will
automatically create a concatenation of Final
nodes with a Eq
matcher.
Operations¶
Let’s review all the operations that you can do with nodes. In those examples,
let’s suppose that node_a
would match the letter "a"
,
node_b
the letter "b"
, and so forth.
Concatenation¶
Expect two nodes to be consecutive using the +
operator.
exp = node_a + node_b + node_c
# Would match "abc"
Alternation¶
Expect either one node either the other using the |
operator.
exp = node_a + (node_b | node_c)
# Would match either "ab" or "ac"
Multiplication¶
Multiply a node in order to indicate repetition. You can multiply by:
An int, to get exactly this number of occurrences
slice(X, None)
to get from X to +inf occurrencesslice(None, X)
to get from 0 to X occurrencesslice(X, Y)
to get from X to Y occurrences
exp = node_a * slice(1, 3)
# Would match "a", "aa" or "aaa"
Capture¶
To report the content that was matched into a capture group, simply name the capture group using brackets.
exp = node_a + (node_b | node_c)['foo']
# For "ab" group "foo" would contain "b"
Reference¶
On top of using the Python syntax as shortcuts, you can directly create instances of nodes. It’s sometimes more convenient to do so.
-
class
nsre.ast.
Node
¶ Root class for a node. It has no real usage by itself but that’s useful to define operators.
Assembling the nodes together will build an AST which the RegExp class will then turn into a compiled regular expression.
Example:
>>> from nsre import * >>> root = Final(Eq('a')) + Final(Eq('b')) * slice(1, 5) >>> re = RegExp.from_ast(root) >>> assert re.match('abb') >>> assert not re.match('a')
-
copy
()¶ Generates a copy of the node. This is done because of the way the graph generation works: it will put all the nodes in a graph so all of them will need a unique ID in case the same sub-tree was used several cases.
-
-
class
nsre.ast.
Final
(statement: nsre.matchers.Matcher[~Tok, ~Out][Tok, Out])¶ In the end, all nodes in the graph should be Final(). They allow the engine to call the matcher (stored in statement here).
-
class
nsre.ast.
Concatenation
(left: nsre.ast.Node, right: nsre.ast.Node)¶ Represents a concatenation of the left and the right nodes.
-
class
nsre.ast.
Alternation
(left: nsre.ast.Node, right: nsre.ast.Node)¶ Represents an alternation of the left and the right nodes
-
class
nsre.ast.
Maybe
(statement: nsre.ast.Node)¶ Represents 0 or 1 occurrence of the statement
-
class
nsre.ast.
AnyNumber
(statement: nsre.ast.Node)¶ Represents 0 to +inf occurrences of the statement
-
class
nsre.ast.
Capture
(name: str, statement: nsre.ast.Node)¶ Represents a capture group around the statement
-
copy
()¶ Generates a copy of the node. This is done because of the way the graph generation works: it will put all the nodes in a graph so all of them will need a unique ID in case the same sub-tree was used several cases.
-