RegExp

The nsre.regexp module contains the algorithms for executing the regular expressions themselves and then transforming them into matches. The centerpiece of this is RegExp.

Please note that the reference talks about a lot of private methods which are documented in the code but not displayed here. You should focus on RegExp.from_ast() and RegExp.match().

Reference

class nsre.regexp.RegExp(graph: networkx.classes.digraph.DiGraph)

Core of the RegExp system. Don’t instantiate this directly. There is so far only one way to create an instance easily but in the future there will be more ways, like a parser for good old-school regular expressions or a new specific DSL.

Here’s an usage example.

>>> from nsre import *
>>> re = RegExp.from_ast(anything()['user'] + seq("@") + anything()['domain'])
>>> m = re.match('remy.sanchez@with-madrid.com', join_trails=True)
>>> assert m['user'].trail == 'remy.sanchez'
>>> assert m['domain'].trail == 'with-madrid.com'
classmethod from_ast(root: nsre.ast.Node[~Tok, ~Out][Tok, Out]) → nsre.regexp.RegExp[~Tok, ~Out][Tok, Out]

Use this to generate your regular expression. To generate the AST, have a look at nsre.ast and nsre.shortcuts modules.

Parameters

root – Root node of your expression.

match(seq: Sequence[Tok], join_trails: bool = False) → nsre.regexp.MatchList[nsre.regexp.Match[~Out]][nsre.regexp.Match[~Out][Out]]

For a given sequence of tokens, generates all the matches that were detected.

Notes

If you think your regular expression like you would think of using the re module then you’re going to have 0 or 1 match. However, if your Matchers can match several options then you might end up with two or more matches at the same time.

For this reason, the MatchList object provides shortcuts so you don’t have to skim through the list of matches if you don’t want to.

Please note that inside a match, the capture groups do not match in “parallel”, only root MatchList provides several matching options. The inside of them is just their content.

Parameters
  • seq – Sequence that you would like to test

  • join_trails – If all your output items are going to be characters, you can set this to true in order to receive trails that are strings instead of them being character lists.

class nsre.regexp.Match(start_pos: int, children: Mapping[str, List[Match]], trail: Sequence[Out])

Represents a match for a capture group in the regular expression.

class nsre.regexp.MatchList

List of matches. It’s just a convenience around a tuple in order to facilitate getting a specific group in the first item of the list.

nsre.regexp.ast_to_graph(root: nsre.ast.Node) → networkx.classes.digraph.DiGraph

You will create your regular expression with a specific syntax which is transformed into an AST, however the regular expression engine expects to navigate in a graph. As it is too complicated to navigate inside the AST directly, this function transforms the AST into an actual graph.

Notes

Since each node, except the nsre.ast.Final ones, have children nodes, the idea is to insert nodes into the graph one at a time and then to work on those new nodes to transform it into its content.

Also, there is implicitly a _Initial and a _Terminal node. The graph exploration will start from the initial node and the regular expression will be considered to be a match if when the input sequence is entirely consumed you can transition to the terminal node.

By example, you got a node A which is a concatenation of B and C. Suppose that the code looks like this:

>>> from nsre import *
>>> c = Final(Eq('c'))
>>> b = Final(Eq('b'))
>>> a = c + b
>>> g = ast_to_graph(a)

Then the first graph you’re going to get is

_Initial -> A -> _Terminal

But then the algorithm is going to transform A into its content and you’ll end up with the new graph

_Initial -> B -> C -> _Terminal

And so on if B and C have content of their own (they don’t in the current example).

The way to transform a node into its content depends on the node type, of course. That’s why you’ll find in this file a bunch of _explore_* methods which are actually the ways to transform a specific node into a graph.

The overall algorithm here is to have a to-do list (the “explore” variable) which contains the set of currently unexplored nodes. When a node is transformed into its content, the newly inserted nodes are also added to the to-do list and will be explored at the next iteration. This goes on and on until the whole AST has been transformed into a graph.

Another detail is that capture groups are indicated with a start and a stop marker on the edges. Each edge can potentially contain in its data a “start_captures” or a “stop_captures” list. They contain the name, in order, of capture groups to start or stop. The capture should start right after the start and before the stop marker.

See also

_explore_concatenation(), _explore_alternation(), _explore_maybe(), _explore_any_number(), _explore_capture()