Undebt: Pattern Utilities

Undebt’s undebt.pattern package exposes various modules full of functions and grammar elements for use in writing pattern files, all documented here.

undebt.pattern.util

tokens_as_list(assert_len=None, assert_len_in=None)

Decorator used to wrap replace functions that converts the parsed tokens into a list. assert_len checks that the tokens have exactly the given length, while assert_len_in checks that the length of the tokens is in the provided list.

tokens_as_dict(assert_keys=None, assert_keys_in=None)

Decorator used to wrap replace functions that converts the parsed tokens into a dictionary, with keys assigned by calling grammar elements with the desired key as the argument. assert_keys checks that the keys in the token dictionary are a subset of the given keys, while assert_keys_in checks that the given keys are a subset of the keys in the token dictionary.

condense(item)

Modifies a grammar element to parse to a single token instead of many different tokens by concatenating the parsed tokens together.

addspace(item)

Equivalent to condense but also adds a space delimiter in-between the concatenated tokens.

quoted(string)

Returns a grammar element that matches a string containing string.

leading_whitespace(text)

Returns the whitespace at the beginning of text.

trailing_whitespace(text)

Returns the whitespace at the end of text.

in_string(location, code)

Determines if, at the given location in the code, there is an enclosing non-multiline string.

fixto(item, output)

Modifies a grammar element to always parse to the same fixed output.

debug(item)

Modifies a grammar element to print the tokens that it matches.

attach(item, action)

Modifies a grammar element to parse to the result of calling action on the tokens produced by that grammar element.

sequence(grammar, n)

Creates a grammar element that matches exactly n of the input grammar.

undebt.pattern.common

INDENT Matches any amount of indentation at the start of a line.

PARENS, BRACKETS, BRACES Grammar elements that match an open parenthesis / bracket / brace to the corresponding closing parenthesis / bracket / brace.

NAME Grammar element that matches a variable name.

DOTTED_NAME Grammar element to match either one or more NAME separated by DOT.

NUM Grammar element to match a number.

STRING Grammar element that matches a string.

TRIPLE_QUOTE_STRING, TRIPLE_DBL_QUOTE_STRING, TRIPLE_SGL_QUOTE_STRING Grammar elements that match different types of multi-line strings.

NL = Literal("\n")

DOT = Literal(".")

LPAREN = Literal("(")

RPAREN = Literal(")")

COMMA = Literal(",")

COLON = Literal(":")

COMMA_IND, LPAREN_IND, IND_RPAREN Same as COMMA, LPAREN, and RPAREN, but allow for an INDENT after (for COMMA_IND and LPAREN_IND) or before (for IND_RPAREN).

LINE_START Matches the start of a line, either after a new line, or at the start of the file.

NO_BS_NL Matches a new line not preceded by a backslash.

START_OF_FILE Grammar element that only matches at the very beginning of the file.

END_OF_FILE Grammar element that only matches at the very end of the file.

SKIP_TO_TEXT Skips parsing position to the next non-whitespace character. To see the skipped text in a token, use originalTextFor(PREVIOUS_GRAMMAR_ELEMENT + SKIP_TO_TEXT) where PREVIOUS_GRAMMAR_ELEMENT is just whatever comes before SKIP_TO_TEXT in your grammar.

SKIP_TO_TEXT_OR_NL Same as SKIP_TO_TEXT, but won’t skip over new lines.

ANY_CHAR Grammar element that matches any one character, including new lines, but not non-newline whitespace. To exclude newlines, just do ~NL + ANY_CHAR.

WHITE Normally, whitespace between grammar elements is ignored when they are added together. Put WHITE in-between to capture that whitespace as a token.

NL_WHITE Same as WHITE but also matches new lines.

undebt.pattern.lang

Contains common patterns for a variety of languages. For example, for patterns specific to the Python grammar, use undebt.pattern.lang.python.

undebt.pattern.lang.python

EXPR Matches any valid Python expression.

EXPR_LIST, EXPR_IND_LIST Matches one or more EXPR separated by COMMA for EXPR_LIST or COMMA_IND for EXPR_IND_LIST.

PARAM, PARAMS Matches one of (PARAM), or at least one of (PARAMS), the valid Python function parameters (arg, kwarg=val, *args, **kwargs).

ATOM Matches a single valid Python atom (that is, an expression without operators).

TRAILER, TRAILERS Matches one of (TRAILER), or any number of (TRAILERS), the valid Python trailers (attribute access, function call, indexing, etc.).

ATOM_BASE Matches an ATOM without any TRAILERS attached to it.

OP Matches any valid Python operator.

BINARY_OP Matches a valid Python binary operator.

ASSIGN_OP Matches a valid Python assignment operator.

UNARY_OP Matches a valid Python unary operator.

UNARY_OP_ATOM Matches an ATOM potentially preceded by unary operator(s).

HEADER Matches imports, comments, and strings at the start of a file. Used to determine where to insert the basic style extra.

undebt.pattern.interface

get_pattern_for_extra(extra)

Returns a (grammar, replace) tuple describing a pattern to insert extra after undebt.pattern.python.HEADER.

get_patterns(*pattern_modules)

Returns a list containing a advanced style patterns list for each pattern module in pattern_modules. The resulting list can be passed to undebt.cmd.logic.process.

undebt.cmd.logic

process(patterns, text)

Where patterns is a list of advanced style patterns lists, applies the specified patterns to the given text and returns the transformed version. Usually used in conjunction with undebt.pattern.interface.get_patterns.