Undebt¶
This is the documentation for Undebt. See below for a list of all of the articles hosted here.
Undebt: Command Line Interface¶
Install it¶
$ git clone https://github.com/Yelp/undebt.git
$ cd undebt
$ pip install .
Read it¶
$ undebt --help
usage: undebt [-h] --pattern MODULE [--verbose] [--dry-run]
[FILE [FILE...]]
positional arguments:
FILE [FILE...]
paths to files to be modified (if not passed uses
stdin)
optional arguments:
-h, --help show this help message and exit
--pattern MODULE, -p MODULE
pattern definition modules
--verbose, -v
--dry-run, -d only print to stdout; do not overwrite files
Try it out¶
$ undebt -p undebt.examples.method_to_function ./tests/inputs/method_to_function_input.txt
$ git diff
diff --git a/tests/inputs/method_to_function_input.txt b/tests/inputs/method_to_function_input.txt
index f268ab9..7681c63 100644
--- a/tests/inputs/method_to_function_input.txt
+++ b/tests/inputs/method_to_function_input.txt
@@ -1,13 +1,14 @@
+from function_lives_here import function
something before code pattern
@decorator([Class])
def some_function(self, abc, xyz):
"""herp the derp while also derping and herping"""
cde = fgh(self.l)
ijk = cde.herp(
- opq_foo=FOO(abc).method()
+ opq_foo=function(FOO(abc))
)['str']
lmn = cde.herp(
- opq_foo=FOO(xyz).method()
+ opq_foo=function(FOO(xyz))
)['str']
bla bla bla
for str_data in derp_data['data']['strs']:
@@ -16,8 +17,8 @@ bla bla bla
rst.uvw(
CTA_BUSINESS_PLATFORM_DISABLED_LOG,
"derp {derp_foo} herp {herp_foo}".format(
- derp_foo=FOO(derp_foo).method(),
- herp_foo=FOO(herp_foo).method(),
+ derp_foo=function(FOO(derp_foo)),
+ herp_foo=function(FOO(herp_foo)),
),
)
something after code pattern
Tips and Tricks¶
Most of these will make use of xargs
Using with grep
/git grep
to find files¶
grep -l <search-text> **/*.css | xargs undebt -p <pattern-module>
# Use git grep if you only want to search tracked files
git grep -l <search-text> | xargs undebt -p <pattern-module>
Using find
to limit to a particular extension¶
find -name '*.js' | xargs grep -l <search-text> | xargs undebt -p <pattern-module>
Using xargs
to work in parallel¶
xargs
takes a -P
flag, which specifies the maximum number of processes
to use.
git grep -l <search-text> | xargs -P <numprocs> undebt -p <pattern-module>
Undebt: Pattern Files¶
Undebt requires a pattern file that describes what to replace and how to replace it. There are two different ways to write pattern files: basic style, and advanced style. Unless you know you need multi-pass parsing, you should use basic style by default.
Basic Style¶
If you don’t know what style you should be using, you should be using basic style. When writing a basic style pattern, you must define the following names in your pattern file:
grammar
defines what pattern you want to replace, and must be a pyparsing grammar object.replace
is a function of one argument, the tokens produced bygrammar
, that returns the string they should be replaced with, orNone
to do nothing (this is the single-argument form—multi-argument is also allowed as documented below).- (optional)
extra
can be set to a string that will be added to the beginning (after the standard Python header) of any file in which there’s at least one match forgrammar
and in whichextra
does not already appear in the header (this feature is commonly used for adding in imports).
That sounds complicated, but it’s actually very simple. To start learning more, it’s recommended you check out Undebt’s example patterns and pattern utilities.
Advanced Style¶
Unlike basic style, advanced style allows you to use custom multi-pass parsing—if that’s not something you need, you should use basic style. When writing an advanced style pattern, you need only define one name:
patterns
is a list of(grammar, replace)
tuples, where each tuple in the list is only run if the previous one succeeded
If patterns
is defined, Undebt will ignore any definitions of grammar
, replace
, and extra
. Instead, all of that information should go into the patterns
list.
As an example, you can replicate the behavior of the basic style extra
by doing the following:
from undebt.pattern.lang.python import HEADER
@tokens_as_list(assert_len=1)
def extra_replace(tokens):
if extra not in tokens[0]:
return tokens[0] + extra + "\n"
else:
return None
patterns.append((HEADER, extra_replace))
Or equivalently but more succinctly:
from undebt.pattern.interface import get_pattern_for_extra
patterns.append(
get_pattern_for_extra(
extra
)
)
Multi-Argument Replace¶
In both styles, when writing a replace
function, it is sometimes useful to have access to the parsing location in the file and/or the text of the original file. If your replace
function takes two arguments, it will be passed location, tokens
, and for three arguments, it will get text, location, tokens
. This will work even if you are using one of the tokens_as_list
or tokens_as_dict
decorators.
Undebt: Examples¶
The undebt.examples package contains various example pattern files. These example patterns can either simply be used as they are to make use of the transformation they describe, or used as templates to build your own pattern files.
undebt.examples.nl_at_eof
¶
(Source)
A toy example to add a new line ("\n"
) to the end of files that lack one.
Example of:
- use of the
tokens_as_list
decorator to define areplace
function with assert checks - negative lookahead using the
~
operator - match any character with
ANY_CHAR
- match the end of a file with
END_OF_FILE
undebt.examples.dbl_quote_docstring
¶
(Source)
Changes all '''
strings that can be changed to """
strings.
Example of:
- return
None
fromreplace
to do nothing - match a
'''
string usingTRIPLE_SGL_QUOTE_STRING
undebt.examples.class_inherit_object
¶
(Source)
Changes classes that inherit from nothing to inherit from object
, which makes sure they behave as Python 3 new-style classes instead of Python 2 old-style classes.
Example of:
Optional
to optionally match something.suppress
method to prevent an object from appearing in the parsed tokensKeyword
to match an individual wordINDENT
to match the beginning of a line and any leading whitespaceNAME
to match any variable name
undebt.examples.hex_to_bitshift
¶
(Source)
Replaces hex flags with bitshift flags.
Example of:
Literal
to match a specific literalCombine
to match a series of tokens without any whitespace in-betweenWord
to match a word made up of a set of characters
undebt.examples.exec_function
¶
(Source)
Changes instances of the Python 2 style exec code in globals, locals
exec statement to the universal Python style exec(code, globals, locals)
(which will work on Python 2.7 and Python 3).
Example of:
- using
tokens_as_list
to assert multiple possible token list lengths ATOM
to match a Python atom
undebt.examples.attribute_to_function
¶
(Source)
Transforms uses of .attribute
into calls to function
, and adds from function_lives_here import function
whenever an instance of function
is added.
Example of:
- use of
extra
to add an import statement - multiple possible patterns using the
|
operator ZeroOrMore
to match any number of a patternPARENS, BRACKETS
to match anything inside matching parentheses and bracketsATOM_BASE
to match a trailerless Python atom
undebt.examples.method_to_function
¶
(Source)
Slightly more complicated version of attribute_to_function
that finds a method call instead of an attribute access, and makes sure that method call is not on self
.
undebt.examples.sqla_count
¶
(Source)
Transforms inefficient SQL alchemy .count()
queries into more efficient .scalar()
queries that don’t create a sub query.
Example of:
- use of the
tokens_as_dict
decorator to define areplace
function with assert checks - grammar element function calling to label tokens in the resulting
tokens_as_dict
dictionary - using
leading_whitespace
andtrailing_whitespace
to extract whitespace in areplace
function
undebt.examples.remove_unused_import
¶
(Source)
Removes from function_lives_here import function
if function
does not appear anywhere else in the file.
Example of:
- using a multi-argument
replace
function - using
HEADER
to analyze the header of a Python file
undebt.examples.contextlib_nested
¶
(Source)
Transforms uses of contextlib.nested
into multiple clauses in a with
statement. Respects usage with as
and without as
.
Example of:
- using
tokens_as_dict
to assert multiple possible dictionary keys EXPR
to match a Python expressionCOMMA_IND, LPAREN_IND, IND_RPAREN
to match optional indentation at particular points
undebt.examples.remove_needless_u_specifier
¶
(Source)
In files where from __future__ import unicode_literals
appears, removes unnecessary u
before strings.
Example of:
- an advanced style pattern file making use of multi-pass parsing
- using
in_string
to determine if the match location is inside of a string originalTextFor
to make grammar elements parse to the original text that matched themSTRING
to match any valid string
undebt.examples.swift
¶
(Source)
Transforms uses of if let where
from Swift 2.2 to the updated syntax in Swift
3.0.
Example of:
- using Undebt to transform a language that isn’t Python
Note: It’s possible that the `EXPR` grammar element used won’t match all Swift expressions; if you are concerned about this, you should define a custom `EXPR` corresponding to the syntax of a Swift expression.
Undebt: Pattern Utilities¶
Undebt’s undebt.pattern
package exposes various modules full of functions and grammar elements for use in writing pattern files, all documented here.
undebt.pattern.util
¶
tokens_as_list(assert_len=None, assert_len_in=None)
Decorator used to wrap replace
functions that converts the parsed tokens into a list. assert_len
checks that the tokens have exactly the given length, while assert_len_in
checks that the length of the tokens is in the provided list.
tokens_as_dict(assert_keys=None, assert_keys_in=None)
Decorator used to wrap replace
functions that converts the parsed tokens into a dictionary, with keys assigned by calling grammar elements with the desired key as the argument. assert_keys
checks that the keys in the token dictionary are a subset of the given keys, while assert_keys_in
checks that the given keys are a subset of the keys in the token dictionary.
condense(item)
Modifies a grammar element to parse to a single token instead of many different tokens by concatenating the parsed tokens together.
addspace(item)
Equivalent to condense
but also adds a space delimiter in-between the concatenated tokens.
quoted(string)
Returns a grammar element that matches a string containing string
.
leading_whitespace(text)
Returns the whitespace at the beginning of text
.
trailing_whitespace(text)
Returns the whitespace at the end of text
.
in_string(location, code)
Determines if, at the given location in the code, there is an enclosing non-multiline string.
fixto(item, output)
Modifies a grammar element to always parse to the same fixed output
.
debug(item)
Modifies a grammar element to print the tokens that it matches.
attach(item, action)
Modifies a grammar element to parse to the result of calling action
on the tokens produced by that grammar element.
sequence(grammar, n)
Creates a grammar element that matches exactly n
of the input grammar.
undebt.pattern.common
¶
INDENT Matches any amount of indentation at the start of a line.
PARENS, BRACKETS, BRACES Grammar elements that match an open parenthesis / bracket / brace to the corresponding closing parenthesis / bracket / brace.
NAME Grammar element that matches a variable name.
DOTTED_NAME
Grammar element to match either one or more NAME
separated by DOT
.
NUM Grammar element to match a number.
STRING Grammar element that matches a string.
TRIPLE_QUOTE_STRING, TRIPLE_DBL_QUOTE_STRING, TRIPLE_SGL_QUOTE_STRING Grammar elements that match different types of multi-line strings.
NL
= Literal("\n")
DOT
= Literal(".")
LPAREN
= Literal("(")
RPAREN
= Literal(")")
COMMA
= Literal(",")
COLON
= Literal(":")
COMMA_IND, LPAREN_IND, IND_RPAREN
Same as COMMA
, LPAREN
, and RPAREN
, but allow for an INDENT
after (for COMMA_IND
and LPAREN_IND
) or before (for IND_RPAREN
).
LINE_START Matches the start of a line, either after a new line, or at the start of the file.
NO_BS_NL Matches a new line not preceded by a backslash.
START_OF_FILE Grammar element that only matches at the very beginning of the file.
END_OF_FILE Grammar element that only matches at the very end of the file.
SKIP_TO_TEXT
Skips parsing position to the next non-whitespace character. To see the skipped text in a token, use originalTextFor(PREVIOUS_GRAMMAR_ELEMENT + SKIP_TO_TEXT)
where PREVIOUS_GRAMMAR_ELEMENT
is just whatever comes before SKIP_TO_TEXT
in your grammar.
SKIP_TO_TEXT_OR_NL
Same as SKIP_TO_TEXT
, but won’t skip over new lines.
ANY_CHAR
Grammar element that matches any one character, including new lines, but not non-newline whitespace. To exclude newlines, just do ~NL + ANY_CHAR
.
WHITE
Normally, whitespace between grammar elements is ignored when they are added together. Put WHITE
in-between to capture that whitespace as a token.
NL_WHITE
Same as WHITE
but also matches new lines.
undebt.pattern.lang
¶
Contains common patterns for a variety of languages. For example, for patterns
specific to the Python grammar, use undebt.pattern.lang.python
.
undebt.pattern.lang.python
¶
EXPR Matches any valid Python expression.
EXPR_LIST, EXPR_IND_LIST
Matches one or more EXPR
separated by COMMA
for EXPR_LIST
or COMMA_IND
for EXPR_IND_LIST
.
PARAM, PARAMS
Matches one of (PARAM
), or at least one of (PARAMS
), the valid Python function parameters (arg
, kwarg=val
, *args
, **kwargs
).
ATOM Matches a single valid Python atom (that is, an expression without operators).
TRAILER, TRAILERS
Matches one of (TRAILER
), or any number of (TRAILERS
), the valid Python trailers (attribute access, function call, indexing, etc.).
ATOM_BASE
Matches an ATOM
without any TRAILERS
attached to it.
OP Matches any valid Python operator.
BINARY_OP Matches a valid Python binary operator.
ASSIGN_OP Matches a valid Python assignment operator.
UNARY_OP Matches a valid Python unary operator.
UNARY_OP_ATOM
Matches an ATOM
potentially preceded by unary operator(s).
HEADER
Matches imports, comments, and strings at the start of a file. Used to determine where to insert the basic style extra
.
undebt.pattern.interface
¶
get_pattern_for_extra(extra)
Returns a (grammar, replace)
tuple describing a pattern to insert extra
after undebt.pattern.python.HEADER
.
get_patterns(*pattern_modules)
Returns a list containing a advanced style patterns
list for each pattern module in pattern_modules
. The resulting list can be passed to undebt.cmd.logic.process
.
undebt.cmd.logic
¶
process(patterns, text)
Where patterns
is a list of advanced style patterns
lists, applies the specified patterns to the given text and returns the transformed version. Usually used in conjunction with undebt.pattern.interface.get_patterns
.
Undebt: Using pyparsing¶
While Undebt’s parsing utilities are very helpful and provide much of the necessary functionality for writing a grammar
, all of the objects are pyparsing objects, and thus it is often necessary and/or useful to use pyparsing
utilities.
While the official pyparsing documentation is a great resource, most of the more advanced utilities there will usually not be necessary. This documentation is an overview of those that are most likely to be useful.
Operators¶
+ (And)
Adding two grammar elements produces a new grammar element that matches the first one, then the second one, with optional intervening whitespace.
| (Or)
Oring two grammar elements produces a new grammar element that attempts to match the first one, then if that fails, attempts to match the second one.
~ (Negative Lookahead)
Inverting a grammar element produces a new grammar element that produces no tokens and matches only if the inverted grammar doesn’t match. Using a negative lookahead also doesn’t advance the current parsing position.
^ (Match Longest)
Similar to |
, but matches the longest of the grammar elements that match, instead of the first grammar element that matches.
Functions¶
Literal(str)
Creates a grammar element that matches str
exactly.
Keyword(str)
Creates a grammar element that matches str
only if it is surrounded by non-letters.
Optional(...)
Creates a grammar element that matches zero or one of the contained grammar element.
ZeroOrMore(...)
Creates a grammar element that matches zero or more of the contained grammar element.
OneOrMore(...)
Creates a grammar element that matches one or more of the contained grammar element.
originalTextFor(...)
Modifies a grammar element to produce only a single token that is the original text that was matched by that grammar element.
Word(charset)
Creates a grammar element that matches a word made of characters in charset
.
SkipTo(...)
Skips parsing position to the next match for the contained objects.
Combine(...)
Forces any grammar elements added together inside of Combine
to not match intervening whitespace and produce only a single token.
Regex(str)
Creates a grammar element that matches str
as a regular expression.
Undebt: Contributing¶
Getting Started¶
Undebt’s development is taking place on Github, so please go ahead and fork the repository if you want to begin contributing.
You’ll then want to get a local copy of the code base:
git clone git@github.com:<your-username>/undebt.git
Getting Setup¶
It is highly recommended that you create a virtual environment before installing the project dependencies.
You can achieve both (create a virtualenv and install dependencies) with:
make dev
Running the Tests¶
Undebt uses tox for testing.
You can run the entire test suite:
make test
Or, run an individual environment:
tox -e py35 # probably need to be virtualenv
Note. If you do not have the required dependencies for each Tox environment, you will receive an error.
Avoid this by passing the --skip-missing-interpreters
option.
Adding Documentation¶
Undebt’s documentation is formatted using reStructuredText and hosted on RTD. Please try to follow the existing style and organization patterns.
You can test your contribution with:
make docs
Your new, local documentation will be available at docs/build/html/
.