Solving this problem with regular expressions (the Unix way) is a brilliant idea, you no longer have to worry about the original problem because you're too preoccupied with finding the right pattern.
When you're first starting out with regex, it definitely feels like adding another problem but once you start to get the hang of the basics and save those old expressions you've made, it gets easier imo
2:01 Wow!! Whenever I've applied python code to other python code, I've treated it as a string. That's awesome. I've been meaning to create a tool that'll crawl over github repos and extract code quality metrics (probably to learn which repo's are worth checking out).. this would make that task *much* easier.
Another great topic choice James! Judging by comments, I am probably not alone in wishing for a sophisticated examination into how far you can get with a static analysis of Python code, where construction of the structure of objects is dynamic. Python does not force you to declare class or object members before you introduce them wherever you like. And the type of a variable is often not easy to determine statically, unless the code relies on some heuristics, which could be quite tricky. So how does flake8 (and its underlying tools) deal with that, or does it even attempt to? Does it have any chance of determining whether a mention of myvar.myfeld = 1 is legit, or a typo for myvar.myfield?
This is one of those times where the dynamicism of python makes it hard for tools to give good feedback. If you are adding attributes at runtime, there is no way to know with certainty whether it was a typo or not (something something halting problem...). What you can do is hold yourself to a self imposed static requirement that you will _not_ add any attrobute that is not named e.g. within dunder init of the class. Then if you see a name set outside of init you know it's wrong if it isn't listed in init. Although this kind of thing requires type information in addition to the ast if you want to handle cases outside of class methods,, so you may need to make a mypy plugin to achieve something like that.
@@mCoding Thanks for the reply. I had long ago adopted a policy of adding all attributes in dunder init, as that enables IDE to provide lookup and completion of them, and to generally maintain sanity. It seems to me that we should enable whatever static validation is feasible, and focus runtime testing on runtime (logical) errors. Indeed, type info is the next level up and Python's runtime type resolution can make that difficult, but in a large proportion of cases the lateness of type resolution is not necessary, and it could be deduced or specified in a static manner (especially with type hints).
Honestly, I'd use '\s+.*import' to find local imports (though I never write function-level imports to begin with) Another possibility: AST is fun for writing DSLs. You can write extremely basic transpiler from subset of python to c#/c++ in just an evening (especially if you are not afraid of litterling parenthesis that even lisper would blush).
This was just a logical separation of concerns. Imagine you have 10s or 100s of custom checks. You won't want 100s of functions in your single NodeVisitor class, you will want to break it up into individual units that each do one check. Some checks may also be more complex and require their own helper functions, so using a class makes it easy to tell which helper functions go with which checks.
Yes you can use ast.NodeTransformer to modify the AST, combined with ast.unparse() to get back to text. Note that ast.unparse(ast.parse(...)) does not necessarily give back the original code though, and you may want to run it through black afterwards to normalize the code style. You can also use a library like rope.
I have used this to implement a syntax-level macro facility. I was looking for a way to provide both blocking versions of API calls and nonblocking ones that used async/await. The code ends up being 90% identical, but forcing non-async clients to call async functions is fiddly, and trying to use “await” in a non-coroutine function is a syntax error. ASTs provided the answer. For a nontrivial example of this in action, see my “Seaskirt” wrapper for the Asterisk telephony engine.
tbh, i would have just allowed imports to only be at the very beginning of the files (excluding all comments). That way local imports are illegal, because no non comments are allowed before them and all of this is solved, but the ast is still very interesting
It makes me think - is it possible to bind an ast to runtime? Like - my dream is a function that prints it's source file name and line number when executed. Maybe it's not to do with ASTs... I'm not sure how to approach it.
Hi James, thanks for the video. I'm having troubles running the flake8_mcoding.py file. I get the following error. Does this have to do with my Python version? I have 3.8.10 installed. line 20, in LocalImportsNotAllowed def check(cls, node: ast.FunctionDef, errors: list[Flake8ASTErrorInfo]) -> None: TypeError: 'type' object is not subscriptable
Yes 3.9 is required for the type hints. You can delete all the type hints that use [], or upgrade to 3.9 (or even 3.10 since that is the current version). Sorry for this inconvenience!
you can also do: "from typing import List" and replace "list[...]" with "List[...]"; 3.9 was when python decided to integrate that kind of hinting into the base code, without the need for any imported helpers
for imports specifically, i think it would be easy enough to just... "any time 'from' or 'import' are indented", but i'm sure advanced code has things like conditional imports and whatnot that are indented, so yeah seems you're right overall
funnily enough, i wasn't able to figure out how to do the eval in a way that would *not* find the "sneakily assign eval to something else and use that" (i actually just visit all the names and if they're eval, i yield an error)
Hello. I am working on research in Change Impact Analysis (CIA). If I compare AST of 2 different python file versions? Will this tool be useful for CIA ?
The problem with AST is that you loose all formatting, like whitespace, comments, etc. You could use the CST, but I do not the advantage there (and actually, I think it should be possible to do that with git hooks and some hacks).
r"^\s+?(?!#)(import\s+\w+|from\s+\w+\s+import\s+\w+)" I get what you're saying, but finding local imports with regex might not have been the best example to use, since that one would actually be pretty easy. You would just need to match "(from ...) import ...", preceded by at least one whitespace character. EZPZ
Solving this problem with regular expressions (the Unix way) is a brilliant idea, you no longer have to worry about the original problem because you're too preoccupied with finding the right pattern.
Bill has a problem.
Bill attempts to use regex to solve his problem.
Bill has two problems.
@@Alche_mist tbh regex makes me faster in a lot of ways. I keep handy examples around after figuring them out.
When you're first starting out with regex, it definitely feels like adding another problem but once you start to get the hang of the basics and save those old expressions you've made, it gets easier imo
@@cryp0g00n4 some languages/syntaxes a regex can’t be used. Html for example
he told that was nightmare!
This video is such perfect timing for me, I could watch a whole series on static analysis for python.
wow I didn't expect to see homework assignments in a youtube video. making me feel I should pay for this lesson!
2:01 Wow!! Whenever I've applied python code to other python code, I've treated it as a string. That's awesome. I've been meaning to create a tool that'll crawl over github repos and extract code quality metrics (probably to learn which repo's are worth checking out).. this would make that task *much* easier.
This might also make your task easier!! pypi.org/project/wily/
@@mCoding Never heard of this but looks helpful, thanks again!
I like how the use of "probably" implies that you yourself don't even know why you're doing it
This is a great example for the ast module 😊
Entertaining and informative as always. Thank you for your work.
Another great topic choice James! Judging by comments, I am probably not alone in wishing for a sophisticated examination into how far you can get with a static analysis of Python code, where construction of the structure of objects is dynamic. Python does not force you to declare class or object members before you introduce them wherever you like. And the type of a variable is often not easy to determine statically, unless the code relies on some heuristics, which could be quite tricky. So how does flake8 (and its underlying tools) deal with that, or does it even attempt to? Does it have any chance of determining whether a mention of myvar.myfeld = 1 is legit, or a typo for myvar.myfield?
This is one of those times where the dynamicism of python makes it hard for tools to give good feedback. If you are adding attributes at runtime, there is no way to know with certainty whether it was a typo or not (something something halting problem...). What you can do is hold yourself to a self imposed static requirement that you will _not_ add any attrobute that is not named e.g. within dunder init of the class. Then if you see a name set outside of init you know it's wrong if it isn't listed in init. Although this kind of thing requires type information in addition to the ast if you want to handle cases outside of class methods,, so you may need to make a mypy plugin to achieve something like that.
@@mCoding Thanks for the reply. I had long ago adopted a policy of adding all attributes in dunder init, as that enables IDE to provide lookup and completion of them, and to generally maintain sanity. It seems to me that we should enable whatever static validation is feasible, and focus runtime testing on runtime (logical) errors. Indeed, type info is the next level up and Python's runtime type resolution can make that difficult, but in a large proportion of cases the lateness of type resolution is not necessary, and it could be deduced or specified in a static manner (especially with type hints).
Very well explained and exercises are cool!
Glad you like them!
Honestly, I'd use '\s+.*import' to find local imports (though I never write function-level imports to begin with)
Another possibility: AST is fun for writing DSLs. You can write extremely basic transpiler from subset of python to c#/c++ in just an evening (especially if you are not afraid of litterling parenthesis that even lisper would blush).
'\s+.*import' this is not a good idea as this will match commented imports or a string that has import in it.
@@KASANITEJ Oh no, I will have to manually skip whole 0-2 results (most likely zero).
Missed ya!
Glad to be back doing Python!
Very good timing
I am learning the same stuff but with antlr
Yeah, thanks; I never hears of Antlr.
Very informative. Thanks James.
Is there any reason past style to put the check function in a class instead of making them a function of the class that goes through the ast?
This was just a logical separation of concerns. Imagine you have 10s or 100s of custom checks. You won't want 100s of functions in your single NodeVisitor class, you will want to break it up into individual units that each do one check. Some checks may also be more complex and require their own helper functions, so using a class makes it easy to tell which helper functions go with which checks.
Is it possible to replace the ast?
For example rename all the functions in a file.
Or AST to code.
Yes you can use ast.NodeTransformer to modify the AST, combined with ast.unparse() to get back to text. Note that ast.unparse(ast.parse(...)) does not necessarily give back the original code though, and you may want to run it through black afterwards to normalize the code style. You can also use a library like rope.
I have used this to implement a syntax-level macro facility. I was looking for a way to provide both blocking versions of API calls and nonblocking ones that used async/await. The code ends up being 90% identical, but forcing non-async clients to call async functions is fiddly, and trying to use “await” in a non-coroutine function is a syntax error. ASTs provided the answer.
For a nontrivial example of this in action, see my “Seaskirt” wrapper for the Asterisk telephony engine.
@@lawrencedoliveiro9104 I wanted to do exactly the same as you.
Would you share your code (even if it is broken)
Amazing! Thanks James.
tbh, i would have just allowed imports to only be at the very beginning of the files (excluding all comments). That way local imports are illegal, because no non comments are allowed before them and all of this is solved, but the ast is still very interesting
It makes me think - is it possible to bind an ast to runtime? Like - my dream is a function that prints it's source file name and line number when executed. Maybe it's not to do with ASTs... I'm not sure how to approach it.
Would it be possible to setup pycharm to also use this custom linting checks as part of it's intelisense?
Not that I'm aware of, but maybe by writing a plugin.
Hi James, thanks for the video. I'm having troubles running the flake8_mcoding.py file. I get the following error. Does this have to do with my Python version? I have 3.8.10 installed.
line 20, in LocalImportsNotAllowed
def check(cls, node: ast.FunctionDef, errors: list[Flake8ASTErrorInfo]) -> None:
TypeError: 'type' object is not subscriptable
Yes 3.9 is required for the type hints. You can delete all the type hints that use [], or upgrade to 3.9 (or even 3.10 since that is the current version). Sorry for this inconvenience!
you can also do: "from typing import List" and replace "list[...]" with "List[...]"; 3.9 was when python decided to integrate that kind of hinting into the base code, without the need for any imported helpers
for imports specifically, i think it would be easy enough to just...
"any time 'from' or 'import' are indented", but i'm sure advanced code has things like conditional imports and whatnot that are indented, so yeah seems you're right overall
funnily enough, i wasn't able to figure out how to do the eval in a way that would *not* find the "sneakily assign eval to something else and use that"
(i actually just visit all the names and if they're eval, i yield an error)
that wasn't the edge case, it was globals()['eval'](...)
Hello. I am working on research in Change Impact Analysis (CIA). If I compare AST of 2 different python file versions? Will this tool be useful for CIA ?
Thanks! This is really helpful information.
I really wish version control systems like Git worked on the AST of the code instead of the test. It would have a lot of advantages.
The problem with AST is that you loose all formatting, like whitespace, comments, etc.
You could use the CST, but I do not the advantage there (and actually, I think it should be possible to do that with git hooks and some hacks).
discord gang
I feel smarter with every video and dummer with everything I still don't know. 1010/1010
good shit my man
YO this is great
I just wrote a 10k word article on writing your own ast based linter from scratch, coincidence? :P
I'm not familiar with your article, but feel free to post a link, I'm sure some may find it a useful resource.
the link disappears when I post
@@sadhlife what can I google to find your article?
@@arisweedler4703 learn Python ASTs by building your own linter
Fabulous article! Thanks for going so thoroughly, and please keep the posts coming! 😍
This is what I was looking for from past 3 days. 😊
r"^\s+?(?!#)(import\s+\w+|from\s+\w+\s+import\s+\w+)"
I get what you're saying, but finding local imports with regex might not have been the best example to use, since that one would actually be pretty easy. You would just need to match "(from ...) import ...", preceded by at least one whitespace character. EZPZ
Your regex matches a multiline string that contains a local import! Like """
def f():
import x
"""
Try again!
Help me
Do Javascript AST baby
Lokal import Regex :P
/ +from\s+\w+\s+import.*/
Its better to use cst: th-cam.com/video/ASRqxDGutpA/w-d-xo.html
False, trying to work with *any form* of Regular Expression will be an absolute nightmare