Yeah I've been bitten by this exact thing and also happens with `Iterable[str]`. I've just resorted to isinstance-ing to explicitly check for str. No great solution to it right now as you've mentioned.
The real solution is to introduce a char type. Then a str would no longer be a sequence of str, it'd be a sequence of char. Maybe in Python 4 ;-) C has a similar problem in that it actually doesn't have a string type at all and there's no way to differentiate between a pointer to a single character and a pointer to what should be interpreted as a "string". Wouldn't the real solution to actually have both types and not conflate the two.
Looks like this works in pyright: from typing import overload, Sequence from warnings import deprecated # python 3.13 + # from typing_extensions import deprecated # python 3.12 - @overload @deprecated('v must not be a string') def first(v: str) -> ...: ... @overload def first(v: Sequence[str]) -> ...: ... def first(v: Sequence[str]) -> ...: # actual code ... first(['foo']) first(('a',)) first('foo')
Topical video, I ran into this exact issue last week. It took me on a long rabbit hole to determine "what even make an variable a Sequence?", and I learned it is a Protocol, and I learned all about Protocol types in Python, and eventually came to the same realization as you, that there needs to be a type "difference" operator, the opposite of Union, to negate `str`. I also tried `Iterable[str]`, (because afterall, the join command only asks for an iterator, not a Sequence), but I got the same issue, `str` is also a valid iterator of strings. Eventually I just went back to list[str]|tuple[str,...].
Similar case is with datetime being a subclass of date but you cannot compare their instances (violating Liskov substitution principle). I believe it could be handled with type checker flags.
Another issue with using list as a parameter type is its invariance. It becomes especially vicious if you type a parameter as `list[StrPath]` (where `StrPath` is an alias for `str | os.PathLike[str]`). There"s still plenty of those in setuptools...
Python has always had this issue, even when typing.Sequence was collections.abc.Sequence. In fact, using just the `typing` module, *Strings are indistinguishable from tuples* 🤦♀ Apart from sprinkling in `isinstance(str)` checks everywhere, there's no way to separate or specify "iterable of strings and not just one string" in a way a human generally understand those words.
A first class fix is sorely needed. Typing "list" is simple. However Sequence and list have different covariance, which can easily lead to further typing challenges if using generics. For now, installing useful-types, and adding "from useful_types import SequenceNotStr" works great (for Sequences, if not Iterables).
Well, it's a hack. Then you have to exclude str every time you use Sequence[str] or Iterator[str] or Iterable[str] or Generator[str] or generic Sequence[T] or ...
I started trying to write a mypy plugin to specifically handle type subtraction and I can get _reasonably_ far with just signature validation -- but not with internal type handling unfortunately I really think this needs to be PEP'd and made first class though
@calebparks8318, I agree that having an idiomatic way is best, e.g. a PEP. I am referring to @yaroslavdon 's suggestion of using a custom tool/plugin to add the feature in the absence of a PEP.
python is ducky and beautiful near pseudocode syntax. strongly typed and performant it aint (which is fine) - so why are we bending the language so hard to be a goose? (or maybe a gopher...)
I tried once the overload approach, in which str input is annotated to return NoReturn, but sadly this will do much worse. The typechecker will simply ignore code after the function call with str input 😅
According to docs Never and NoReturn are supposed to be treated identically by typecheckers. I think Never is aimply newer and more intuitive, especially when used for function arguments @@jh0ker
You forgot another annoyance besides ('echo') not being a tuple, ('echo' 'hello' 'world') is also a single string. I absolutely hate implicit string concatenation
@@philipreinhold1180 How often do you want to put such a long string directly in the source though? That honestly doesn't seem like a common enough usecase to justify such a quirk.
This issue really upsets me too! As a workaround, i now often simply use list[str], and dont bother with Sequence[str] as a type. It is much easier to make the user convert thier iterator to a list, and it is pretty rare when you actually get any performance benefit from passing iterable custom types (even for quite stupidly large lists)
@@anthonywritescodewhoops, I just rewatched the video and realized you already mention type intersection. That's what I get for multitasking! A not so fun way to fix this in the current type system is with overloads. @overload def foo(x: str) -> NoReturn: ... @overload def foo(x: Sequence[str]) -> str: ... def foo(x: Sequence[str]) -> str: assert isinstance(x, str) return "".join(x) reveal_type(foo(["a", "b"])). # str reveal_type(foo("a")) # Never in mypy, NoReturn in pyright This seems to mostly work, but man are overloads ugly
the overload doesn't actually help. Never isn't an error unless it's used (and in many cases indistinguishable from None beyond making mypy not type check the rest of the function (since it doesn't check unreachable code))
You can do that if you are willing to use a library. Using beartype: # Type hint matching any non-string sequence *WHOSE ITEMS ARE ALL STRINGS.* SequenceNonstrOfStr = Annotated[Sequence[str], ~IsInstance[str]]
I had a try but I feel like something is wrong though and I'm also not fully aware of everything python does so it might be totally missing the point. I'm enjoying your videos by the they're great for learning. from typing import Union from collections.abc import Sequence class Minus: def __init__(self,ref,*types) -> None: self.ref=ref self.__args__=Union[*set(types)]
def __instancecheck__(self,obj: object) -> bool: return isinstance(obj,self.ref) and not isinstance(obj,self.__args__)
that perhaps works for runtime checking but it's not helpful in this situation where the desire is static checking. type annotations (generally) are only checked by static tools and not at runtime (yes of course there's wacky things like pydantic and friends but I'm really discussing static checking here)
I always find myself handling such cases in the first lines of the function with: cmd_args = [cmd_args] if isinstance(cmd_args, str) else cmd_args I hate it but it's a necessary evil...
What about a function returning a TypeGuard or TypeIs? It could just return its argument at runtime, though there would still be a small overhead from the function call.
Also got tricked by this, maybe converting to union of popular types like list, tuple, set will cover 95% of users in general case, but it’s obvious trade-off
Hi, could anyone elaborate on the disadvantages of just allowing list[str]? Yes, you're kind of annoying the caller by "forcing" them to provide a list. But why is it so bad? Thanks.
That's true. I guess the relevance of the performance penalty for the full copy depends on the context though. Having re-watched your video, I think I'd go for the union of a hand picked set of sequences, at the expense of custom sequences.
What about overloading with Never or NoReturn, something like @overload def run(items: str) -> Never: ... @overload def run(items: Sequence[str]) -> int: ... def run(items): if isinstance(items, str): raise TypeError("Expected a sequence of strings, not a single string") # Process the sequence of strings ...
yeah and that's what the actual code does -- however this was a lower level function where the command was already a tuple in all actual cases and the test was just incorrect
I see here, try Sequence from Sequence then execute command with arguments expect IndexError execute a joined nothing string command. Not a good or only solution for that.
Maybe Iterable[str] is more precise. Also, the type depends on what shlex.join takes. The docs say "Concatenate the tokens of the list split_command and return a string." Obviously any iterable works. Another option would be to accept *cmd_args: str. I feel like this would make it harder to fudge up the function call.
Iterable[str] has exactly the same problem. and yes *args: str avoids the problem (as others have suggested below) but it isn't a solution to the root cause
@@anthonywritescode very true! I really wish we had intersection types etc in Python. Another thing is, that it is currently not possible to type potentially partial args/kwargs like in curry functions. I played around with typing.ParamSpec and typing.Concatenate, but I think it can't be done.
I would change the implementation of run_command to runtime type check if it was passed a str, and if it was, it changes it to cmd_args=(cmd_args,) making it behave they way the caller intended when they accidentally wrote run_command(('echo')) instead of run_command(('echo', )), then the actual behaviour agrees with the type checking as str becomes a valid thing to pass.
many below have suggested this and it's a terrible idea. it directly encourages the nonsense like ('echo hello',) which is exactly what the types are there to prevent
If I were faced with the task of excluding a type from possible types, in this case I would write something like: ```py @typing.overload def run_command(cmd_args: str) -> typing.Never: ... @typing.overload def run_command(cmd_args: Sequence[str]) -> int: ... def run_command(cmd_args: Sequence[str]) -> int: print(f"$ {shlex.join(cmd_args)}") return subprocess.call(cmd_args) ```
Unless user inputs can never get into that list, you're gonna pay the price of sanitizing them at some point! That has to be dynamic. (Even if no user input can get into the code now, it could in 3 years when someone reuses the code.) In that case, one problem in the code is accepting a list or tuple when you probably need a custom class that forces those str's to follow the rules. @@anthonywritescode
Nothing like being both the host and the topic on a show called "all code sucks".
"Debugging is like being the murderer _and_ the investigator in a crime drama."
Yeah I've been bitten by this exact thing and also happens with `Iterable[str]`.
I've just resorted to isinstance-ing to explicitly check for str. No great solution to it right now as you've mentioned.
The real solution is to introduce a char type. Then a str would no longer be a sequence of str, it'd be a sequence of char. Maybe in Python 4 ;-)
C has a similar problem in that it actually doesn't have a string type at all and there's no way to differentiate between a pointer to a single character and a pointer to what should be interpreted as a "string".
Wouldn't the real solution to actually have both types and not conflate the two.
Immediately get the issue when seeing the title! I think just like we have Union (|), allow something like a minus is really a good idea in this case
Looks like this works in pyright:
from typing import overload, Sequence
from warnings import deprecated # python 3.13 +
# from typing_extensions import deprecated # python 3.12 -
@overload
@deprecated('v must not be a string')
def first(v: str) -> ...: ...
@overload
def first(v: Sequence[str]) -> ...: ...
def first(v: Sequence[str]) -> ...:
# actual code
...
first(['foo'])
first(('a',))
first('foo')
only if you force deprecations as errors
Topical video, I ran into this exact issue last week. It took me on a long rabbit hole to determine "what even make an variable a Sequence?", and I learned it is a Protocol, and I learned all about Protocol types in Python, and eventually came to the same realization as you, that there needs to be a type "difference" operator, the opposite of Union, to negate `str`.
I also tried `Iterable[str]`, (because afterall, the join command only asks for an iterator, not a Sequence), but I got the same issue, `str` is also a valid iterator of strings.
Eventually I just went back to list[str]|tuple[str,...].
Similar case is with datetime being a subclass of date but you cannot compare their instances (violating Liskov substitution principle). I believe it could be handled with type checker flags.
Very proud of myself for figuring what this was gonna be before your demo
Another issue with using list as a parameter type is its invariance. It becomes especially vicious if you type a parameter as `list[StrPath]` (where `StrPath` is an alias for `str | os.PathLike[str]`). There"s still plenty of those in setuptools...
Python has always had this issue, even when typing.Sequence was collections.abc.Sequence. In fact, using just the `typing` module, *Strings are indistinguishable from tuples* 🤦♀
Apart from sprinkling in `isinstance(str)` checks everywhere, there's no way to separate or specify "iterable of strings and not just one string" in a way a human generally understand those words.
Python's Fractal Strings made me laugh a lot when I learned that's what they are
A first class fix is sorely needed. Typing "list" is simple. However Sequence and list have different covariance, which can easily lead to further typing challenges if using generics.
For now, installing useful-types, and adding "from useful_types import SequenceNotStr" works great (for Sequences, if not Iterables).
I thought that by the maturity of mypy we'd have a way to forbid some specific types but I guess I was wrong... Is it something hard to implement?
It seems to be a bit hard; however, the main issue is the lack of an accepted PEP.
can you workaround this by following each `Sequence[str]` with `if isinstance(arg, str): raise ValueError`?
yes as below, but this is going to be an ugly sprinkling everywhere for a problem that _should_ be something statically determinable
Looks like it's a fundamental issue that is caused by the fact that str is a Sequence[str] and not Sequence[char], but that would require python 4 :(
Typecheckers *could* special case it. Or a type-exclusion (the inverse of a `Union`, let's say `Not`) could be added.
Well, it's a hack. Then you have to exclude str every time you use Sequence[str] or Iterator[str] or Iterable[str] or Generator[str] or generic Sequence[T] or ...
What about using `typing.Annotated` and relying on the type-checker for some metadata magic?
I started trying to write a mypy plugin to specifically handle type subtraction and I can get _reasonably_ far with just signature validation -- but not with internal type handling unfortunately
I really think this needs to be PEP'd and made first class though
basedmypy has intersection type support
@@barakatzirIt's always better to have a PEP rather than going with other solutions.
@@barakatzirSo does beartype.
@calebparks8318, I agree that having an idiomatic way is best, e.g. a PEP. I am referring to @yaroslavdon 's suggestion of using a custom tool/plugin to add the feature in the absence of a PEP.
Do you just not want to pay the penalty of a runtime assert that checks `isinstance(cmd_args, str)`?
python is ducky and beautiful near pseudocode syntax. strongly typed and performant it aint (which is fine) - so why are we bending the language so hard to be a goose? (or maybe a gopher...)
I think he just wants it to be enforced statically
an isinstance check when there's a type checker is admitting defeat
@@anthonywritescode I mean that's a perspective. Seems more like cutting off one's nose to spite one's face, but your code!
to be clear you're suggesting adding potentially hundreds of if statements and slow isinstance calls -- and I'm the the one cutting my nose?
I tried once the overload approach, in which str input is annotated to return NoReturn, but sadly this will do much worse. The typechecker will simply ignore code after the function call with str input 😅
What about Never?
According to docs Never and NoReturn are supposed to be treated identically by typecheckers. I think Never is aimply newer and more intuitive, especially when used for function arguments @@jh0ker
You forgot another annoyance besides ('echo') not being a tuple, ('echo' 'hello' 'world') is also a single string. I absolutely hate implicit string concatenation
Implicit string concatenation is great for splitting a string (without newlines) over multiple lines without requiring runtime concatenation.
C has it.
@@philipreinhold1180 How often do you want to put such a long string directly in the source though? That honestly doesn't seem like a common enough usecase to justify such a quirk.
@@volbla all the time when writing informative error messages. It should require a newline though, I never use the form with just a space
This issue really upsets me too! As a workaround, i now often simply use list[str], and dont bother with Sequence[str] as a type. It is much easier to make the user convert thier iterator to a list, and it is pretty rare when you actually get any performance benefit from passing iterable custom types (even for quite stupidly large lists)
forcing a full copy just to match types is so wasteful
Had this exact problem multiple times. Had to add runtime checks that checks for Sequence that are not string. Very annoying.
If we ever get Intersection and Not types, something awkward like `Sequence[str] & ~str` would solve this
Gotta love our type system ¯\_(ツ)_/¯
surely if we got intersection all the other set operators would probably work and difference would be sufficient
@@anthonywritescodewhoops, I just rewatched the video and realized you already mention type intersection. That's what I get for multitasking!
A not so fun way to fix this in the current type system is with overloads.
@overload
def foo(x: str) -> NoReturn: ...
@overload
def foo(x: Sequence[str]) -> str: ...
def foo(x: Sequence[str]) -> str:
assert isinstance(x, str)
return "".join(x)
reveal_type(foo(["a", "b"])). # str
reveal_type(foo("a")) # Never in mypy, NoReturn in pyright
This seems to mostly work, but man are overloads ugly
the overload doesn't actually help. Never isn't an error unless it's used (and in many cases indistinguishable from None beyond making mypy not type check the rest of the function (since it doesn't check unreachable code))
You can do that if you are willing to use a library. Using beartype: # Type hint matching any non-string sequence *WHOSE ITEMS ARE ALL STRINGS.*
SequenceNonstrOfStr = Annotated[Sequence[str], ~IsInstance[str]]
@@anthonywritescode oh wow, yikes
I had a try but I feel like something is wrong though and I'm also not fully aware of everything python does so it might be totally missing the point. I'm enjoying your videos by the they're great for learning.
from typing import Union
from collections.abc import Sequence
class Minus:
def __init__(self,ref,*types) -> None:
self.ref=ref
self.__args__=Union[*set(types)]
def __instancecheck__(self,obj: object) -> bool:
return isinstance(obj,self.ref) and not isinstance(obj,self.__args__)
def __or__(self,type: type|tuple[type]) -> Union[type]:
return Union[self,type]
def __ror__(self,type: type|tuple[type]) -> Union[type]:
return Union[self,type]
def __class_getitem__(cls,types: tuple[type,...]) -> object:
return cls(*types)
isinstance("asdfasdfsfd",Minus[Sequence[str],str])
that perhaps works for runtime checking but it's not helpful in this situation where the desire is static checking. type annotations (generally) are only checked by static tools and not at runtime (yes of course there's wacky things like pydantic and friends but I'm really discussing static checking here)
Maybe there could be mypy plugin that only allows str when stated explicitly.
I always find myself handling such cases in the first lines of the function with:
cmd_args = [cmd_args] if isinstance(cmd_args, str) else cmd_args
I hate it but it's a necessary evil...
now you try and run ['echo hello'] which is exactly the nonsense the type system is trying to protect you from
What about a function returning a TypeGuard or TypeIs? It could just return its argument at runtime, though there would still be a small overhead from the function call.
I'm not sure what you mean -- could you elaborate?
My bad. I thought I had an idea that would work but when I started coding I realized it was incomplete. 😳
Also got tricked by this, maybe converting to union of popular types like list, tuple, set will cover 95% of users in general case, but it’s obvious trade-off
Hi, could anyone elaborate on the disadvantages of just allowing list[str]?
Yes, you're kind of annoying the caller by "forcing" them to provide a list.
But why is it so bad?
Thanks.
you would require a full copy to convert from a perfectly valid other Sequence type
That's true.
I guess the relevance of the performance penalty for the full copy depends on the context though.
Having re-watched your video, I think I'd go for the union of a hand picked set of sequences, at the expense of custom sequences.
@@anthonywritescodeThx for replying btw and for the video of course. Great content, as usual.
List is also invariant. Meaning you can't pass a list of subclasses of string if I recall.
What about overloading with Never or NoReturn, something like
@overload
def run(items: str) -> Never: ...
@overload
def run(items: Sequence[str]) -> int: ...
def run(items):
if isinstance(items, str):
raise TypeError("Expected a sequence of strings, not a single string")
# Process the sequence of strings
...
like 5 or 6 others have suggested this, see my replies below (this is way way worse and doesn't actually solve the problem to begin with!)
@@anthonywritescode yikes ok. that's totally unintuitive, in many other languages you're just not allowed to get out a Never type...
yo dawg, I heard you like strings
One option is `run_command(*args)`, but it does have its own downsides.
yeah and that's what the actual code does -- however this was a lower level function where the command was already a tuple in all actual cases and the test was just incorrect
HOWEVER! (I knew this was coming!~)
I see here, try Sequence from Sequence then execute command with arguments expect IndexError execute a joined nothing string command. Not a good or only solution for that.
what
Maybe Iterable[str] is more precise. Also, the type depends on what shlex.join takes. The docs say "Concatenate the tokens of the list split_command and return a string." Obviously any iterable works.
Another option would be to accept *cmd_args: str. I feel like this would make it harder to fudge up the function call.
Iterable[str] has exactly the same problem. and yes *args: str avoids the problem (as others have suggested below) but it isn't a solution to the root cause
@@anthonywritescode very true! I really wish we had intersection types etc in Python. Another thing is, that it is currently not possible to type potentially partial args/kwargs like in curry functions. I played around with typing.ParamSpec and typing.Concatenate, but I think it can't be done.
yeah there's a special mypy plugin for partial for example -- though I find the need for arbitrary currying rare
@@anthonywritescode lol partial really is special cased in mypy! XD
perhaps functool's single dispatch could be used to catch invocations with string and raise not implemented error
it's trivial to solve at runtime (and not interesting) but does not help you with the static time typing problem
I would change the implementation of run_command to runtime type check if it was passed a str, and if it was, it changes it to cmd_args=(cmd_args,) making it behave they way the caller intended when they accidentally wrote run_command(('echo')) instead of run_command(('echo', )), then the actual behaviour agrees with the type checking as str becomes a valid thing to pass.
now you end up with nonsense ('echo hello',) which is exactly what the types were there to prevent
Check if the input is explicitly a string using `isinstance(cmd_args,str)` and then, if it is, convert it to a list using `cmd_args = [cmd_args]`
many below have suggested this and it's a terrible idea. it directly encourages the nonsense like ('echo hello',) which is exactly what the types are there to prevent
The single-element tuple syntax is the reason I use list syntax everywhere. It just looks so wrong to have func((x,)); func([x]) looks so much better.
The one thing pytype got right 🙈
interesting....
If I were faced with the task of excluding a type from possible types, in this case I would write something like:
```py
@typing.overload
def run_command(cmd_args: str) -> typing.Never: ...
@typing.overload
def run_command(cmd_args: Sequence[str]) -> int: ...
def run_command(cmd_args: Sequence[str]) -> int:
print(f"$ {shlex.join(cmd_args)}")
return subprocess.call(cmd_args)
```
a few others have suggested the same, it's actually worse! check the other comments for an explanation
if isinstance(cmd_arg, str):
cmd_arg = (cmd_arg,)
see the other comments as to why this is worse! several have already suggested this bad idea
Unless user inputs can never get into that list, you're gonna pay the price of sanitizing them at some point! That has to be dynamic. (Even if no user input can get into the code now, it could in 3 years when someone reuses the code.)
In that case, one problem in the code is accepting a list or tuple when you probably need a custom class that forces those str's to follow the rules.
@@anthonywritescode
if isinstance(cmd_args, str): cmd_args = (cmd_args,)
It should be type safe