Sequence[str] is a tiny landmine (all code sucks) #10

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ธ.ค. 2024

ความคิดเห็น • 109

  • @sethmlarson
    @sethmlarson หลายเดือนก่อน +45

    Nothing like being both the host and the topic on a show called "all code sucks".

    • @ryanoftinellb
      @ryanoftinellb หลายเดือนก่อน +7

      "Debugging is like being the murderer _and_ the investigator in a crime drama."

  • @fridgedigga
    @fridgedigga หลายเดือนก่อน +24

    Yeah I've been bitten by this exact thing and also happens with `Iterable[str]`.
    I've just resorted to isinstance-ing to explicitly check for str. No great solution to it right now as you've mentioned.

  • @FunkyELF
    @FunkyELF หลายเดือนก่อน +34

    The real solution is to introduce a char type. Then a str would no longer be a sequence of str, it'd be a sequence of char. Maybe in Python 4 ;-)
    C has a similar problem in that it actually doesn't have a string type at all and there's no way to differentiate between a pointer to a single character and a pointer to what should be interpreted as a "string".
    Wouldn't the real solution to actually have both types and not conflate the two.

  • @shunitsu__
    @shunitsu__ หลายเดือนก่อน +11

    Immediately get the issue when seeing the title! I think just like we have Union (|), allow something like a minus is really a good idea in this case

  • @troubledCod7-hy2ex
    @troubledCod7-hy2ex หลายเดือนก่อน +10

    Looks like this works in pyright:
    from typing import overload, Sequence
    from warnings import deprecated # python 3.13 +
    # from typing_extensions import deprecated # python 3.12 -
    @overload
    @deprecated('v must not be a string')
    def first(v: str) -> ...: ...
    @overload
    def first(v: Sequence[str]) -> ...: ...
    def first(v: Sequence[str]) -> ...:
    # actual code
    ...
    first(['foo'])
    first(('a',))
    first('foo')

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +2

      only if you force deprecations as errors

  • @flubba86
    @flubba86 หลายเดือนก่อน +1

    Topical video, I ran into this exact issue last week. It took me on a long rabbit hole to determine "what even make an variable a Sequence?", and I learned it is a Protocol, and I learned all about Protocol types in Python, and eventually came to the same realization as you, that there needs to be a type "difference" operator, the opposite of Union, to negate `str`.
    I also tried `Iterable[str]`, (because afterall, the join command only asks for an iterator, not a Sequence), but I got the same issue, `str` is also a valid iterator of strings.
    Eventually I just went back to list[str]|tuple[str,...].

  • @maciekmx
    @maciekmx หลายเดือนก่อน +4

    Similar case is with datetime being a subclass of date but you cannot compare their instances (violating Liskov substitution principle). I believe it could be handled with type checker flags.

  • @ember2081
    @ember2081 หลายเดือนก่อน +2

    Very proud of myself for figuring what this was gonna be before your demo

  • @avasam06
    @avasam06 หลายเดือนก่อน +2

    Another issue with using list as a parameter type is its invariance. It becomes especially vicious if you type a parameter as `list[StrPath]` (where `StrPath` is an alias for `str | os.PathLike[str]`). There"s still plenty of those in setuptools...

  • @aneroid
    @aneroid หลายเดือนก่อน +4

    Python has always had this issue, even when typing.Sequence was collections.abc.Sequence. In fact, using just the `typing` module, *Strings are indistinguishable from tuples* 🤦‍♀
    Apart from sprinkling in `isinstance(str)` checks everywhere, there's no way to separate or specify "iterable of strings and not just one string" in a way a human generally understand those words.

  • @RoamingAdhocrat
    @RoamingAdhocrat หลายเดือนก่อน +4

    Python's Fractal Strings made me laugh a lot when I learned that's what they are

  • @jamesparrott7799
    @jamesparrott7799 หลายเดือนก่อน

    A first class fix is sorely needed. Typing "list" is simple. However Sequence and list have different covariance, which can easily lead to further typing challenges if using generics.
    For now, installing useful-types, and adding "from useful_types import SequenceNotStr" works great (for Sequences, if not Iterables).

  • @mrswats
    @mrswats หลายเดือนก่อน +4

    I thought that by the maturity of mypy we'd have a way to forbid some specific types but I guess I was wrong... Is it something hard to implement?

    • @calebparks8318
      @calebparks8318 หลายเดือนก่อน +1

      It seems to be a bit hard; however, the main issue is the lack of an accepted PEP.

  • @RoamingAdhocrat
    @RoamingAdhocrat หลายเดือนก่อน +3

    can you workaround this by following each `Sequence[str]` with `if isinstance(arg, str): raise ValueError`?

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +11

      yes as below, but this is going to be an ugly sprinkling everywhere for a problem that _should_ be something statically determinable

  • @grihabor
    @grihabor หลายเดือนก่อน +4

    Looks like it's a fundamental issue that is caused by the fact that str is a Sequence[str] and not Sequence[char], but that would require python 4 :(

    • @avasam06
      @avasam06 หลายเดือนก่อน

      Typecheckers *could* special case it. Or a type-exclusion (the inverse of a `Union`, let's say `Not`) could be added.

    • @grihabor
      @grihabor หลายเดือนก่อน

      Well, it's a hack. Then you have to exclude str every time you use Sequence[str] or Iterator[str] or Iterable[str] or Generator[str] or generic Sequence[T] or ...

  • @yaroslavdon
    @yaroslavdon หลายเดือนก่อน +2

    What about using `typing.Annotated` and relying on the type-checker for some metadata magic?

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +4

      I started trying to write a mypy plugin to specifically handle type subtraction and I can get _reasonably_ far with just signature validation -- but not with internal type handling unfortunately
      I really think this needs to be PEP'd and made first class though

    • @barakatzir
      @barakatzir หลายเดือนก่อน

      basedmypy has intersection type support

    • @calebparks8318
      @calebparks8318 หลายเดือนก่อน

      @@barakatzirIt's always better to have a PEP rather than going with other solutions.

    • @1rbroderi
      @1rbroderi หลายเดือนก่อน

      ​@@barakatzirSo does beartype.

    • @barakatzir
      @barakatzir หลายเดือนก่อน

      @calebparks8318, I agree that having an idiomatic way is best, e.g. a PEP. I am referring to @yaroslavdon 's suggestion of using a custom tool/plugin to add the feature in the absence of a PEP.

  • @VACatholic
    @VACatholic หลายเดือนก่อน +11

    Do you just not want to pay the penalty of a runtime assert that checks `isinstance(cmd_args, str)`?

    • @thirtysixnanoseconds1086
      @thirtysixnanoseconds1086 หลายเดือนก่อน +2

      python is ducky and beautiful near pseudocode syntax. strongly typed and performant it aint (which is fine) - so why are we bending the language so hard to be a goose? (or maybe a gopher...)

    • @cianotico
      @cianotico หลายเดือนก่อน

      I think he just wants it to be enforced statically

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +18

      an isinstance check when there's a type checker is admitting defeat

    • @VACatholic
      @VACatholic หลายเดือนก่อน

      @@anthonywritescode I mean that's a perspective. Seems more like cutting off one's nose to spite one's face, but your code!

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +1

      to be clear you're suggesting adding potentially hundreds of if statements and slow isinstance calls -- and I'm the the one cutting my nose?

  • @barakatzir
    @barakatzir หลายเดือนก่อน +2

    I tried once the overload approach, in which str input is annotated to return NoReturn, but sadly this will do much worse. The typechecker will simply ignore code after the function call with str input 😅

    • @jh0ker
      @jh0ker หลายเดือนก่อน

      What about Never?

    • @barakatzir
      @barakatzir หลายเดือนก่อน +1

      ​According to docs Never and NoReturn are supposed to be treated identically by typecheckers. I think Never is aimply newer and more intuitive, especially when used for function arguments @@jh0ker

  • @FunkyELF
    @FunkyELF หลายเดือนก่อน +11

    You forgot another annoyance besides ('echo') not being a tuple, ('echo' 'hello' 'world') is also a single string. I absolutely hate implicit string concatenation

    • @philipreinhold1180
      @philipreinhold1180 หลายเดือนก่อน +6

      Implicit string concatenation is great for splitting a string (without newlines) over multiple lines without requiring runtime concatenation.

    • @renren_does_programming
      @renren_does_programming หลายเดือนก่อน

      C has it.

    • @volbla
      @volbla หลายเดือนก่อน

      ​@@philipreinhold1180 How often do you want to put such a long string directly in the source though? That honestly doesn't seem like a common enough usecase to justify such a quirk.

    • @philipreinhold1180
      @philipreinhold1180 หลายเดือนก่อน

      @@volbla all the time when writing informative error messages. It should require a newline though, I never use the form with just a space

  • @EngineerNick
    @EngineerNick หลายเดือนก่อน

    This issue really upsets me too! As a workaround, i now often simply use list[str], and dont bother with Sequence[str] as a type. It is much easier to make the user convert thier iterator to a list, and it is pretty rare when you actually get any performance benefit from passing iterable custom types (even for quite stupidly large lists)

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +1

      forcing a full copy just to match types is so wasteful

  • @devmishra18
    @devmishra18 หลายเดือนก่อน

    Had this exact problem multiple times. Had to add runtime checks that checks for Sequence that are not string. Very annoying.

  • @dougmercer
    @dougmercer หลายเดือนก่อน +3

    If we ever get Intersection and Not types, something awkward like `Sequence[str] & ~str` would solve this
    Gotta love our type system ¯\_(ツ)_/¯

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +1

      surely if we got intersection all the other set operators would probably work and difference would be sufficient

    • @dougmercer
      @dougmercer หลายเดือนก่อน

      ​@@anthonywritescodewhoops, I just rewatched the video and realized you already mention type intersection. That's what I get for multitasking!
      A not so fun way to fix this in the current type system is with overloads.
      @overload
      def foo(x: str) -> NoReturn: ...
      @overload
      def foo(x: Sequence[str]) -> str: ...
      def foo(x: Sequence[str]) -> str:
      assert isinstance(x, str)
      return "".join(x)
      reveal_type(foo(["a", "b"])). # str
      reveal_type(foo("a")) # Never in mypy, NoReturn in pyright
      This seems to mostly work, but man are overloads ugly

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +1

      the overload doesn't actually help. Never isn't an error unless it's used (and in many cases indistinguishable from None beyond making mypy not type check the rest of the function (since it doesn't check unreachable code))

    • @1rbroderi
      @1rbroderi หลายเดือนก่อน

      You can do that if you are willing to use a library. Using beartype: # Type hint matching any non-string sequence *WHOSE ITEMS ARE ALL STRINGS.*
      SequenceNonstrOfStr = Annotated[Sequence[str], ~IsInstance[str]]

    • @dougmercer
      @dougmercer หลายเดือนก่อน

      @@anthonywritescode oh wow, yikes

  • @BenTonks-xh3mc
    @BenTonks-xh3mc หลายเดือนก่อน +1

    I had a try but I feel like something is wrong though and I'm also not fully aware of everything python does so it might be totally missing the point. I'm enjoying your videos by the they're great for learning.
    from typing import Union
    from collections.abc import Sequence
    class Minus:
    def __init__(self,ref,*types) -> None:
    self.ref=ref
    self.__args__=Union[*set(types)]

    def __instancecheck__(self,obj: object) -> bool:
    return isinstance(obj,self.ref) and not isinstance(obj,self.__args__)

    def __or__(self,type: type|tuple[type]) -> Union[type]:
    return Union[self,type]

    def __ror__(self,type: type|tuple[type]) -> Union[type]:
    return Union[self,type]

    def __class_getitem__(cls,types: tuple[type,...]) -> object:
    return cls(*types)

    isinstance("asdfasdfsfd",Minus[Sequence[str],str])

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +2

      that perhaps works for runtime checking but it's not helpful in this situation where the desire is static checking. type annotations (generally) are only checked by static tools and not at runtime (yes of course there's wacky things like pydantic and friends but I'm really discussing static checking here)

  • @thiemofischer2230
    @thiemofischer2230 หลายเดือนก่อน

    Maybe there could be mypy plugin that only allows str when stated explicitly.

  • @dimboump
    @dimboump หลายเดือนก่อน +8

    I always find myself handling such cases in the first lines of the function with:
    cmd_args = [cmd_args] if isinstance(cmd_args, str) else cmd_args
    I hate it but it's a necessary evil...

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +3

      now you try and run ['echo hello'] which is exactly the nonsense the type system is trying to protect you from

  • @mathman316
    @mathman316 หลายเดือนก่อน

    What about a function returning a TypeGuard or TypeIs? It could just return its argument at runtime, though there would still be a small overhead from the function call.

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน

      I'm not sure what you mean -- could you elaborate?

    • @mathman316
      @mathman316 หลายเดือนก่อน

      My bad. I thought I had an idea that would work but when I started coding I realized it was incomplete. 😳

  • @SQADIqw
    @SQADIqw หลายเดือนก่อน

    Also got tricked by this, maybe converting to union of popular types like list, tuple, set will cover 95% of users in general case, but it’s obvious trade-off

  • @er63438
    @er63438 หลายเดือนก่อน

    Hi, could anyone elaborate on the disadvantages of just allowing list[str]?
    Yes, you're kind of annoying the caller by "forcing" them to provide a list.
    But why is it so bad?
    Thanks.

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +2

      you would require a full copy to convert from a perfectly valid other Sequence type

    • @er63438
      @er63438 หลายเดือนก่อน

      That's true.
      I guess the relevance of the performance penalty for the full copy depends on the context though.
      Having re-watched your video, I think I'd go for the union of a hand picked set of sequences, at the expense of custom sequences.

    • @er63438
      @er63438 หลายเดือนก่อน

      ​@@anthonywritescodeThx for replying btw and for the video of course. Great content, as usual.

    • @avasam06
      @avasam06 หลายเดือนก่อน

      List is also invariant. Meaning you can't pass a list of subclasses of string if I recall.

  • @calvinkhor890
    @calvinkhor890 หลายเดือนก่อน

    What about overloading with Never or NoReturn, something like
    @overload
    def run(items: str) -> Never: ...
    @overload
    def run(items: Sequence[str]) -> int: ...
    def run(items):
    if isinstance(items, str):
    raise TypeError("Expected a sequence of strings, not a single string")
    # Process the sequence of strings
    ...

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน

      like 5 or 6 others have suggested this, see my replies below (this is way way worse and doesn't actually solve the problem to begin with!)

    • @calvinkhor890
      @calvinkhor890 หลายเดือนก่อน

      @@anthonywritescode yikes ok. that's totally unintuitive, in many other languages you're just not allowed to get out a Never type...

  • @jamesarthurkimbell
    @jamesarthurkimbell หลายเดือนก่อน +1

    yo dawg, I heard you like strings

  • @IvanHabunek
    @IvanHabunek หลายเดือนก่อน +2

    One option is `run_command(*args)`, but it does have its own downsides.

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน

      yeah and that's what the actual code does -- however this was a lower level function where the command was already a tuple in all actual cases and the test was just incorrect

  • @James-l5s7k
    @James-l5s7k หลายเดือนก่อน

    HOWEVER! (I knew this was coming!~)

  • @antonboch1244
    @antonboch1244 หลายเดือนก่อน

    I see here, try Sequence from Sequence then execute command with arguments expect IndexError execute a joined nothing string command. Not a good or only solution for that.

  • @insanecbrotha
    @insanecbrotha หลายเดือนก่อน

    Maybe Iterable[str] is more precise. Also, the type depends on what shlex.join takes. The docs say "Concatenate the tokens of the list split_command and return a string." Obviously any iterable works.
    Another option would be to accept *cmd_args: str. I feel like this would make it harder to fudge up the function call.

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +1

      Iterable[str] has exactly the same problem. and yes *args: str avoids the problem (as others have suggested below) but it isn't a solution to the root cause

    • @insanecbrotha
      @insanecbrotha หลายเดือนก่อน

      @@anthonywritescode very true! I really wish we had intersection types etc in Python. Another thing is, that it is currently not possible to type potentially partial args/kwargs like in curry functions. I played around with typing.ParamSpec and typing.Concatenate, but I think it can't be done.

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน

      yeah there's a special mypy plugin for partial for example -- though I find the need for arbitrary currying rare

    • @insanecbrotha
      @insanecbrotha หลายเดือนก่อน

      @@anthonywritescode lol partial really is special cased in mypy! XD

  • @nigeltan7766
    @nigeltan7766 27 วันที่ผ่านมา

    perhaps functool's single dispatch could be used to catch invocations with string and raise not implemented error

    • @anthonywritescode
      @anthonywritescode  26 วันที่ผ่านมา

      it's trivial to solve at runtime (and not interesting) but does not help you with the static time typing problem

  • @leeachristie
    @leeachristie หลายเดือนก่อน

    I would change the implementation of run_command to runtime type check if it was passed a str, and if it was, it changes it to cmd_args=(cmd_args,) making it behave they way the caller intended when they accidentally wrote run_command(('echo')) instead of run_command(('echo', )), then the actual behaviour agrees with the type checking as str becomes a valid thing to pass.

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +1

      now you end up with nonsense ('echo hello',) which is exactly what the types were there to prevent

  • @MyKo101AB
    @MyKo101AB หลายเดือนก่อน

    Check if the input is explicitly a string using `isinstance(cmd_args,str)` and then, if it is, convert it to a list using `cmd_args = [cmd_args]`

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน

      many below have suggested this and it's a terrible idea. it directly encourages the nonsense like ('echo hello',) which is exactly what the types are there to prevent

  • @maleldil1
    @maleldil1 หลายเดือนก่อน

    The single-element tuple syntax is the reason I use list syntax everywhere. It just looks so wrong to have func((x,)); func([x]) looks so much better.

  • @fsouza
    @fsouza หลายเดือนก่อน +1

    The one thing pytype got right 🙈

  • @EW-mb1ih
    @EW-mb1ih หลายเดือนก่อน

    interesting....

  • @d17-d7e
    @d17-d7e หลายเดือนก่อน

    If I were faced with the task of excluding a type from possible types, in this case I would write something like:
    ```py
    @typing.overload
    def run_command(cmd_args: str) -> typing.Never: ...
    @typing.overload
    def run_command(cmd_args: Sequence[str]) -> int: ...
    def run_command(cmd_args: Sequence[str]) -> int:
    print(f"$ {shlex.join(cmd_args)}")
    return subprocess.call(cmd_args)
    ```

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน +1

      a few others have suggested the same, it's actually worse! check the other comments for an explanation

  • @yehoshualevine
    @yehoshualevine หลายเดือนก่อน

    if isinstance(cmd_arg, str):
    cmd_arg = (cmd_arg,)

    • @anthonywritescode
      @anthonywritescode  หลายเดือนก่อน

      see the other comments as to why this is worse! several have already suggested this bad idea

    • @yehoshualevine
      @yehoshualevine หลายเดือนก่อน +1

      Unless user inputs can never get into that list, you're gonna pay the price of sanitizing them at some point! That has to be dynamic. (Even if no user input can get into the code now, it could in 3 years when someone reuses the code.)
      In that case, one problem in the code is accepting a list or tuple when you probably need a custom class that forces those str's to follow the rules.
      @@anthonywritescode

  • @skewty
    @skewty หลายเดือนก่อน

    if isinstance(cmd_args, str): cmd_args = (cmd_args,)

    • @d17-d7e
      @d17-d7e หลายเดือนก่อน

      It should be type safe