Search-and-replace Pandas values with "where" and "mask"

แชร์
ฝัง
  • เผยแพร่เมื่อ 31 ธ.ค. 2024

ความคิดเห็น • 4

  • @imothar
    @imothar 7 หลายเดือนก่อน +1

    Another great video👍 Just wondering if there were any specific reason why did not use pd.NA? Perhaps it's the same result in the end, when it comes to floats 🤷

    • @ReuvenLerner
      @ReuvenLerner  7 หลายเดือนก่อน

      The future of Pandas is clearly pd.NA, and I should use it more! But in this particular case, it didn't make a difference: Using either np.nan or pd.NA will turn the dtype into floats. That's because the standard int type isn't nullable, meaning that it cannot handle pd.NA as anything other than a float. If, however, you were to set the dtype to be Int64 (note the capital), then using pd.NA would indeed do what you (and I) want.

  • @marcinpohl3264
    @marcinpohl3264 7 หลายเดือนก่อน +1

    How do i use np.NaN in a way that does NOT change ints to floats?

    • @ReuvenLerner
      @ReuvenLerner  7 หลายเดือนก่อน +1

      NaN is a float. So if you want to have NaN in an int column, then the ints will need to change to floats.
      HOWEVER, if you create your series with a nullable type, then you can use pd.NA instead of np.nan, and you'll be all set. That's because pd.NA is compatible with a wide variety of types:
      In [12]: s = Series([10, 20, 30, 40, 50])
      In [13]: s.loc[3] = pd.NA
      In [14]: s
      Out[14]:
      0 10.0
      1 20.0
      2 30.0
      3 NaN
      4 50.0
      dtype: float64
      In [15]: s = Series([10, 20, 30, 40, 50], dtype='Int64')
      In [16]: s.loc[3] = pd.NA
      In [17]: s
      Out[17]:
      0 10
      1 20
      2 30
      3
      4 50
      dtype: Int64