Another great video👍 Just wondering if there were any specific reason why did not use pd.NA? Perhaps it's the same result in the end, when it comes to floats 🤷
The future of Pandas is clearly pd.NA, and I should use it more! But in this particular case, it didn't make a difference: Using either np.nan or pd.NA will turn the dtype into floats. That's because the standard int type isn't nullable, meaning that it cannot handle pd.NA as anything other than a float. If, however, you were to set the dtype to be Int64 (note the capital), then using pd.NA would indeed do what you (and I) want.
NaN is a float. So if you want to have NaN in an int column, then the ints will need to change to floats. HOWEVER, if you create your series with a nullable type, then you can use pd.NA instead of np.nan, and you'll be all set. That's because pd.NA is compatible with a wide variety of types: In [12]: s = Series([10, 20, 30, 40, 50]) In [13]: s.loc[3] = pd.NA In [14]: s Out[14]: 0 10.0 1 20.0 2 30.0 3 NaN 4 50.0 dtype: float64 In [15]: s = Series([10, 20, 30, 40, 50], dtype='Int64') In [16]: s.loc[3] = pd.NA In [17]: s Out[17]: 0 10 1 20 2 30 3 4 50 dtype: Int64
Another great video👍 Just wondering if there were any specific reason why did not use pd.NA? Perhaps it's the same result in the end, when it comes to floats 🤷
The future of Pandas is clearly pd.NA, and I should use it more! But in this particular case, it didn't make a difference: Using either np.nan or pd.NA will turn the dtype into floats. That's because the standard int type isn't nullable, meaning that it cannot handle pd.NA as anything other than a float. If, however, you were to set the dtype to be Int64 (note the capital), then using pd.NA would indeed do what you (and I) want.
How do i use np.NaN in a way that does NOT change ints to floats?
NaN is a float. So if you want to have NaN in an int column, then the ints will need to change to floats.
HOWEVER, if you create your series with a nullable type, then you can use pd.NA instead of np.nan, and you'll be all set. That's because pd.NA is compatible with a wide variety of types:
In [12]: s = Series([10, 20, 30, 40, 50])
In [13]: s.loc[3] = pd.NA
In [14]: s
Out[14]:
0 10.0
1 20.0
2 30.0
3 NaN
4 50.0
dtype: float64
In [15]: s = Series([10, 20, 30, 40, 50], dtype='Int64')
In [16]: s.loc[3] = pd.NA
In [17]: s
Out[17]:
0 10
1 20
2 30
3
4 50
dtype: Int64