One simple method would be to assign the default value first and then perform 2 loc
calls:
In [66]: df = pd.DataFrame({'x':[0,-3,5,-1,1]}) df Out[66]: x 0 0 1 -3 2 5 3 -1 4 1 In [69]: df['y'] = 0 df.loc[df['x'] < -2, 'y'] = 1 df.loc[df['x'] > 2, 'y'] = -1 df Out[69]: x y 0 0 0 1 -3 1 2 5 -1 3 -1 0 4 1 0
If you wanted to use np.where
then you could do it with a nested np.where
:
In [77]: df['y'] = np.where(df['x'] < -2 , 1, np.where(df['x'] > 2, -1, 0)) df Out[77]: x y 0 0 0 1 -3 1 2 5 -1 3 -1 0 4 1 0
So here we define the first condition as where x is less than -2, return 1, then we have another np.where
which tests the other condition where x is greater than 2 and returns -1, otherwise return 0
timings
In [79]: %timeit df['y'] = np.where(df['x'] < -2 , 1, np.where(df['x'] > 2, -1, 0)) 1000 loops, best of 3: 1.79 ms per loop In [81]: %%timeit df['y'] = 0 df.loc[df['x'] < -2, 'y'] = 1 df.loc[df['x'] > 2, 'y'] = -1 100 loops, best of 3: 3.27 ms per loop
So for this sample dataset the np.where
method is twice as fast