Group by index + column in pandas

From version 0.20.1 it is simplier:

Strings passed to DataFrame.groupby() as the by parameter may now reference either column names or index level names

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])

df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
                   'B': np.arange(8)}, index=index)

print (df)

              A  B
first second      
bar   one     1  0
      two     1  1
baz   one     1  2
      two     1  3
foo   one     2  4
      two     2  5
qux   one     3  6
      two     3  7

print (df.groupby(['second', 'A']).sum())
          B
second A   
one    1  2
       2  4
       3  6
two    1  4
       2  5
       3  7

Leave a Comment