Why did ‘reset_index(drop=True)’ function unwantedly remove column?

As the saying goes, “What happens in your interpreter stays in your interpreter”. It’s impossible to explain the discrepancy without seeing the full history of commands entered into both Python interactive sessions.

However, it is possible to venture a guess:

df.reset_index(drop=True) drops the current index of the DataFrame and replaces it with an index of increasing integers. It never drops columns.

So, in your interactive session, _worker_id was a column. In your co-worker’s interactive session, _worker_id must have been an index level.

The visual difference can be somewhat subtle. For example, below, df has a _worker_id column while df2 has a _worker_id index level:

In [190]: df = pd.DataFrame({'foo':[1,2,3], '_worker_id':list('ABC')}); df
Out[190]: 
  _worker_id  foo
0          A    1
1          B    2
2          C    3

In [191]: df2 = df.set_index('_worker_id', append=True); df2
Out[191]: 
              foo
  _worker_id     
0 A             1
1 B             2
2 C             3

Notice that the name _worker_id appears one line below foo when it is an index level, and on the same line as foo when it is a column. That is the only visual clue you get when looking at the str or repr of a DataFrame.

So to repeat: When _worker_index is a column, the column is unaffected by df.reset_index(drop=True):

In [194]: df.reset_index(drop=True)
Out[194]: 
  _worker_id  foo
0          A    1
1          B    2
2          C    3

But _worker_index is dropped when it is part of the index:

In [195]: df2.reset_index(drop=True)
Out[195]: 
   foo
0    1
1    2
2    3

Leave a Comment