pd.concat
requires that the indices be unique. To remove rows with duplicate indices, use
df = df.loc[~df.index.duplicated(keep='first')]
import pandas as pd from pandas import Timestamp df1 = pd.DataFrame( {'price': [0.7286, 0.7286, 0.7286, 0.7286], 'side': [2, 2, 2, 2], 'timestamp': [1451865675631331, 1451865675631400, 1451865675631861, 1451865675631866]}, index=pd.DatetimeIndex(['2000-1-1', '2000-1-1', '2001-1-1', '2002-1-1'])) df2 = pd.DataFrame( {'bid': [0.7284, 0.7284, 0.7284, 0.7285, 0.7285], 'bid_size': [4000000, 4000000, 5000000, 1000000, 4000000], 'offer': [0.7285, 0.729, 0.7286, 0.7286, 0.729], 'offer_size': [1000000, 4000000, 4000000, 4000000, 4000000]}, index=pd.DatetimeIndex(['2000-1-1', '2001-1-1', '2002-1-1', '2003-1-1', '2004-1-1'])) df1 = df1.loc[~df1.index.duplicated(keep='first')] # price side timestamp # 2000-01-01 0.7286 2 1451865675631331 # 2001-01-01 0.7286 2 1451865675631861 # 2002-01-01 0.7286 2 1451865675631866 df2 = df2.loc[~df2.index.duplicated(keep='first')] # bid bid_size offer offer_size # 2000-01-01 0.7284 4000000 0.7285 1000000 # 2001-01-01 0.7284 4000000 0.7290 4000000 # 2002-01-01 0.7284 5000000 0.7286 4000000 # 2003-01-01 0.7285 1000000 0.7286 4000000 # 2004-01-01 0.7285 4000000 0.7290 4000000 result = pd.concat([df1, df2], axis=0) print(result) bid bid_size offer offer_size price side timestamp 2000-01-01 NaN NaN NaN NaN 0.7286 2 1.451866e+15 2001-01-01 NaN NaN NaN NaN 0.7286 2 1.451866e+15 2002-01-01 NaN NaN NaN NaN 0.7286 2 1.451866e+15 2000-01-01 0.7284 4000000 0.7285 1000000 NaN NaN NaN 2001-01-01 0.7284 4000000 0.7290 4000000 NaN NaN NaN 2002-01-01 0.7284 5000000 0.7286 4000000 NaN NaN NaN 2003-01-01 0.7285 1000000 0.7286 4000000 NaN NaN NaN 2004-01-01 0.7285 4000000 0.7290 4000000 NaN NaN NaN
Note there is also pd.join
, which can join DataFrames based on their indices, and handle non-unique indices based on the how
parameter. Rows with duplicate index are not removed.
In [94]: df1.join(df2) Out[94]: price side timestamp bid bid_size offer \ 2000-01-01 0.7286 2 1451865675631331 0.7284 4000000 0.7285 2000-01-01 0.7286 2 1451865675631400 0.7284 4000000 0.7285 2001-01-01 0.7286 2 1451865675631861 0.7284 4000000 0.7290 2002-01-01 0.7286 2 1451865675631866 0.7284 5000000 0.7286 offer_size 2000-01-01 1000000 2000-01-01 1000000 2001-01-01 4000000 2002-01-01 4000000 In [95]: df1.join(df2, how='outer') Out[95]: price side timestamp bid bid_size offer offer_size 2000-01-01 0.7286 2 1.451866e+15 0.7284 4000000 0.7285 1000000 2000-01-01 0.7286 2 1.451866e+15 0.7284 4000000 0.7285 1000000 2001-01-01 0.7286 2 1.451866e+15 0.7284 4000000 0.7290 4000000 2002-01-01 0.7286 2 1.451866e+15 0.7284 5000000 0.7286 4000000 2003-01-01 NaN NaN NaN 0.7285 1000000 0.7286 4000000 2004-01-01 NaN NaN NaN 0.7285 4000000 0.7290 4000000