I think that your problem is that you are expecting np.append
to add the column in-place, but what it does, because of how numpy data is stored, is create a copy of the joined arrays
Returns ------- append : ndarray A copy of `arr` with `values` appended to `axis`. Note that `append` does not occur in-place: a new array is allocated and filled. If `axis` is None, `out` is a flattened array.
so you need to save the output all_data = np.append(...)
:
my_data = np.random.random((210,8)) #recfromcsv('LIAB.ST.csv', delimiter='\t') new_col = my_data.sum(1)[...,None] # None keeps (n, 1) shape new_col.shape #(210,1) all_data = np.append(my_data, new_col, 1) all_data.shape #(210,9)
Alternative ways:
all_data = np.hstack((my_data, new_col)) #or all_data = np.concatenate((my_data, new_col), 1)
I believe that the only difference between these three functions (as well as np.vstack
) are their default behaviors for when axis
is unspecified:
concatenate
assumesaxis = 0
hstack
assumesaxis = 1
unless inputs are 1d, thenaxis = 0
vstack
assumesaxis = 0
after adding an axis if inputs are 1dappend
flattens array
Based on your comment, and looking more closely at your example code, I now believe that what you are probably looking to do is add a field to a record array. You imported both genfromtxt
which returns a structured array and recfromcsv
which returns the subtly different record array (recarray
). You used the recfromcsv
so right now my_data
is actually a recarray
, which means that most likely my_data.shape = (210,)
since recarrays are 1d arrays of records, where each record is a tuple with the given dtype.
So you could try this:
import numpy as np from numpy.lib.recfunctions import append_fields x = np.random.random(10) y = np.random.random(10) z = np.random.random(10) data = np.array( list(zip(x,y,z)), dtype=[('x',float),('y',float),('z',float)]) data = np.recarray(data.shape, data.dtype, buf=data) data.shape #(10,) tot = data['x'] + data['y'] + data['z'] # sum(axis=1) won't work on recarray tot.shape #(10,) all_data = append_fields(data, 'total', tot, usemask=False) all_data #array([(0.4374783740738456 , 0.04307289878861764, 0.021176067323686598, 0.5017273401861498), # (0.07622262416466963, 0.3962146058689695 , 0.27912715826653534 , 0.7515643883001745), # (0.30878532523061153, 0.8553768789387086 , 0.9577415585116588 , 2.121903762680979 ), # (0.5288343561208022 , 0.17048864443625933, 0.07915689716226904 , 0.7784798977193306), # (0.8804269791375121 , 0.45517504750917714, 0.1601389248542675 , 1.4957409515009568), # (0.9556552723429782 , 0.8884504475901043 , 0.6412854758843308 , 2.4853911958174133), # (0.0227638618687922 , 0.9295332854783015 , 0.3234597575660103 , 1.275756904913104 ), # (0.684075052174589 , 0.6654774682866273 , 0.5246593820025259 , 1.8742119024637423), # (0.9841793718333871 , 0.5813955915551511 , 0.39577520705133684 , 1.961350170439875 ), # (0.9889343795296571 , 0.22830104497714432, 0.20011292764078448 , 1.4173483521475858)], # dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('total', '<f8')]) all_data.shape #(10,) all_data.dtype.names #('x', 'y', 'z', 'total')