Skip to content Skip to sidebar Skip to footer

Convert A Numpy Array To A Structured Array

Given that the width in bytes for rows in numpy array and the sum width of fields in a structure defined by dtype are the same, is there a simple way to convert such numpy array to

Solution 1:

In [222]: x = np.array([[ 0,  2,  3,  4,  5], [ 0, 12, 13, 14, 15]])
In [223]: dt = np.dtype([('checksum','u2'), ('word', 'B', (3,))])

I know from past use, the genfromtxt can handle relatively complex dtypes:

In [224]: np.savetxt('temp', x[:,1:], fmt='%d')
In [225]: cat temp
234512131415
In [226]: data = np.genfromtxt('temp', dtype=dt)
In [227]: data
Out[227]: 
array([( 2, [ 3,  4,  5]), (12, [13, 14, 15])],
      dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

But I haven't dug into its code to see how it maps the flat row data on to the dtypes.

But it turns out the unstructured_to_structured that I mentioned in a comment can handle your dtype:

In [228]: import numpy.lib.recfunctions as rf
In [229]: rf.unstructured_to_structured(x[:,1:],dtype=dt)
Out[229]: 
array([( 2, [ 3,  4,  5]), (12, [13, 14, 15])],
      dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

But for simpler dtype, I and others have often recommended turning the list of lists into a list of tuples.

In[230]: [tuple(row) for row in x[:,1:]]
Out[230]: [(2, 3, 4, 5), (12, 13, 14, 15)]

Many of the recfunctions use a field-by-field copy

In [231]: res = np.zeros(2, dtype=dt)
In [232]: res
Out[232]: 
array([(0, [0, 0, 0]), (0, [0, 0, 0])],
      dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
In [233]: res['checksum']= x[:,1]
In [234]: res['word']
Out[234]: 
array([[0, 0, 0],
       [0, 0, 0]], dtype=uint8)
In [235]: res['word'] = x[:,2:]
In [236]: res
Out[236]: 
array([( 2, [ 3,  4,  5]), (12, [13, 14, 15])],
      dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

byte view

I missed the fact that you wanted to repack bytes. My above answer treats the input line as 4 numbers/ints that will be assigned to the 4 slots in the compound dtype. But with uint8 input, and u2 and u1 slots, you want to view the 5 bytes with the new dtype, not make a new array.

In [332]: dt
Out[332]: dtype([('checksum', '<u2'), ('word', 'u1', (3,))])
In [333]: arr = np.array([(1,2,3,4,5),
     ...:     (11,12,13,14,15)], dtype = np.uint8)
In [334]: arr.view(dt)
Out[334]: 
array([[( 513, [ 3,  4,  5])],
       [(3083, [13, 14, 15])]],
      dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

view adds a dimension, that we need to remove:

In [335]: _.shape
Out[335]: (2, 1)
In [336]: arr.view(dt).reshape(2)
Out[336]: 
array([( 513, [ 3,  4,  5]), (3083, [13, 14, 15])],
      dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

and changing the endedness of the u2 field:

In [337]: dt = np.dtype([('checksum','>u2'), ('word', 'B', (3,))])
In [338]: arr.view(dt).reshape(2)
Out[338]: 
array([( 258, [ 3,  4,  5]), (2828, [13, 14, 15])],
      dtype=[('checksum', '>u2'), ('word', 'u1', (3,))])

Post a Comment for "Convert A Numpy Array To A Structured Array"