How To Store An Array In Hdf5 File Which Is Too Big To Load In Memory?
Is there any way to store an array in an hdf5 file, which is too big to load in memory? if I do something like this f = h5py.File('test.hdf5','w') f['mydata'] = np.zeros(2**32) I
Solution 1:
According to the documentation, you can use create_dataset
to create a chunked array stored in the hdf5. Example:
>>>import h5py>>>f = h5py.File('test.h5', 'w')>>>arr = f.create_dataset('mydata', (2**32,), chunks=True)>>>arr
<HDF5 dataset "mydata": shape (4294967296,), type "<f4">
Slicing the HDF5 dataset
returns Numpy-arrays.
>>>arr[:10]
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
>>>type(arr[:10])
numpy.array
You can set values as for a Numpy-array.
>>>arr[3:5] = 3>>>arr[:6]
array([ 0., 0., 0., 3., 3., 0.], dtype=float32)
I don't know if this is the most efficient way, but you can iterate over the whole array in chunks. And for instance setting it to random values:
>>>import numpy as np>>>for i inrange(0, arr.size, arr.chunks[0]):
arr[i: i+arr.chunks[0]] = np.random.randn(arr.chunks[0])
>>>arr[:5]
array([ 0.62833798, 0.03631227, 2.00691652, -0.16631022, 0.07727782], dtype=float32)
Post a Comment for "How To Store An Array In Hdf5 File Which Is Too Big To Load In Memory?"