Skip to content Skip to sidebar Skip to footer

How To Store An Array In Hdf5 File Which Is Too Big To Load In Memory?

Is there any way to store an array in an hdf5 file, which is too big to load in memory? if I do something like this f = h5py.File('test.hdf5','w') f['mydata'] = np.zeros(2**32) I

Solution 1:

According to the documentation, you can use create_dataset to create a chunked array stored in the hdf5. Example:

>>>import h5py>>>f = h5py.File('test.h5', 'w')>>>arr = f.create_dataset('mydata', (2**32,), chunks=True)>>>arr
<HDF5 dataset "mydata": shape (4294967296,), type "<f4">

Slicing the HDF5 dataset returns Numpy-arrays.

>>>arr[:10]
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], dtype=float32)
>>>type(arr[:10])
numpy.array

You can set values as for a Numpy-array.

>>>arr[3:5] = 3>>>arr[:6]
array([ 0.,  0.,  0.,  3.,  3.,  0.], dtype=float32)

I don't know if this is the most efficient way, but you can iterate over the whole array in chunks. And for instance setting it to random values:

>>>import numpy as np>>>for i inrange(0, arr.size, arr.chunks[0]):
        arr[i: i+arr.chunks[0]] = np.random.randn(arr.chunks[0])
>>>arr[:5]
array([ 0.62833798,  0.03631227,  2.00691652, -0.16631022,  0.07727782], dtype=float32)

Post a Comment for "How To Store An Array In Hdf5 File Which Is Too Big To Load In Memory?"