Skip to content Skip to sidebar Skip to footer

Pyspark Matrix Accumulator

I want to additively populate a matrix with values inferred from an rdd using a pyspark accumulator; I found the docs a bit unclear. Adding a bit of background, just in case its r

Solution 1:

Aha! I think I got it. The accumulator, at the end of the day, still needs to add its own pieces to itself. So, change addInPlace to:

defaddInPlace(self, mAdd, lIndex):
    iftype(lIndex) == list:
        mAdd[lIndex[0], lIndex[1]] += 1else:
        mAdd += lIndex
    return mAdd

So now it adds indices when it is given a list, and adds itself after the populate_sparse function loop to create my final matrix.

Post a Comment for "Pyspark Matrix Accumulator"