Pyspark Matrix Accumulator
I want to additively populate a matrix with values inferred from an rdd using a pyspark accumulator; I found the docs a bit unclear. Adding a bit of background, just in case its r
Solution 1:
Aha! I think I got it. The accumulator, at the end of the day, still needs to add its own pieces to itself. So, change addInPlace
to:
defaddInPlace(self, mAdd, lIndex):
iftype(lIndex) == list:
mAdd[lIndex[0], lIndex[1]] += 1else:
mAdd += lIndex
return mAdd
So now it adds indices when it is given a list, and adds itself after the populate_sparse
function loop to create my final matrix.
Post a Comment for "Pyspark Matrix Accumulator"