Python Data Frame: Cumulative Sum Of Column Until Condition Is Reached And Return The Index
Solution 1:
Opt - 1:
You could compute the cumulative sum using cumsum. Then use np.isclose with it's inbuilt tolerance parameter to check if the values present in this series lies within the specified threshold of 15 +/- 2. This returns a boolean array.
Through np.flatnonzero, return the ordinal values of the indices for which the True condition holds. We select the first instance of a True value.
Finally, use .iloc to retrieve value of the column name you require based on the index computed earlier.
val = np.flatnonzero(np.isclose(df.Num_Albums.cumsum().values, 15, atol=2))[0]
df['Num_authors'].iloc[val] # for faster access, use .iat
4
When performing np.isclose on the series later converted to an array:
np.isclose(df.Num_Albums.cumsum().values, 15, atol=2)
array([False, False, True, False, False, False], dtype=bool)
Opt - 2:
Use pd.Index.get_loc on the cumsum calculated series which also supports a tolerance parameter on the nearest method.
val = pd.Index(df.Num_Albums.cumsum()).get_loc(15, 'nearest', tolerance=2)
df.get_value(val, 'Num_authors')
4Opt - 3:
Use idxmax to find the first index of a True value for the boolean mask created after sub and abs operations on the cumsum series:
df.get_value(df.Num_Albums.cumsum().sub(15).abs().le(2).idxmax(), 'Num_authors')
4Solution 2:
I think you can directly add a column with the cumulative sum as:
In [3]: df
Out[3]:
index Num_Albums Num_authors
0 0 10 4
1 1 1 5
2 2 4 4
3 3 7 1000
4 4 1 44
5 5 3 8
In [4]: df['cumsum'] = df['Num_Albums'].cumsum()
In [5]: df
Out[5]:
index Num_Albums Num_authors cumsum
0 0 10 4 10
1 1 1 5 11
2 2 4 4 15
3 3 7 1000 22
4 4 1 44 23
5 5 3 8 26
And then apply the condition you want on the cumsum column. For instance you can use where to get the full row according to the filter. Setting the tolerance tol:
In [18]: tol = 2
In [19]: cond = df.where((df['cumsum']>=15-tol)&(df['cumsum']<=15+tol)).dropna()
In [20]: cond
Out[20]:
index Num_Albums Num_authors cumsum
22.04.04.015.0Solution 3:
This could even be done as following code:
def your_function(df):
sum=0
index=-1
for i indf['Num_Albums'].tolist():
sum+=i
index+=1
ifsum == ( " your_condition " ):
return (index,df.loc([df.Num_Albums==i,'Num_authors']))
This would actually return a tuple of your index and the corresponding value of Num_authors as soon as the "your condition" is reached.
or could even be returned as an array by
def your_function(df):
sum=0
index=-1
for i indf['Num_Albums'].tolist():
sum+=i
index+=1
ifsum == ( " your_condition " ):
return df.loc([df.Num_Albums==i,'Num_authors']).index.values
I am not able to figure out the condition you mentioned of the cumulative sum as when to stop summing so I mentioned it as " your_condition " in the code!!
I am also new so hope it helps !!
Post a Comment for "Python Data Frame: Cumulative Sum Of Column Until Condition Is Reached And Return The Index"