Python Data Frame: Cumulative Sum Of Column Until Condition Is Reached And Return The Index
Solution 1:
Opt - 1:
You could compute the cumulative sum using cumsum
. Then use np.isclose
with it's inbuilt tolerance parameter to check if the values present in this series lies within the specified threshold of 15 +/- 2. This returns a boolean array.
Through np.flatnonzero
, return the ordinal values of the indices for which the True
condition holds. We select the first instance of a True
value.
Finally, use .iloc
to retrieve value of the column name you require based on the index computed earlier.
val = np.flatnonzero(np.isclose(df.Num_Albums.cumsum().values, 15, atol=2))[0]
df['Num_authors'].iloc[val] # for faster access, use .iat
4
When performing np.isclose
on the series
later converted to an array:
np.isclose(df.Num_Albums.cumsum().values, 15, atol=2)
array([False, False, True, False, False, False], dtype=bool)
Opt - 2:
Use pd.Index.get_loc
on the cumsum
calculated series which also supports a tolerance
parameter on the nearest
method.
val = pd.Index(df.Num_Albums.cumsum()).get_loc(15, 'nearest', tolerance=2)
df.get_value(val, 'Num_authors')
4
Opt - 3:
Use idxmax
to find the first index of a True
value for the boolean mask created after sub
and abs
operations on the cumsum
series:
df.get_value(df.Num_Albums.cumsum().sub(15).abs().le(2).idxmax(), 'Num_authors')
4
Solution 2:
I think you can directly add a column with the cumulative sum as:
In [3]: df
Out[3]:
index Num_Albums Num_authors
0 0 10 4
1 1 1 5
2 2 4 4
3 3 7 1000
4 4 1 44
5 5 3 8
In [4]: df['cumsum'] = df['Num_Albums'].cumsum()
In [5]: df
Out[5]:
index Num_Albums Num_authors cumsum
0 0 10 4 10
1 1 1 5 11
2 2 4 4 15
3 3 7 1000 22
4 4 1 44 23
5 5 3 8 26
And then apply the condition you want on the cumsum
column. For instance you can use where
to get the full row according to the filter. Setting the tolerance tol
:
In [18]: tol = 2
In [19]: cond = df.where((df['cumsum']>=15-tol)&(df['cumsum']<=15+tol)).dropna()
In [20]: cond
Out[20]:
index Num_Albums Num_authors cumsum
22.04.04.015.0
Solution 3:
This could even be done as following code:
def your_function(df):
sum=0
index=-1
for i indf['Num_Albums'].tolist():
sum+=i
index+=1
ifsum == ( " your_condition " ):
return (index,df.loc([df.Num_Albums==i,'Num_authors']))
This would actually return a tuple of your index and the corresponding value of Num_authors as soon as the "your condition" is reached.
or could even be returned as an array by
def your_function(df):
sum=0
index=-1
for i indf['Num_Albums'].tolist():
sum+=i
index+=1
ifsum == ( " your_condition " ):
return df.loc([df.Num_Albums==i,'Num_authors']).index.values
I am not able to figure out the condition you mentioned of the cumulative sum as when to stop summing so I mentioned it as " your_condition " in the code!!
I am also new so hope it helps !!
Post a Comment for "Python Data Frame: Cumulative Sum Of Column Until Condition Is Reached And Return The Index"