Coding Variables With Pandas Timeseries
Solution 1:
I've been asked via email if I ever found a solution to what I wanted to do here, so I'm sharing what I've been doing to date. This might not be the canonical way of using pandas, but it's sufficed for me.
In short, I've split my data into a couple of data frames.
The first, data, is as above, but I only use the columns which correspond to single values, like trial, stimuli, resp, and rt.
For my time series data, I use two additional data frames, one for the x-coordinate data, and one for the y. Although there's probably a more elegant way of generating these, my code does the following.
data.iloc[0]
time_stamp 21/11/201313:06
subject 1276270
trial 0
stimuli 14
resp 2
rt 1145
x [-0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0....
y [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
t [1, 26, 26, 35, 45, 55, 65, 75, 85, 95, 105, 1...
Name: 0, dtype: object
data['nx'], data['ny'] = zip(*
[even_time_steps(x, y, t)
for x, y, t, in zip(data.x, data.y, data.t)])
# Using function even_time_steps from package squeak
# https://github.com/EoinTravers/Squeak
# Simpler applications could use
# data['nx'] = [pd.TimeSeries(x) for y in data['x']]
# data['ny'] = [pd.TimeSeries(x) for y in data['y']]
# Seperate DataFrames
nx = pd.concat(list(data.nx), axis=1).T
ny = pd.concat(list(data.ny), axis=1).T
# Remove redundant columns
redundant = ['nx', 'ny', 'x', 'y'] # etc...
data = data.drop(redundant, axis=1)
# Important - reindex data
data.index = range(len(data)) # 0, 1, 2, ..., len(data)
Now data contains all my coding information, nx all my x-coordinate information, and ny my y coordinate information.
nx.head()
0 1 2 3 4 5 6 7 8 9 ... 91
0 0 0 0 0 0 0 0 0 0 0.00000 ... 0.953960
1 0 0 0 0 0 0 0 0 0 0.00099 ... 1.000000
2 0 0 0 0 0 0 0 0 0 0.00000 ... 1.010000
3 0 0 0 0 0 0 0 0 0 0.00000 ... 0.870396
4 0 0 0 0 0 0 0 0 0 0.00000 ... 1.000000
92 93 94 95 96 97 98 99 100
0 0.993564 1.000000 1.000000 1.00000 1.000000 1 1 1 1
1 1.000000 1.000000 1.000000 1.00000 1.000000 1 1 1 1
2 1.010000 1.008812 1.003960 1.00000 1.000000 1 1 1 1
3 0.906238 0.936931 0.973564 0.98604 0.993366 1 1 1 1
4 1.000000 1.000000 1.000000 1.00000 1.000000 1 1 1 1
[5 rows x 101 columns]
Finally, to select specific subsets of the x and y data, according to coding variables stored in data, I just take the index of the relevant subset of data
subject1_index = data[data.subject==1].index
print subject1_index
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39], dtype='int64')
and select a matching subset of nx and ny using the iloc method.
sub1_x = nx.iloc[subject1_index]
sub1_y = ny.iloc[subject1_index]
foriin subject1_index:
plt.plot(nx.iloc[i], ny.iloc[i], 'r', alpha=.3)
plt.plot(sub1_x.mean(), sub1_y.mean(), 'r', linewidth=2)

EDIT: For completeness, note that a lot of my analysis requires long format data (and is carried out in R). Again, there may be a more elegant way of doing this (so use at your own risk!), but my code goes (note, this is real code, from a different dataset, and I haven't bothered to change the variable names to match the original example):
# Long format data
wide_data = data.copy()
steps = nx.columns
for i in steps:
wide_data['nx_%i' % i] = nx[i]
wide_data['ny_%i' % i] = ny[i]
id_vars = ['subject_nr', 'condition', 'count_trial_sequence',
'trial_id', 'choice', 'accuracy']
# Long data with 'nx' as the variable
long_data = pd.melt(wide_data, id_vars=id_vars, value_vars = ['nx_%i' % i for i in steps])
long_data['step'] = long_data.variable.map(lambda s: int(s[3:]))
long_data['nx'] = long_data.value
# Same with 'ny'
tmp_long = pd.melt(wide_data, id_vars=id_vars, value_vars = ['ny_%i' % i for i in steps])
# Combine in single data frame
long_data['ny'] = tmp_long['value']
del tmp_long
long_data = long_data.drop(['variable', 'value'], axis=1)
long_data.to_csv(os.path.join('data', 'long_data.csv'))
long_data.head()
Out[41]:
subject_nr condition count_trial_sequence trial_id choice accuracy
0505250022 A 013 rsp1 True1505250022 A 116 rsp1 True2505250022 B 22 rsp2 False3505250022 B 30 rsp1 False4505250022 C 433 rsp2 False
step nx ny
00001000200030004000
Post a Comment for "Coding Variables With Pandas Timeseries"