Reference Dataframe Value Corresponding To Column Header
Solution 1:
We can use numpy indexing as recommended by the documentation on Looking up values by index/column labels as the replacement for the deprecated DataFrame.lookup
.
With factorize
Select
and reindex
:
idx, cols = pd.factorize(df['Select'])
df['value'] = (
df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
)
Notice 1: If there is a value in the factorized column that does not correspond to a column header, the resulting value will be
NaN
(which indicates missing Data).Notice 2: Both indexers need to be 0 based range indexes (compatible with numpy indexing).
np.arange(len(df))
creates the range index based on the length of the DataFrame and therefore works in all cases.
However, if the DataFrame already has a compatible index (like in this example) df.index
can be used directly.
idx, cols = pd.factorize(df['Select'])
df['value'] = (
df.reindex(cols, axis=1).to_numpy()[df.index, idx]
)
df
:
Area 1234Selectvalue0225433462342314536543214136267342911143113543519224521942127394322343
Another option is Index.get_indexer
:
df['value'] = df.to_numpy()[
df.index.get_indexer(df.index),
df.columns.get_indexer(df['Select'])
]
- Notice: The same condition as above applies, if
df.index
is already a contiguous 0-Based index (compatible with numpy indexing) we can usedf.index
directly, instead of processing it withIndex.get_indexer
:
df['value'] = df.to_numpy()[
df.index,
df.columns.get_indexer(df['Select'])
]
df
:
Area 1234Selectvalue0225433462342314536543214136267342911143113543519224521942127394322343
Warning For get_indexer
: if there is a value in Select
that does not correspond to a column header the return value is -1
which will return the value from the last column in the DataFrame (since python supports negative indexing relative to the end). This is not as safe as NaN
since it will return a value from the Select
Column which is numeric and it may be difficult to tell the Data is invalid immediately.
Sample Program:
import pandas as pd
df = pd.DataFrame({
'Select': ['B', 'A', 'C', 'D'],
'A': [47, 2, 51, 95],
'B': [56, 88, 10, 56],
'C': [70, 73, 59, 56]
})
df['value'] = df.to_numpy()[
df.index,
df.columns.get_indexer(df['Select'])
]
print(df)
Notice in the last row the Select column is D
but it pulls the value from C
which is the last column in the DataFrame (-1). This is not immediately apparent that the lookup failed/is incorrect.
Select AB C value
0B475670561A2887322 C 511059593 D 95565656 # <- Value from C
Compared with factorize
:
idx, cols = pd.factorize(df['Select'])
df['value'] = (
df.reindex(cols, axis=1).to_numpy()[df.index, idx]
)
Notice in the last row the Select column is D
and the corresponding value is NaN which is used in pandas to indicate missing data.
Select AB C value
0B47567056.01A288732.02 C 51105959.03 D 955656 NaN # <- Missing Data
Setup and imports:
import numpy as np # (Only needed is using np.arange)import pandas as pd
df = pd.DataFrame({
'Area': [22, 45, 67, 54, 21],
1: [54, 36, 34, 35, 27],
2: [33, 54, 29, 19, 39],
3: [46, 32, 11, 22, 43],
4: [23, 14, 14, 45, 22],
'Select': [4, 1, 3, 2, 3]
})
Solution 2:
Try .apply
with axis=1
. Inside the lambda
you can use value from Select
column to reference the value:
df["Value"] = df.apply(lambda x: x[x["Select"]], axis=1)
print(df)
Prints:
Area 1 2 3 4 Select Value
0 22 54 33 46 23 4 23
1 45 36 54 32 14 1 36
2 67 34 29 11 14 3 11
3 54 35 19 22 45 2 19
4 21 27 39 43 22 3 43
Post a Comment for "Reference Dataframe Value Corresponding To Column Header"