Use the pipe function for fluent pandas api
pipe
is a method that accepts a function
pipe
, by default, assumes the first argument of this function is a data frame and passes the current dataframe down the pipeline
The function should return a dataframe also, if you want to continue with the chaining.
Yet, it can also return any other value if you put it in the last step.
This is incredibly valuable because it takes you one step further from SQL where you do things in reverse
Create a sample dataframe
# Import modules
import pandas as pd
# Example dataframe
raw_data = {'fruit': ['Banana', 'Orange', 'Apple', 'lemon', "lime", "plum"],
'color': ['yellow', 'orange', 'red', 'yellow', "green", "purple"],
'kcal': [89, 47, 52, 15, 30, 28],
'size_cm' : [20, 10, 9, 7, 5, 4]
}
df = pd.DataFrame(raw_data, columns = ['fruit', 'color', 'kcal', "size_cm"])
df
fruit | color | kcal | size_cm | |
---|---|---|---|---|
0 | Banana | yellow | 89 | 20 |
1 | Orange | orange | 47 | 10 |
2 | Apple | red | 52 | 9 |
3 | lemon | yellow | 15 | 7 |
4 | lime | green | 30 | 5 |
5 | plum | purple | 28 | 4 |
def add_to_col(de, col='kcal', n=200):
ret=df.copy() # a dataframe is mutable, we use copy in order to avoid modifying any data
ret[col]=ret[col]+n
return ret
(df
.pipe(add_to_col)
.pipe(add_to_col, col='size_cm',n=10)
.head(5)
)
fruit | color | kcal | size_cm | |
---|---|---|---|---|
0 | Banana | yellow | 89 | 30 |
1 | Orange | orange | 47 | 20 |
2 | Apple | red | 52 | 19 |
3 | lemon | yellow | 15 | 17 |
4 | lime | green | 30 | 15 |