Filter or select lines of a DataFrame containing values in a list
In this article we will learn to filter the lines of a dataframe based on the values contained in a column of that dataframe. This is simular to the "Filter" functionnality of Excel. Let's first create our dataframe :
# Import modules
import pandas as pd
# Example dataframe
raw_data = {'fruit': ['Banana', 'Orange', 'Apple', 'lemon', "lime", "plum"],
'color': ['yellow', 'orange', 'red', 'yellow', "green", "purple"],
'kcal': [89, 47, 52, 15, 30, 28]
}
df = pd.DataFrame(raw_data, columns = ['fruit', 'color', 'kcal'])
df
fruit | color | kcal | |
---|---|---|---|
0 | Banana | yellow | 89 |
1 | Orange | orange | 47 |
2 | Apple | red | 52 |
3 | lemon | yellow | 15 |
4 | lime | green | 30 |
5 | plum | purple | 28 |
If we want to extract all the lines where the value of the color column is yellow, we would proceed like so :
df[df["color"] == "yellow"]
fruit | color | kcal | |
---|---|---|---|
0 | Banana | yellow | 89 |
3 | lemon | yellow | 15 |
Now, if we want to filter the DataFrame by a list of values we would rather use the isin
method like this :
df[df["color"].isin(["yellow", "red"])]
fruit | color | kcal | |
---|---|---|---|
0 | Banana | yellow | 89 |
2 | Apple | red | 52 |
3 | lemon | yellow | 15 |