Select rows from a Pandas DataFrame based on values in a column


Import modules

import pandas as pd
                

Create some dummy data

raw_data = {'name': ['Willard Morris', 'Al Jennings', 'Omar Mullins', 'Spencer McDaniel'],
                'age': [20, 19, 22, 21],
                'favorite_color': ['blue', 'blue', 'yellow', "green"],
                'grade': [88, 92, 95, 70]}

df = pd.DataFrame(raw_data)
df.head()
age favorite_color grade name
0 20 blue 88 Willard Morris
1 19 blue 92 Al Jennings
2 22 yellow 95 Omar Mullins
3 21 green 70 Spencer McDaniel

Select rows based on column value:

#To select rows whose column value equals a scalar, some_value, use ==:
df.loc[df['favorite_color'] == 'yellow']
age favorite_color grade name
2 22 yellow 95 Omar Mullins

Select rows whose column value is in an iterable array:

#To select rows whose column value is in an iterable array, which we'll define as array, you can use isin:
array = ['yellow', 'green']
df.loc[df['favorite_color'].isin(array)]
age favorite_color grade name
2 22 yellow 95 Omar Mullins
3 21 green 70 Spencer McDaniel

Select rows based on multiple column conditions:

#To select a row based on multiple conditions you can use &:
array = ['yellow', 'green']
df.loc[(df['age'] == 21) & df['favorite_color'].isin(array)]
age favorite_color grade name
3 21 green 70 Spencer McDaniel

Select rows where column does not equal a value:

#To select rows where a column value does not equal a value, use !=:
df.loc[df['favorite_color'] != 'yellow']
age favorite_color grade name
0 20 blue 88 Willard Morris
1 19 blue 92 Al Jennings
3 21 green 70 Spencer McDaniel

Select rows whose column value is not in an iterable array:

#To return a rows where column value is not in an iterable array, use ~ in front of df:
array = ['yellow', 'green']
df.loc[~df['favorite_color'].isin(array)]
age favorite_color grade name
0 20 blue 88 Willard Morris
1 19 blue 92 Al Jennings



Copyright © Erik Rood 2020 | Twitter