Writing data to a sql database with pandas
When manipulating you data using pandas, it is sometimes useful to store a dataframe. Pandas provides multiple ways to export dataframes. The most common consist in exporting to a csv, a pickle, to hdf or to excel. However, exporting to a sql database can prove very useful. Indeed, having a well structured database is a great for storing all the data related to your analysis in one place.
In this tutorial, we will see how to store a dataframe in a new table of a sqlite dataframe. Note than it would also work with any other sql database a long as you change the connxion to the one that suits your needs.
First let's import pandas and sqlite3
import pandas as pd
import sqlite3
from sqlite3 import Error
# Example dataframe
raw_data = {'fruit': ['Banana', 'Orange', 'Apple', 'lemon', "lime", "plum"],
'color': ['yellow', 'orange', 'red', 'yellow', "green", "purple"],
'kcal': [89, 47, 52, 15, 30, 28]
}
df = pd.DataFrame(raw_data, columns = ['fruit', 'color', 'kcal'])
df
fruit | color | kcal | |
---|---|---|---|
0 | Banana | yellow | 89 |
1 | Orange | orange | 47 |
2 | Apple | red | 52 |
3 | lemon | yellow | 15 |
4 | lime | green | 30 |
5 | plum | purple | 28 |
Now that the DataFrame has been created, let's push it to the sqlite database called mydatabase.db in a new table called tutyfrutty
db_file = ".\\mydatabase.db"
try :
with sqlite3.connect(db_file) as conn:
df.to_sql("tutyfrutty", conn)
except Error as e:
print(e)
except ValueError :
print("The TABLE tutyfrutty already exists, read below to understand how to handle this case")
Note that if the table tutyfrutty was already existing, the to_sql function will return a ValueError.
This is where, the if_exists option comes into play. Let's look at the docstring of this function :
"""
if_exists : {'fail', 'replace', 'append'}, default 'fail'
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
"""
Let's say, I want to update my dataframe with some new rows
df.loc[len(df)+1] = ['Cranberry', 'red', 308]
df
fruit | color | kcal | |
---|---|---|---|
0 | Banana | yellow | 89 |
1 | Orange | orange | 47 |
2 | Apple | red | 52 |
3 | lemon | yellow | 15 |
4 | lime | green | 30 |
5 | plum | purple | 28 |
7 | Cranberry | red | 308 |
8 | Cranberry | red | 308 |
I can now replace the table with the new values using the "replace" option
db_file = ".\\mydatabase.db"
try :
with sqlite3.connect(db_file) as conn:
df.to_sql("tutyfrutty", conn, if_exists="replace")
except Error as e:
print(e)