Writing data to a sql database with pandas

When manipulating you data using pandas, it is sometimes useful to store a dataframe. Pandas provides multiple ways to export dataframes. The most common consist in exporting to a csv, a pickle, to hdf or to excel. However, exporting to a sql database can prove very useful. Indeed, having a well structured database is a great for storing all the data related to your analysis in one place.

In this tutorial, we will see how to store a dataframe in a new table of a sqlite dataframe. Note than it would also work with any other sql database a long as you change the connxion to the one that suits your needs.

First let's import pandas and sqlite3

import pandas as pd
import sqlite3
from sqlite3 import Error
# Example dataframe

raw_data  = {'fruit': ['Banana', 'Orange', 'Apple', 'lemon', "lime", "plum"], 
        'color': ['yellow', 'orange', 'red', 'yellow', "green", "purple"], 
        'kcal': [89, 47, 52, 15, 30, 28]
    }

df = pd.DataFrame(raw_data, columns = ['fruit', 'color', 'kcal'])
df
fruit color kcal
0 Banana yellow 89
1 Orange orange 47
2 Apple red 52
3 lemon yellow 15
4 lime green 30
5 plum purple 28

Now that the DataFrame has been created, let's push it to the sqlite database called mydatabase.db in a new table called tutyfrutty

db_file = ".\\mydatabase.db"
try :
    with sqlite3.connect(db_file) as conn:
        df.to_sql("tutyfrutty", conn)
except Error as e:
    print(e)
except ValueError :
    print("The TABLE tutyfrutty already exists, read below to understand how to handle this case")

Note that if the table tutyfrutty was already existing, the to_sql function will return a ValueError.

This is where, the if_exists option comes into play. Let's look at the docstring of this function :

"""
if_exists : {'fail', 'replace', 'append'}, default 'fail'
    - fail: If table exists, do nothing.
    - replace: If table exists, drop it, recreate it, and insert data.
    - append: If table exists, insert data. Create if does not exist.
"""

Let's say, I want to update my dataframe with some new rows

df.loc[len(df)+1] = ['Cranberry', 'red', 308]
df
fruit color kcal
0 Banana yellow 89
1 Orange orange 47
2 Apple red 52
3 lemon yellow 15
4 lime green 30
5 plum purple 28
7 Cranberry red 308
8 Cranberry red 308

I can now replace the table with the new values using the "replace" option

db_file = ".\\mydatabase.db"
try :
    with sqlite3.connect(db_file) as conn:
        df.to_sql("tutyfrutty", conn, if_exists="replace")
except Error as e:
    print(e)