Tuesday, August 26, 2014

%sql: To Pandas and Back

A Pandas DataFrame has a nice to_sql(table_name, sqlalchemy_engine) method that saves itself to a database.

The only trouble is that coming up with the SQLAlchemy Engine object is a little bit of a pain, and if you're using the IPython %sql magic, your %sql session already has an SQLAlchemy engine anyway. So I created a bogus PERSIST pseudo-SQL command that simply calls to_sql with the open database connection:

%sql PERSIST mydataframe

The result is that your data can make a very convenient round-trip from your database, to Pandas and whatever transformations you want to apply there, and back to your database:

In [1]: %load_ext sql

In [2]: %sql postgresql://@localhost/
Out[2]: u'Connected: @'

In [3]: ohio = %sql select * from cities_of_ohio;
246 rows affected.

In [4]: df = ohio.DataFrame()

In [5]: montgomery = df[df['county']=='Montgomery County']

In [6]: %sql PERSIST montgomery
Out[6]: u'Persisted montgomery'

In [7]: %sql SELECT * FROM montgomery
11 rows affected.
[(27L, u'Brookville', u'5,884', u'Montgomery County'),
 (54L, u'Dayton', u'141,527', u'Montgomery County'),
 (66L, u'Englewood', u'13,465', u'Montgomery County'),
 (81L, u'Germantown', u'6,215', u'Montgomery County'),
 (130L, u'Miamisburg', u'20,181', u'Montgomery County'),
 (136L, u'Moraine', u'6,307', u'Montgomery County'),
 (157L, u'Oakwood', u'9,202', u'Montgomery County'),
 (180L, u'Riverside', u'25,201', u'Montgomery County'),
 (210L, u'Trotwood', u'24,431', u'Montgomery County'),
 (220L, u'Vandalia', u'15,246', u'Montgomery County'),
 (230L, u'West Carrollton', u'13,143', u'Montgomery County')]