Friday, May 03, 2013

ipython-sql for multi-database comparisons

For my newest ipython-sql trick, I needed to compare some queries run across different databases. How hard would it be to get side-by-side results into tidy IPython Notebook output?

Not hard at all, it turns out, if you're willing to violate basic principles of human decency.

That's an itty-bitty image, so here's the crazy part:

class SQL_Comparitor(object):
    def __init__(self, *connection_strings):
        self.connection_strings = connection_strings
    def run(self, qry):
        dframes = []
        for connection_string in self.connection_strings:
            result = %sql $connection_string $qry

Did you catch that? I used %sql magic and IPython variable substitution inside an instance method. It feels so wrong! But it works! Provided you're running within IPython, of course; normal Python will not under any circumstances run an unholy perlish abomination like this. I'm just really amazed that we can use IPython tricks inside class definitions, but it's real.

Since the result is a Pandas DataFrame, it's easy to apply transformations. For instance, say you only want the rows where the values are different:

diff = results[results['svr1/db1_Value'] != results['svr2/db2_Value']]

I'm not sure how to distribute this class, since it's small and it's not actually valid Python, just valid IPython. For now I've made a gist (and its nbviewer version).


stonebig said...

Hi Catherine,

It is awesome, magic, and simple : thank you very very much !

If I dear ask : would you have another example for the reverse case : feeding a sql server from a pandas dataframe ?


Unknown said...

For the reverse direction, you should probably use calchipan or pandasql.

If you really want, you can do it with %sql with something like

%sql INSERT INTO my_table (col1, col2) VALUES '$col1', '$col2'

... except then you need to worry about escaping single-quotes in your values, and so forth.