Catherine: pyOraGeek: 2011

Thursday, October 27, 2011

MySQL Bizarro World

Getting used to MySQL has been a real challenge for me. Most everything I know about databases is backward in this MySQL world.

In the REAL WORLD,
table names are case-insensitive.

In MySQL WORLD,
table names are case-sensitive. Maybe. Depending on what platform you're running on.

in the REAL WORLD,
queries against VARCHAR data are case-sensitive.

in the MySQL WORLD,
queries against VARCHAR data are case-insensitive. (So is every other use of the data. Which can cause immeasurable pain w/r/t UNIQUE constraints if you weren't prepared for it.)

in the REAL WORLD,
database connections are expensive, and complex queries are optimized well. If you can connect to the database once and issue a single complex query, you'll get much better performance than if you make repeated connections and issue large numbers of simple queries.

in the MySQL WORLD,
database connections are cheap, and complex queries are optimized badly. If you make repeated connections and issue large numbers of simple queries, you'll get much better performance than if you connect to the database once and issue a single complex query.

I think the conclusion is obvious: MySQL was written by programmers from Superman's Bizarro World.

Incidentally, the Dayton Oracle User Group is planning a MySQL-themed meeting in the mid-term future. If you'd like to get involved - as an attendee or a speaker - let me know!

Hello!

Thursday, October 20, 2011

HTSQL answers

HTSQL slides

Thanks to the great audiences at my HTSQL talks at Ohio Linuxfest and Dayton IEEE! (And to the folks who will come see me at Columbus Code Camp on Saturday.) I've promised you answers to some of your questions that stumped me, and (for OLF people) been criminally slow at getting them to you. So here you are! Some of the answers come from my own research, but I've also inserted quotes directly from the creator of HTSQL, Clark Evans... the embedded quotes are from Clark.

1. How do you restrict access via HTSQL?

http://htsql.org/doc/install.html#security

First, consider carefully which database user account you use to run htsql-ctl serve, and assign only the rights that user (representing your HTSQL users) should legitimately have.

Second, you can (and probably should) close down port 8080 (or whatever port you're serving HTSQL on) on your machine's firewall, and route all traffic through a webserver like Apache. (My HTRAF directions tell how to do that.) Then you can apply whatever authentication, IP limits, etc. you need at the webserver level.

If you need multiple groups to access your data with varying levels of permission, it's easy to run multiple instances of HTSQL as multiple
database users, route those instances through Apache, and restrict them at the Apache level appropriately.

This is a great answer. A few more items:

If the database is static (updated periodically), want to put varnish or something on the front. When you make another "data push" you could run though common queries to warm the cache. This is what we do for demo.htsql.org so that queries in our tutorial don't even hit the server.

For PostgreSQL, there is also a ``select timeout`` you can set using the "tweak.timeout" plugin, it can help a little bit with accidental denial of service. Basically, it cancels a query if your query runs over a particular number of seconds. If other databases have this ability, we could add a similar feature.

There is a also a ``autolimit`` that you can apply, this adds a LIMIT X to every query. In the current HTSQL implementation, all the results have to fit into memory: so, you can kill the backend process by creating a large result set. We'll fix this problem sometime next year... if you have a friendly audience, this generally isn't a problem. This plugin helps ensure users don't "accidently" create a big result though.

Even with these two enabled, you can still make queries that bring down either the HTSQL server (via memory exhaustion on big result) or the Database (via memory or cpu denial). So, some caution is advised if you give *direct* HTSQL access since you're letting arbitrary queries be created and such.

One solid way to handle this is separate the "trusted" users who need to create queries from "untrusted" users who are just running canned reports and dashboards. There's a "ssi" demo for doing this and we'll improve on it later. Basically, you have .htsql files server side with canned queries in them. You then limit users to only access .htsql saved queries. It'd be great to have this more automated... the demo code is just that: a proof of concept.

2. How can you paginate results?

You can request a "page" of results with HTSQL's limit() function. The optional second argument is an offset:

http://demo.htsql.org/course.limit(10)
http://demo.htsql.org/course.limit(10,10)
http://demo.htsql.org/course.limit(10,20)

There is a tweak.autolimit available to keep users from killing off their browsers with mistakenly broad queries. For example:

htsql-ctl -E tweak.autolimit:limit=10 serve pgsql://user:pwd@host/database

Users may also want to consider a browser like Chrome, where a runaway tab won't lock up the entire browser.

There's no way currently to have HTML results automatically insert "Next Page"-type links. Keep in mind that users aren't likely to genuinely want to page manually through very long result sets anyway; they'd probably be better off narrowing their queries rather than searching long lists by eyeball.

There are plans for HTRAF to generate automatically paginated tables at some point in the future.

3. Performance-wise, how does HTSQL respond under intense loads?

The best way to perform under intense loads is to not perform at all (see varnish above). The HTSQL server is also stateless: you can load balance as many copies as you need to meet demand.

As far as a single-process goes, the time spent converting HTSQL-to-SQL isn't really that significant compared to the query execution. The result handling isn't bad... although we've not done any load testing.

The SQL generation is probably the most important part. I think HTSQL generates quite clear/clean SQL, although your mileage may vary. To make any real judgments you need test queries and test data and profile it. Often times a 30 line SQL query that looks a bit ugly/repetitive will outperform a "hand-optimized" SQL query that is 12 lines.

We (Prometheus Research) [would] be delighted to help figure out why HTSQL performs badly if you have a specific query and test data set for us!

Anecdotally, for big (1-3 page) queries we have converted from SQL to HTSQL, the HTSQL equivalent is ~40% less source code and an order of magnitude more readable and maintainable. The SQL we then generate is typically bigger, sometimes 2x the size, but so far, the performance in the samples we have has been as-good-if-not-better than the original.

4. What about outer joins in HTSQL?

All HTSQL joins are LEFT OUTER joins - rows from the "driving" table are always included in the results, whether or not there are also records in the joined tables... it's a natural consequence of the driving table always determining the size of the result set. (If you specifically want to exclude rows that don't have counterparts in the joined tables, you can use ?exists(joined-table). More about that here.

Wednesday, October 19, 2011

Columbus Code Camp

I'll be speaking on HTSQL this Saturday at Columbus Code Camp. It was a great event last year, and I'm expecting even better this year. Check out the schedule - it's an excellent set of "I gotta see that!" topics.

Hope to see you there! (Unless you're at Southwest Ohio GiveCamp, of course - sorry I have to miss my GiveCamp friends this year.)

Sunday, September 11, 2011

HTRAF setup, from zero

As I mentioned in my OLF talk, setting up HTSQL is crazy-easy.

HTRAF, however - the library that lets you do the gorgeous graphics - takes a few more steps. The documentation doesn't spell out the webserver-specific aspects of the setup, which may confuse you if you aren't an experienced webserver admin. So, here's my expanded version "getting started with HTRAF". My directions assume an Ubuntu machine.

1.

sudo easy_install htsql

It doesn't have to be on the same machine you use for HTRAF, just a machine that your HTRAF machine can contact.

2. Start up HTSQL:

htsql-ctl serve postgres://username:password@hostname/databasename

3. On your HTRAF machine, install a web server.

sudo apt-get install apache2

4. Get the HTRAF library. The simplest thing is to put it right under your Apache DocumentRoot:


sudo su - www-data
cd /var/www/
wget http://htsql.org/download/HTRAF-latest.zip
unzip HTRAF-latest.zip

5. You need to handle HTSQL requests via your Apache server. (If you try contacting the HTSQL server directly from your webpages, users' browsers are likely to block you, thinking that the site includes a cross-site scripting attack.)

So you'll need to change your Apache server configuration by adding ProxyPass and ProxyPassReverse directives. Apache configuration files are structured differently on different distributions; on mine, I used

gksudo gedit /etc/apache2/sites-enabled/000-default

to add


    ProxyPass /htsql/ http://localhost:8080/
    ProxyPassReverse /htsql/ http://localhost:8080/

just within the <VirtualHost *:80> directive.

6. Next, you need to enable mod_proxy on your Apache, so that it knows what to do with a ProxyPass.


cd /etc/apache2
sudo cp mods-available/proxy_*.* mods-enabled/

7. Now restart Apache so that the new settings will take effect.

sudo service apache2 restart

8. Test it out! Hit this with your web browser:
http://localhost:8080/htsql/a_table_from_your_database

9. Now write a webpage that includes HTRAF elements calling HTSQL! Here's a minimal example.


<html>
<head>
     <script type="text/javascript"
     src="HTRAF-2.0.0b1/htraf/htraf.js"
     data-htsql-version="2"
     data-htsql-prefix="/htsql">
     </script>
    <link rel="stylesheet" type="text/css" 
        href="HTRAF-2.0.0b1/htraf/htraf-02.css"/>
</head>
<body>
<select id="school"
    data-htsql="/school{code, name}?exists(department)">
    </select>
<h3>Departments</h3>
<table id="department" data-hide-column-0="yes"
    data-htsql="/department{code, name,
                     count(course) :as '%23 of courses'}
         ?school_code=$school&name~$department_name"
    data-ref="school department_name">
</table>
</body>
</html>

Save it as /var/www/minimal.html and view it at http://localhost/minimal.html.

8. Hey, that table looks bland! If you preferred the colors I showed in my talk, you can use this CSS, which I copied from HTRAF's demo a few months ago. Save it to /var/www/HTRAF-2.0.0b1/htraf/htraf-02.css and change the stylesheet in your <head> to


    <link rel="stylesheet" type="text/css" 
    href="HTRAF-2.0.0b1/htraf/htraf-02.css"/>

HTSQL slides posted

Thanks to my Ohio Linuxfest audience for your attention and interest! My slides from yesterday are posted:

Your Database, Exposed: HTSQL

Shortly, I'll also post a summary of the questions I was asked that I didn't have firm answers for.

As usual, I had great time at OLF. As usual, I brought my voice to its knees by talking to awesome people in the noisy exhibit hall before my talk... I'm going to ask for a morning speaking slot next time I speak! Thanks, everybody!

Wednesday, September 07, 2011

Ohio LinuxFest

I feel bad for not blogging after PyOhio. I just have trouble finding words. Along with PyCon, it's a sort of family reunion for me.

But anyway - next conference: Ohio LinuxFest. This weekend, so sign up now now now move move move - I think today is the pre-reg deadline. It's always a great event, draws people from all over the East and sometimes further. Look for our PyOhio table to have a Python chat (or help staff the table, and introduce other attendees to Python joy). OLF is one of the best places in the region for midway mingling.

I'm speaking on HTSQL. I spoke on it at Indiana LinuxFest in the spring, too, only this time it follows several months of using it seriously at work. The experience has only made me more enthusiastic about HTSQL. Check it out... there's still time to be an early adopter and sneer at everybody else after it becomes famous.

See you in Columbus! Register now!

Monday, August 08, 2011

MSSQL to CSV

Searching for ways to dump CSV from a MS SQL Server brings up recommendations to buy various third-party tools.

Excuse me? Buy tools for something any Pythonista can do in seventeen lines?

You'll need the pyodbc module, which you should have anyway because it rocks.


import csv

import pyodbc



cnxn = pyodbc.connect(

       '''DRIVER={SQL Server};

          SERVER=localhost;

          DATABASE=mydb;

          UID=myname;

          PWD=mypwd''')

curs = cnxn.cursor()



def write_table(tblname):

    with open(tblname+'.csv', 'wb') as outfile:

        writer = csv.writer(outfile)

        curs.execute('SELECT * FROM %s' % tblname)

        writer.writerows(curs.fetchall())



curs.execute('SELECT name FROM sys.tables')

table_names = curs.fetchall()

for table_name in table_names:

    write_table(table_name.name)

Go on, get more sophisticated with the hardcoded connect string, etc.

AND A NOTE: If the ultimate destination of your .csv is to be imported into another database, you'd better distinguish between empty strings and NULLS. To do that, replace


        writer.writerows(curs.fetchall())

with


        for row in curs:

            writer.writerow(['NULL' if r is None else r 

                             for r in row])

AND, if you end up importing this .csv into MySQL, you'll want to set ESCAPED BY '' in your LOAD DATA statement, or else backslashes will start mucking up your field boundaries. (Thanks to Shannon -jj Behrens for saving my sanity on that one). Here's my script to consume the files:


for fn in `ls *.csv`

do

  tbl=${fn%.*}

  mysql myinst -e  "SET FOREIGN_KEY_CHECKS=0; 

                    DELETE FROM $tbl; 

                    LOAD DATA LOCAL INFILE '$fn' 

                      INTO TABLE $tbl 

                      COLUMNS TERMINATED BY ',' 

                      OPTIONALLY ENCLOSED BY '\"' 

                      ESCAPED BY '';"

done

Wednesday, July 27, 2011

Analytic functions

A life history of unforgettable moments of clarity:

1983: Jesus
1994: Special relativity
2011: Analytic functions

I've known for a long time that I really should make use of analytic database functions. I think I've RTFMed a half-dozen times over the years, sometimes fumbling through an example or two, but never really getting them. Tom Kyte's appearance at the Dayton-Oracle User Group finally made it clear to me.

Fortunately, (a different delivery of) Tom's talk was recorded. Go, watch it! You don't have to live without analytic functions one more day!

In my case, I'd been tormented by questions about "the most recent record where..." in a MySQL database; the complex classic SQL queries I was using performed horribly in MySQL. MySQL doesn't have analytic functions, so I converted the reporting database to PostgreSQL. It paid off enormously; analytic functions cut several queries from multiple days to a few minutes. I am a Believer.

Thursday, July 14, 2011

KACE M300

We interrupt this blog with a message from my sponsor.

One of the cool things about working at KACE is helping make a product that people genuinely need and want. The job of keeping up with an organization's computers, networks, software, licenses, patches, security exposure, etc. is growing to near hopelessness for IT pros without good tools. KACE appliances are effective, affordable, and simple for medium and larger enterprises.

Small operations have different standards of what's "affordable" and "simple", though - so today KACE introduces the M300, a systems management appliance cheap and simple enough for virtually anybody.

I've only had a tiny, peripheral role in bringing out the M300, but it still feels really good. Better systems management is something the world truly needs - too many human souls pour way too many hours into the scutwork part of IT. I hope the M300 can get a lot more people get free from that.

Wednesday, July 06, 2011

Tom Kyte in Dayton

Tom Kyte of asktom.oracle.com is coming to Dayton on July 14!

This is crazy-big for Dayton. Tom is one of the most amazing Oracle experts out there, with an immense knowledge of Oracle and a gift for getting straight to the heart of questions and problems; you really need to see him in action.

Thanks so much to Vicki Blommel for bringing this huge win to Dayton - when I was leading the Dayton Oracle User Group, I never even dreamed of bringing somebody this big to town.

Monday, March 28, 2011

HTSQL and Indiana LinuxFest

Thanks and congratulations to Lord Drachenblut and the rest of the volunteer crew for putting on Indiana LinuxFest! They did a great job. I thought the participation was really impressive, looking and feeling more like a well-established conference than a brand-new one. Nothing will displace Ohio LinuxFest in my heart, naturally, but it looks like the community in this region can support and be bolstered by two annual conferences. It's almost enough to blunt the pain of missing PyCon... well, OK, not really. But it was still great. I got a chance to see Tarus Balog again, who's so gracious it's crazy, and meet a whole mess of FLOSS's other heroes... Thanks to all of them for making the trip to Indiana!

As a result of ILF, my determination to immerse myself in Puppet reached new heights. Earlier, I'd assumed it was only for people who administer servers by the dozens or hundreds, but now I see how much even my small server admin tasks need the sort of order and repeatability that Puppet can help give me.

I also got a chance to present on HTSQL, a new passion of mine since I first saw Clark Evan's submission for his PyCon talk. My slides are here (and, for once, they actually cover the bulk of what I said). However, you'll probably want to watch video from the PyCon talk by Clark Evans himself; he goes into much more detail than I did.

In a word, HTSQL is a gorgeously simple approach to database reporting. How gorgeous? Take a look at this dashboard. How simple? Go back to that dashboard and hit View Source. It's absurdly concise and comprehensible thanks to the HTRAF JavaScript library and HTSQL queries that fit comfortably and clearly into a URL, like this. Go spend some time playing with HTSQL if you care about data at all.

Tuesday, March 22, 2011

Indiana LinuxFest

First off, thanks to everybody who was at my Python intro at Millenicon. You can find the talk materials at http://catherinedevlin.pythoneers.com/ under "Talks".

Second, have you seen the great schedule at Indiana LinuxFest this weekend? They've really put together a lot of great material. (And I don't just say that because I'll be presenting on HTSQL.) I'm especially pleased that they have talks that will cater to my new love affair with Puppet. There's still time to sign up... hope to see you there!

Friday, March 18, 2011

Amateur Computer Programming

It's true: I missed PyCon 2011. Unbelievable. We have a sick horse who can't be left alone even for six hours, and our last-ditch scheme to get him to the OSU veterinary hospital, just before we were to leave for PyCon, foundered (ha!) with a broken-down truck and trailer on a small rural road. In the dark. And snow. Wasn't a good night. (Sigh) Next year, with feeling!

Anyway, onward. Tonight I'm on a panel at Millennicon (a straight-up SF con) titled "Amateur Computer Programming". I wanted to gather links to some short screencasts to give people an idea of some of the interesting tools out there.

resolverone.com lets you program spreadsheets
sikuli.org drives your whole system by a script
greasespot.net lets you reprogram somebody else's web app
seleniumhq.org, a web testing framework
renpy.org - easy visual novel authoring
pygame.org - game writing

Monday, February 21, 2011

mysql rocks

No, I don't mean the MySQL server, about which my feelings are still very mixed. I'm still very keenly looking forward to moving to PostgreSQL. I mean mysql, the command-line client.

It's awesome! All those features that I've been gradually adding to sqlpython - at the cost of slow execution, and with my personal late-night bad quality control - most of them have been in mysql for years. Flexible output formats, easy interaction with the OS environment, and so forth. A fat toolbox of goodies.

So, how about this, Oracle? Oracle SQL*Plus has been neglected and underpowered for ages. It can't hold a candle to the mysql client for features. Why not assign a few programmers to extend the mysql client so that it works with Oracle as well as MySQL? Why should the world have to deal with a different client program for every database back-end? I can really imagine mysql evolving into the One True SQL Client.

Finally, I have to praise the MySQL Cookbook from O'Reilly. For somebody moving to MySQL from another database, it's the only way to fly. Almost all MySQL books I've seen assume that it's your first database, and spend 90% of the text explaining SQL fundamentals. Based on my happy experience with the Python Cookbook, I thought the MySQL Cookbook would do better, and I was right. It cuts straight to the stuff that isn't obvious but is very, very useful.

(Hi. Yes, I've been quiet. Yes, I love the new job. Yes, it's a lot of work, particularly during this learning-the-ropes phase.)