At Dayton Dynamic Languages last night, we watched a great video, "Git for Ages 4 and Up". ("WARNING: CHOKING HAZARD - Small Parts. Not For Children Under 4 Years.") We had a great time, and I learned a bunch of things that I literally put to use for work on the bus home that night.
So, what's the difference between watching a video at home and watching it at a user group meeting? The all-important pause key. Every couple minutes, one of us would say, "Wait, what?"; we'd pause and discuss. Discussion is always the best part of a user event; anytime I help organize an event where the attendees don't start talking to each other, I'm disappointed. DDL never has a problem with discussion, but the video acts as a great tool to help seed and drive the conversation.
Now I'm thinking that all sorts of groups may want to add an occasional Movie Night to their regular meeting schedule. There are plenty of great video sources like Python Miro Community, but watching with your friends adds enormously to the experience.
[Hi to all my friends at CodeMash! Sorry I'm not there... next year in Sanduskalem, right?]
Thursday, January 12, 2012
Thursday, October 27, 2011
MySQL Bizarro World

Getting used to MySQL has been a real challenge for me. Most everything I know about databases is backward in this MySQL world.
In the REAL WORLD,
table names are case-insensitive.
In MySQL WORLD,
table names are case-sensitive. Maybe. Depending on what platform you're running on.
in the REAL WORLD,
queries against VARCHAR data are case-sensitive.
in the MySQL WORLD,
queries against VARCHAR data are case-insensitive. (So is every other use of the data. Which can cause immeasurable pain w/r/t UNIQUE constraints if you weren't prepared for it.)
in the REAL WORLD,
database connections are expensive, and complex queries are optimized well. If you can connect to the database once and issue a single complex query, you'll get much better performance than if you make repeated connections and issue large numbers of simple queries.
in the MySQL WORLD,
database connections are cheap, and complex queries are optimized badly. If you make repeated connections and issue large numbers of simple queries, you'll get much better performance than if you connect to the database once and issue a single complex query.
I think the conclusion is obvious: MySQL was written by programmers from Superman's Bizarro World.
Incidentally, the Dayton Oracle User Group is planning a MySQL-themed meeting in the mid-term future. If you'd like to get involved - as an attendee or a speaker - let me know!
Hello!
Thursday, October 20, 2011
HTSQL answers
HTSQL slides
Thanks to the great audiences at my HTSQL talks at Ohio Linuxfest and Dayton IEEE! (And to the folks who will come see me at Columbus Code Camp on Saturday.) I've promised you answers to some of your questions that stumped me, and (for OLF people) been criminally slow at getting them to you. So here you are! Some of the answers come from my own research, but I've also inserted quotes directly from the creator of HTSQL, Clark Evans... the embedded quotes are from Clark.
1. How do you restrict access via HTSQL?
http://htsql.org/doc/install.html#security
First, consider carefully which database user account you use to run htsql-ctl serve, and assign only the rights that user (representing your HTSQL users) should legitimately have.
Second, you can (and probably should) close down port 8080 (or whatever port you're serving HTSQL on) on your machine's firewall, and route all traffic through a webserver like Apache. (My HTRAF directions tell how to do that.) Then you can apply whatever authentication, IP limits, etc. you need at the webserver level.
If you need multiple groups to access your data with varying levels of permission, it's easy to run multiple instances of HTSQL as multiple
database users, route those instances through Apache, and restrict them at the Apache level appropriately.
2. How can you paginate results?
You can request a "page" of results with HTSQL's limit() function. The optional second argument is an offset:
http://demo.htsql.org/course.limit(10)
http://demo.htsql.org/course.limit(10,10)
http://demo.htsql.org/course.limit(10,20)
There is a tweak.autolimit available to keep users from killing off their browsers with mistakenly broad queries. For example:
Users may also want to consider a browser like Chrome, where a runaway tab won't lock up the entire browser.
There's no way currently to have HTML results automatically insert "Next Page"-type links. Keep in mind that users aren't likely to genuinely want to page manually through very long result sets anyway; they'd probably be better off narrowing their queries rather than searching long lists by eyeball.
There are plans for HTRAF to generate automatically paginated tables at some point in the future.
3. Performance-wise, how does HTSQL respond under intense loads?
4. What about outer joins in HTSQL?
All HTSQL joins are LEFT OUTER joins - rows from the "driving" table are always included in the results, whether or not there are also records in the joined tables... it's a natural consequence of the driving table always determining the size of the result set. (If you specifically want to exclude rows that don't have counterparts in the joined tables, you can use ?exists(joined-table). More about that here.
Thanks to the great audiences at my HTSQL talks at Ohio Linuxfest and Dayton IEEE! (And to the folks who will come see me at Columbus Code Camp on Saturday.) I've promised you answers to some of your questions that stumped me, and (for OLF people) been criminally slow at getting them to you. So here you are! Some of the answers come from my own research, but I've also inserted quotes directly from the creator of HTSQL, Clark Evans... the embedded quotes are from Clark.
1. How do you restrict access via HTSQL?
http://htsql.org/doc/install.html#security
First, consider carefully which database user account you use to run htsql-ctl serve, and assign only the rights that user (representing your HTSQL users) should legitimately have.
Second, you can (and probably should) close down port 8080 (or whatever port you're serving HTSQL on) on your machine's firewall, and route all traffic through a webserver like Apache. (My HTRAF directions tell how to do that.) Then you can apply whatever authentication, IP limits, etc. you need at the webserver level.
If you need multiple groups to access your data with varying levels of permission, it's easy to run multiple instances of HTSQL as multiple
database users, route those instances through Apache, and restrict them at the Apache level appropriately.
This is a great answer. A few more items:
If the database is static (updated periodically), want to put varnish or something on the front. When you make another "data push" you could run though common queries to warm the cache. This is what we do for demo.htsql.org so that queries in our tutorial don't even hit the server.
For PostgreSQL, there is also a ``select timeout`` you can set using the "tweak.timeout" plugin, it can help a little bit with accidental denial of service. Basically, it cancels a query if your query runs over a particular number of seconds. If other databases have this ability, we could add a similar feature.
There is a also a ``autolimit`` that you can apply, this adds a LIMIT X to every query. In the current HTSQL implementation, all the results have to fit into memory: so, you can kill the backend process by creating a large result set. We'll fix this problem sometime next year... if you have a friendly audience, this generally isn't a problem. This plugin helps ensure users don't "accidently" create a big result though.
Even with these two enabled, you can still make queries that bring down either the HTSQL server (via memory exhaustion on big result) or the Database (via memory or cpu denial). So, some caution is advised if you give *direct* HTSQL access since you're letting arbitrary queries be created and such.
One solid way to handle this is separate the "trusted" users who need to create queries from "untrusted" users who are just running canned reports and dashboards. There's a "ssi" demo for doing this and we'll improve on it later. Basically, you have .htsql files server side with canned queries in them. You then limit users to only access .htsql saved queries. It'd be great to have this more automated... the demo code is just that: a proof of concept.
2. How can you paginate results?
You can request a "page" of results with HTSQL's limit() function. The optional second argument is an offset:
http://demo.htsql.org/course.limit(10)
http://demo.htsql.org/course.limit(10,10)
http://demo.htsql.org/course.limit(10,20)
There is a tweak.autolimit available to keep users from killing off their browsers with mistakenly broad queries. For example:
htsql-ctl -E tweak.autolimit:limit=10 serve pgsql://user:pwd@host/databaseUsers may also want to consider a browser like Chrome, where a runaway tab won't lock up the entire browser.
There's no way currently to have HTML results automatically insert "Next Page"-type links. Keep in mind that users aren't likely to genuinely want to page manually through very long result sets anyway; they'd probably be better off narrowing their queries rather than searching long lists by eyeball.
There are plans for HTRAF to generate automatically paginated tables at some point in the future.
3. Performance-wise, how does HTSQL respond under intense loads?
The best way to perform under intense loads is to not perform at all (see varnish above). The HTSQL server is also stateless: you can load balance as many copies as you need to meet demand.
As far as a single-process goes, the time spent converting HTSQL-to-SQL isn't really that significant compared to the query execution. The result handling isn't bad... although we've not done any load testing.
The SQL generation is probably the most important part. I think HTSQL generates quite clear/clean SQL, although your mileage may vary. To make any real judgments you need test queries and test data and profile it. Often times a 30 line SQL query that looks a bit ugly/repetitive will outperform a "hand-optimized" SQL query that is 12 lines.
We (Prometheus Research) [would] be delighted to help figure out why HTSQL performs badly if you have a specific query and test data set for us!
Anecdotally, for big (1-3 page) queries we have converted from SQL to HTSQL, the HTSQL equivalent is ~40% less source code and an order of magnitude more readable and maintainable. The SQL we then generate is typically bigger, sometimes 2x the size, but so far, the performance in the samples we have has been as-good-if-not-better than the original.
4. What about outer joins in HTSQL?
All HTSQL joins are LEFT OUTER joins - rows from the "driving" table are always included in the results, whether or not there are also records in the joined tables... it's a natural consequence of the driving table always determining the size of the result set. (If you specifically want to exclude rows that don't have counterparts in the joined tables, you can use ?exists(joined-table). More about that here.
Wednesday, October 19, 2011
Columbus Code Camp
I'll be speaking on HTSQL this Saturday at Columbus Code Camp. It was a great event last year, and I'm expecting even better this year. Check out the schedule - it's an excellent set of "I gotta see that!" topics.
Hope to see you there! (Unless you're at Southwest Ohio GiveCamp, of course - sorry I have to miss my GiveCamp friends this year.)
Hope to see you there! (Unless you're at Southwest Ohio GiveCamp, of course - sorry I have to miss my GiveCamp friends this year.)
Sunday, September 11, 2011
HTRAF setup, from zero
As I mentioned in my OLF talk, setting up HTSQL is crazy-easy.
HTRAF, however - the library that lets you do the gorgeous graphics - takes a few more steps. The documentation doesn't spell out the webserver-specific aspects of the setup, which may confuse you if you aren't an experienced webserver admin. So, here's my expanded version "getting started with HTRAF". My directions assume an Ubuntu machine.
1.
2. Start up HTSQL:
3. On your HTRAF machine, install a web server.
4. Get the HTRAF library. The simplest thing is to put it right under your Apache DocumentRoot:
5. You need to handle HTSQL requests via your Apache server. (If you try contacting the HTSQL server directly from your webpages, users' browsers are likely to block you, thinking that the site includes a cross-site scripting attack.)
So you'll need to change your Apache server configuration by adding ProxyPass and ProxyPassReverse directives. Apache configuration files are structured differently on different distributions; on mine, I used
6. Next, you need to enable mod_proxy on your Apache, so that it knows what to do with a ProxyPass.
7. Now restart Apache so that the new settings will take effect.
8. Test it out! Hit this with your web browser:
http://localhost:8080/htsql/a_table_from_your_database
9. Now write a webpage that includes HTRAF elements calling HTSQL! Here's a minimal example.
8. Hey, that table looks bland! If you preferred the colors I showed in my talk, you can use this CSS, which I copied from HTRAF's demo a few months ago. Save it to /var/www/HTRAF-2.0.0b1/htraf/htraf-02.css and change the stylesheet in your <head> to
HTRAF, however - the library that lets you do the gorgeous graphics - takes a few more steps. The documentation doesn't spell out the webserver-specific aspects of the setup, which may confuse you if you aren't an experienced webserver admin. So, here's my expanded version "getting started with HTRAF". My directions assume an Ubuntu machine.
1.
sudo easy_install htsqlIt doesn't have to be on the same machine you use for HTRAF, just a machine that your HTRAF machine can contact.
2. Start up HTSQL:
htsql-ctl serve postgres://username:password@hostname/databasename
3. On your HTRAF machine, install a web server.
sudo apt-get install apache2
4. Get the HTRAF library. The simplest thing is to put it right under your Apache DocumentRoot:
sudo su - www-data
cd /var/www/
wget http://htsql.org/download/HTRAF-latest.zip
unzip HTRAF-latest.zip
5. You need to handle HTSQL requests via your Apache server. (If you try contacting the HTSQL server directly from your webpages, users' browsers are likely to block you, thinking that the site includes a cross-site scripting attack.)
So you'll need to change your Apache server configuration by adding ProxyPass and ProxyPassReverse directives. Apache configuration files are structured differently on different distributions; on mine, I used
gksudo gedit /etc/apache2/sites-enabled/000-defaultto add
just within the <VirtualHost *:80> directive.
ProxyPass /htsql/ http://localhost:8080/
ProxyPassReverse /htsql/ http://localhost:8080/
6. Next, you need to enable mod_proxy on your Apache, so that it knows what to do with a ProxyPass.
cd /etc/apache2
sudo cp mods-available/proxy_*.* mods-enabled/
7. Now restart Apache so that the new settings will take effect.
sudo service apache2 restart
8. Test it out! Hit this with your web browser:
http://localhost:8080/htsql/a_table_from_your_database
9. Now write a webpage that includes HTRAF elements calling HTSQL! Here's a minimal example.
<html>
<head>
<script type="text/javascript"
src="HTRAF-2.0.0b1/htraf/htraf.js"
data-htsql-version="2"
data-htsql-prefix="/htsql">
</script>
<link rel="stylesheet" type="text/css"
href="HTRAF-2.0.0b1/htraf/htraf-02.css"/>
</head>
<body>
<select id="school"
data-htsql="/school{code, name}?exists(department)">
</select>
<h3>Departments</h3>
<table id="department" data-hide-column-0="yes"
data-htsql="/department{code, name,
count(course) :as '%23 of courses'}
?school_code=$school&name~$department_name"
data-ref="school department_name">
</table>
</body>
</html>
Save it as /var/www/minimal.html and view it at http://localhost/minimal.html.8. Hey, that table looks bland! If you preferred the colors I showed in my talk, you can use this CSS, which I copied from HTRAF's demo a few months ago. Save it to /var/www/HTRAF-2.0.0b1/htraf/htraf-02.css and change the stylesheet in your <head> to
<link rel="stylesheet" type="text/css"
href="HTRAF-2.0.0b1/htraf/htraf-02.css"/>
HTSQL slides posted
Thanks to my Ohio Linuxfest audience for your attention and interest! My slides from yesterday are posted:
Your Database, Exposed: HTSQL
Shortly, I'll also post a summary of the questions I was asked that I didn't have firm answers for.
As usual, I had great time at OLF. As usual, I brought my voice to its knees by talking to awesome people in the noisy exhibit hall before my talk... I'm going to ask for a morning speaking slot next time I speak! Thanks, everybody!
Your Database, Exposed: HTSQL
Shortly, I'll also post a summary of the questions I was asked that I didn't have firm answers for.
As usual, I had great time at OLF. As usual, I brought my voice to its knees by talking to awesome people in the noisy exhibit hall before my talk... I'm going to ask for a morning speaking slot next time I speak! Thanks, everybody!
Wednesday, September 07, 2011
Ohio LinuxFest
I feel bad for not blogging after PyOhio. I just have trouble finding words. Along with PyCon, it's a sort of family reunion for me.
But anyway - next conference: Ohio LinuxFest. This weekend, so sign up now now now move move move - I think today is the pre-reg deadline. It's always a great event, draws people from all over the East and sometimes further. Look for our PyOhio table to have a Python chat (or help staff the table, and introduce other attendees to Python joy). OLF is one of the best places in the region for midway mingling.
I'm speaking on HTSQL. I spoke on it at Indiana LinuxFest in the spring, too, only this time it follows several months of using it seriously at work. The experience has only made me more enthusiastic about HTSQL. Check it out... there's still time to be an early adopter and sneer at everybody else after it becomes famous.
See you in Columbus! Register now!
But anyway - next conference: Ohio LinuxFest. This weekend, so sign up now now now move move move - I think today is the pre-reg deadline. It's always a great event, draws people from all over the East and sometimes further. Look for our PyOhio table to have a Python chat (or help staff the table, and introduce other attendees to Python joy). OLF is one of the best places in the region for midway mingling.
I'm speaking on HTSQL. I spoke on it at Indiana LinuxFest in the spring, too, only this time it follows several months of using it seriously at work. The experience has only made me more enthusiastic about HTSQL. Check it out... there's still time to be an early adopter and sneer at everybody else after it becomes famous.
See you in Columbus! Register now!
Monday, August 08, 2011
MSSQL to CSV
Searching for ways to dump CSV from a MS SQL Server brings up recommendations to buy various third-party tools.
Excuse me? Buy tools for something any Pythonista can do in seventeen lines?
You'll need the pyodbc module, which you should have anyway because it rocks.
Go on, get more sophisticated with the hardcoded connect string, etc.
AND A NOTE: If the ultimate destination of your .csv is to be imported into another database, you'd better distinguish between empty strings and NULLS. To do that, replace
with
AND, if you end up importing this .csv into MySQL, you'll want to set ESCAPED BY '' in your LOAD DATA statement, or else backslashes will start mucking up your field boundaries. (Thanks to Shannon -jj Behrens for saving my sanity on that one). Here's my script to consume the files:
Excuse me? Buy tools for something any Pythonista can do in seventeen lines?
You'll need the pyodbc module, which you should have anyway because it rocks.
import csv
import pyodbc
cnxn = pyodbc.connect(
'''DRIVER={SQL Server};
SERVER=localhost;
DATABASE=mydb;
UID=myname;
PWD=mypwd''')
curs = cnxn.cursor()
def write_table(tblname):
with open(tblname+'.csv', 'wb') as outfile:
writer = csv.writer(outfile)
curs.execute('SELECT * FROM %s' % tblname)
writer.writerows(curs.fetchall())
curs.execute('SELECT name FROM sys.tables')
table_names = curs.fetchall()
for table_name in table_names:
write_table(table_name.name)
Go on, get more sophisticated with the hardcoded connect string, etc.
AND A NOTE: If the ultimate destination of your .csv is to be imported into another database, you'd better distinguish between empty strings and NULLS. To do that, replace
writer.writerows(curs.fetchall())
with
for row in curs:
writer.writerow(['NULL' if r is None else r
for r in row])
AND, if you end up importing this .csv into MySQL, you'll want to set ESCAPED BY '' in your LOAD DATA statement, or else backslashes will start mucking up your field boundaries. (Thanks to Shannon -jj Behrens for saving my sanity on that one). Here's my script to consume the files:
for fn in `ls *.csv`
do
tbl=${fn%.*}
mysql myinst -e "SET FOREIGN_KEY_CHECKS=0;
DELETE FROM $tbl;
LOAD DATA LOCAL INFILE '$fn'
INTO TABLE $tbl
COLUMNS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
ESCAPED BY '';"
done
Subscribe to:
Posts (Atom)

