Catherine: pyOraGeek

Wednesday, October 07, 2009

Stop! Drop that field!

For PyOhio registration, we used a nice service called eventbrite. It worked great, but I have one big problem with it: it collected way too much data from registrants. It got us the data we needed, but it also asked for home addresses, gender, job title, company... all data we had no legitimate need for or plans to use, probably just because the fields are in the eventbrite form template. Entering it was pointless nuisance for our attendees, and maybe some were actually put off by the length or intrusiveness of the registration form. (Dave Stanek, if you're reading this, let's see if we can change that for next year.)

We are so not the only offenders in this department. It's everywhere, it's endemic. At website after website, we're asked to provide information of no apparent relevance to the sites' purposes. It's so easy to throw field after field into a data collection form; templates are provided with every conceivable field already in place; and - well, why not? Isn't more data better?

No. No, it's not. Excess data takes time, clutters databases, obscures important data, increases risks of data leakage. In interpersonal interactions, we always have the option of asking "Why do you need to know that?", or just giving people that funny look that tells them they're going out of bounds. On paper forms, we can leave fields blank. Automated forms with field validation cut those safeguards off and open the door to compulsive collection syndrome. The one defense people do have against intrusive electronic forms - lying - ruins data quality, and false data is much worse than no data at all.

We need a ethos of restraint in data collection, of always asking, "Why am I collecting this field?" Data collection needs to be seen as something that is not pure good, but something that has a cost to weigh against the benefit. Not collecting data is often the responsible choice, and we need to teach each other that.

Monday, October 05, 2009

PyCon talk review

We've got a record number of volunteers working on PyCon's Program Committee - the group that reviews talk proposals and decides which ones go on the schedule. And it's a good thing, because we've also got a record number of proposals - 179! (For comparison, PyCon 2008 got 118.)

Right now, we're in the fun part - going through the proposed talks and yelling, "Oooh! Ooooh! I want that one!" Just looking through the proposed talks is a great Python education all by itself - you find out about useful packages and techniques you'd never known were out there.

The tough part comes later - when we have to winnow the list down. Without exception, there are talks I want to see that won't make the cut. Accepting all the good talks would be great, but we'd need a week of PyCon, and three days for the core conference are all we figure most attendees can spare. (My proposal for a round-the-clock talk schedule met only chuckles. Then again, with late-night Open Spaces, we already come dangerously close to a round-the-clock schedule...)

Tuesday, September 29, 2009

Ohio LinuxFest

Wooo, Ohio LinuxFest!

My reStructuredText slides are at catherinedevlin.pythoneers.com, down at the bottom of the page. Thanks to everybody who attended and gave great feedback!

I arrived Friday morning this time and spent a good chunk of the day at the Hackathon, working with Mark Borgerding on his idea for a new educational game for Childsplay. We made some good progress, especially since we were both 100% newbies to pygame! I enjoyed myself and learned a thing or two. Hopefully we'll be able to finish the game up remotely over the next several weeks.

Next came an impromptu lightning talk session (I looooove lightning talks) where I gave an extremely badly-organized (but well-received) glimpse at sqlpython.

I helped Todd Trichler from Oracle Technology Network with his demonstration of Oracle's free offerings. We had a good group, and the next morning Todd gave away his ENORMOUS box of Oracle software in about an hour.

The hallway track was, as usual, excellent. William McVey and Eric Floehr used the PyOhio table to stir up interest in CincyPy and Central Ohio Pythonistas, and Monday's inaugural COPy meeting had 27 attendees! Score! I enjoyed talking with people so much that I found myself on the verge of going hoarse just 90 minutes before my talk. Eek!

I had a great time. Congratulations and thank you to the OLF organizers and sponsors. Once a year, you make Ohio feel like anything but a technology backwater!

Sunday was the Diversity in Open Source workshop, which deserves its own post. To be continued...

Thursday, September 24, 2009

Python at Ohio LinuxFest

The Ohio LinuxFest fun starts tomorrow! Here are the Python-related activities there that I know of.

Of the Friday hackathon projects, I believe that schoolsplay and sendoff are Python-based.
http://www.ohiolinux.org/hackathon.html

Zenoss Community day on Friday (Zenoss is a Python product):
http://www.ohiolinux.org/zenoss.html

Python for Linux System Administration - Vern Ceder
10 AM Saturday
http://www.ohiolinux.org/talks.html#PYTHON

reStructuredText - Plain Text gets Superpowers - me
5 PM Saturday
http://www.ohiolinux.org/talks.html#TEXT

PyOhio booth - all day Saturday (though not always staffed). All Python groups should take advantage of it shamelessly - bring your literature! - and anybody who wants to hang around there and have Python-related conversations with people, that's fantastic.

And, of course, don't forget Oracle's event at OLF.

See you there!

Thursday, September 17, 2009

easy_install no longer easy on Vista

Since my Windows machine was upgraded from XP to Vista, managing Python packages has become absolutely horrible. Here's what I've puzzled out so far, with much wailing and gnashing of teeth:

1. No matter what rights your primary account has, you need to run easy_install from a Run as Administrator window - otherwise, easy_install runs in a separate window which pops up, flashes some feedback at you for a microsecond or so, then disappears, leaving you with absolutely no record of whether the install works and why. There doesn't seem to be any way to log the results to a file.

2. After installing any module that is deployed as an .egg into site-packages, you need to go and edit its permissions manually to give your account read privileges on the egg. (Giving your account privileges on the whole site-packages directory does not help.) Until you do, import newmodule will fail with ImportError: no module named newmodule on your account - but will succeed when run from a Run as Administrator window.

This is bad news. I fought my way through because I'm a dedicated Pythonista; how many Vista-using Py-curious are going to give up on Python because module installation now requires such hacks?

(Don't forget - Ohio LinuxFest registration ends tomorrow at noon! Move, move, move! You'll hurt my feelings if you don't go.)

Friday, August 28, 2009

Enthought's reStructuredText editor

Enthought has produced a wonderful tool for getting into reStructuredText: a side-by-side WYSIWYG rST editor.

Getting it installed, however, just about killed me. Here are the steps I finally puzzled out for Ubuntu 9.04. Miss any steps - or even change the order - and you'll get error messages that don't help even slightly.

sudo apt-get update sudo apt-get install python-setuptools python-vtk sudo easy_install -U numpy sudo easy_install -U docutils sphinx TraitsBackendQt[nonets] AppTools[nonets]

[EDIT] If you're not on 9.04, or you just want to be on the safe side, it doesn't hurt to sudo apt-get install python-dev python-qt4 at the beginning of the whole process.

The other odd thing is that this lovely editor apparently has no name, and certainly no handy start script or presence in any menu. Mine got installed at /usr/local/lib/python2.6/dist-packages/AppTools-3.3.0-py2.6.egg/enthought/rst/app.py; your best bet to find yours is probably sudo updatedb; locate -r rst/app.py$

Then, set up a bash script to make it usable. I'm calling it "rsted". sudo nano /usr/local/bin/rsted and fill it with:

#!/bin/bash python /usr/local/lib/python2.6/dist-packages/AppTools-3.3.0-py2.6.egg/enthought/rst/app.py $*

... sudo chmod +x /usr/local/bin/rsted, and live happily ever after.

(The $* in /usr/local/bin/rsted doesn't actually do anything - the editor doesn't seem to accept arguments like a filename - but I'm being hopeful for the future.)

Friday, August 14, 2009

PyCon 2010: Call for Proposals

In the opinion of most attendees I talked to, PyCon 2009 was the best one yet. If you need an excuse to come to PyCon 2010... well, what better excuse could there be than, "I'm speaking"?

Call for proposals — PyCon 2010 — http://us.pycon.org/2010/

Due date: October 1st, 2009

Want to showcase your skills as a Python Hacker? Want to have hundreds of people see your talk on the subject of your choice? Have some hot button issue you think the community needs to address, or have some package, code or project you simply love talking about? Want to launch your master plan to take over the world with python?

PyCon is your platform for getting the word out and teaching something new to hundreds of people, face to face.

Previous PyCon conferences have had a broad range of presentations, from reports on academic and commercial projects, tutorials on a broad range of subjects and case studies. All conference speakers are volunteers and come from a myriad of backgrounds. Some are new speakers, some are old speakers. Everyone is welcome so bring your passion and your code! We’re looking to you to help us top the previous years of success PyCon has had.

PyCon 2010 is looking for proposals to fill the formal presentation tracks. The PyCon conference days will be February 19-22, 2010 in Atlanta, Georgia, preceded by the tutorial days (February 17-18), and followed by four days of development sprints (February 22-25).

Online proposal submission is open now! Proposals will be accepted through October 1st, with acceptance notifications coming out on November 15th. For the detailed call for proposals, please see:

http://us.pycon.org/2010/conference/proposals/

For videos of talks from previous years – check out:

http://pycon.blip.tv

We look forward to seeing you in Atlanta!

Tuesday, August 11, 2009

BLOBs in sqlpython

Obviously, you can't query BLOBs in a command-line SQL tool.

Unless, of course, that tool is sqlpython. Bwa ha ha ha.

reStructuredText talk at OLF

It's official - I'm on the schedule!

reStructuredText: Plain Text Gets Superpowers
September 26, 2009, 5 - 6pm
at Ohio LinuxFest

Greater Columbus Convention Center

400 North High Street
Columbus, OH 43215 USA

Introduction to reStructuredText, a simple single-source format that can generate documents in HTML, PDF, .odt, and many other formats.

I also see tasty-looking talks on "Python for Linux System Administration", a "Sysadmins' Rosetta Stone" talk that should help me port my Ubuntu skills to Red Hat, and gobs more - plus the Diversity in Open Source workshop. This will be a great year!

Spend Like a Pirate Day

Why do we Americans continue to carry around drab $1 bills, struggling to cram mushy, wrinkled paper into vending machine readers, when we could carry gleaming, clinking, golden doubloons?

Admire the gleam and the weight. This is the proper sensory experience for money!

I just found out you can buy boxes of 250 coins directly from the mint. Granted, there's $5 shipping, so you're paying $255 to get $250, but your credit card kickback should cover that. Then you can eschew that lame ATM for months! Let's face it, you only use cash for little purchases these days anyway. Make every cash transaction enjoyable!

When you receive your coins, feel free to run your hands through them several times, purring, "Arrrrrr! Thar's treasure for ye, me mateys!" Do NOT, however, bury them in a sturdy wooden chest and draw a map with a dotted line and an X. I know it's tempting! I want to do it, too! But the point is to get more of these beauties into circulation.

Arrrrrr!

Monday, August 10, 2009

workaround: easy_install windows vanish

My employer-mandated Vista machine has gone almost unused for months because of an infuriating quirk in Vista's command-prompt operation. I've come as close to completely forgetting how to use Windows as I've ever been. Giles Thomas (of Resolver Systems) saved me.

The problem: Vista runs programs like easy_install in a new cmd window, separate from the one they were invoked in. The instant the program terminates, the new window is closed, and any messages it returned - like error messages - are lost. No, redirecting the messages with > and 2> does not work. "Why would you want to see error messages, anyway? They're so geeky and depressing!" *gum snap*

Thus, I was unable to install cx_Oracle, and had to turn to my trusty Ubuntu machine for absolutely every pyOraGeekish task.

Giles blogged a lifesaver workaround. If you can run the cmd window as Administrator in the first place, you are entrusted with the awesome power and responsibility of being allowed to view your own error messages.

So, now I can blow the dust off my Vista machine. Wow, controlling the font size of a cmd window is absolutely as primitive as it was in Windows 3.1. Giles, got a workaround for this, too?

Tuesday, August 04, 2009

Oracle at Ohio LinuxFest

It's not official yet - so you can't find it at the Ohio LinuxFest website - but it looks like Oracle will be a sponsor and exhibitor this year. They're planning to do an Oracle-on-Linux installfest. If you'd like to get your first taste of Oracle on Linux, sign up for LinuxFest (it's free) and prepare to have a blast.

If you're already pretty good at Oracle-on-Linux and would like to help others get started, send me email! I hope to gather a small group of volunteers to help out at the installfest.

bug reports from the public

From "Dr. Kronkheit and His Only Living Patient":

SMITH: Doctor, it hurts when I do this.
DALE: Don't do that.

Dear Enterprise Rent-A-Car,

That was called a "bug report". It was not actually a request for a condescending message about how I can still rent a car by navigating the website in a different way. I knew that.

I took the time to write up detailed instructions for reproducing the bug as a professional courtesy to your developers. Unfortunately, they will never see my message, since your customer service is managed strictly from a "deal with nuisance customers" point of view. Looks like I wasted a couple seconds of your time as well as several minutes of my own.

RESOLVED: If I ever work on a project large enough to have a customer service department separate from development, I will insist upon bridging this gap. I will make sure customer service has the access, knowledge, and encouragement to communicate constantly with me. I will regard customer service as a valuable conduit for end-user feedback rather than a distant, uninteresting group of non-colleagues.

logging is ugly

Instrumenting your code - whether with a PL/SQL package like Quest Error Manager as Steven Feuerstein suggests, Python's logging module, or whatever - is an important part of writing good code.

I don't. Not very often, anyway. When I do, I often delete the logging calls as soon as the code is more or less working. The biggest reason is the ugliness.


def days_ago (ndays):
    logging.info('days_ago called with arg %s' % str(ndays))
    logging.info('arg type %s' % type(ndays))
    try:
        ndays = int(ndays)
        logging.info('argument converted to integer %d' % ndays)
        current_date = datetime.datetime.now()
        logging.info('current date is %s' % str(current_date))
        result = current_date - datetime.timedelta(ndays)
    except ValueError, e:
        logging.error('Error converting ndays to integer:')
        logging.error(str(e))
        result = None
    logging.info('returning from days_ago: %s' % str(result))
    return result

EWWWWWW! That is ugly! It completely disrupts the comfortable reading of the code. It buries the actual purpose and actions of the function under a steaming heap of chatter. It offends everything that I value in beauty and readability. What to do?

One solution might be a code editor that would toggle the visibility of all logging calls in a program. You could leave them invisible most of the time, and only look at the logging statements when you have a specific reason to. I can see two problems with that, though.

The logging calls themselves could get out of synch with the functioning code. This could be partially addressed by having logging calls become visible automatically whenever code adjacent to them is changed.
This would create code which is readable and beautiful in my editor, but ugly when somebody else tries to read it. Perhaps if we cooked up a convention whereby a header comment could define the suggested hiding of logging calls for each program, and most editors could be trained to recognize and respect these suggestions?

I don't have the answer for this. I'd love to hear ideas.

EDIT: Gary Bernhardt tweets, "Anything that you want your editor to to be able to hide (comments, setters/getters, logging) shouldn't exist".

An interesting idea... how to eliminate logging? A good logging decorator could log arguments, return values, and error messages to any decorated function - and that level of information could suffice, if the functions are very fine-grained. Having to write fine-grained functions is the sort of constraint that might improve programming style, too, much the way unit testing demands well-defined functions. I think I'll try this philosophy and see if I can make it work.

It's not much help for the PL/SQL side of things, though. I'm trying to imagine if there's some way to make an analog to Python's decorators in PL/SQL.

Tuesday, July 28, 2009

Database comics

Clearly, it's somebody's responsibility to create a Hall of Fame for database-related comic strips. I may as well start gethering them here.

Fault-tolerance comic on key-value stores

What else is out there?

Sunday, July 26, 2009

now that's agile

My favorite anecdote from PyOhio 2009:

William McVey prepared slides for his Sunday PyOhio presentation using reStructuredText and rst2s5, but he wasn't satisfied with S5's presentation quality. He tried rst2odp to generate an OpenOffice Impress document instead, but it failed him.

So he convened a Saturday night sprint on rst2odp at PyOhio. Working past midnight, a small team fixed the rst2odp flaws. William regenerated his slides and presented successfully on Sunday.

Finally, in a mighty feat of recursion, he described the feat in a lightning talk Sunday evening, using slides generated by rst2odp, including a slide that contained the source code of the lightning talk he was giving, including the slide with the source code...

Oracle - Linux - Python tutorial slides (PyOhio)

Todd Trichler from OTN is beginning his half of our presentation while I post... my half's materials are here.

Thursday, July 23, 2009

doubleplus PyOhio


>>> 2009 > (2 * 2008)
True

PyOhio 2008 had 93 registrants. PyOhio 2009 has more than doubled that number already. Hooray!

Thursday, July 02, 2009

PyOhio registration, schedule, sponsors...

Things are really coming together!

PyOhio registration is open. 38 people have signed up in just over a day!
The talk schedule is up. 25 talks, my goodness.
Oracle Technology Network and Intellovations have stepped forward as sponsors.
Oracle is also sending Todd Trichler to cooperate with me on a two-hour tutorial on Oracle/Python/Linux. I'm enjoying the preparations so far - I think you'll like it if you have any interest in databases.

Wednesday, June 24, 2009

don't need no stinking rules engine

There's a whole class of programs called "rules engines". The idea is to remove the details of a process from the hard-code of the program, store them externally, and view/modify them easily. The engine then converts the rules, stored in some sort of custom format, back into an executable form at runtime.

In my experience, Python is an effective rule engine. Thanks to Python's readability, you can store business rules as snippets of Python code - in textfiles, a database table, or wherever you prefer - and business users should be able to read them comfortably. After that, a very lightweight Python program can load the rules and the relevant data and use exec() or eval() to apply the rules to it.

One of my main projects is an example of this. It's program that synchronizes data between two Oracle databases. That sounds easy, but business details complicate it enormously:

Table and column naming, structure, and normalization differ
Only some rows are transferred, according to a complex set of business rules
Only some columns are transferred. Column values are combined, split, truncated, have functions applied, etc. Again, governed by a jungle of business rules
The business rules change continuallyRules must be documented. Letting documentation get out of synch with implemented rules is very bad.
Users may demand explanations for each decision made by the program, down to the row and column level

My first take on the problem was a large hard-coded PL/SQL procedure. What a nightmare!

Later, I rewrote the rules as snippets of Python. Each rule is stored a database table along with the dates it takes effect and expires, the person authorizing the rule, and a justification. This readable, self-documenting set of rules can also answer questions like, "Why did things change since last month?".

Unfortunately, I didn't know much about object-relational mappers when I wrote it, so the program has clunky data-fetching code. I'm currently working on a third version of the program that uses SQLAlchemy; the resulting program is very short. Broadly, here's what it does:

- Queries the row-level and column-level rules from their respective tables

- Fetches a row from the local database (ours) and the corresponding row from the remote database (theirs)

- The heart of the engine:


data = {'ours': ours, 'theirs': theirs}

for row_rule in row_rules:

    if not eval(row_rule, data):

        return

for column_rule in column_rules:

    exec(column_rule, data)

Actually, the data dict also includes definitions of a few functions that some of the rules invoke. The function names are chosen to be self-explanatory to business users. For example:


def fiscalYear(inDate):

    if inDate.month > 9:

        result = inDate.year + 1

    else:

        result = inDate.year

    return result



data = {'ours': ours, 'theirs': theirs, 'fiscalYear': fiscalYear}

I suppose it wouldn't be too hard to put the function definitions themselves in among the rules, then include locals() in with data, as long as execution order is controlled (easily done by putting execution_order columns in the rules tables). It hasn't been necessary for my project.

- Now row_rules has eval()-able entries like::


  id  start   end     code                      authorized   reason

   1  1/1/06  7/1/09  ours.funded == "Y"        Bob          I said so

and column_rules has exec()-able entries like::


  id  start   end     code                           authorized   reason

   1  2/2/08          if (ours.value > 1000):        Steve        to annoy Bob

                          theirs.value = ours.value

- Add some logging and a "test-run" capacity which reports on the changes without actually performing them. (It uses sqlalchemy.orm.attributes.get_history() for this; make sure to set autoflush=False if you use this, or intermediate flushes might clear the history.)

I suppose I could try writing a sample rules-engine implementation in simple, general terms, that people could crib from for their own "rules engine" applications. I wonder if that would be helpful, or if just the general idea is enough guidance.