Monday, April 23, 2007

good errors vs. bad errors

One of the things I love most about PostgreSQL is its beautiful error messages. For instance, let's say I've inexplicably forgotten to add my home state to a table of states, thus violating a foreign key constraint when I try to insert a Minnesotan city. Here it is in Oracle:

SQL> insert into city values ('International Falls','MN');
insert into city values ('International Falls','MN')
*
ERROR at line 1:
ORA-02291: integrity constraint (MYSCHEMA.FK_CITY_STATE) violated - parent key not found

OK, that doesn't suck. In this case, it's pretty easy to figure out the problem, especially since I gave my constraint a meaningful name. However, there are cases where the foreign key is not so obvious - maybe it involves multiple columns - and I'm sometimes sent off to look up details on what FK_CITY_STATE involves. Since DESCRIBE won't work on a constraint, that means either an intricate query joining several V$ tables, or a graphical tool like Toad or SQL Developer.

Now I'll make the same mistake in psql.

mydb=# insert into city values (2,'International Falls','MN');
ERROR: insert or update on table "city" violates foreign key constraint "fk_city_state"
DETAIL: Key (state)=(MN) is not present in table "state".

Now that's clarity! It's one of the things I love about PostgreSQL. After all, the database has all this detail easily accessible to it; why shouldn't it deliver it to me along with the error message? It's even more important when an error like this is thrown from somewhere deep inside some code, where it may not be obvious to me which statement or which row is responsible for it - having the offending value displayed is incredibly convenient. It's considerate programming - no more, no less.

I'll say this for Oracle, though - Oracle's standardized ABC-12345 error codes are really handy for looking them up, for searches on Metalink or Google.

In any case, either of these SQL environments have good error reporting. But what about Oracle Enterprise Manager?


Where am I supposed to go with this? Shall we go search on Metalink for "An error has occurred"? Does someone at Oracle imagine that this message is, in some mysterious way, useful? Maybe there's some logfile somewhere on the server that will tell me more... maybe... somewhere. Maybe. Somewhere.

Seriously, folks. OEM has been around for a long time now. It still throws lots of errors, often for no apparent fault of the DBA (I got this one simply by trying to access the "Administration" tab.) But I would forgive throwing lots of errors, if the errors were clear, and if they included information to facilitate finding out more. This, however, I cannot forgive, and it is all too typical of my experiences with OEM. (Well, a few years back, the errors tended to be screenfuls of Java error-stack vomit; now most of those seem to have been "cleaned up" into these poker-faced non-messages. That's progress? It's like Windows 3.1 all over again.)

So, if you see me rolling my eyes next time an Oracle rep shows a slide telling us how wonderful lovely OEM makes DBAs' jobs so much easier... this is why.

Friday, April 20, 2007

Hello Penguicon

I'm having a fantastic time at Penguicon so far...

except, of course, for being unable to connect my laptop to the projector for my talk. Aaaaaaaaaah!

If you were in my audience, 1) thank you for bearing with me, 2) I hope you managed to get something out of it, and 3) I'm so glad you came here!

The full tarball of talk material is available here. You can use it to look through the plain-text version of the talk (independently available here), or install vpython on your system and run the demos (I recommend running 'python solarSim4.py', then looking at the code), or actually install Bruce on your system so you can run 'socrates.py pyIntro.soc' and see the presentation the way you should have seen it at the con.

Thanks for being a great, involved - not to mention forgiving - audience!

Thursday, April 12, 2007

Excel reports from TurboGears

Last week, I had the chance to design, build, and deploy a TurboGears application to outside customers for the first time ever. It was awesome! It was a simple survey form - well, simple in concept, but various questions were dependent in various ways on the answers to other questions, so I did have to get into the JavaScript, praise be to MochiKit.

The application owner wanted to see his survey results in (hmm, can you guess?) MS Excel. "No problem", I said, and then was surprised at how much trouble I had figuring out how to do that straight from TG. AFAIK, TurboGears doesn't have a handy way to output CSV. I could certainly produce a webpage with CSV data, and even use the default method in my controller to serve it with a .CSV extension ("http://myserver/report.csv"), but it still had webpage-type headers and thus Excel still didn't know what to do with it. And no, I wasn't about to tell him to cut-and-paste.

What I eventually hit on was to provide the results as a simple HTML table, then take advantage of the fact that Excel can open a webpage that contains (only) a table rather nicely. Then I gave him a batchfile that simply said

"c:\Program Files\Microsoft Office\Office11\excel.exe" http://myserver/report

... double-clicking it gets a live view of the data.

If he'd been cooler, the batchfile could just as well have said
"c:\Program Files\OpenOffice.org 2.2\program\scalc.exe" http://myserver/report
- I tried that, too, it works.

Actually, I didn't serve it under http://myserver/report, I served it under https://myserver/reportWithHideousGUIDblahblah4242rtfm22hike. That was my quick-and-dirty way to provide some basic security - the report URL is unguessable, so unless he shares the link, it should be safe. On the other hand, anybody who can sniff his request can extract the magic URL and use it. Can anybody comment on how much of a risk that is? There's no real sensitivity to the data in this case, but it would be nice to know if this the-URL-is-the-password scheme is a worthwhile shortcut or is terribly dumb.

Tuesday, April 03, 2007

Presenting at Penguicon

Well, two weeks from Friday, I'll be presenting on Python in Michigan... wearing a Starfleet uniform, if the vendor area opens in time.

Yes, I'll be at Penguicon. It's an open-source convention - it's an SF convention - it's a dessert topping! Troy, MI, April 20-22. I've never been there before, and I expect to enjoy it. The schedule included two higher-level Python talks, so it seemed to me like somebody should introduce Python so newbies could get the background they need to absorb the higher-level stuff. Wish me luck!
Introduction to Python, April 20 7:30 - 8:30 PM, Catherine Devlin

Django the Python framework for rapid web app development, April 21 9:30 - 10:30 AM, Dan Scott

Patterns of Data Access implemented in SQLAlchemy, April 21 6:00 - 7:00 PM, Mark Ramm
I also hope to make it to
PostgreSQL & Explaining Explain, April 20 10:00 - 11:00 PM, Aaron Thul
and to sit on the Constructed Languages panel (I'm an Esperantist)... and not to sleep, apparently.

My presentation will be more or less a repeat of the one I gave at Dayton Perl Mongers - it went well there, and the planetary theme should be even more appropriate for Penguicon. In fact, now that I've got a good talk to give, I'm starting to look around for any old place, any old audience in the area I can introduce Python to. Suggestions are welcome! centralOH@python.org is gaining membership pretty nicely, but we're still going to need more to turn it into a really thriving in-person group. Hopefully I can lure a bunch of new people into the joys of Python.

Monday, April 02, 2007

Feeling the love from Oracle

The Oracle Technology Network has rolled out two new features deeply gratifying to the PyOraGeek...Thank you, OTN! This attention is important for more than just the benefits to those of us already hooked on Python. Oracle's interest carries a lot of weight; when Oracle embraced Linux several years ago, it was a big step forward in establishing Linux's credibility in the corporate world. I think we can expect a boost, too.

Wednesday, March 14, 2007

Intro to Python presentation

I've put the presentation for tonight's Intro to Python talk here.

It's not exactly the "slides" - it's a Bruce presentation. Bruce is great for live demos, especially for Python. Unfortunately, there's no "slide deck" that you can review statically - if you want to use it, you have to install Bruce and run it. The TAR I posted has a README with some basic instructions.

Since many conference organizers want a slide deck, one nice addition to Bruce would be a way to generate a slide deck. (No, I'm not volunteering... yet.) The other wart I ran into was the difficulty of changing the font sizes for the interpreter. Setting values in Socrates !CONFIG directives and config.py didn't work; I had to dig down into baseinterp.py and hard-code new font size values. That one, I probably could patch, if I get time.

Anyway, I did end up using Visual Python to do some fun solar system simulations. That should hold audience attention, I think...

Thursday, March 08, 2007

Dayton, meet Python. Python, Dayton.

I'm giving an introduction to Python at the next Dayton Perl Mongers meeting:
(Caution: the hCalendar creator didn't work for me with Opera or IE, only Firefox.)

If you're in the area, and still haven't learned Python (how have you put up with this blog?), please drop by!

I think that, to keep some visual interest up, I'll use Python's turtle module for some of my demonstrations. Not exactly snooty enterprise stuff, but turtle graphics are cute. Either that, or possibly vPython, which is certainly more impressive - but a bit more complex and less intuitive, possibly wrong for an introduction.

(I haven't forgotten my promise to post about PyCon. I'm waiting for presentation links to be published so I can refer to them.)

Monday, March 05, 2007

Geek Event Aggregator design notes

I finally created a central page for the Geek Event Aggregator - mostly it's design notes, but it's also a home less ephemeral than a bunch of blog posts:

http://catherine.devlin.googlepages.com/geekEventAggregator.html

Monday, February 26, 2007

Thanks to Google

... the Geek Event Aggregator is working again. Shortly after my post about the upload to gCal not working, I got an email from Ryan Boyd, of Google's Calendar team:
Someone brought the following post to my attention and I thought I'd follow up and see if I can be of any assistance... The problem that you are experiencing (500 error: Cannot access the calendar you requested) is due to the large size of (number of events in) the calendar.
Wow! Thanks to whoever passed my whining on to him... let me never speak ill of this social web stuff!

With that information, I was able to get it working again, so go ahead and check out that APL conference near your uncle in Kalamazoo.

Better yet, I was able to give my Lightning Talk on it, and it was very well received. The Lightning Talks are so popular that PyCon doesn't even schedule anything opposite them anymore, so basically the entire conference body of 593 people (!!!) was there; I've never spoken before so many in my life. I'm glad that didn't occur to me until afterward.

In the talk, I briefly outlined the architecture. At some point I'll post about it here. The basic principle is: HTML interpretation is hard, let's go datetutils.parser.parse()ing!

Yeah, PyCon! I just got back, and I've got so much to say; expect a flurry of blog posts as soon as I can get to it. I'm as happy as a clam. As several clams! As several unusually happy clams!

Thursday, February 15, 2007

Geek Event Aggravator

I swear to you, the code that gathers data for the Geek Event Aggregator is running great.  It's harvesting over 5,000 events, with ever-improving completeness and accuracy.  

Unfortunately... the code that ships the harvested data up to Google Calendar - which worked perfectly at first - is now falling flat, with gCal returning 
"Error: Cannot access the calendar you requested." 
when I try to ship the new data up, or to clear out the old data - thus the redundant entries.  (No, it's not a simple password problem, I did check that...) 

I suspect that I triggered something at gCal that said, "Hey, there's way too much activity on this calendar, it must be some evil bot, lock it down."
  
I guess this is the problem with mashups.  It's great to springboard off the work of total strangers - but you're at the mercy of a black box.  Maybe I can buttonhole somebody from Google at PyCon next week (holy cow!  PyCon next week!) and ask for a contact somewhere in the bowels of the gCal team who might be willing to help me troubleshoot.  The sad thing is, this pretty much sinks my plan to do a Lightning Talk on it.

So anyway, my apologies if you've been trying to use the Aggregator and have found only stale and redundant information.  My box is overladen with fresh, hot event data; I just have to figure out a way to get it out to you.  I tried Swivel, but can't seem to get the data there, either.  And the old-fashioned solution - the app running in the Oracle Application Express sandbox - can only receive data by manual uploads, and then only in small chunks; I would have to do about fifty mouse clicks every time I wanted to refresh the Aggregator's data, and I'm sorry, but I'm not a clicker.

The ultimate solution, of course, is to build my own app on my own web server, or on a web host with permissive policies on installing software.  I don't have those resources right now, unfortunately.  (sigh)

[I apologize; Blogger has gone loony on the line breaks for this post, and I can't seem to fix it. Grumble. All my web resources are letting me down today.]

Sunday, February 11, 2007

Presenting at Dayton-Cincinnati Code Camp

The following topics you submitted have been selected to be included in this year’s Code Camp:

· Data in Python, from ASCII to ZODB

Yay! Be in West Chester, OH (i.e., Northern Cincinnati, basically) on March 24 for a fun, free day of Code Camp. Most of the content is .NET, but they're very gracious about encouraging other content, too.

My talk is going to be a broad survey of ways you can get data of various sorts into and out of Python programs - not just RDBMS, but everything from the simplest to the most elaborate of data infrastructures. It's a lot to take on in an hour, but I think it'll be good for raising awareness of the breadth of possibilities - even for non-Python programmers (you poor things). I know I've learned quite a bit already just outlining the talk.

Thursday, February 08, 2007

held for ransom by Vista

I attended an MSDN event here in Dayton last fall, where I was given a beta CD of Vista. I finally got around to putting it on my Windows partition last month. The install went pretty smoothly - it had an option to upgrade the XP partition, preserving the applications and data already present.

I didn't really play with it long enough to form a firm opinion on Vista, but the experiment ended rather badly. Vista popped up a window telling me that the trial period had expired, and it was time to pay up. I had to go to work at that point, so I shut down the machine, brought it into the office, and booted it up again. Then, the "trial period expired" window and a browser window - supplied so that I could buy a license at the MS site - were the only functionality that Vista would enable.
I couldn't go to my desktop, Windows Explorer, a command prompt, or anything that would enable me to access my hard drive. Files that I had created under my paid-for XP were now inaccessible to me; no way was provided to copy them off before reinstalling XP. My hard disk was, in essence, being held hostage; paying for a Vista licence was the ransom.

I tried to reinstall XP from its CD while preserving the data on the partition, but I got some frightening error messages in the process and aborted before risking the whole partition.

This could have been a serious crisis, but for two things -
  • I do all my real work on the Linux partition, and didn't really have anything to preserve from the Windows side. I just wanted to take a look around to verify that.
  • The Windows NTFS partition was perfectly accessible from Ubuntu, so I copied what I needed that way - didn't even need a thumb drive, just dragged icons onto the Ubuntu desktop. Hooray! I hadn't realized that read-only access to NTFS was so foolproof from Linux.
But what if I hadn't been a Linux freak? This could have been a real sackcloth-and-ashes event.

[If it hadn't been a dual-boot machine, I think I could have accomplished the same thing with an Ubuntu install CD, booting from it and copying the files to a USB drive.]

I don't think MS planned it this way. They just failed to plan a soft landing for their software's trial period ending. I'll remember that next time I play with a demo.

Wednesday, February 07, 2007

Microsoft Expression Web

Let me say two things in praise of Microsoft today:

1. They sponsored CodeMash.

2. I really, really like their new HTML editor, Expression Web. I like the split-screen, with the cursor that simultaneously shows each element in the WYSIWYG and HTML panes; I like the ease of generating in-line styles and moving them easily and cleanly to a separate CSS, I *love* the clean, uncluttered HTML it generates (*so* unlike FrontPage, ptui)

And, to my amazement, it's not Microsofty at all. It doesn't generate IIS-dependent code, you can preview the rendering in Firefox, etc. as easily as in IE. It's... respectful!

I'm officially asking my boss to buy us a copy. The one problem is... I prefer to work on Linux. Yes, I have a dual-boot machine, but going back and forth is a pain. So - can anybody recommend their favorite Linux-compatible HTML editors? Has anybody else tried Expression Web? Can you speak to how its features compare? Honestly, I've pretty much stuck with a text editor all this time, and I don't know if Expression Web's features are unique or routine.

Monday, January 29, 2007

centralOH@python.org

There's a new mailing list, centralOH@python.org, for Python users in and around Central Ohio. If this describes you, please join us!

http://mail.python.org/mailman/listinfo/centraloh


The Cleveland area already has a wonderful Python users' group - clepy. Those of us too far from Cleveland have been out of luck, though. We hope to use the new mailing list to gather together a bunch of Python users from the area, then begin to plan some live events. Eventually, perhaps we'll have a full-fledged, regularly meeting user group. First, though, we need to contact each other!

So, please, if you know anybody within shouting distance of central Ohio, let them know about centralOH@python.org !

Friday, January 26, 2007

Pardon, your template is showing.


Forgetting to apply data to your template is a pretty understandable mistake. But why are quantities like $baseCommittmentNumber queried live, while "6th movie" and "7th movie" are hard-coded? I hope they weren't just too lazy to generate ordinal forms of $baseCommittmentNumber+1 and +2 on the fly!

Incidentally, when all is said and done, the Columbia House deal works out to a minimum of ($0.50 X 5) + ($15 X 1) + ($20 X 3) + ($2 X 9) = $93.50 for 9 DVDs.

Thursday, January 25, 2007

photos on business cards

Joe Wirtley (a CodeMash attendee) blogged an appeal for folks to include a photo of themselves on their blogs. I think it's a great idea - this weird bloggy world of electronic chatter is the most useful when we keep it linked to the world of flesh-and-blood. It's a shame to think that I've probably walked right past people at conferences whose writings I knew well.

I'm thinking of taking the appeal a step further. I love being able to recognize people and call them by name. Unfortunately, I'm a nerd, and I'm not very good at it. I struggle to find techniques to empower my socially deficient brain. Often, after I get someone's business card, I rush away to jot notes on the back - notes about what they're involved in and what we talked about, but also notes on their appearance. This is a desperate attempt to enable myself to recognize them the next time we meet, probably months or years in the future. The trouble is, a few phrases like "tall; curly hair" or "beard, glasses" really aren't enough.

I fantasized today about buying a cameraphone, snapping a photo of each person I meet, then finding a tag-based filing tool so that I could store the photos with the people's names and keywords about where I'm likely to see them again. Then, before arriving at an event, I could review the photos of people I'd be likely to see there; when I arrived, I'd call out to them by name and feel so cool.

Now, just imagine if somebody gave me a business card with their picture on it. Wow! I'd be so grateful! It would short-circuit this entire dilemma!

It's almost time for me to get new cards printed, and I could lead this incredibly useful trend by putting my own photo on them. Except... I'm nervous that it will come across funny, like I'm a wannabe model or something. Traditionally, geeks are supposed to imagine themselves as not really even having faces, as just being disembodied intelligences free-floating in the ether. The thing is, that's really not a useful myth. Talking to each other - face-to-face when we can - has very real value.

Maybe I should go ahead and do it anyway, and not worry about what people think. Maybe just being "the weirdo who puts her picture on her business cards" isn't so bad - for networking purposes, being weird isn't as bad as being forgotten. It's not "goofiness", it's "branding", right?

Hmm. Opinions? Would anybody join me on this?

Monday, January 22, 2007

CodeMash summary

Wow, CodeMash was FUN! It was chock-full of the kind of inquisitive, energetic people who make you glad to be a geek, both in the formal sessions and the informal sociable stuff.

I'll give just a few highlights, hopefully ones that haven't already been blogged to death by other CodeMash bloggers...

Even for someone who already uses TurboGears a bit, Kevin Dangoor's TG talk was great. Almost too high-energy to follow, but I got great ideas like using an in-memory SQLite back end for development testing (thus not worrying about repairing the database after testing);

It was nice to have Kevin's co-author Mark Ramm there, too. Questioning the authors beats leafing through a book any day... I just ordered my copy of the TG book for $26.99 from Nerdbooks. Amazon can slightly beat that price ($25.64), but at Nerdbooks I could toss in mildly outdated books on Unix and DB2 for $2-3 each.

David Stanek's talk on "Python Web Services" was actually more important, I thought, for the language-independent description of the relative merits of REST, XML-RPC, and SOAP-based services than it was for the Python-specific content. If you were off at some .NET talk instead, and you've always been a little fuzzy about how those work and relate to one another, go read his slide deck.

As a data geek, I really perked up at Scott Guthrie's keynote on LINQ. It's a new language component that basically extends the SQL idea (describe what data you want, let the computer worry about how to get it) beyond just RDBMS and into flat files, XML, web services... wherever the data may be squirreled away, it'll be a way to go get it with a single consistent syntax. Cool! I'll be eagerly looking forward to IronPython implementations of LINQ. So, Microsoft - thank you for LINQ, thank you for being one of the CodeMash sponsors, and I forgive you for Vista overwriting the GRUB boot loader on my dual-boot machine this morning (but only because I found a tutorial on how to repair it).

James Ward demonstrated ooooh-aaaah stuff you can do with Flex, which wasn't surprising. More importantly, I learned that the Flex/Flash world has really opened up ("opened" as in "open source"). When I looked into it a few years ago, it seemed that you needed a $mucho-K Flash server to even dip in. That's not the way I do things; I build a minor app or two - something that will really be used, not a development-only toy - before I know whether I really want to run with a technology. Now, most of the stack is open-source and free; only a couple optional (admittedly very convenient) components are commercially licensed. Adobe's new mostly-open-source business model for Flex permits my kind of experimentation; good for them!

Kevin Dahlhausen's talk on Test-Driven Development filled an embarrassing gap in my understanding of TDD - I'd heard of mock objects, but had never really figured out what role they play in TDD.

I took part in two Open Space discussions - one on women in IT, one on social networking for nerds - that were important enough that I think I'll give them their own blog posts in the next few days.

I heard Martin Fowler's name come up so many times that I finally screamed (internally), "Okay OKAY I'll read his stuff, already"!

In a week or so, I'll update this post to include hotlinks to the talk slide decks; most of them aren't posted yet.

Anyway... so much glorious fun. With CodeMash in January and PyCon in February, who dares call winter a blah time?

Thursday, January 18, 2007

CodeMash

I'm here at CodeMash in Sandusky! I'd thought about blogging the whole conference real-time, but I think I'll only be doing a little of that... my laptop is fearfully heavy, and dragging it around all day would seriously inhibit my enjoyment of the Hallway Track (the stuff you learn chatting informally with people between sessions).

Anyway, so far, CodeMash is AWESOME. Very professionally run, and with excellent content. It began last night with a great expert panel discussion on languages - the sort of experts who seem to know a dozen different languages in more depth than you know your very favorite. It feels so good to have something like this here in Ohio.

A couple notes from the language panel...
  • As a Pythonista, I was very gratified that the whole panel had a lot of knowledge of and respect for Python; it seemed to be mentioned more than any other single language!
  • Tha panel took up the question of "which language should new computer science students learn?" The consensus was that sticking to any one language is a real mistake. Every language carries its own limitations and patterns of thought; learning new languages broadens the ways you have to think about a problem. Teaching students only one language risks hard-coding the assumptions and limitations of that language right into their brains. The rule of thumb "learn a new language every year" came up.
  • As for specific languages, Python was the first reccommended... and assembly ended up also very highly reccommended.
Anyway, it's going to be a great conference. I'll try to update tonight. There will be a lot of tough decisions about which talks to go to, and I may decide against some of the Python talks - partly because I might be already beyond some of them in skill, and partially to honor the cross-language spirit of CodeMash.

Friday, January 12, 2007

PyCon and CodeMash

I'll be at PyCon, Feb. 23-25 in Dallas!


I'll also be at CodeMash in Sandusky!

CodeMash – I'll be there!

I've mentioned both these conferences, but this is a good time to remind you of both of them, because the early-bird PyCon registration is almost upon us (Jan. 15) and CodeMash itself is almost upon us.

After two conferences in two months, my employer will probably manacle me to the server for the rest of the year, but it'll be worth it. I'd been coy up until now as to whether I'd actually make it to these events; my personal schedule has been wildly chaotic. But now I've got reservations in hand, so I'll see you there!

Wednesday, January 10, 2007

Geek Event Aggregator is back

Re-introducing the Geek Event Aggregator - the world's most comprehensive and least accurate compilation of technology events! The Aggregator regularly scans a huge variety of event announcement websites and gathers the information into a single location. Now, when the VIC-20 Users' Group schedules a meeting down the street from your in-laws in Pittsburgh, you'll know it.

(Yes, I've introduced this before, almost a year ago. But a LOT has changed, and the accuracy is much better now.)

The Aggregator uploads its data into a Google Calendar account - the most convenient way to access it is as a Google Calendar user (which I really do recommend.) Here's what you do.
  1. Log into your Google Calendar account and choose "Manage Calendars".

  2. Click "add calendar"

  3. Search for "geek event aggregator", then select "Add Calendar".

  4. Return to your calendar home. Oops, your calendar now looks dense with events. That's because you're seeing an unfiltered list of all events, everywhere. Better un-check "geekEventAggregator".

  5. Now you can make real use of the Aggregator - search for the city, state, or region of interest. The Aggregator attaches several keywords to each event based on its location.

  6. Don't take anything the Aggregator reports at face value! Double-check it all. Remember, it's just a poor computer trying to read webpages that were written for humans. It doesn't even attempt to find the time of day, it just declares that everything runs 9-5 Eastern time (who cares if it's in Johannesburg?) I expect to keep gradually improving the accuracy as time goes by.
If you don't have and don't want a Google Calendar account, you can also access it through a static HTML page, an ICS document, or an XML feed. In that case, you don't get Google's handy Search button, and you'll need to use your own patience/creativity to find events in the places you want.

The Geek Event Aggregator is written in Python (of course!) Behind the scenes, it uses TurboGears and PostgreSQL. (No, you don't see a TG-based web application; the web app generates an annotated version of each scanned webpage, which I can use to troubleshoot the scan; thus, the improved accuracy of this second-generation Aggregator.)

I expect to present a Lightning Talk on it at PyCon 2007 (you are going to PyCon, aren't you?)

Thursday, December 07, 2006

Swivel

Darn. I didn't think of Swivel. Then again, you didn't think of it, either.

Swivel is described as "YouTube for data". Not just for uploading and viewing data... but also for madly cross-tabulating. Wow!

As a database professional, I'm kind of embarrassed that two physicists came up with it. But I'm the spouse of a physicist... so I'm kind of proud, too.

Anyway, I was interested in seeing what sort of data was in there on gender issues... so I searched Swivel on "gender" and got... no hits. Wow. I decided to look for some data worth uploading - say, that frightening "Balancing the Equation" study showing a steady drop of women working in information technology. Well, that study appears to be in send-us-cash-and-we'll-mail-it-to-you form; in fact, I don't see a lot of raw, upload-worthy data out on the web at all. Hmm. Any ideas?

There's got to be some mailing list of Swivel contributors out there somewhere; I guess I should find it and ask them where they find their raw data.

I'm dying to know what RDBMS they run on...

Tuesday, December 05, 2006

PyCon 2007

PyCon is coming! February 23-25, 2007 in Dallas!

I had the unexpected honor this year of serving on the Program Committee - the people who read the submitted abstracts and argue about which talks to include. The decisions were not easy! We had more excellent proposals than we could fit into the schedule... I wonder if we'll discuss scheduling four days for PyCon 2008. I think we've outgrown three days! We do have plenty of time set aside for Open Space talks and Lightning Talks, however, and I hope the presenters who didn't get into the regular schedule will show their stuff there.

Just reading the proposals was a great educational experience - I picked up news and ideas that have helped me quite a bit already. The actual conference is going to be incredible - though it will probably include plenty of those "Noooo! I want to be in all three seminars at once!" moments.

The hotel's conference rate is actually a good deal for the quality of the hotel - I know that's not always the case at conferences - but the number of rooms available at that rate is pretty limited, so consider getting your reservation right away. With so much good stuff going on from morning till night, it's really nice to "commute" via the elevator.

Monday, December 04, 2006

I'm all grown up!

I'm finally a serious open-source participant!

... by which I mean...
  • I was using a software package (SQLAlchemy),
  • found a place where it didn't meet my needs,
  • submitted an enhancement, together with test case,
  • and had the enhancement taken up into the project trunk.
I've got a second (less trivial) enhancement submitted but not yet incorporated, too. If you're using SQLAlchemy and you need to use reflection on tables that you're reaching through a database link, you'll need to do it via synonyms and grab code from this ticket. I hope it'll get into the trunk eventually, but it's a low-priority specialty need. (Unless, of course, you need it, in which case there's nothing low-priority about it!)

I guess I've been through the process before with SQLPython, via private email with its creator... it just felt so formal this time, with a Bug Tracker and Ticket Numbers and everything.

Anyway, the whole process takes a little puzzling out... but wow, it's FUN!

Tuesday, November 14, 2006

An agile January to you

There's going to be a fantastic week of agile programming activities in this area this coming January!

Begin with The TurboGears Jam in Ann Arbor, MI, Jan. 14 - 16.

On the 17th, drive two hours southeast to Sandusky, OH.

Then, Jan. 18 - 19, attend CodeMash at Kalahari Resort.

I don't know if they planned these events to dovetail so well, but Bruce Eckel is deeply involved in both, so perhaps they did.

Girls in IT

This Friday, there's a conference at Sinclair Community College here in Dayton (and other sites in Ohio) called We Are IT Day, designed to encourage high-school girls' participation in information technology.

We Are IT Day website

They're still accepting "lunch buddies" - technology professional women who can chat with a small group of girls over lunch.

And, if you know a girl here in Dayton who should be there, try to make sure she can go!

(Sorry I didn't publicize this earlier - I hadn't found a website for it.)

Tuesday, November 07, 2006

IronPython and Oracle

In my post on "Oracle-free Oracle access", I speculated that using IronPython with Oracle's .NET tools might be an effective way to access the database, but I'd never actually tried it.

Since then, I've been contacted by two people who've not only done it successfully, they've written up nice descriptions with code.
I recommend both blogs - they're in my feed reader.

If you can't read Bernd's post, here's my translation of the text that precedes his script:
Here follows the code for a simple IronPython program which enables the interactive input of SQL statements and output from an Oracle database. I use the ODP, which apparently is included with Oracle 10g XE, because I have not installed it separately. In addition, I use the well-known HR demo schema.

After startup, an example session might look like this:

Monday, November 06, 2006

Ubuntu: Linux for Cats



Our kitty knows his Linux. Posted by Picasa

Sunday, November 05, 2006

TurboGears and Oracle

This weekend saw the fulfillment of a lifelong dream - I got TurboGears working against an Oracle database!

For general information, I recommend the ToDo list tutorial and Splee's post on SQLAlchemy/TG. But there are some particulars you'll need to know to work with Oracle... so here's a super-basic example to demonstrate.
  1. After installing TurboGears, run at the command prompt:
    tg-admin quickstart --sqlalchemy
  2. In dev.cfg, replace
    sqlalchemy.dburi="sqlite:///devdata.sqlite"
    with
    sqlalchemy.dburi="oracle://scott:tiger@orcl"

    [EDIT April 25, 2007:] Unless you've specifically configured your Oracle database to support Unicode (and maybe even if you have - I'm still fuzzy on this part), you'll also need to set
    sqlalchemy.convert_unicode=True

    If you decide to leave it out, then start getting
    SQLError: (NotSupportedError) Variable_TypeByValue(): unhandled data
    type unicode
    you'll know you needed this parameter set.
  3. Add the following to model.py:
    from turbogears.database import bind_meta_data
    bind_meta_data()
    from sqlalchemy.ext.assignmapper import assign_mapper
    emp_table = Table("emp", metadata, autoload=True)
    class Emp(object):
    pass
    assign_mapper(session.context, Emp, emp_table)
  4. To controllers.py, add

    import model
    then add to the Root class

    @expose(template="myProjectName.templates.emps")
    def emps(self):
    emps_list = model.Emp.select()
    return dict(emps=emps_list)
  5. Copy templates/welcome.kid to templates/emps.kid, and replace the document body with

    <ul>
    <li py:for="emp in emps">
    ${emp.ename} : ${emp.job}
    </li>
    </ul>
  6. From the command prompt, run
    python start-myProjectName.py
  7. Point a browser at http://localhost:8080/emps
There - you're listing data from SCOTT.EMP!

You can, of course, manually define the columns of your tables; using autoload is simply more convenient and error-proof. It'll only work against tables that have a primary key, though. If you don't use autoload, you don't need to call bind_meta_data in model.py.

TurboGears has recently added SQLAlchemy, as an alternative to SQLObject, for its database-access layer. I don't know much about their relative merits, but it seems like SQLAlchemy may be more friendly to a database-centered (as opposed to object-programming-centered) point of view. In any case, SQLAlchemy has Oracle support, whereas SQLObject's Oracle support still hasn't been integrated into the main codebase. Thus, I'm using the SQLAlchemy flavor of TurboGears.

Tuesday, October 31, 2006

Oracle Net unaccountability

I have a complaint. After seven years of Oracle experience, ORA-12154: TNS:could not resolve the connect identifier specified ought to be in my past.

I installed a standard Oracle 10.2 client on a fresh, new machine, only to find that the sqlplus.exe in {ORACLE_HOME}/bin was not looking in {ORACLE_HOME}/network/admin for its TNSNAMES.ORA. I don't know where it was looking, or why. I searched for any of those annoying stray TNSNAMES.ORA files, and there weren't any, but that doesn't mean that Oracle Net wasn't looking for TNSNAMES in all the wrong places. I eventually gave up and set the TNS_ADMIN environment variable, but that was an unsatisfying brute-force solution; I want to know why SQLPLUS wasn't looking in the standard place in its own home for TNSNAMES.ORA, but apparently I'll never know.

The problem is that Oracle Net gives you no feedback about what went wrong when something goes wrong. Did it find a TNSNAMES.ORA, but hit a syntax error in it? Did it find a TNSNAMES.ORA, but not the one you expected it to? Did it not find a TNSNAMES.ORA at all? Well, you'll just have to guess. Yes, you can trace Oracle Net; you need to insert directives like TRACE_LEVEL_CLIENT=user and TRACE_DIRECTORY_CLIENT into SQLNET.ORA. Ah, but which SQLNET.ORA? Well, that's the problem - if Oracle Net isn't looking where you expect it to for TNSNAMES.ORA, it won't see your SQLNET.ORA either.

It was a big improvement when TNSPING was upgraded to report on which SQLNET.ORA it was using. (And, indeed, in my case, TNSPING reported that it was using {ORACLE_HOME}/network/admin/sqlnet.ora, as you would expect, which is probably why TNSPING could resolve my service names just fine.) We really need a similar improvement in Oracle Net in general - some troubleshooting information that's transmitted in the error message every time TNS resolution fails.

Have I simply missed the memo on some good way to troubleshoot these problems? If you know of one, please let me know.

While researching it, I did find a pretty nice resource - ora-code.com. Their ORA-12154 page is a more concise and relevant checklist than anything I know of on MetaLink. Unfortunately, they lack a search box, so the best way to find their pages is simply to Google for "ora-code 12154".

There. Now if Oracle magazine ever asks me "what one improvement I'd like to see in Oracle", I have my answer ready. That and PL/Python, of course.

Sunday, October 22, 2006

resetPwd.py

I just posted a little script at http://sqlwrappy.sourceforge.net/resetPwd.py for password changing; a full description is at http://sqlwrappy.sourceforge.net/resetPwd.html. You may like to use it / borrow from it if
  • You need to provide for nontechnical users who must field "Can you reset my password?" requests (the designed purpose)
  • You want a relatively robust command-line way to collect Oracle account login information
  • You want to crib code for a command-line pick-from-this-list loop
I packaged it into the 0.1.2 release of sqlWrap.py, too.

Friday, October 20, 2006

The destructive power of stereotypes

I hope that this study receives all the attention it deserves.

"Women exposed to bogus scientific theories linking their gender to poor math skills did worse on arithmetic tests than others..."

This is why people who care get so upset about "harmless personal opinions" about women being inherently worse at acience/math. They're not harmless. They push women away from technical excellence. Yes, that bothers me, and no passive-aggressive whining about "political correctness" will make me accept it in demure silence.

It's especially remarkable that the study showed an easily measurable effect of just one claim of male superiority. Now imagine the cumulative effect of hearing such claims, again and again, over an entire lifetime... it's no wonder that the women who do end up in technology are the exceptionally flinty ones.

Sunday, October 08, 2006

meaning of LAMP

Among the great things I learned at Ohio LinuxFest came from Jeff Waugh's talk. I bet you've seen that LAMP acronym around, and wondered what it stood for.

Linux
Apache
Most scripting languages begin with "P"
PostgreSQL

Now, that makes a lot more sense.

Monday, October 02, 2006

Oracle BoF at LinuxFest

Thank God, the babysitter, and a wonderful spouse, I did get to go to Ohio LinuxFest after all. Hooray!

With one talk each on MySQL and PostgreSQL, it was a good day for database enthusiasts. Maybe that's why I saw quite a few Oracle users there.

The organizers also invited attendees to put together impromptu Birds-of-a-Feather (BoF) sessions. I was tempted to throw together one for Oracle, but I felt like my impromptu idea needed more preparation... I wanted to have a list of topics to seed discussion if it drags.

So, what kind of topics would you suggest for an Oracle-Linux BoF? Here are some that occurred to me.
  • Experiences with Oracle on various Linux distributions
  • Oracle's ancient Apache 1.3 HTTPServer; is it ever going to get to 2.0? Can you get mod_plsql working on Apache 2.0?
  • Open-source SQL*Plus alternatives (actually, someday I hope to talk at OOUG on this)
  • Experiences with RAC on Linux, Oracle Cluster File System, etc.?
Please, add topics to this list! And, if you're going to be at next year's Ohio LinuxFest, come be part of an OraBoF!

Oracle-free Oracle access

I got an interesting question from Guido d'Amico... he wants to use Python scripts to access Oracle databases from machines with no Oracle software installed. Between us, we came up with these options.
  • cx_Oracle and DCOracle2: These "classic" DB-API2 modules both rely on the OCI (Oracle Call Interface), a piece of software distributed by Oracle. (I believe all comparable means for accessing Oracle from other languages rely on the OCI, too.) There's just no way around that - you need some sort of Oracle client installed on the machine you're using them from.
    You don't have to bulk up your machine with a full-blown standard Oracle client, though.
    • Oracle Instant Client is lightweight (85 MB on my Windows box), free, and redistributable. For better and worse, it comes as a simple zipped set of files - if you want any environment variables set (ORACLE_HOME, PATH), you need to do that yourself.
    • OracleXEClient is likewise lightweight (72 MB) and free, and very easy to install.
    Neither of these options comes with a /network/admin/ folder, which might be a little confusing - unless you want to make your connections with Easy Connect, you'll need to set up $ORACLE_HOME/network/admin/tnsnames.ora by yourself.
  • You can use ODBC. mxODBC has been around for a while, but is not free for commercial use. pyODBC is free, and I hadn't actually heard of it until I researched this question - maybe I'll review it (or at least find a review) sometime soon.
  • You can use JDBC from Jython. Andy Todd and Przemek Piotrowski have blog posts detailing this.
  • You can go to IronPython and... um... OK, I've never yet done database access from IronPython, but I assume that using ODT.NET from IronPython is easy enough.

    EDIT: Przemek Piotrowski has not just made it work, he's posted a tutorial on ODP under IronPython. Thanks, Przemek!

Tuesday, September 12, 2006

Ohio LinuxFest

Ohio LinuxFest is coming Sep. 30!

I absolutely loved it last year. It looks like I won't be able to make this year... *weeps bitter tears* - unless, that is, one of you would like to babysit a ten-year-old? No? Bleah, Saturday events are not parent-friendly. (Actually, I did meet one attendee last year who brought his ten-year old. I can't imagine pulling that off with ours, though.)

But anyway, you can give me some comfort by going yourself and having a fantastic time. They've promised extra Linux-newbie stuff this year, too. It will be better than Cats; you will want to see it again and again.

Qnxo price weirdness

Let me say, first off, that I deeply appreciate everything Steven Feuerstein does for the Oracle community.

But the marketing of Qnxo has just gotten weird. My boss asked me for my wishlist for the software budget, and I tried to put Qnxo on it. On Qnxo's "buy it" page, I got
Qnxo is, for the time being, available only on a trial basis. The trial version offers the full range of Qnxo functionality and will work for 30 days after install. If you would like to continue using Qnxo after that point, please visit the Support page and fill out the Contact form. We will then provide you with a key that will enable Qnxo for use until the end of 2006. If you have any questions, please also submit your question through the Support page.
I thought refusing to give customers a straight up-front answer on a price was a hallmark of $10K+ software. Last I heard, Qnxo was $175.

If it were open-source, I'd dive in and use it. If it were proprietary with a clear price, we could decide whether to buy it. But this? I'm supposed to make it a part of my development process based on the hope that, when we're ultimately given a price, it'll be acceptable? I don't see how anybody can do that. I know we can't.

[EDIT 09-09-2007: Steven himself (!) just notified me via his comment that Qnxo is now QCGU, and it's free of charge. Thanks, Steven!]

Monday, September 11, 2006

dual-boot Ubuntu/Windows

I have a new laptop from my boss! A Dell XPS M1710; it feels a bit like an SUV of laptops... it weighs more than I do, comes with an on-board fusion reactor, and emits a menacing red glow from around its edges. It's just chock-full of gigabytes and megahertz and stuff. I will never complain that my boss skimps on our hardware. I think I need a mule to carry it, though.

Anyway, since I'm torn between my love of Linux and my workplace's Windows mandate, I decided to make the machine mirror my split. I need a dual-boot Windows/Ubuntu machine. Since this is the first time I've done so, I'm really glad it was on a clean machine... it was not as smooth as vanilla Ubuntu.

Here's what I did. You should skip step 2.
  1. Found good documentation on the Ubuntu wiki
  2. Let standard Ubuntu installer attempt to shrink the main Windows (NTFS) partition, to give me room for Linux partitions. I ended up with partitons of type "unknown". Oops!
  3. Re-installed Windows from its CD, this time setting its NTFS partition up with about 1/3 the total disk space and leaving the rest uncommitted.
  4. Followed the instructions from the "Issues with Windows XP and NTFS" section of WindowsDualBoot: created a System Rescue CD, booted from it, and used run_qtparted to redo the partitioning.
  5. Created a single physical partition - the last one of the four I'm allowed.
  6. Within the final physical partition, created a linux-swap partition (4 GB for my 2 GB of memory). Divided the rest of the disk between an ext3 partition for Linux root and a FAT32 partition for data that Windows and Linux will be able to share.
  7. Started the Ubuntu installer again; chose "manually edit the partition table", and assigned the swap and ext3 partitions to Ubuntu.
Accepting the defaults, my FAT32 partition got labelled "sda5" under Ubuntu and "e:" under Windows. Sure enough, both OS's can access it OK.

Success! One dual boot laptop, hooray! (Yes, I know, if I'd been really bold, I would have used Xen to run both OS's virtually. Maybe next time.)

Wednesday, August 30, 2006

Blogging tools

I've seen so many blogging tools over the past year or so... many of them seem like interesting projects that have generated lots of excitement.

Here's the part I don't get, though. Why? Had people honestly been going around saying, "Wow, entering my blog entries conventionally is such a time drain"?

I blog a few times a month, and the challenge is having something worth saying and choosing words to say it well - not slinging those words into Blogger's standard posting interface. It's hard to imagine how shaving seconds off that could honestly be worth choosing, installing, and understanding a blogging tool, much less writing one. Maybe if I blogged six times a day, but who would read that?

Maybe someone who "gets it" can explain it to me?

I suspect that this is one of these cases that's being driven by the coolness of the solutions, not the actual need for them. Mind you, I've got nothing against that. I have often spent three hours writing code to avoid a one-hour manual job. (Which is not as illogical as it sounds, because when you finish the manual job, somebody's bound to say, "Oh, I'm so sorry, but we need it done again..." Not that a logical consideration of that possibility is what drives me; I do it because it's fun.)

Friday, August 25, 2006

dumpfile diving

The oracle EXP utility generates a dumpfile which, although technically binary, actually contains lots of readable ASCII. Sometime I need to see this information, especially since dumpfiles don't provide a way (that I know of) of summarizing their contents and the conditions of their generation. In a perfect world, we'd never have to work from with dumpfiles burned onto scratched CDs by unknown parties and abandoned in the dusty corner behind the recycle bin, but this is not a perfect world.

Anyway, I just went through several techniques of examining a mysterious.dmp, and thought I'd share the experience. Much of it would apply to delving into any mixed binary/ascii file.
  • The worst way: more mysterious.dmp

    gave me frightening glyphs and angry beeps (xset b off to stop those), like R2-D2 invoking dread Cthulhu. Worse, my session would henceforth speak to me only in proto-Sumerian. I could kill the terminal window and open a new one, of course, but I still spent several hours plastering smooth curves over all the office's sharp corners, just to be safe.

  • Not quite so bad: less mysterious.dmp

    let me look at the file, making harmless marks of the binary characters, and didn't mangle my session's character set. Yay! I do need to make a habit of using less instead of more.

  • Still pretty painful: grep -a "what I'm looking for" mysterious.dmp

    The -a flag makes grep look into a file even though it's binary. It has no proper idea of where lines end in a binary file, though, so your hits can be really long. I had better luck grepping the files that resulted from the operations below.

  • Good: imp me@mydb show=Y file=mysterious.dmp full=y log=mysterious_contents.txt

    This gives you a clean-looking file (well, except for all the gratuitous quotation marks). It's also the only technique I know that you can use on Windows (without Cygwin). You don't get data contents, though, just DDL.

  • Best: strings mysterious.dmp > mysterious_contents.txt

    This GNU strings utility is really great! You get the ASCII, the whole ASCII, and nothing but the ASCII, quick and clean.
What I still don't have, though, is a way to examine a dumpfile and find out exactly what command was used to generate it. That would give all kinds of fantastic information: What instance was it? Which user did the export? Was it full? ... and so forth. If you know of a way to query a dumpfile for this kind of information, please comment!

Wednesday, August 23, 2006

xubuntu

Xubuntu is a variant of Ubuntu that uses the lightweight Xfce desktop environment, making it a good choice for low-powered systems.

Or so they say. So I dusted off (literally) a Gateway Solo laptop (Pentium II, 300 MHz, 64 MB RAM) and replaced its Windows 98 with Xubuntu 6.06. And that's what I'm posting from now! Go, Xubuntu! The only problem is that a naughty kitten tore several keys out of the keyboard several years ago, making typing difficult. And he would have to get the 'e', the little stinker.

I'm using a 2-year-old wireless card with it, though. I failed with my first attempt to make its old ethernet card work, and decided to take Tim Almond's advice and just use a known compatible device rather than go into ethernet-card archaeology.

Hmm, now there's a new working laptop in the house. Oh, the possibilities...

Monday, August 07, 2006

History

Cleanup of obsolete material continues.

I'm holding a boxed set of Oracle 7.3.4 Server software. It's still shrink-wrapped.

Can I really throw this out? 7.3.4 is where I started, after all. Then again, if I keep it, am I like the people who imagine that their comic book "investments" will pay off one day?

I love working in IT... yet it can be horrifying, realizing just how ephemeral our constructions are. If you're a mason, your work may outlive you by millenia. If you're a geek and you want your work to outlive you, you'd better get very sick or take up some dangerous hobbies.

Tuesday, August 01, 2006

obsolete books

Following a recent run of absurdly good luck at user-group drawings, I must face the fact that my cubicle will not tolerate my current inventory of books.

"So throw them out". But... but... books represent knowledge, how can I just throw them out? Especially the ones that I never did get around to devouring... sure, those skills may have proven irrelevant to my work, but it's stuff I never learned! How can I give up on learning it, send Knowledge away unlearned?

Ahem. Anyway, psychological issues aside, does anybody know a good destination for mildly obsolete technical books? The recycle bin seems so brutal, yet how can I find somebody who'd want them? This is mostly Oracle and Java stuff averaging five years old...

Thursday, July 13, 2006

Digital cholesterol

That's my new term for the performance-clogging stuff that big-enterprise IT departments automatically install to user desktops via the enterprise network. Every week, a bit more gets piped in without my foreknowledge or consent, gradually crippling my machine.

I do my serious work on my Ubuntu laptop, which is barred from my workplace's network; I download software from home and transfer my finished products to work by USB drive. At first that seemed like an unfortunate price I had to pay; now it's looking more like a blessing. My Ubuntu laptop sizzles along as fast as the day I first booted it, while my plugged-in Windows machine creaks and groans and is slowly becoming unusable.

Thursday, July 06, 2006

sqlWrap and oraDifference, packaged right

In the past few months, I've written sqlWrap.py, a database connection convenience wrapper, and oraDifference.py (which depends on sqlWrap.py). I didn't have any particularly sane way to distribute them, though, and I apologize to anybody who made the attempt.

Well, it may amount to delusions of grandeur, but I registered them as a SourceForge project. Now they have
  • A single, sane place for downloads, properly versioned
  • A regular distutils python installer: unzip it, run
    python setup.py install
    , and everything goes where it belongs (sqlWrap.py in your Python library, oraDifference.py in your python Scripts directory)
  • a Windows executable installer (oooh, aaah)
  • Homepages with documentation: for sqlWrap.py and oraDifference.py
Because oraDifference.py depends on sqlWrap.py, I put them together in a single install for simplicity.

This has been my first time working with Python's distutils module (so much easier than I expected!) and Sourceforge (not so much). It's been fun!

Saturday, June 03, 2006

sqlpython enhancements

I told you that Luca Canali's sqlpython is wonderfully easy to customize.

I probably should have also told you that it's dangerously addictive to customize. I kind of went out of control, and produced sqlpyPlus.py, a module of enhancements to sqlpython.

- SQL*Plus-style bind variables
- Query result stored in special bind variable ":_" if one row, one item
- SQL buffer with list, run, ed, get, etc.; unlike SQL*Plus, buffer stores session's full history
- @script.sql loads and runs (like SQL*Plus)
- ! runs operating-system command
- SQL*Plus-style describe, spool
- write sends query result directly to file
- comments shows table and column comments
- compare ... to ... graphically compares results of two queries
- commands are case-insensitive
- show and set to control sqlpython parameters

sqlpyPlus.py is not as clean and elegant as sqlpython - that's one reason I put it in a separate module, so that you can keep it separate from the original sqlpython and your own homemade enhancements. But it should cover pretty much everything you usually use SQL*Plus for, plus some goodies I hope you'll like.

[EDIT: Since I wrote this, Luca has wrapped an enhanced and debugged version of sqlpyplus into his distribution of sqlpython itself. Now you should simply go and get or upgrade sqlpython, and you'll have these goodies automatically.]

Wednesday, May 17, 2006

The missing Mercurial manual

I think I've found my ideal solution for version control. I used bzr for a few weeks, and appreciated its distributed nature - no single repository has to be the ultimate authority, so it works well for machines that can't all be connected to the same network. I use a travelling USB drive to sync my machines, so it was perfect... except that bzr can take several minutes for just a few merges. It got annoying when the merge was keeping me from leaving work.

Mercurial is also distributed, but it's very, very fast. It's even (gasp) well-documented! There are, however, a few things I think a newbie should know up-front.
  • The tutorial assumes you'll start by copying an existing mercurial repository. I wanted my own, though, and it took me several tries to figure out that I needed to do this:
    ~/existing$ hg init
    ~/existing$ hg add foo.bar
    ~/existing$ hg commit
    ~/existing$ cd ..
    ~$ hg clone existing newdir

    Later on, if I make new files in existing, they won't get to newdir until I
    ~/existing$ hg add newfoo.bar
    ~/existing$ hg commit
    ~/existing$ cd ../newdir
    ~/newdir$ hg pull
    ~/newdir$ hg update

    pull brings fresh metadata in from the other repository, storing it in .hg, but doesn't actually go on to update the files themselves. That's what update does. It took a while to get it, but now that I do, it seems intuitive and helps me feel in control.
  • Your EDITOR or HGEDITOR environment variable must be set, or else hg commit will hurl you pitilessly into vi (*nix) or throw up its hands and scoff at you (Windows).
  • But, if you use gedit and already have another file open in it, you get
    ~/existing$ hg commit
    transaction abort!
    rollback completed

    I suspect that may happen with other multi-file editors, too. I'll use EXPORT HGEDITOR=pico.

Tuesday, May 16, 2006

Python in OTN again

Here's a cheer to Przemek Piotrowski for his recent OTN article, Build a Rapid Web Development Environment for Python Server Pages and Oracle. Python Server Pages are just one of the 1022 ways to build a web application with Python, but his methodical instructions would be useful for doing anything remotely related (including installation of Oracle XE and mod_python).

Friday, May 05, 2006

SQLpython - a SQL client of your very own

Luca Canali has written SQLpython, a lovely new SQL command-line tool for Oracle.

Right now, the most popular SQL command-line tools are
  • SQLPlus, included with Oracle, is sometimes great, sometimes annoying, and impossible to modify (source code not available).
  • gqlplus is open-source. It's written in C, though, which means (to my mind) that you'll need all of your strength and all of your courage if you want to modify it.
So, download sqlpython.tar, untar it, put sqlpython.py and mysqlpy.py somewhere handy (like your Python library), and then:
$ python
>>> import mysqlpy
SQL.NoConnection> connect hr/hr@xe
SQL.xe> select * from employees;
Now comes the fun part! Open up mysqlpy.py and sqlpython.py and start modifying. They're very basic right now, but very clean, concise, easy to understand, and easy to modify. For instance, I wanted to be able to issue Python commands like this:
SQL.xe> py print 'This is a python command';
This is a python command.
So I added this method to mysqlpy:
    def do_py(self, arg):
exec(arg)
That's all I did - not one keystroke more - and it works. Now that's extensibility!

If you're not an Oracle person and you're envious, as far as I can tell, it should be easy to modify SQLpython to use any DB-API2 adapter.

Friday, April 28, 2006

Stored procedures from cx_Oracle

A couple of people have asked me about calling Oracle stored procedures from cx_Oracle. It's taken me a while to answer, because... I didn't know! I'd only had experience doing them the 'dumb' way:

>>> ora = cx_Oracle.Connection('scott/tiger@orcl')
>>> curs = ora.cursor()
>>> curs.execute('execute immediate myStoredProc(:a)',{'a':'the letter a'})

... but, of course, that won't do if (for instance) you want OUT variables. So I did a little research. cx_Oracle provides callproc and callfunc, but using them can get squirrely. Say you have PROCEDURE times_two(n IN NUMBER, result OUT NUMBER).
>>> n = 1
>>> curs.callproc('times_two',[2, n])
[2, 4]
>>> n
1
In other words, if you just pass a regular Python variable to callproc, the value won't actually change, OUT mode notwithstanding. If you want the new value, you'll just have to assign it there from callproc's return value.

Alternately, you can prepare the way by setting up your in/out variable as an instance of a special cx_Oracle object type, as follows...
>>> n = curs.var(cx_Oracle.NUMBER)
> curs.callproc('times_two',[5,n])
[5, 10.0]
>>> n
<cx_Oracle.NUMBER object at 0xb7cf2480>
>>> n.getvalue()
10.0
Pre-setting a variable's type? Calling .getvalue() just to see the contents? What an un-Pythonic pain! As far as I know, for the time being, cx_Oracle and PL/SQL procedures with IN-OUT parameters are simply two great tastes that do not taste great together. You can do it, you just won't feel like you're having Pythonic fun.

On the plus side, if the stored function or procedure is within a PL/SQL package, callproc accepts that in the way you'd guess:
> curs.callproc('multiplication_package.times_two',[5,n])
[5, 10.0]

Oh, and it looks like sqlWrap.py wasn't handling .callproc. I've posted a correction.

Saturday, April 22, 2006

IOUG Collaborate! handouts

If you're on your way to Collaborate!, and getting annoyed at the way you need to search manually for each session just to download its session materials, this script may be handy. It lets you grab the session materials from your personal itinerary.

Be gentle, it was written in a huge hurry.

#!/usr/bin/python
"""Creates a version of your Collaborate! personal itinerary with links to
session materials.

To use:
0. Make sure your machine has Python. www.python.org
1. Login to your personal itinerary at
http://iougew.prod.web.sba.com/displaymod/ITIntro.cfm?conference_id=44
2. Once your personal itinerary is showing, use Save As to save the webpage
to your hard drive. Name it PersonalIT.cfm.html. (This should be the
default name.)
3. Put this script in the same directory with PersonalIT.cfm.html.
4. Run the script by issuing 'python makeLinks.py' at the command prompt.
5. Open the generated file PersonalIT.withLinks.cfm.html with a browser.
6. The (find materials) links for each title will search for session materials.

By Catherine Devlin (catherinedevlin.blogspot.com)"""
import re, urllib
titleRe = re.compile('(Title:</td>\s+<td.*?>(<a href.*?>(.*?)</a>))', re.DOTALL | re.MULTILINE)
f = open('PersonalIT.cfm.html')
contents = f.read()
newContents = contents
f.close()
sessions = titleRe.finditer(contents)
sessionLinks = [s.groups()[1:] for s in sessions]
for (wholeLink, title) in sessionLinks:
withNewLink = '%s <a href="http://iougew.prod.web.sba.com/proceedingmod/SearchEvents.cfm?conference_id=44&searchType=2&title=%s">(find materials)</a>' % (wholeLink, urllib.quote(title))
newContents = newContents.replace(wholeLink, withNewLink)
newFile = open('PersonalIT.withLinks.cfm.html','w')
newFile.write(newContents)
newFile.close()

Monday, April 17, 2006

oraDifference.py

oraDifference.py - a tool for comparing items that differ between two Oracle schemas. The basic idea is to leverage the excellent graphical diff/merge tools available for file comparison and conveniently use them to inspect database object differences.

There are many programs that can compare two database schemas and tell you which objects are defined differently between them. That's really not good enough, though, because you then need to tediously dig into the definition of each (allegedly) differing object by hand, and perform any desired reconciliation by hand.

I wrote oraDifference.py to make comparing and reconciling schemas more convenient. For example, let's say you have the SCOTT schema in production and development instances. Stored function MYFUNC is defined in both, but the definition differs. View MYVIEW is defined only in development. Then running
python oraDifference.py scott@prod scott@dev
will generate the following batch files (Win) or shell scripts (*nix):
  • oraDifferenceResults/FUNCTION/MYFUNC.bat, which will invoke a graphical diff/merge tool showing you precisely where MYFUNC's definition differs between the two instances
  • oraDifferenceResults/FUNCTION/MYFUNC-copy-SCOTT-DEV.bat, which will write DEV's definition of FUNC into PROD
  • oraDifferenceResults/VIEW/missingFrom-SCOTT-PROD/MYVIEW.sql, the definition of MYVIEW
  • oraDifferenceResults/VIEW/missingFrom-SCOTT-PROD/MYVIEW-copy-SCOTT-DEV.bat, which writes MYVIEW into PROD
For now, you have to do the work of getting oraDifference.py (and sqlWrap.py) manually and putting them someplace appropriate. I do intend to wrap them up in a proper distutils distribution (maybe even with an .egg).

I'm posting this now because I find it really useful already. You may find some of my design decisions quirky - for instance, I mush all one-liner items (grants, synonyms) into big files by category, rather than making separate files for each grant or synonym. It's Python, though, so you should be able to tweak it to meet your tastes. Also, you can tweak the process oraDifference.py uses to decide whether two objects differ. I have always been annoyed that I can't stop TOAD's "Schema Compare" tool from turning up dozens of "differences" that I consider false hits. With oraDifference.py, you can just get in there and change it.

Eventually I hope to release something that will look polished and final, but for now, feel free to use it, and re-code any part that doesn't match your preferences - and let me know about any of your changes that you think should go into everybody's version.

For fairness, I'll mention some other options I found for schema comparison...
  • LivingLogic's oradiff.py (part of ll.orasql) is the closest to oraDifference.py. It also compares the text for each object, but it outputs in unified diff format (or unidiff). If you can read unidiff comfortably - welcome, advanced extraterrestrial visitor! The oradiff.py source code looks tidy and well-organized, but it's still not obvious to me how to tweak it.
  • schemaCompare, a Java program, was registered at SourceForge in June 2002, but has not yet released any files. I conceived oraDifference.py about two weeks ago. Not to suggest that this implies anything about the relative productivity of various languages. (jab, jab)

Friday, April 14, 2006

Python Core for Oracle

He put it in a comment, but it bears repeating:

Przemek Piotrowski has written up Python Core for Oracle, a set of instructions to put a top-to-bottom data-driven webserver stack on your machine in about half an hour. The installation is surprisingly straightforward. It's all fully functional, and it's all free.

This is the Golden Age!

Thursday, April 13, 2006

Cheetah templating

Yesterday, Python's str.Template failed me, so it was finally time to learn Cheetah.

I wanted to use templates like
'my list has $len($myList) objects; the first is named $myList[0].name.upper()'
... but, of course, that sort of stuff is impossible with str.Template. I created a bunch of code to populate a dictionary to pass to str.Template, but that was clunky, and defeated the purpose of having a template that clearly describes its own contents. In Cheetah, it's perfectly straightforward.

from Cheetah.Template import Template
tmplt = 'my list has $len($myList) objects; the first is named $myList[0].name.upper()'
print Template(tmplt, [locals(),globals()])


The second argument is the list of dictionaries Cheetah will search for matches to variables in the Template. Using [locals(), globals()] is my way to cheat and say, "Look wherever the interpreter would".

Wednesday, April 12, 2006

Oracle XE and Ubuntu

WOW. I just installed Oracle XE on my Ubuntu machine. I absolutely cannot believe how easy it was. This is - honest to goodness - all I did.
  1. Download oracle-xe_10.2.0.1-1.0_i386.deb
  2. su - root
  3. dpkg -i oracle-xe_10.2.0.1-1.0_i386.deb
    It ran for maybe thirty seconds - so short, I was certain there had been an error!
  4. /etc/init.d/oracle-xe configure (it told me to do that)
  5. pointed Firefox at http://127.0.0.1:8080/apex (it requested that, too)
  6. Started using the database (plus its included Application Express).
The entire installation took less than five minutes. Unbelievable! "This is Oracle?"

The only glitches I've gotten so far were when using Python's cx_Oracle against XE, and I've puzzled them out. (I don't know whether other people will get these glitches; they could have resulted from some residue of the full-fledged 10.2 Oracle that was on the machine before.)
  • import cx_Oracle gave ImportError: libclntsh.so.10.1: cannot open shared object file: No such file or directory until I set LD_LIBRARY_PATH=/usr/lib/oracle/xe/app/oracle/product/10.2.0/server/lib/
  • conn = cx_Oracle.Connection('scott/tiger@xe') gave RuntimeError: Unable to acquire Oracle environment handle until I set ORACLE_HOME=/usr/lib/oracle/xe/app/oracle/product/10.2.0/server


[EDIT Oct. 9, 2007: Configuring cx_Oracle with Oracle XE has turned out to be harder than expected. See my new blog post.]

[EDIT Mar. 6, 2008: Great instructions for installing straight from Oracle's repository with apt-get here]

Tuesday, March 28, 2006

summary of Oracle/Python discussion

My OTN article on Oracle and Python was kept very brief, to be non-intimidating and to fit within OTN's preferred length. If you've come here, though, you're ready for the rest of the story! I'll use this post to summarize that discussion.
  • Python+Oracle on other Linux distributions - see Andy Todd's blog entry
  • alternatives to fetchone(): fetchmany(), fetchall(), and looping directly on the cursor - see my last entry and this comment
  • Passing an argument to split(), to avoid errors on more complex init.ora parameters - see this comment

Friday, March 24, 2006

OTN article addendum

If you've read my new article at the Oracle Technology Network, Wrapping Your Brain Around Oracle + Python, thank you! I'd like to add a few more details about fetching rows with cx_Oracle that can make your code even cleaner.

Several times, I demonstrate getting rows from a cursor by means of the cursor's .fetchone() method. .fetchone() is used in loops like this:
curs.execute(<some query>, <bind variables>)
aRowOfData = curs.fetchone()
while aRowOfData:
<commands>
aRowOfData = curs.fetchone()
Another, more concise, alternative was not mentioned in the article. The cursor object itself can be iterated over, like this:
curs.execute(<some query>, <bind variables>)
for aRowOfData in curs:
<commands>
The effect is the same, but it works with two fewer lines of code.

Finally, the .fetchall() method, bringing the entire result set into a list at once, was only briefly mentioned in the article, but it would probably be preferable to .fetchone() for the small result sets we'll find in places like v$parameter. Only when a result set is very large (or your computer is very memory-limited) do you need to worry about .fetchall()'s impact on your system's available memory.

Sunday, March 12, 2006

Alice: corrupting the youth

I wish I could remember which PyCon delegate told me about Alice. It's a graphical environment for programming animations in a very kid-friendly fashion, yet full of solid, object-oriented programming goodness. This generation's LOGO, I suppose.

I was hoping it would fire our nine-year-old's interest in computers. I think it's working; Star Wars: Battlefront never produced such delighted shrieks and giggles. Maybe it works too well. He refused dessert to spend the extra couple minutes with Alice. Choosing code over food - I always thought of that as late-stage geekery.

Cheer: Alice works hard to be girl-friendly. Boo: Not available for Linux.

Tuesday, March 07, 2006

new and improved sqlWrap.py

After reading the Python Cookbook and attending PyCon, I have enormously improved sqlWrap.py, my Python module for handling DB-API 2.0 connections conveniently.

Also at PyCon, I learned that I could be accused of having duplicated projects likeLike them, I allow tuple-like, dict-like, and object-like access to fields. I like mine better, though; it requires less preparation, and has really nice reporting methods. For example,

conn = sqlWrap.SqliteConnection('myDb.sqlite')
print conn.select('myTable').xml()


... is all you need to get an XML report on myTable. Similar reporting methods exist for
  • tables in pp (prettyprint), xhtml, ReStructured Text
  • transposed tables in pp, xhtml, ReStructuredText
  • SQL INSERT statements

Hey, if everybody gets to write their own web app platform, why shouldn't I write my own DB-API wrapper?

Thursday, March 02, 2006

PyCon2006


No, Guido doesn't know I have this... I snagged it from the badge reuse box. You keep your smelly old rock star T-shirts, I'm keeping this.

So... PyCon. Wow. It was wonderful, of course. What else could it be? Put 400 people with that much intelligence, creativity, and energy in one place, and it can hardly help but be wonderful.

What surprised me is that I found so many good ideas no matter what I was doing at PyCon. Whether I was in a talk that I expected to benefit hugely from, or a "Probably useless but I guess I'll go anyway" talk, or just chatting between or after sessions, I seemed to learn good stuff constantly just the same. The serendipity was worth more than the "planned" learning, and maybe that's what makes conventions so much more useful than formal books and classes and so forth.

Friday, February 17, 2006

Python Cookbook

This may seem silly, but I really have to sing the praises of the Python Cookbook, 2nd Edition. Everybody else has known about it forever, but I just got my copy a month ago.

I am blown away. I can't believe how good this book is, far beyond any other programming book I've ever known. I'm now completely embarrassed about the quality of the code I wrote without it, and tempted to stay up all night refactoring everything. In fact, I've already largely rewritten sqlWrap.py based on what I've learned.

I'm afraid I dawdled about buying my copy because I never found the recipes at ActiveState all that compelling - occasionally nice, but usually nothing to jump and scream about. I figured the Cookbook would just be a bunch of them, bound together. Not so - the careful selection and excellent discussion make it amazing. It's like having a master programmer at your elbow to guide you.

Friday, February 10, 2006

The Geek Event Aggregator is ready!


The Geek Event Aggregator
is more-or-less ready for prime time! It now collects many more events. Better yet, it is very easy to feed more event sources, so it is set for even more growth. In other words,

PLEASE FEED THE AGGREGATOR NEW SITE SUGGESTIONS!

Some design notes:

After fooling around with complex regular expressions, Beautiful Soup, etc., I found a quick-and-dirty way that works better. The Aggregator downloads a page's HTML, replaces all tags with carriage returns, breaks the remaining text into lines, and checks those lines for ones that appear to contain recognizable future dates. The Aggregator assumes that is an upcoming event date. (There are many reasons dates get put on websites, but future dates almost always refer to meetings or events.)

Wow, human beings have many, many, many ways to write dates. Fortunately, python's dateutil module can recognize most of them - I mainly have to modify that to avoid false hits (like interpreting '.' as 'today', or '2006' alone as 'Jan 1, 2006').

Some sites already aggregate events from several groups and places together; for those, the Aggregator uses a slightly different algorithm. It finds the dates as above, then finds the location and event name by their line-number position relative to the date. (A human being (me) needs to provide relative line-number positions for those values in advance; for instance, one site may always list location on the line immediately after the event date, and that fact is recorded in advance.)

For the multi-event sites, the Aggregator has a decent but kludgey algorithm to parse city and region, despite the great variety of ways to write a location. Part of that relies on a list of recognized city names. It can be used for the single-event sites, too; if a recognizable city name is in the site's Title, or in the text in the form "Blah Blah City Blah Blahware Group", the Aggregator can find it. But if your event is in Athens, GA, the Aggregator thinks it's in Greece.

HTML DB makes it very convenient to build a web interface to the data. Oracle is also very gracious to host a free sandbox for HTML DB projects (which is where the Aggregator lives right now.) Unfortunately, as far as I can see, HTML DB doesn't support RESTful interfaces, or serving up pages as XML. That's a pity, because this cries out to be a REST web service. Maybe eventually I'll buy/find a place to host the web app in TurboGears or something.

For now, I've given up on feeding upcoming.org. Upcoming mandates an actual street address, which is just too hard to find automatically. I'd still like to pull from it, although I'll have to check their legal requirements, and - dare I say it? - I don't know if it will really have many relevant events I don't already have.

I'm sorry the events are so U.S./Canada-centric. It only has a handful of events from elsewhere, and it doesn't break down regions within other countries. (What's wrong with going from St. Petersburg to Novosibirsk for a meeting, anyway? Isn't that what the Trans-Siberian Railway is for?) You can help fix this by suggesting new sites to scan, and volunteering to introduce regional granularity for other countries.

Actually, because I've been the only one to feed the Aggregator so far, the events are Ohio-centric. You folks in benighted backwaters like California and New York are just going to have to feed it your own favorite sites if you want to fix that.

Some of the many things that produce misses and false hits:
  • Dates without years. Somebody puts on the website, "Our next meeting is Nov. 19." The Aggregator can't tell that they haven't updated the site since 2004.
  • Years must be on the same line as the rest of the date. If you say,
    2005 events:
    Feb. 14
    Aug. 12
    ... the Aggregator doesn't see the "2005", and believes there are events on Feb. 14 and Aug 12 of this year.
  • Frames. Well, you can't blame it; frames mess up everybody. But if you can dig into the HTML source and puzzle out the URL of the frame with the data, then that can be read. That's what I did for the OKCOUG webpage, for example.

Thursday, February 02, 2006

Geek Event Aggregator's future

The strangest thing happened to me this morning. The clouds split open, and a beam of light shone down onto me. (Since I was sitting in my cubicle, this in itself was odd enough.) A Voice spoke from Heaven, and said,

"Catherine, remember that somewhat cheesy Geek Event Aggregator you put on HTML DB a while ago?"

"Yes, Lord?" I said. (That seemed like the obvious response. That the Almighty would speak with hyperlinks didn't seem too surprising.)

"Well, I have heard the cries of my people. I want you to rewrite your aggregator to both consume and provide events in iCalendar format. Also, interface it with upcoming.org - both to download and to upload. Behold, Python libraries for upcoming.org have already been written for you. And I'm thinking that Beautiful Soup might help you pluck event descriptions from the God-awful HTML jungles you find them in."

"It sounds wonderful, Lord," I said. "But when am I supposed to do this? I mean, you did just drop custody of our Godson into our lap. Time is not exactly abundant right now."

There was a long pause, then the Voice said, "Let me get back to you on that."