Wednesday, August 10, 2005

Geek event aggregator

Click me!

I've written up an aggregator script (Python, of course) that browses an assortment of event announcement webpages and parses out the name/city/state/date information. Then I made an HTML DB application to serve it up according to your state or province (sorry, non-North-Americans).

No, I haven't written a brilliant AI to figure this out. It's just a bunch of regular expressions.

To do:
  1. Mine from more event sources
  2. Refactor the source code so it doesn't embarrass me and I can post it
  3. Debug multiple hits for some events (InOUG especially)
  4. Add non-North-American region support
  5. Automate the mining (currently I kick it off and upload to HTML DB by hand)
  6. get somebody to host the app in some more prominent site
  7. Provide access to the pure XML as generated by the aggregator, so others don't have to go through the heck of parsing I do

2 comments:

Daniel said...

Hey, cool.

I like the geographical breakdown; do you have a GIS component (to make smaller regions easy)?

What happens when people want to add new sources? I assume that's handled by hand right now...

Catherine said...

Nothing so sophisticated! "Region" is defined as "in a next-nearest-neighbor state" (a neighbor's neighbor). It's quick but quirky... New York and Minnesota are in the same "region" (thanks to blooter Ontario), while Maine and Massachusetts are not!

The aggregator basically considers "city" to be a dumb string; once it knows my event is in Ohio, it doesn't care if the city is "Dayton" or "Over next to them big mountain thingies". Hooking into some geo-web service would be cool, but would require pinning down the city.

As for updating, right now, I just go to the site, add it to the hard-code list of targets, and puzzle out a regex to catch the name/loc/date/link information. But maybe I'll improve it if this ends up growing out of control...