I've written up an aggregator script (Python, of course) that browses an assortment of event announcement webpages and parses out the name/city/state/date information. Then I made an HTML DB application to serve it up according to your state or province (sorry, non-North-Americans).
No, I haven't written a brilliant AI to figure this out. It's just a bunch of regular expressions.
To do:
- Mine from more event sources
- Refactor the source code so it doesn't embarrass me and I can post it
- Debug multiple hits for some events (InOUG especially)
- Add non-North-American region support
- Automate the mining (currently I kick it off and upload to HTML DB by hand)
- get somebody to host the app in some more prominent site
- Provide access to the pure XML as generated by the aggregator, so others don't have to go through the heck of parsing I do
1 comment:
Nothing so sophisticated! "Region" is defined as "in a next-nearest-neighbor state" (a neighbor's neighbor). It's quick but quirky... New York and Minnesota are in the same "region" (thanks to blooter Ontario), while Maine and Massachusetts are not!
The aggregator basically considers "city" to be a dumb string; once it knows my event is in Ohio, it doesn't care if the city is "Dayton" or "Over next to them big mountain thingies". Hooking into some geo-web service would be cool, but would require pinning down the city.
As for updating, right now, I just go to the site, add it to the hard-code list of targets, and puzzle out a regex to catch the name/loc/date/link information. But maybe I'll improve it if this ends up growing out of control...
Post a Comment