Wednesday, October 07, 2009

Stop! Drop that field!

For PyOhio registration, we used a nice service called eventbrite. It worked great, but I have one big problem with it: it collected way too much data from registrants. It got us the data we needed, but it also asked for home addresses, gender, job title, company... all data we had no legitimate need for or plans to use, probably just because the fields are in the eventbrite form template. Entering it was pointless nuisance for our attendees, and maybe some were actually put off by the length or intrusiveness of the registration form. (Dave Stanek, if you're reading this, let's see if we can change that for next year.)

We are so not the only offenders in this department. It's everywhere, it's endemic. At website after website, we're asked to provide information of no apparent relevance to the sites' purposes. It's so easy to throw field after field into a data collection form; templates are provided with every conceivable field already in place; and - well, why not? Isn't more data better?

No. No, it's not. Excess data takes time, clutters databases, obscures important data, increases risks of data leakage. In interpersonal interactions, we always have the option of asking "Why do you need to know that?", or just giving people that funny look that tells them they're going out of bounds. On paper forms, we can leave fields blank. Automated forms with field validation cut those safeguards off and open the door to compulsive collection syndrome. The one defense people do have against intrusive electronic forms - lying - ruins data quality, and false data is much worse than no data at all.

We need a ethos of restraint in data collection, of always asking, "Why am I collecting this field?" Data collection needs to be seen as something that is not pure good, but something that has a cost to weigh against the benefit. Not collecting data is often the responsible choice, and we need to teach each other that.

Monday, October 05, 2009

PyCon talk review

We've got a record number of volunteers working on PyCon's Program Committee - the group that reviews talk proposals and decides which ones go on the schedule. And it's a good thing, because we've also got a record number of proposals - 179! (For comparison, PyCon 2008 got 118.)

Right now, we're in the fun part - going through the proposed talks and yelling, "Oooh! Ooooh! I want that one!" Just looking through the proposed talks is a great Python education all by itself - you find out about useful packages and techniques you'd never known were out there.

The tough part comes later - when we have to winnow the list down. Without exception, there are talks I want to see that won't make the cut. Accepting all the good talks would be great, but we'd need a week of PyCon, and three days for the core conference are all we figure most attendees can spare. (My proposal for a round-the-clock talk schedule met only chuckles. Then again, with late-night Open Spaces, we already come dangerously close to a round-the-clock schedule...)