Wednesday, May 21, 2014


I went down a refactoring rabbit hole on ddl-generator and ended up pulling out the portion that pulls in data from various file formats. Perhaps it will be useful to others.

>>> from data_dispenser.sources import Source
>>> for row in Source('animals.csv'):
...     print(row)
OrderedDict([('name', 'Alfred'), ('species', 'wart hog'), ('kg', '22'), ('notes', 'loves turnips')])
OrderedDict([('name', 'Gertrude'), ('species', 'polar bear'), ('kg', '312.7'), ('notes', 'deep thinker')])
OrderedDict([('name', 'Emily'), ('species', 'salamander'), ('kg', '0.3'), ('notes', '')])

Basically, I wanted a consistent way to consume rows of data - no matter where those rows come from. Right now, JSON, CSV, YAML, etc. all require separate libraries, each with its own API. This abstracts all that out, for reading purposes; now each data source is just a Source.

I'd love bug reports, and sample files to test against. And feel free to contribute patches! For example, it wouldn't be hard to add MS Excel as a data source.

No comments: