I went down a refactoring rabbit hole on ddl-generator and ended up pulling out the portion that pulls in data from various file formats. Perhaps it will be useful to others.
>>> from data_dispenser.sources import Source >>> for row in Source('animals.csv'): ... print(row) ... OrderedDict([('name', 'Alfred'), ('species', 'wart hog'), ('kg', '22'), ('notes', 'loves turnips')]) OrderedDict([('name', 'Gertrude'), ('species', 'polar bear'), ('kg', '312.7'), ('notes', 'deep thinker')]) OrderedDict([('name', 'Emily'), ('species', 'salamander'), ('kg', '0.3'), ('notes', '')])
Basically, I wanted a consistent way to consume rows of data - no matter where those rows come from. Right now, JSON, CSV, YAML, etc. all require separate libraries, each with its own API. This abstracts all that out, for reading purposes; now each data source is just a Source.
I'd love bug reports, and sample files to test against. And feel free to contribute patches! For example, it wouldn't be hard to add MS Excel as a data source.