NoVa-Py Talk: Building a Python Package
One of the most popular uses of the API for DocumentCloud, the document research/publishing platform where I work, is to bulk-upload hundreds or thousands of documents. People usually hack their own code together to do this, sometimes using the Python or Ruby wrappers for the API.
After talking with users and hearing their thoughts about the workflow — a desire to have a record of each file’s URL once uploaded, for example — I saw an opportunity to add some luxury to the process. A couple of months, a lot of research, and a few bruises later, I had my first Python package: pneumatic.
pneumatic does a few things to make life easier. It grabs information about each uploaded file and saves it in a SQLite database, which you can dump to csv. It uses Python’s multiprocessing module to try to add some speed (recognizing that this is a network-bound task). And it scans all subfolders for files, which is handy when you obtain a collection of files organized that way.
Learning about Python packaging was as much a part of the project as creating the library itself. The folks at the Northern Virginia Python Users Group were kind enough to invite me to share what I learned recently. Click through the title card to view the slides.