Articles in category: python

How to automate extracting tables from PDFs, using Tabula

11 April 2017 - One of my colleagues needs tables extracted from a few hundred PDFs. There’s an excellent tool called Tabula that I frequently use, but you have to process each PDF manually. However, it turns out you can also automate the process. For those like me who didn’t know, here’s how it works.

Amsterdam has room for another 2.1 million bicycle racks

20 June 2016 - Amsterdam has a persistent shortage of bicycle racks. Bicycle professor Marco te Brömmelstroet argues that this is really a matter of making choices: the space occupied by four parked cars could easily accommodate 30 bicycle racks.

Assignment 1-3

23 March 2016 - A little background: I’m using the Outlook on Life Surveys dataset and I’m interested in the relation between union membership and political participation (background here). The third assignment is similar to the second one, only we’re required to do some data management before outputting the data. Therefore, I’ll submit an adapted version of the programme and blogpost of the previous assignment.

Coursera Data Analysis and Interpretation

12 March 2016 - I was initially introduced to R by Nathan Yau’s Visualize This, but subsequently I learned a lot about R through some of the courses in Brian Caffo, Roger Peng and Jeff Leek’s Data Science Specialization at Coursera. In fact, the course was a reason for me to postpone switching from R to Python.

Base versus ggplot2

12 February 2016 - Yesterday, stats guru Jeff Leek confessed the ultimate unpopular opinion in data science: «I don’t use ggplot2 and I get nervous when other people do» (if you haven’t a clue what this is about, you may want to skip this post altogether). His confession met with ridicule, more riducule, and an occasional «oh my god I thought I was the only one!».