champagne anarchist | armchair activist


How to make a d3js force layout stay within the chart area - even with multiple components

For a post on analysing networks of corporate control, I wanted to create some network graphs with d3.js. The new edition of Scott Murray’s great book on d3.js, which is updated to version 4, contains a good example to get you started. However, I was still struggling with some practical issues, as the chart below illustrates (reload the page to see the problem develop).

A large part of the graph drifts out of the chart area, and the problem only gets worse on a mobile screen. But I figured out some sort of solution.

As Murray explains, you can vary the strength value of the force layout. Positive values attract, negative values repel. The default value is –30. You could set d3.forceManyBody().strength(-3) to create a more compact graph.

Of course, the ideal setting will depend on screen size. You could vary the strength value according to screen width. While you’re at it, you may also want to vary the radius of nodes and the stroke-width of edges. For example with something like this:

if(w > 380){
    var strength = -3;
    var r = 3;
    var sw = 0.3;
    var strength = -1;
    var r = 3;
    var sw = 0.15;

Now this may make the graph more compact, but it doesn’t solve one specific problem: components not connected to the rest of the chart will still drift out of the chart area. In my example, there are four components: a large one, and three pairs of nodes that are only connected to each other and not to the rest of the graph.

The way in which I dealt with this was to create four different graphs and attach the small components to a forceCenter at the margin of the chart area. For example, d3.forceCenter().x(0.1 * w).y(0.9 * h)) will put one of them in the bottom left corner. Here’s the result:

It’s still a lot of code - I can’t help feeling there should be a more efficient way to do this. Also, it’s slightly weird that the small components immediately freeze, whereas the large one takes its time to develop into its final shape. And the text labels could be improved. But at least it seems to work.

How to automate extracting tables from PDFs, using Tabula

One of my colleagues needs tables extracted from a few hundred PDFs. There’s an excellent tool called Tabula that I frequently use, but you have to process each PDF manually. However, it turns out you can also automate the process. For those like me who didn’t know, here’s how it works.

Command line tool

You can download tabula-java’s jar here (I had no idea what a jar is, but apparently it’s a format to bundle Java files). You also need a recent version of Java. Note that on a Mac, Terminal may still use an old version of Java even if you have a newer version installed. The problem and how to solve it are discussed here.

For this example, create a project folder and store the jar in a subfolder script. Store the PDFs you want to process in a subfolder data/pdf and create an empty subfolder data/csv.

On a Mac, open Terminal, use cd to navigate to your project folder and run the following code (make sure the version number of the tabula jar is correct):

for i in data/pdf/*.pdf; do java -jar script/tabula-0.9.2-jar-with-dependencies.jar -n -p all -a 29.75,43.509,819.613,464.472 -o ${i//pdf/csv} $i; done

On Windows, open the command prompt, use cd to navigate to your project folder and run the following code (again, make sure the version number of the tabula jar is correct):

for %i in (data/pdf/*.pdf) do java -jar script/tabula-0.9.2-jar-with-dependencies.jar -n -p all -a 29.75,43.509,819.613,464.472 -o data/csv/%~ni.csv data/pdf/%i

The settings you can use are described here. The examples above use the following settings:

  • -n: stands for nospreadsheet; use this if the tables in the PDF don’t have gridlines.
  • -p all: look for tables in all pages of the document. Alternatively, you can specify specific pages.
  • -a (area): the portion of the page to analyse; default is the entire page. You can choose to omit this setting, which may be a good idea when the location or size of tables varies. On the other hand, I‘ve had a file where tables from one specific page were not extracted unless I set the area variable. The area is defined by coordinates that you can obtain by analysing one PDF manually with the Tabula app and exporting the result not as csv, but as script.
  • -o: the name of the file to write the csv to.

In my experience, you may need to tinker a bit with the settings to get the results right. Even so, Tabula will sometimes get the rows right but incorrectly or inconsistently identify cells within a row. You may be able to solve this using regex.

Python (and R)

There’s a Python wrapper, tabula-py that will turn PDF tables into Pandas dataframes. As with tabula-java, you need a recent version of Java. Here’s an example of how you can use tabula-py:

import tabula
import os
import pandas as pd
folder = 'data/pdf/'
paths = [folder + fn for fn in os.listdir(folder) if fn.endswith('.pdf')]
for path in paths:
    df = tabula.read_pdf(path, encoding = 'latin1', pages = 'all', area = [29.75,43.509,819.613,464.472], nospreadsheet = True)
    path = path.replace('pdf', 'csv')
    df.to_csv(path, index = False)

Using the Python wrapper, I needed to specify the encoding. I ran into a problem when I tried to extract tables with varying sizes from multi-page PDFs. I think it’s the same problem as reported here. From the response, I gather the problem may be addressed in future versions of tabula-py.

For those who use R, there’s also an R wrapper for tabula, tabulizer. I haven’t tried it myself.

Call tabula-java from Python

[Update 2 May 2017] - I realised there’s another way, which is to call tabula-java from Python. Here’s an example:

import os
pdf_folder = 'data/pdf'
csv_folder = 'data/csv'
base_command = 'java -jar tabula-0.9.2-jar-with-dependencies.jar -n -p all -f TSV -o {} {}'
for filename in os.listdir(pdf_folder):
    pdf_path = os.path.join(pdf_folder, filename)
    csv_path = os.path.join(csv_folder, filename.replace('.pdf', '.csv'))
    command = base_command.format(csv_path, pdf_path)

This solves tabula-py’s problem with multipage pdf’s containing tables with varying sizes.

Embedding D3.js charts in a responsive website

UPDATE - better approach here.

For a number of reasons, I like to use D3.js for my charts. However, I’ve been struggling for a while to get them to behave properly on my blog which has a responsive theme. I’ve tried quite a few solutions from Stack Overflow and elsewhere but none seemed to work.

I want to embed the chart using an iframe. The width of the iframe should adapt to the column width and the height to the width of the iframe, maintaining the aspect ratio of the chart. The chart itself should fill up the iframe. Preferably, when people rotate their phone, the size of the iframe and its contents should update without the need to reload the entire page.

Styling the iframe

Smashing Magazine has described a solution for embedding videos. You enclose the iframe in a div and use css to add a padding of, say, 40% to that div (the percentage depending on the aspect ratio you want). You can then set both width and height of the iframe itself to 100%. Here’s an adapted version of the code:

.container_chart_1 {
    position: relative;
    padding-bottom: 40%;
    height: 0;
    overflow: hidden;
.container_chart_1 iframe {
    position: absolute;
    left: 0;
    width: 100%;
    height: 100%;
<div class ='container_chart_1'>
<iframe src='' frameborder='0' scrolling = 'no' id = 'iframe_chart_1'>

Making the chart adapt to the iframe size

The next question is how to make the D3 chart adapt to the dimensions of the iframe. Here’s what I thought might work but didn’t: in the chart, obtain the dimensions of the iframe using window.innerWidth and window.innerHeight (minus 16px - something to do with scrollbars apparently?) and use those to define the size of your chart.

Using innerWidth and innerHeight seemed to work - until I tested it on my iPhone. Upon loading a page it starts out OK, but then the update function increases the size of the chart until only a small detail is visible in the iframe (rotate your phone to replicate this). Apparently, iOS returns not the dimensions of the iframe but something else when innerWidth and innerHeight are used. I didn’t have that problem when I tested on an Android phone.

Adapt to the iframe size: Alternative solution

Here’s an alternative approach for making the D3 chart adapt to the dimensions of the iframe. Set width to the width of the div that the chart is appended to (or to the width of the body) and set height to width * aspect ratio. Here’s the relevant code:

var aspect_ratio = 0.4;
var frame_width = $('#chart_2').width();
var frame_height = aspect_ratio * frame_width;

The disadvantage of this approach is that you’ll have to set the aspect ratio in two places: both in the css for the div containing the iframe and in the html-page that is loaded in the iframe. So if you decide to change the aspect ratio, you’ll have to change it in both places. Other than that, it appears to work.

Reloading the chart upon window resize

Then write a function that reloads the iframe content upon window resize, so as to adapt the size of the chart when people rotate their phone. Note that on mobile devices, scrolling may trigger the window resize. You don’t want to reload the contents of the iframe each time someone scrolls the page. To prevent this, you may add a check whether the window width has changed (a trick I picked up here). Also note that with Drupal, you need to use jQuery instead of $.

width = jQuery(window).width;
    if(jQuery(window).width() != width){
        document.getElementById('iframe_chart_1').src = document.getElementById('iframe_chart_1').src;
        width = jQuery(window).width;

In case you know a better way - do let me know!

FYI, here’s the chart used as illustration in its original context.

Hoe exporteer je fietsknooppunten naar je Garmin

Fietsknooppunten zijn handig. Je kijkt van te voren langs welke knooppunten je wilt fietsen, schrijft de nummers op een briefje en dat plak je met doorzichtige tape op de bovenbuis van je frame. Maar soms missen er bordjes en raak je de weg kwijt. In het ergste geval raak je hopeloos verstrikt in een troosteloze buitenwijk van Almere.

De oplossing is simpel: exporteer de route naar je Garmin (die moet dan wel navigatie hebben). Hier wordt het helder uitgelegd. De auteur schakelt de kaarten op de Garmin uit. Ik niet, waardoor het nog simpeler wordt. Hier zijn de stappen:

  • Ga naar de routeplanner van de Fietsersbond.
  • Klik op de knop «LF en knooppunten» en zoom in totdat de knooppunten zichtbaar worden.
  • Klik op het knooppunt waar je wil starten en klik in de popup op «Van».
  • Klik op elk knooppunt waar je langs wil fietsen en klik in de popup op «Via».
  • Klik op het knooppunt waar je wil eindigen en klik in de popup op «Naar».
  • Klik op de groene knop «Plan route».
  • Klik in de linkerbalk op «GPS» en in het volgende scherm «GPX bestand».
  • Het bestand wordt opgeslagen in je computer. Sluit de Garmin via de usb aan op je computer, gooi het bestand in de map «Garmin/NewFiles» en ontkoppel de Garmin weer.

Je Garmin maakt er automatisch een course van. Dit is op een Mac, misschien dat het met Windows anders werkt. En voor de geeks: hier las ik hoe je je GPX-file weer op een Leaflet kaart kan tonen.

Step by step: creating an R package

With the help of posts by Hillary Parker and trestletech I managed to create my first R package in RStudio (here’s why) . It wasn’t as difficult as I thought and it seems to work. Below is a basic step-by-step description of how I did it (this assumes you have one or more R functions to include in your package, preferably in separate R-script files):

If you want, you can upload the package to Github. Other people will then be able to install it: