Weather data

16 February 2022

Hourly weather data is available from observations, but also from models that estimate local weather conditions - a bit like weather forecasts for the past. In this article, I compare estimates from Oikolab and ERA5 to observations from the Royal Netherlands Meteorological Institute (KNMI), for three locations in the Netherlands.

Weather data

I’ve used hourly weather data from KNMI to analyse how cycling speed varies with weather conditions. Of course, KNMI data is only available for the Netherlands.

An alternative source of weather data is Oikolab. Oikolab is based on ERA5, a European project that uses data from a variety of sources including weather stations, weather balloons, aircraft, ships and sattelites as input for models that estimate weather conditions for specific locations and moments in time. There are two versions available: ERA5 (which Oikolab uses) and ERA5 Land (which is more detailed).

Oikolab is a paid service. They currently offer a pay-as-you-go plan which will let you download 1,500 units per month for free, with one unit corresponding to one month of data for one variable at one location. The main reason why you might want to use Oikolab rather than the original ERA5 data is that downloading and processing is faster and easier.

I’m not knowledgeable about weather data, and ERA5 simulations are a black box to me. To better understand the data, I compared Oikolab and ERA5 estimates to KNMI observations. I downloaded hourly data for the year 2020, for the locations of three KNMI weather stations: Schiphol Airport, De Bilt, and Maastricht.


The table below compares temperature estimates from Oikolab, ERA5 and ERA5 Land with observations from KNMI (degrees Celsius; KNMI measures at a height of 1.5 metres; data used from ERA5 assumes 2m).

oikolab era5 era5
location measure
schiphol corr 0.99 0.99 0.98
bias -0.15 -0.15 -0.04
MAE 0.72 0.72 0.87
maastricht corr 0.99 0.99 0.99
bias -0.10 -0.10 -0.04
MAE 0.56 0.56 0.80
de_bilt corr 0.99 0.99 0.98
bias -0.05 -0.05 -0.00
MAE 0.56 0.56 0.87

The table shows the correlation coefficient; bias (a positive bias implies that estimates tend to be higher than KNMI observations); and the mean absolute difference between estimates and KNMI observations.

There’s a very strong correlation between estimates and observations.

Wind speed

The table below compares wind speed estimates from Oikolab, ERA5 and ERA5 Land with observations from KNMI (m/s at a height of 10m).

oikolab era5 era5
location measure
schiphol corr 0.91 0.91 0.92
bias 0.03 0.03 -0.19
MAE 0.96 0.96 0.96
maastricht corr 0.88 0.88 0.87
bias -0.19 -0.19 -0.68
MAE 0.87 0.87 1.02
de_bilt corr 0.89 0.89 0.89
bias 0.70 0.70 0.26
MAE 0.99 0.99 0.78

For wind speed, the correlation between estimates and observations is somewhat weaker, but still strong.

As the chart below illustrates, there’s an interesting pattern for average wind speed values by hour of the day.

Two things stand out. First, KNMI observations have a more pronounced day-night pattern than ERA5, ERA5 Land and Oikolab estimates (ERA5 is not shown, but is very similar to Oikolab). Second, there’s a drop in ERA5 estimated wind speed at 10am (but not in ERA5 Land estimates). Interestingly, both phenomena were also found in a study that compared ERA5 estimates to observational data in Sweden, at least for coastal and inland locations (see figures 5a and b; the drop occurs at 12 CEST, which corresponds to 10am UTC).

As for the drop in wind speed, the authors of the Swedish study indicate that the cause of this ‘undesirable feature’ is still unknown, although they suggest it may be caused by an imperfection of the weather model. The phenomenon is also discussed here, and a related phenomenon (temperature) here.

Wind direction

The chart below compares wind directions provided by Oikolab to those observed by KNMI (D3.js code for the chart here).

At all three locations, the dominant wind directions are between south and west. In ERA5 estimates, this pattern appears to be more pronounced and there appears to be a bit of a shift from south to west. It doesn’t seem to matter whether you use Oikolab, ERA5 or ERA5 Land: they all deviate somewhat from KNMI observations, in similar ways.


The table below compares precipitation estimates from Oikolab, ERA5 and ERA5 Land with observations from KNMI (mm).

oikolab era5 era5
location measure
schiphol corr 0.58 0.58 0.30
bias -0.01 -0.01 1.06
MAE 0.11 0.11 1.09
maastricht corr 0.56 0.56 0.29
bias -0.01 -0.01 0.84
MAE 0.09 0.09 0.87
de_bilt corr 0.54 0.54 0.29
bias -0.02 -0.02 0.90
MAE 0.10 0.10 0.94

For precipitation, the correlation between KNMI observations and ERA5 estimates is weaker than for other variables. This was perhaps to be expected: «Care should be taken when comparing model parameters with observations, because observations are often local to a particular point in space and time, rather than representing averages over a model grid box.» I’ll return to this in the discussion below.

Air pressure

The table below compares air pressure estimates from Oikolab, ERA5 and ERA5 Land with observations from KNMI (Pa).

oikolab era5 era5
location measure
schiphol corr 1.00 1.00 1.00
bias 27.05 26.97 15.92
MAE 30.07 30.06 30.16
maastricht corr 1.00 1.00 1.00
bias -1271.98 -1257.50 -1362.08
MAE 1271.98 1257.50 1362.08
de_bilt corr 1.00 1.00 1.00
bias -38.65 -40.25 -134.90
MAE 39.28 40.80 134.94

There’s a very strong correlation between observations from KNMI and estimates. For Maastricht, ERA5 and Oikolab report substantially lower air pressure than KNMI. This is probably because KNMI corrects for elevation. The Maastricht weather station is the most elevated of the country, at 114.3 meters above sea level.


Overall, weather data from Oikolab and ERA5 is broadly similar to KNMI observations, although there appear to be some systematic differences, such as the drop in wind speed at 10am. Especially air pressure and temperature estimates show a very strong correlation with KNMI observations. For wind speed, the correlation is somewhat weaker but still strong. For precipitation, the correlation is much weaker.

It appears that correlations are weaker for phenomena that show more local variation. For example, temperatures will normally not be very different for two nearby locations, but it may rain at one and not at the other. ERA5 (and Oikolab) estimates can be thought of as averages for an area. In the case of rain, these may well be different from what is observed at a specific location in that area. Note that this doesn’t necessarily mean that observational data is superior. For example, if you want to know if it rains at the Ronde Hoep, observations at Schiphol Airport may not necessarily give you the correct answer.

Despite the smaller grid used in ERA5 Land data, it doesn’t seem to correlate better with KNMI observations.


ERA5 and ERA5 Land data is available from Copernicus (for some reason I had to file a separate request to get ERA5 total precipitation data). To get data for a specific location, enter the coordinates of that location as the West and South values; and enter slightly higher values for East and North. That way, you’ll get data for one grid point that coincides with the location you’re interested in.

The data comes in the grib file format. I used the xarray and cfgrib Python packages to convert the data to a pandas dataframe. Note that you’ll need to calculate wind speed and direction from the components of the wind (u and v).

16 February 2022 | Categories: data