“My dear open data, I love you, because you are open. And I want to open you up even more. So that everyone can use you. And reuse. Forever.” Ronald Pijnenburg (34) put these words on paper during a science communication workshop in a love letter to his field: open data in the earth sciences. His goal is to make scientific data reusable for everyone as easily as possible.
Pijnenburg studied earth sciences in Utrecht. “That seemed like a nice combination between science subjects and the romance of the earth. Chemistry and physics can sometimes be a bit dry, but volcanoes and earthquakes are a lot more striking.” He was particularly interested in the processes deep inside the earth.
“I did my PhD on research into the Groningen earthquakes. What exactly is happening there? There was certainly a lot of interest in this after the earthquake in 2012 in Huizinge.” That earthquake had a magnitude of 3.6, the strongest ever measured in Groningen. This led to a major investigation by the gas producer NAM. “My project was part of that.”
The natural gas in Groningen is locked up in a layer of sandstone at a depth of three kilometres. It consists of clumped grains of sand, with space in between. There is natural gas in those pores. “When you pump that gas up, the sandstone collapses a bit. My research question was how this sandstone collapses.”
Pijnenburg simulated the conditions at that depth in a lab, such as high pressure and temperature. He looked at what happened when you extract gas from sandstone. “Part of the collapse of sandstone happened elastically, just like a stiff spring, and partly permanent, for example because sand grains shift relative to each other, break or because clay is compressed between the grains.” The elastic collapse occurs immediately upon gas extraction, but the permanent collapse could possibly continue for a while, depending on what exactly is happening. “We’re not sure if that’s the case. If so, it’s getting less and less.”
Gas extraction in Groningen has been taking place since 1963. “Unfortunately, you cannot conduct experiments that last decades, so any long-term effects must be modelled. Other researchers are working on that.”
Worked in Mozambique, Kazakhstan and Australia
Pijnenburg spent his entire studies and PhD on the Utrecht campus. “I took a few years off in between. Then I worked in Mozambique, Kazakhstan and Australia in mining.” Pijnenburg has now returned to exactly the same place: he is now leading a project to make geoscientific data more widely available. “I looked in detail at one geological process for four years, then I wanted to zoom out again.”
“Scientists, including geologists, produce a lot of data. It is largely publicly available, for transparency of our own research and efficiency for future investigations. But the problem is that it cannot be found because it is scattered around. In the Netherlands alone, you have to search for geoscientific data on six different websites. There are hundreds of them abroad.”
Pijnenburg explains that when publishing, it is indicated what a dataset is about: the so-called metadata. Think of the researcher, location or research method. “Anything can go wrong there. If a scientist has linked the term ‘illite’, a type of clay, to the data, someone else can’t find it by searching for ‘clay’. That would be useful, because otherwise you could miss data. Moreover, there are also regular typos.”
One big data catalog
Pijnenburg solves both problems, scattering of data and incomplete metadata, by contributing to one large European catalog for earth science data. The data itself remains in its original location, but is much easier to find due to the streamlined metadata. “It actually works like Funda: all data can be found on one website and you can filter on what you are looking for, for example on volcanology as a subject or electron microscope as a method. The data from my own research is of course also included.” The platform came online last week.
Setting up this catalog was easier said than done, says Pijnenburg. “There are so many scientists from so many disciplines involved, who all deliver their data differently. Satellite images are easy to reuse. But for lab data that is a bigger challenge. Even for scientists this is often still specialist and therefore difficult to reuse. Our ultimate aim is to offer the data as a species Binasa reference book for science students.”
“It is good that scientists often make their data public, but there is still a step to be taken. Because if no one can find it, it’s no use to you. Such a data catalog offers a solution.”