alexei vella
Texas Rep. Lamar Smith, who heads the U.S. House Committee on Science, Space and Technology, is on the hunt for wasteful use of government funds.
As part of the search, the Republican demanded the National Science Foundation show paperwork on selected research grants. One project that caught his eye is the Paleobiology Database project, which is currently maintained by the UW-Madison’s Department of Geoscience.
According to Ars Technica, a technology news website, “Most of the grants that were picked out seemed to be judged primarily on whether their titles sounded silly to people unfamiliar with the field.”
But when the congressional committee targeted the Paleobiology Database, it evidently did not take notice of one of its heavy users.
“The energy industry uses it because fossils come from sedimentary rock, and that’s where fossil fuels are found,” says Shanan Peters, Dean L. Morgridge Professor of Geoscience, who is steward of the database. “Dean Morgridge got his master’s degree from our Geosciences Department in 1954. He then led the team that discovered the Prudhoe Bay oil deposits in Alaska, which is one of the largest oil fields in North America.”
The database is a computerized record compiling field and museum research on fossils uncovered by paleontologists from all over the world. “It’s a crown jewel of the discipline and has contributed to scientific research and the energy industry,” says Peters.
The database has had several physical homes over the past 15 years. During that time around 300 volunteer paleobiologists have read more than 40,000 documents and entered data by hand.
“Paleontologists spend their careers studying specific fossil groups, documenting where and when they occur and describing new species,” Peters adds. “But if you want to answer the question of how many species were alive on the planet 10 million years ago, and how biodiversity on the planet has changed in response to climate perturbations and asteroid impacts and so on, then we need to have the work of all these individuals aggregated into one place so we can ask questions about the large-scale history of life.”
But there are hundreds of thousands of publications on fossils out there. Humans can’t read and enter them fast enough. Peters has found a way to speed up the process. That’s one of the reasons the fossil database has found a home at UW-Madison.
Peters has been working with Miron Livny and other scientists at the Center for High Throughput Computing on campus, who are racing to perform increasingly sophisticated data analysis using a system called DeepDive to extract data including facts buried in sources like the text, tables and figures of scientific journals. Together they are building an infrastructure that will support machine-reading across many sciences.
According to their recent article in the Public Library of Science, DeepDive is often more accurate than humans, and it is much, much faster. Says Peters: “It is a natural to use the Paleobiology Database as a test case for pitting the DeepDive machine against human experts.”