Fresh off the ArXiv: Data mining for better materials synthesis
There remains something of an alchemistic guesswork in developing new materials with useful functional properties. I remember a favorite anecdote of my graduate advisor, Laura Greene. Starting in the 1950’s, Bernd Theodor Matthias developed a list of rules for successfully designing new superconductors. These “Matthias’ rules” were as follows:
- high symmetry is good, cubic symmetry is the best
- high density of electronic states is good
- stay away from oxygen
- stay away from magnetism
- stay away from insulators
- stay away from theorists
The discovery of high-temperature superconductors in 1986 threw all of these rules (except perhaps the last?) right out the window. Still today, serendipidy plays an integral role in the discovery of new superconductors. The iron-based superconductors–the most recent newly discovered group of superconductors–were born out of Hideo Hosono’s group somewhat stumbling upon 3 K superconductivity in LaOFeP, followed by clever doping down the pnictogen line (arsenic for phosphorous) and other element-swapping tricks. La(O,F)FeAs, with a more interesting transition temperature of 26 K, followed and the avalanche of discovery let loose from there.
Hosono’s lab was working to impart semiconductors with magnetic properties by synthesizing and characterizing a wide variety of compounds. LaOFeAs fell short in that application, but Hosono claims he had a hunch that his pnictides would be superconducting, an instinct built from years of research.
“By having researched such materials for many years, I feel that I was able to cultivate an eye for looking at materials,” Hosono said.
How do we codify this type of instinct? How could we automatically, but intelligently, canvas the vast array of potential compounds and not only find those with interesting properties, but actually select for a particular functionality? Since 2009, Hosono has led what I’d call industrial-scale materials discovery. By 2013, he and collaborators synthesized about 1000 candidate materials, of which about 25 were superconducting. A 2.5% success rate from some of the best and most experienced materials minds on earth.
Can machines do better? Can we do better, with the right tools?
How do we codify this type of instinct? How could we automatically, but intelligently, canvas the vast array of potential compounds and not only find those with interesting properties, but actually select for a particular functionality?
So now you can imagine perhaps why this this paper caught my eye on the arXiv this week: “Data Mining for better material synthesis: the case of pulsed laser deposition of complex oxides“. Collaborators at Oak Ridge National Laboratory, the University of Tennessee, and Xi’an Jiaotong University in China have developed a software tool to guide materials design.
The group focused on growing oxide films by pulsed laser deposition. Their software mines the literature to extract synthesis parameters and functional properties of previously studied materials and employs crowd sourcing to improve the accuracy of that data. This information is organized in a searchable repository open to additions and mining by the community. An example of the data visualization capabilities is shown below.
They’ve also developed simple machine-learning tools to analyze this wealth of data and extract out information to help relate growth parameters and functional properties. Their data and code is available on github.
The authors hope to fill a gap in recent efforts towards materials design. They point out the limitations of purely theoretical approaches, like the materials genome, which doesn’t address what materials can actually be fabricated and under what conditions. They mention new databases for sharing experimental data and methods: the Materials Data Facility, Materials Innovation Network, and the Materials Data Curation System. The software proposed here is a general tool for automatically extracting synthesis parameters and functional property information. The authors emphasize that their approach is universal and can be applied to other methods of synthesis besides PLD.
Importantly, the authors point out that also including failures in materials development is crucial to generating successes. They foresee labs contributing this information, which is difficult to find in published work, from their internal logs. They acknowledge that the database currently requires much human effort. Automation will require applying natural language processing technology and developing new tools to analyze and extract data from figures.
I have a hunch this type of “big data” analysis for materials science will explode in the next decades. There is so much room for fascinating (and potentially lucrative) discovery.
What do you think about the role of big data and materials discovery? Where will the next revolution in materials synthesis come from? Weigh in, in the comments!