Metric Learning for High-Throughput Combinatorial Data Sets

2019-10-31T13:03:24Z (GMT) by Kiran Vaddi Olga Wodo
Materials design and discovery through the high-throughput exploration of materials space has been recognized as a new paradigm in materials science. However, typical high-throughput exploration methods deliver high-dimensional and very diverse data sets that pose the challenge of extracting the key features and patterns that could guide the discovery process. Unraveling patterns is a nontrivial task as quite often the underlying physical phenomena are uncertain and latent variables governing the performance are mainly unknown. In this paper, we discuss challenges related to designing a data analytics tool for clustering high-throughput measurements performed on the compositional library of materials. The critical aspects of our methodology are (i) learning the similarity measures, as opposed to using fixed similarity measures (e.g., Euclidean distance, dynamic time warping), while (ii) imposing the similarity in the composition space. Our methodology is based on the multitask learning approach that is formulated to account for the composition neighborhoods that are specific to the compositional libraries. We demonstrate the advantages of our methodology for the library of cyclic voltammetry curves generated for model multimetal catalysts, as well as X-ray diffraction patterns from experimental studies. We also compare our approach with the current state-of-the-art methods used in similar problems. This work has important implications for designing high-throughput exploration including catalysts for electrochemical systems, such as fuel cells and metal-air batteries.