The progress in first-principles simulation codes and supercomputing capabilities have given birth to the so-called high-throughput (HT) ab initio approach, thus allowing for the identification of many new compounds for a variety of applications (e.g., lithium battery and photovoltaic). As a result, a number of databases have also become available online, providing access to various properties of materials, mainly ground‑state though. Indeed, for more complex properties (e.g., linear or higher‑order responses), the HT approach is still out of reach because of the required CPU time. To overcome this limitation, machine learning approaches have recently attracted much attention in the framework of materials design.
In this talk, I will review recent progress in the emerging field of materials informatics. I will briefly introduce the OPTIMADE API  that was developed for searching the leading materials databases (such as AFLOW, the Materials Cloud, the Materials Project, NOMAD, OQMD, ...) using the same queries. I will introduce the MODNet framework and its recent developments for predicting materials properties [2-5]. This approach, which is particularly well suited for limited datasets, relies on a feedforward neural network and the selection of physically meaningful features. Finally, I will illustrate the power of materials informatics, combining high‑throughput ab initio calculations and machine learning, through a few recent examples.
 C.W. Andersen et al., Sci. Data 8, 217 (2021).
 P.‑P. De Breuck, G. Hautier, and G.‑M. Rignanese, npj Comput. Mater. 7, 83 (2021).
 P.‑P. De Breuck, M.L. Evans, and G.‑M. Rignanese, J. Phys.: Condens. Matter 33, 404002 (2021).
 P.‑P. De Breuck, G. Heymans, and G.‑M. Rignanese, J. Mater. Inf. 2, 10 (2022).
 X. Liu, P.‑P. De Breuck, L. Wang, and G.‑M. Rignanese, npj Comput. Mater. 8, 233 (2022).