Universal Hyper Active Learning: A Data Pipeline to Accelerate Materials Discovery

universalHAL logo

Building accurate machine learning models of atomic interactions requires carefully curated training datasets, but generating these datasets is often the hardest and most time-consuming part of the process. ase-uhal is a Python tool that automates and accelerates this data generation stepby steering atomistic simulations towards configurations the model finds most informative, avoiding redundant calculations. Its 'universal' extension to the hyperactive learning (HAL) approach makes it compatible with the new generation of foundation models that can be fine-tuned for specific applications, and introduces a batched workflow that significantly improves throughput over existing methods. Available via 'pip install ase-uhal' and integrated with the widely used ASE ecosystem, the approach is demonstrated on an InGaP alloy system, showing that models fit to diverse training data outperform those fitted using random sampling.