High-Accuracy Excited-State Reference Benchmark Dataset for Organic Semiconductors

BenchmarkSet1500 logo

The BenchmarkSet1500 resource theme provides a dataset of multireference excited states for 1500 small organic semiconductors, alongside a Python-based workflow used to generate the associated high-level excited-state calculations. It is designed for researchers in organic electronics and data-driven chemistry who require reliable and reproducible excited-state data, as well as those developing machine learning models or screening pipelines. By combining standardised computational workflows with multi-level electronic structure methods (TD-DFT, CASSCF, NEVPT2), the resource theme enables reproducible data generation and delivers an AI-ready dataset suitable for structure-property analysis, direct quantum chemistry method comparison, and molecular design.

Data Sources

BenchmarkSet1500 logoDataset Icon

BenchmarkSet1500: High-Accuracy Excited-State Reference Benchmark Dataset for Organic Semiconductors

BenchmarkSet1500 is an open-access multireference excited-state database established to provide the first dedicated high-accuracy benchmark set for organic semiconductor research. The repository comprises 1,500 small organic molecules with consistently computed vertical excited-state properties obtained using state-averaged complete active space self-consistent field (SA-CASSCF) and strongly contracted N-electron valence state second-order perturbation theory (SC-NEVPT2), alongside the full reproducible workflow code used to generate the dataset. The dataset focuses on systems where single-reference approaches (e.g. TD-DFT) are known to fail, including molecules exhibiting strong static correlation and inverted singlet-triplet gaps. BenchmarkSet1500 is designed to support rigorous method benchmarking, systematic assessment of theory-level performance, development of predictive models, and screening for technologically relevant organic semiconductors. This dataset is available in two forms: (1) a data collection with one entry per molecule which contains curated metadata, optimised geometries (at B3LYP/6-31g* level of theory), complete electronic-structure output files, and computed excited-state energies and oscillator strengths for low-lying singlet and triplet states and (2) a consolidated machine-learning-ready CSV file which aggregates all molecules with their structural descriptors and excited-state properties to enable immediate integration into data-driven workflows.