AI Ready Datasets
AI ready datasets give physical sciences researchers data they can trust and reuse. By standardising formats, capturing essential metadata, and ensuring experimental and simulation outputs are findable and interoperable, they remove the hidden labour of data cleaning and preparation. This lets researchers focus on analysis and discovery, and it enables AI tools to work reliably across instruments, facilities, and disciplines. This resource theme brings together datasets designed to support AI and machine learning workflows, ranging from general purpose collections that researchers can shape for their own models, to task-specific datasets that already include annotations and predefined training, validation, and test splits. Every record includes one or more datafiles accompanied by a Croissant format metadata description (https://mlcommons.org/working-groups/data/croissant/), ensuring that structure, provenance, and context are captured in a machine readable way.

