The itwinAI framework represents a comprehensive solution developed by CERN and the Julich Supercomputing Center (JSC) to facilitate the development, training, and maintenance of AI-based methods for scientific applications. It serves as a core module within the interTwin project, aimed at co-designing and implementing an interdisciplinary Digital Twin Engine. itwinAI streamlines the entire AI lifecycle, offering user-friendly core functionalities such as distributed training, hyperparameter optimization, and model registry.
Distributed Training: itwinAI simplifies the process of distributing existing code across multiple GPUs and nodes, automating the training workflow. It leverages industry-standard backends, including PyTorch Distributed Data Parallel (DDP), TensorFlow distributed strategies, and Horovod.
Hyperparameter Optimization: Enhancing model accuracy is made more efficient with itwinAI's hyperparameter optimization functionality. Researchers can intelligently explore hyperparameter spaces, eliminating the need for manual parameter tuning. This functionality is empowered by RayTune.
Model Registry: itwinAI offers a robust model registry for logging and storing models and associated performance metrics, enabling comprehensive analysis. The backend leverages MLFlow for seamless model management.
itwinAI has undergone successful deployment and testing on JSC's HDFML cluster and has been integrated with interLink, a framework within interTwin designed to seamlessly offload compute-intensive tasks from cloud to high-performance computing (HPC) resources.
The versatility of itwinAI is evident in its application across various scientific domains, including its contributions to Detector Simulation in High-Energy Physics and Fire Risk Modeling in climate research. This framework stands as a valuable resource for researchers, data scien