Performance Benchmarking of Tensor Trains for accelerated Quantum-Inspired Homogenization on TPU, GPU and CPU architectures
Authors
Sascha H. Hauck, Matthias Kabel, Nicolas R. Gauger
Categories
Abstract
Recent advances in high-resolution CT-imaging technology are creating a new class of ultra-high resolved micro-structural datasets that challenge the limits of traditional homogenization approaches. While state-of-the-art FFT-based homogenization techniques remain effective for moderate datasets, their memory footprint and computational cost grow rapidly with increasing resolution, making them increasingly inefficient for industrial-scale problems. To address these challenges, the recently developed Superfast-Fourier Transform (SFFT)-based homogenization algorithm leverages the memory-efficient low-rank representations of Tensor Trains (TTs), which reduce the storage and computational requirements of large-scale homogenization problems. Developed for CPU usage, SFFT-based Homogenization efficiently handles high-resolution datasets, assuming the underlying data is well-behaved. In this work, we investigate the performance of fundamental TT operations on modern hardware accelerators using the JAX framework. This benchmarking study, comparing CPUs, GPUs, and TPUs, evaluates execution times and computational efficiency. Building on these insights, we adapt the SFFT-based homogenization algorithm for usage on accelerators, achieving speed-ups of up to 10x relative to the CPU implementation, thus paving the road for the treatment of previously infeasible dataset sizes. Our results show that GPUs and TPUs achieve comparable performance in realistic scenarios, despite the relative immaturity of the TPU ecosystem, demonstrating the potential of both architectures to accelerate quantum-inspired techniques for industrial-scale simulations, particularly for homogenization problems.
Performance Benchmarking of Tensor Trains for accelerated Quantum-Inspired Homogenization on TPU, GPU and CPU architectures
Categories
Abstract
Recent advances in high-resolution CT-imaging technology are creating a new class of ultra-high resolved micro-structural datasets that challenge the limits of traditional homogenization approaches. While state-of-the-art FFT-based homogenization techniques remain effective for moderate datasets, their memory footprint and computational cost grow rapidly with increasing resolution, making them increasingly inefficient for industrial-scale problems. To address these challenges, the recently developed Superfast-Fourier Transform (SFFT)-based homogenization algorithm leverages the memory-efficient low-rank representations of Tensor Trains (TTs), which reduce the storage and computational requirements of large-scale homogenization problems. Developed for CPU usage, SFFT-based Homogenization efficiently handles high-resolution datasets, assuming the underlying data is well-behaved. In this work, we investigate the performance of fundamental TT operations on modern hardware accelerators using the JAX framework. This benchmarking study, comparing CPUs, GPUs, and TPUs, evaluates execution times and computational efficiency. Building on these insights, we adapt the SFFT-based homogenization algorithm for usage on accelerators, achieving speed-ups of up to 10x relative to the CPU implementation, thus paving the road for the treatment of previously infeasible dataset sizes. Our results show that GPUs and TPUs achieve comparable performance in realistic scenarios, despite the relative immaturity of the TPU ecosystem, demonstrating the potential of both architectures to accelerate quantum-inspired techniques for industrial-scale simulations, particularly for homogenization problems.
Authors
Sascha H. Hauck, Matthias Kabel, Nicolas R. Gauger
Click to preview the PDF directly in your browser