PaperSwipe

Performance Benchmarking of Tensor Trains for accelerated Quantum-Inspired Homogenization on TPU, GPU and CPU architectures

Published 2 days agoVersion 1arXiv:2512.07811

Authors

Sascha H. Hauck, Matthias Kabel, Nicolas R. Gauger

Categories

cond-mat.mtrl-sci

Abstract

Recent advances in high-resolution CT-imaging technology are creating a new class of ultra-high resolved micro-structural datasets that challenge the limits of traditional homogenization approaches. While state-of-the-art FFT-based homogenization techniques remain effective for moderate datasets, their memory footprint and computational cost grow rapidly with increasing resolution, making them increasingly inefficient for industrial-scale problems. To address these challenges, the recently developed Superfast-Fourier Transform (SFFT)-based homogenization algorithm leverages the memory-efficient low-rank representations of Tensor Trains (TTs), which reduce the storage and computational requirements of large-scale homogenization problems. Developed for CPU usage, SFFT-based Homogenization efficiently handles high-resolution datasets, assuming the underlying data is well-behaved. In this work, we investigate the performance of fundamental TT operations on modern hardware accelerators using the JAX framework. This benchmarking study, comparing CPUs, GPUs, and TPUs, evaluates execution times and computational efficiency. Building on these insights, we adapt the SFFT-based homogenization algorithm for usage on accelerators, achieving speed-ups of up to 10x relative to the CPU implementation, thus paving the road for the treatment of previously infeasible dataset sizes. Our results show that GPUs and TPUs achieve comparable performance in realistic scenarios, despite the relative immaturity of the TPU ecosystem, demonstrating the potential of both architectures to accelerate quantum-inspired techniques for industrial-scale simulations, particularly for homogenization problems.

Performance Benchmarking of Tensor Trains for accelerated Quantum-Inspired Homogenization on TPU, GPU and CPU architectures

2 days ago
v1
3 authors

Categories

cond-mat.mtrl-sci

Abstract

Recent advances in high-resolution CT-imaging technology are creating a new class of ultra-high resolved micro-structural datasets that challenge the limits of traditional homogenization approaches. While state-of-the-art FFT-based homogenization techniques remain effective for moderate datasets, their memory footprint and computational cost grow rapidly with increasing resolution, making them increasingly inefficient for industrial-scale problems. To address these challenges, the recently developed Superfast-Fourier Transform (SFFT)-based homogenization algorithm leverages the memory-efficient low-rank representations of Tensor Trains (TTs), which reduce the storage and computational requirements of large-scale homogenization problems. Developed for CPU usage, SFFT-based Homogenization efficiently handles high-resolution datasets, assuming the underlying data is well-behaved. In this work, we investigate the performance of fundamental TT operations on modern hardware accelerators using the JAX framework. This benchmarking study, comparing CPUs, GPUs, and TPUs, evaluates execution times and computational efficiency. Building on these insights, we adapt the SFFT-based homogenization algorithm for usage on accelerators, achieving speed-ups of up to 10x relative to the CPU implementation, thus paving the road for the treatment of previously infeasible dataset sizes. Our results show that GPUs and TPUs achieve comparable performance in realistic scenarios, despite the relative immaturity of the TPU ecosystem, demonstrating the potential of both architectures to accelerate quantum-inspired techniques for industrial-scale simulations, particularly for homogenization problems.

Authors

Sascha H. Hauck, Matthias Kabel, Nicolas R. Gauger

arXiv ID: 2512.07811
Published Dec 8, 2025

Click to preview the PDF directly in your browser