PaperSwipe

Model-Free Assessment of Simulator Fidelity via Quantile Curves

Published 1 day agoVersion 1arXiv:2512.05024

Authors

Garud Iyengar, Yu-Shiou Willy Lin, Kaizheng Wang

Categories

stat.MEcs.AIcs.LG

Abstract

Simulation of complex systems originated in manufacturing and queuing applications. It is now widely used for large-scale, ML-based systems in research, education, and consumer surveys. However, characterizing the discrepancy between simulators and ground truth remains challenging for increasingly complex, machine-learning-based systems. We propose a computationally tractable method to estimate the quantile function of the discrepancy between the simulated and ground-truth outcome distributions. Our approach focuses on output uncertainty and treats the simulator as a black box, imposing no modeling assumptions on its internals, and hence applies broadly across many parameter families, from Bernoulli and multinomial models to continuous, vector-valued settings. The resulting quantile curve supports confidence interval construction for unseen scenarios, risk-aware summaries of sim-to-real discrepancy (e.g., VaR/CVaR), and comparison of simulators' performance. We demonstrate our methodology in an application assessing LLM simulation fidelity on the WorldValueBench dataset spanning four LLMs.

Model-Free Assessment of Simulator Fidelity via Quantile Curves

1 day ago
v1
3 authors

Categories

stat.MEcs.AIcs.LG

Abstract

Simulation of complex systems originated in manufacturing and queuing applications. It is now widely used for large-scale, ML-based systems in research, education, and consumer surveys. However, characterizing the discrepancy between simulators and ground truth remains challenging for increasingly complex, machine-learning-based systems. We propose a computationally tractable method to estimate the quantile function of the discrepancy between the simulated and ground-truth outcome distributions. Our approach focuses on output uncertainty and treats the simulator as a black box, imposing no modeling assumptions on its internals, and hence applies broadly across many parameter families, from Bernoulli and multinomial models to continuous, vector-valued settings. The resulting quantile curve supports confidence interval construction for unseen scenarios, risk-aware summaries of sim-to-real discrepancy (e.g., VaR/CVaR), and comparison of simulators' performance. We demonstrate our methodology in an application assessing LLM simulation fidelity on the WorldValueBench dataset spanning four LLMs.

Authors

Garud Iyengar, Yu-Shiou Willy Lin, Kaizheng Wang

arXiv ID: 2512.05024
Published Dec 4, 2025

Click to preview the PDF directly in your browser