PaperSwipe

TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models

Published 1 week agoVersion 1arXiv:2512.05943

Authors

Shima Imani, Seungwhan Moon, Lambert Mathias, Lu Zhang, Babak Damavandi

Categories

cs.AI

Abstract

Reliable mathematical and scientific reasoning remains an open challenge for large vision-language models. Standard final-answer evaluation often masks reasoning errors, allowing silent failures to persist. To address this gap, we introduce TRACE, a framework for Transparent Reasoning And Consistency Evaluation that diagnoses reasoning trajectories rather than only end results. At its core, TRACE leverages Auxiliary Reasoning Sets, compact sub question answer pairs that decompose complex problems, evaluate intermediate steps through consistency-based metrics, and expose failures overlooked by standard evaluation. Our experiments show that consistency across ARS correlates with final-answer correctness and helps pinpoint the reasoning steps where failures arise, offering actionable signals for model improvement. Furthermore, TRACE defines confidence regions that distinguish reliable from unreliable reasoning paths, supporting effective filtering, debugging, and model refinement.

TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models

1 week ago
v1
5 authors

Categories

cs.AI

Abstract

Reliable mathematical and scientific reasoning remains an open challenge for large vision-language models. Standard final-answer evaluation often masks reasoning errors, allowing silent failures to persist. To address this gap, we introduce TRACE, a framework for Transparent Reasoning And Consistency Evaluation that diagnoses reasoning trajectories rather than only end results. At its core, TRACE leverages Auxiliary Reasoning Sets, compact sub question answer pairs that decompose complex problems, evaluate intermediate steps through consistency-based metrics, and expose failures overlooked by standard evaluation. Our experiments show that consistency across ARS correlates with final-answer correctness and helps pinpoint the reasoning steps where failures arise, offering actionable signals for model improvement. Furthermore, TRACE defines confidence regions that distinguish reliable from unreliable reasoning paths, supporting effective filtering, debugging, and model refinement.

Authors

Shima Imani, Seungwhan Moon, Lambert Mathias et al. (+2 more)

arXiv ID: 2512.05943
Published Dec 5, 2025

Click to preview the PDF directly in your browser