Boosting Cross-Architectural Emulation Performance by Foregoing the Intermediate Representation Model
Authors
Amy Iris Parker
Categories
Abstract
As more applications utilize virtualization and emulation to run mission-critical tasks, the performance requirements of emulated and virtualized platforms continue to rise. Hardware virtualization is not universally available for all systems, and is incapable of emulating CPU architectures, requiring software emulation to be used. QEMU, the premier cross-architecture emulator for Linux and some BSD systems, currently uses dynamic binary translation (DBT) through intermediate representations using its Tiny Code Generator (TCG) model. While using intermediate representations of translated code allows QEMU to quickly add new host and guest architectures, it creates additional steps in the emulation pipeline which decrease performance. We construct a proof of concept emulator to demonstrate the slowdown caused by the usage of intermediate representations in TCG; this emulator performed up to 35x faster than QEMU with TCG, indicating substantial room for improvement in QEMU's design. We propose an expansion of QEMU's two-tier engine system (Linux KVM versus TCG) to include a middle tier using direct binary translation for commonly paired architectures such as RISC-V, x86, and ARM. This approach provides a slidable trade-off between development effort and performance depending on the needs of end users.
Boosting Cross-Architectural Emulation Performance by Foregoing the Intermediate Representation Model
Categories
Abstract
As more applications utilize virtualization and emulation to run mission-critical tasks, the performance requirements of emulated and virtualized platforms continue to rise. Hardware virtualization is not universally available for all systems, and is incapable of emulating CPU architectures, requiring software emulation to be used. QEMU, the premier cross-architecture emulator for Linux and some BSD systems, currently uses dynamic binary translation (DBT) through intermediate representations using its Tiny Code Generator (TCG) model. While using intermediate representations of translated code allows QEMU to quickly add new host and guest architectures, it creates additional steps in the emulation pipeline which decrease performance. We construct a proof of concept emulator to demonstrate the slowdown caused by the usage of intermediate representations in TCG; this emulator performed up to 35x faster than QEMU with TCG, indicating substantial room for improvement in QEMU's design. We propose an expansion of QEMU's two-tier engine system (Linux KVM versus TCG) to include a middle tier using direct binary translation for commonly paired architectures such as RISC-V, x86, and ARM. This approach provides a slidable trade-off between development effort and performance depending on the needs of end users.
Authors
Amy Iris Parker
Click to preview the PDF directly in your browser