qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC]: port of embedded x86-mini disassembler to QEMU


From: Michael Clark
Subject: [RFC]: port of embedded x86-mini disassembler to QEMU
Date: Fri, 10 Jan 2025 23:08:21 +1300

a note to announce a port of the x86-mini disassembler to QEMU.

- https://github.com/michaeljclark/qemu/tree/x86-mini
- https://github.com/michaeljclark/x86/tree/x86-mini

# x86-mini

the x86-mini library is a lightweight x86 encoder, decoder, and
disassembler that uses extensions to the Intel instruction set
metadata format to encode modern VEX/EVEX instructions and legacy
instructions using a parameterized LEX (legacy extension) format.

- metadata-driven disassembler with Intel format output.
- written in C11 for compatibility with projects written in C.
- low-level instruction encoder and decoder uses <= 32-bytes.
- python tablegen program to generate C tables from CSV metadata.
- metadata table tool to inspect operand encode and decode tables.
- carefully checked machine-readable instruction set metadata.
- support for REX/VEX/EVEX and preliminary support for REX2.

the x86-mini x86 encoder and decoder library has been written from
scratch to be modern and as simple as possible while also covering
recent additions to the Intel and AMD 64-bit instruction sets such
as the EVEX encodings for recent AVX-512 extensions and soon REX2/
EVEX encodings for Intel APX, as it is written with that in mind.

## interest to the QEMU community

- x86-mini is fast. raw decode performance is ~100-200MiB/sec.
- x86-mini is small. 5 files, ~5 KLOC or ~13 KLOC including tables.
- x86-mini is complete and includes the latest AVX-512 extensions.
- x86-mini is easy to extend and uses extended Intel format metadata.
- x86-mini is documented with detailed info on the metadata format.
- x86-mini has CLI tools for searching x86 instruction set metadata.

## techinical notes

- the decoder is table-based and uses a metadata interpreter.
- the decode table is ~66KiB with a ~150KiB acceleration trie.
- there are currently 3658 opcode entries active on x86-64
  which expands to 4775 table entries due to parameterization.
- it could be made faster by vectorizing the prefix decoder and
  generating decode templates from the metadata to consteval
  metadata interpretation to eliminate some L1 D$ traffic.

after cherry-picking the commit, one can test host and target
disassembly support. e.g. for an x86-64 target on an x86-64 host:

$ echo aaa | qemu-x86_64 -d in_asm,out_asm /usr/bin/openssl sha256

## caveats and limitations

- supports 32-bit and 64-bit disassembly, and theoretically 16-bit.
- designed to support 16-bit but base index formats are not done yet.
- x86-64 is exhaustively fuzz-tested against the LLVM disassembler.
- but x86-mini is new and hasn't been battle-tested in production.

if you already link with capstone then it doesn't provide very many
immediate benefits, however, I think it is potentially useful as a
small embeddable disassembler to evaluate for potential inclusion.

## rationale

I worked on the QEMU disassembler while working on the QEMU RISC-V
target back in 2017/2018 and I was curious about vector support.
it seemed at the time that TCG vector support was piecemeal, plus
the old x86 disassembler seemed messy and incomplete. I also needed
an MIT-licensed disassembler to enable use in a commercial product.
basically, I was looking for a lightweight symmetric x86 instruction
encoder and decoder library in pure C with simple build requirements.
that is what prompted this initiative.

it would be nice to have an x86 disassembler building out-of-the-box
as I find QEMU's built-in tracing extremely useful and given x86 is
a popular target, a small embedded disassembler might be practical.

## summary and conclusion

at minimum, the metedata may be useful for x86 EVEX support. note
I see `tests/tcg/i386/x86.csv` in the source tree. the metadata is
also based on x86-csv but has had numerous inaccuracies fixed as
well as conversion of legacy instructions to the new LEX format.
in effect the metadata has been fuzz-tested against LLVM for x86-64
and ISA coverage is in the order of ~99.7%. the main branch of the
linked repo has a procedural fuzzer for metadata-based instruction
synthesis that could be useful for generating test cases for QEMU.

I am kind of throwing this over the fence, although the code is quite
self-contained and my stress and mental health is now under control.
also I have not yet run checkpatch.pl on this code. it is a preview.

x86-mini submaintainer.
Michael Clark.
--



reply via email to

[Prev in Thread] Current Thread [Next in Thread]