I assume the huge .h files are autogenerated? If so, QEMU cannot use them without including the human-readable sources in the tree.
I can see how that might be interesting for x86 virtualization where you have only one target and therefore you can get rid of the capstone dependency. At the same time, other virtualization targets like arm64 and RISC-V are going to become more and more important—not less—and not having to maintain a disassembler ourselves as part of QEMU is also a big plus...
Paolo
- https://github.com/michaeljclark/x86/tree/x86-mini
# x86-mini
the x86-mini library is a lightweight x86 encoder, decoder, and
disassembler that uses extensions to the Intel instruction set
metadata format to encode modern VEX/EVEX instructions and legacy
instructions using a parameterized LEX (legacy extension) format.
- metadata-driven disassembler with Intel format output.
- written in C11 for compatibility with projects written in C.
- low-level instruction encoder and decoder uses <= 32-bytes.
- python tablegen program to generate C tables from CSV metadata.
- metadata table tool to inspect operand encode and decode tables.
- carefully checked machine-readable instruction set metadata.
- support for REX/VEX/EVEX and preliminary support for REX2.
the x86-mini x86 encoder and decoder library has been written from
scratch to be modern and as simple as possible while also covering
recent additions to the Intel and AMD 64-bit instruction sets such
as the EVEX encodings for recent AVX-512 extensions and soon REX2/
EVEX encodings for Intel APX, as it is written with that in mind.
## interest to the QEMU community
- x86-mini is fast. raw decode performance is ~100-200MiB/sec.
- x86-mini is small. 5 files, ~5 KLOC or ~13 KLOC including tables.
- x86-mini is complete and includes the latest AVX-512 extensions.
- x86-mini is easy to extend and uses extended Intel format metadata.
- x86-mini is documented with detailed info on the metadata format.
- x86-mini has CLI tools for searching x86 instruction set metadata.
## techinical notes
- the decoder is table-based and uses a metadata interpreter.
- the decode table is ~66KiB with a ~150KiB acceleration trie.
- there are currently 3658 opcode entries active on x86-64
which expands to 4775 table entries due to parameterization.
- it could be made faster by vectorizing the prefix decoder and
generating decode templates from the metadata to consteval
metadata interpretation to eliminate some L1 D$ traffic.
after cherry-picking the commit, one can test host and target
disassembly support. e.g. for an x86-64 target on an x86-64 host:
$ echo aaa | qemu-x86_64 -d in_asm,out_asm /usr/bin/openssl sha256
## caveats and limitations
- supports 32-bit and 64-bit disassembly, and theoretically 16-bit.
- designed to support 16-bit but base index formats are not done yet.
- x86-64 is exhaustively fuzz-tested against the LLVM disassembler.
- but x86-mini is new and hasn't been battle-tested in production.
if you already link with capstone then it doesn't provide very many
immediate benefits, however, I think it is potentially useful as a
small embeddable disassembler to evaluate for potential inclusion.
## rationale
I worked on the QEMU disassembler while working on the QEMU RISC-V
target back in 2017/2018 and I was curious about vector support.
it seemed at the time that TCG vector support was piecemeal, plus
the old x86 disassembler seemed messy and incomplete. I also needed
an MIT-licensed disassembler to enable use in a commercial product.
basically, I was looking for a lightweight symmetric x86 instruction
encoder and decoder library in pure C with simple build requirements.
that is what prompted this initiative.
it would be nice to have an x86 disassembler building out-of-the-box
as I find QEMU's built-in tracing extremely useful and given x86 is
a popular target, a small embedded disassembler might be practical.
## summary and conclusion
at minimum, the metedata may be useful for x86 EVEX support. note
I see `tests/tcg/i386/x86.csv` in the source tree. the metadata is
also based on x86-csv but has had numerous inaccuracies fixed as
well as conversion of legacy instructions to the new LEX format.
in effect the metadata has been fuzz-tested against LLVM for x86-64
and ISA coverage is in the order of ~99.7%. the main branch of the
linked repo has a procedural fuzzer for metadata-based instruction
synthesis that could be useful for generating test cases for QEMU.
I am kind of throwing this over the fence, although the code is quite
self-contained and my stress and mental health is now under control.
also I have not yet run checkpatch.pl on this code. it is a preview.
x86-mini submaintainer.
Michael Clark.
--