[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC]: port of embedded x86-mini disassembler to QEMU
From: |
Michael Clark |
Subject: |
Re: [RFC]: port of embedded x86-mini disassembler to QEMU |
Date: |
Sat, 11 Jan 2025 18:10:52 +1300 |
User-agent: |
Mozilla Thunderbird |
On 1/11/25 05:05, Paolo Bonzini wrote:
Il ven 10 gen 2025, 14:03 Michael Clark<michael@anarch128.org> ha scritto:
On 1/11/25 00:07, Paolo Bonzini wrote:
Il ven 10 gen 2025, 10:52 Michael Clark<michael@anarch128.org> ha
scritto:
a note to announce a port of the x86-mini disassembler to QEMU.
-https://github.com/michaeljclark/qemu/tree/x86-mini
I assume the huge .h files are autogenerated? If so, QEMU cannot use them
without including the human-readable sources in the tree.
yes indeed. there is an x86_tablegen.py python script in the other repo
but it is not in the current patch. it would be somewhat easy to read
the tables from CSV files directly into arrays at the expense of several
more milliseconds during startup. the revised operand formats maps
relatively strictly to enum definitions with string tables in the source
so a reader in C would not be impossible
Building the tables at compile time is fine, only leaving out the script is
not.
okay. now it's even smaller. I don't mind if folks kick the tyres on the
code but I'm not asking for it to be merged. I'd want it to be in better
shape if folks decide they like it. and I don't want to expose anyone
including myself to metadata format changes. so maybe 6-18 months time.
$ git show | diffstat
b/disas/meson.build | 93
b/disas/x86-core.c | 5
b/disas/x86.h | 2
b/scripts/x86-tablegen.py | 540 +++
disas/x86-enums.inc | 1883 -----------
disas/x86-tables.inc | 6218 --------------------------------------
6 files changed, 636 insertions(+), 8105 deletions(-)
$ wc -l disas/x86*[ch] scripts/x86-tablegen.py
2846 disas/x86-core.c
92 disas/x86-disas.c
1758 disas/x86.h
540 scripts/x86-tablegen.py
5236 total
$ wc -l disas/x86-data/*.csv | tail -5
4 disas/x86-data/x86_waitpkg.csv
2 disas/x86-data/x86_wbnoinvd.csv
145 disas/x86-data/x86_x87.csv
3 disas/x86-data/x86_xsaveopt.csv
3731 total
x86_fuzzer.c in the other repo might be interesting. ~900 LOC showing
instruction set metadata reflection for generative fuzzing. it has a
tiny combinatorial expansion algorithm with constraints for conditional
domain reduction to prune the graph. like I vary imm when we have RAX
as the base register and various other pruning heuristics such as riz.
it needs some random synthesis but I will get to that at some point.
https://github.com/michaeljclark/x86/blob/trunk/tests/x86_fuzzer.c
it exposes some of the internals of the codec. the tables exclusively
have a maximum of two opcode bytes with one conditionally being ModRM.
the codec converts REX.W/VEX/EVEX/prefix/map into trie page selectors.
it also expands out ModRM masks if multiple instructions straddle the
same opcode byte with different mod values. like this curiosity where
the same opcode is used for two instructions, one as reg, one as mem.
f3 0f c7 f7 senduipi edi
f3 0f c7 77 01 vmxon qword ptr [rdi + 1]
so I spent time on instrumentation (output is snipped due to width).
$ ./build/x86_opcodes -o -g | egrep '(senduipi|vmxon|cmpxchg8b)'
| cmpxchg8b m64 | c7 08 | ff f8 | .lex.0f.w0 c7 /1 .lock |
| cmpxchg8b m64 | c7 48 | ff f8 | .lex.0f.w0 c7 /1 .lock |
| cmpxchg8b m64 | c7 88 | ff f8 | .lex.0f.w0 c7 /1 .lock |
| vmxon m64 | c7 30 | ff f8 | .lex.f3.0f.w0 c7 /6 |
| vmxon m64 | c7 70 | ff f8 | .lex.f3.0f.w0 c7 /6 |
| vmxon m64 | c7 b0 | ff f8 | .lex.f3.0f.w0 c7 /6 |
| senduipi rw | c7 f0 | ff f8 | .lex.f3.0f.w0 c7 /6 |
this shows you the LEX format which has new suffixes like 'wx' and 'ww'
for default 32 or default 64 operand sizes. looks a lot like VEX/EVEX.
and all legacy instructions have been converted to use this form so it
is much easier to reason about decoding them. in any case, folks might
be curious and decide they want to experiment with it.
$ echo aaa | qemu-x86_64 -d in_asm,out_asm /usr/bin/openssl sha256
I see SSE2/AVX instructions in the output and it doesn't crash. :D
it's quite eye-opening to see the codegen. it would be nice to get
qemu-system to use the MMU by running the translator with the HV on.
btw Intel XED is pretty neat but its build system is totally alien.
Michael.