qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC]: port of embedded x86-mini disassembler to QEMU


From: Michael Clark
Subject: Re: [RFC]: port of embedded x86-mini disassembler to QEMU
Date: Sat, 11 Jan 2025 18:10:52 +1300
User-agent: Mozilla Thunderbird

On 1/11/25 05:05, Paolo Bonzini wrote:
Il ven 10 gen 2025, 14:03 Michael Clark<michael@anarch128.org> ha scritto:

On 1/11/25 00:07, Paolo Bonzini wrote:
Il ven 10 gen 2025, 10:52 Michael Clark<michael@anarch128.org> ha
scritto:
a note to announce a port of the x86-mini disassembler to QEMU.

-https://github.com/michaeljclark/qemu/tree/x86-mini
I assume the huge .h files are autogenerated? If so, QEMU cannot use them
without including the human-readable sources in the tree.
yes indeed. there is an x86_tablegen.py python script in the other repo
but it is not in the current patch. it would be somewhat easy to read
the tables from CSV files directly into arrays at the expense of several
more milliseconds during startup. the revised operand formats maps
relatively strictly to enum definitions with string tables in the source
so a reader in C would not be impossible

Building the tables at compile time is fine, only leaving out the script is
not.


okay. now it's even smaller. I don't mind if folks kick the tyres on the
code but I'm not asking for it to be merged. I'd want it to be in better
shape if folks decide they like it. and I don't want to expose anyone
including myself to metadata format changes. so maybe 6-18 months time.

$ git show | diffstat
 b/disas/meson.build       |   93
 b/disas/x86-core.c        |    5
 b/disas/x86.h             |    2
 b/scripts/x86-tablegen.py |  540 +++
 disas/x86-enums.inc       | 1883 -----------
 disas/x86-tables.inc      | 6218 --------------------------------------
 6 files changed, 636 insertions(+), 8105 deletions(-)

$ wc -l disas/x86*[ch] scripts/x86-tablegen.py
  2846 disas/x86-core.c
    92 disas/x86-disas.c
  1758 disas/x86.h
   540 scripts/x86-tablegen.py
  5236 total

$ wc -l disas/x86-data/*.csv | tail -5
     4 disas/x86-data/x86_waitpkg.csv
     2 disas/x86-data/x86_wbnoinvd.csv
   145 disas/x86-data/x86_x87.csv
     3 disas/x86-data/x86_xsaveopt.csv
  3731 total

x86_fuzzer.c in the other repo might be interesting. ~900 LOC showing
instruction set metadata reflection for generative fuzzing. it has a
tiny combinatorial expansion algorithm with constraints for conditional
domain reduction to prune the graph. like I vary imm when we have RAX
as the base register and various other pruning heuristics such as riz.
it needs some random synthesis but I will get to that at some point.

https://github.com/michaeljclark/x86/blob/trunk/tests/x86_fuzzer.c

it exposes some of the internals of the codec. the tables exclusively
have a maximum of two opcode bytes with one conditionally being ModRM.
the codec converts REX.W/VEX/EVEX/prefix/map into trie page selectors.
it also expands out ModRM masks if multiple instructions straddle the
same opcode byte with different mod values. like this curiosity where
the same opcode is used for two instructions, one as reg, one as mem.

  f3 0f c7 f7          senduipi    edi
  f3 0f c7 77 01       vmxon       qword ptr [rdi + 1]

so I spent time on instrumentation (output is snipped due to width).

$ ./build/x86_opcodes -o -g | egrep '(senduipi|vmxon|cmpxchg8b)'
| cmpxchg8b m64     | c7 08 | ff f8 | .lex.0f.w0 c7 /1 .lock   |
| cmpxchg8b m64     | c7 48 | ff f8 | .lex.0f.w0 c7 /1 .lock   |
| cmpxchg8b m64     | c7 88 | ff f8 | .lex.0f.w0 c7 /1 .lock   |
| vmxon m64         | c7 30 | ff f8 | .lex.f3.0f.w0 c7 /6      |
| vmxon m64         | c7 70 | ff f8 | .lex.f3.0f.w0 c7 /6      |
| vmxon m64         | c7 b0 | ff f8 | .lex.f3.0f.w0 c7 /6      |
| senduipi rw       | c7 f0 | ff f8 | .lex.f3.0f.w0 c7 /6      |

this shows you the LEX format which has new suffixes like 'wx' and 'ww'
for default 32 or default 64 operand sizes. looks a lot like VEX/EVEX.
and all legacy instructions have been converted to use this form so it
is much easier to reason about decoding them. in any case, folks might
be curious and decide they want to experiment with it.

$ echo aaa | qemu-x86_64 -d in_asm,out_asm /usr/bin/openssl sha256

I see SSE2/AVX instructions in the output and it doesn't crash. :D
it's quite eye-opening to see the codegen. it would be nice to get
qemu-system to use the MMU by running the translator with the HV on.

btw Intel XED is pretty neat but its build system is totally alien.

Michael.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]