Re: [RFC]: port of embedded x86-mini disassembler to QEMU

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC]: port of embedded x86-mini disassembler to QEMU

From:	Michael Clark
Subject:	Re: [RFC]: port of embedded x86-mini disassembler to QEMU
Date:	Sat, 11 Jan 2025 09:09:09 +1300
User-agent:	Mozilla Thunderbird

On 1/11/25 05:05, Paolo Bonzini wrote:

Il ven 10 gen 2025, 14:03 Michael Clark <michael@anarch128.org> ha scritto:

On 1/11/25 00:07, Paolo Bonzini wrote:

Il ven 10 gen 2025, 10:52 Michael Clark <michael@anarch128.org> ha

scritto:

a note to announce a port of the x86-mini disassembler to QEMU.

- https://github.com/michaeljclark/qemu/tree/x86-mini


I assume the huge .h files are autogenerated? If so, QEMU cannot use them
without including the human-readable sources in the tree.


yes indeed. there is an x86_tablegen.py python script in the other repo
but it is not in the current patch. it would be somewhat easy to read
the tables from CSV files directly into arrays at the expense of several
more milliseconds during startup. the revised operand formats maps
relatively strictly to enum definitions with string tables in the source
so a reader in C would not be impossible



Building the tables at compile time is fine, only leaving out the script is
not.

fair enough. I wanted to test the disassembler and I figured out how todo that with both QEMU host and target. I haven't learned how to creategenerative dependencies in meson yet but it can't be as bad as CMake.

QEMU running openssl is a pretty good torture test. I am going to spendtime analyzing the -d in_asm,out_asm logs for openssl. I don't yet havea pseudo alias translation step so NOP still shows as XCHG eax,eax.

and fuzzing x86_64 was extremely interesting as it uncovered somehardware bugs that led to historic findings inside the QEMU translator.so I know that the level of accuracy is somewhat good. for example:


  NOP -> XCHG eax,eax
  REX.B XCHG eax,eax -> XCHG eax,r8d
  PAUSE -> REP NOP -> REP XCHG eax,eax
  REX.B PAUSE -> REP REX.B XCHG eax,eax -> REP XCHG eax,r8d

it seems Intel filters out REX.B for NOP but not REP NOP. and I knowwhat QEMU does. it does what one expects. unused REP is undefined buttypically is ignored for non string instructions with the exception of0F, 0F38, 0F3A where REP/F3 is interpreted as part of the opcode. butIntel has made REP XCHG eax,r8d act like REP NOP. I haven't tested thisout on AMD hardware but I consider it a silicon bug on Intel. there is atest case on this binutils issue. in any case, this is in QEMU history.


- https://sourceware.org/bugzilla/show_bug.cgi?id=32462

-https://www.blackhat.com/docs/us-17/thursday/us-17-Domas-Breaking-The-x86-ISA.pdf

I can see how that might be interesting for x86 virtualization where you

have only one target and therefore you can get rid of the capstone
dependency. At the same time, other virtualization targets like arm64 and
RISC-V are going to become more and more important—not less—and not

having

to maintain a disassembler ourselves as part of QEMU is also a big

plus...

yes indeed. but in an ideal world the encoders and decoders are matched
pairs. I would like to work on a translator or interpreter that uses the
same codec as the disassembler



Ok, that makes sense. QEMU already has a decoder that is very table-based
though the tables are hand written. I am not wed to it though—as long as
the code generators remain more or less unmodified, I would love to only
keep "these is how the operands are prepared for use in the IR emitters"
and make the details of x86 decoding Someone Else's Problem. So if you can
kill most (certainly not all) of the tables in
target/i386/tcg/decode-new.c.inc that would be interesting.

(I am sure you'd find some underspecified and/or wrong parts of the x86
spec, too :) For example many VEX classes are bollocks, plus some more
examples hinted at at the top of that file).

yes indeed. the metadata in the Intel SDM is littered with mistakes suchas field transpositions, typos and missing data. I would hazard a guessthat maybe ~71% of the metadata is usable in a machine readable manner.given that LLVM tablegen has its own format, I consider x86-mini thesource of truth for metadata derived from the Intel format. although Ihaven't fuzz tested again NASM yet, but I found a small number of errorsin LLVM. albeit mostly in instructions that are not used in anger.


Michael.

Paolo

anyway, in fact it is just yet another disassembler at this point, but

the codec emitter works. it doesn't yet have an arch-neutral TCG-like
API and IR to drive it yet.

[Prev in Thread]

Current Thread

[Next in Thread]

[RFC]: port of embedded x86-mini disassembler to QEMU, Michael Clark, 2025/01/10
- Re: [RFC]: port of embedded x86-mini disassembler to QEMU, Paolo Bonzini, 2025/01/10
  - Re: [RFC]: port of embedded x86-mini disassembler to QEMU, Michael Clark, 2025/01/10
    - Re: [RFC]: port of embedded x86-mini disassembler to QEMU, Paolo Bonzini, 2025/01/10
    - Re: [RFC]: port of embedded x86-mini disassembler to QEMU, Michael Clark <=
    - Re: [RFC]: port of embedded x86-mini disassembler to QEMU, Michael Clark, 2025/01/11

Prev by Date: [PULL 36/38] cpu: Remove nr_cores from struct CPUState
Next by Date: [PATCH v4 0/3] Enable clang build on Windows
Previous by thread: Re: [RFC]: port of embedded x86-mini disassembler to QEMU
Next by thread: Re: [RFC]: port of embedded x86-mini disassembler to QEMU
Index(es):
- Date
- Thread