[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Question on implementation detail of `temp_sync`
From: |
lrwei |
Subject: |
Re: Question on implementation detail of `temp_sync` |
Date: |
Thu, 6 Aug 2020 23:53:18 +0800 |
> lrwei <[hidden email]> writes:
>
> > Sorry for the unintentional sending of an uncompleted message.
>
> Questions about the internals of the TCG are very much in the remit of
> qemu-devel so are likely to get missed on qemu-discuss which is more
> aimed at user questions.
Thanks for your reply! Next time I will send my questions there.
>
> <re-pasted to fix html noise>
>
> > I understands that the current code works, but gets confused on why
> > `ts` needs to be loaded in to a register when `free_or_dead` is not
> > set.
>
> It isn't, the break leaves the switch statement once it stores the
> constant to memory.
It seems to me that if `free_or_dead` was zero, further `tcg_out_sti`
and `break` in the if-block won't be executed at all. And that is
exactly what confused me.
>
> > For example in the following scenario:
> > movi_i32 r0, 0x1
> > add_i32 r1, r1, r0
> > ...
> > (where r0 is not used any more, and both r0 and r1 are globals)
>
> > If I am not mistaken, the code gen procedure of the first IR will call
> > `temp_sync` with `free_or_dead` not set, which load the constant in to
> > a register and store it back to memory. At this time, `r0` will be
> > `TEMP_VAL_REG` instead of `TEMP_VAL_CONST`, so the following IR can't
> > embed this constant operand in the assembly instruction it produces.
> > Also, this results in a seemingly useless register allocation (, why
> > don't the further use of r0 use the constant directly?)
>
> Is this what you are actually seeing generated? If you run with -d
> in_asm,op,op_opt,out_asm it should be clear what actually happened.
The above example is originally made up to illustrate my question, but
I modified the frontend to directly emit the above IR (meaninglessly),
and here is the result:
---- Pasting output of Qemu ----
QEMU 5.0.93 monitor - type 'help' for more information
(qemu) OP:
ld_i32 tmp0,env,$0xfffffffffffffff0
movi_i32 tmp1,$0x0
brcond_i32 tmp0,tmp1,lt,$L0
---- 00000000
movi_i32 r0,$0x1
add_i32 r1,r1,r0
movi_i32 pc,$0x4
call hlt,$0x8,$0,env
exit_tb $0x0
set_label $L0
exit_tb $0x7f8510000043
OP after optimization and liveness analysis:
ld_i32 tmp0,env,$0xfffffffffffffff0 dead: 1 pref=0xffff
movi_i32 tmp1,$0x0 pref=0xffff
brcond_i32 tmp0,tmp1,lt,$L0 dead: 0 1
---- 00000000
movi_i32 r0,$0x1 sync: 0 pref=0xffff
add_i32 r1,r1,r0 sync: 0 dead: 0 1 2 pref=0xffff
movi_i32 pc,$0x4 sync: 0 dead: 0 pref=0xffff
call hlt,$0x8,$0,env dead: 0
set_label $L0
exit_tb $0x7f8510000043
OUT: [size=72]
0x7f8510000100: mov -0x10(%rbp),%ebx [tb header & initial
instruction]
0x7f8510000103: test %ebx,%ebx
0x7f8510000105: jl 0x7f8510000130
0x7f851000010b: mov $0x1,%ebx
0x7f8510000110: mov %ebx,0x0(%rbp)
0x7f8510000113: mov 0x4(%rbp),%r12d
0x7f8510000117: add %r12d,%ebx
0x7f851000011a: mov %ebx,0x4(%rbp)
0x7f851000011d: movl $0x4,0x80(%rbp)
0x7f8510000127: mov %rbp,%rdi
0x7f851000012a: callq *0x10(%rip) # 0x7f8510000140
0x7f8510000130: lea -0xf4(%rip),%rax # 0x7f8510000043
0x7f8510000137: jmpq 0x7f8510000018
data: [size=8]
0x7f8510000140: .quad 0x0000556e15d000d0
---- End of Qemu output ----
The constant is first loaded into `ebx` and then synced back to
its memory location. And further addition is of register format.
And I tried it again after remove the `free_or_dead` check in
`temp_sync`, it gives:
---- Pasting output of Qemu ----
QEMU 5.0.93 monitor - type 'help' for more information
(qemu) OP:
ld_i32 tmp0,env,$0xfffffffffffffff0
movi_i32 tmp1,$0x0
brcond_i32 tmp0,tmp1,lt,$L0
---- 00000000
movi_i32 r0,$0x1
add_i32 r1,r1,r0
movi_i32 pc,$0x4
call hlt,$0x8,$0,env
exit_tb $0x0
set_label $L0
exit_tb $0x7f9db0000043
OP after optimization and liveness analysis:
ld_i32 tmp0,env,$0xfffffffffffffff0 dead: 1 pref=0xffff
movi_i32 tmp1,$0x0 pref=0xffff
brcond_i32 tmp0,tmp1,lt,$L0 dead: 0 1
---- 00000000
movi_i32 r0,$0x1 sync: 0 pref=0xffff
add_i32 r1,r1,r0 sync: 0 dead: 0 1 2 pref=0xffff
movi_i32 pc,$0x4 sync: 0 dead: 0 pref=0xffff
call hlt,$0x8,$0,env dead: 0
set_label $L0
exit_tb $0x7f9db0000043
OUT: [size=72]
0x7f9db0000100: mov -0x10(%rbp),%ebx [tb header & initial
instruction]
0x7f9db0000103: test %ebx,%ebx
0x7f9db0000105: jl 0x7f9db000012d
0x7f9db000010b: movl $0x1,0x0(%rbp)
0x7f9db0000112: mov 0x4(%rbp),%ebx
0x7f9db0000115: inc %ebx
0x7f9db0000117: mov %ebx,0x4(%rbp)
0x7f9db000011a: movl $0x4,0x80(%rbp)
0x7f9db0000124: mov %rbp,%rdi
0x7f9db0000127: callq *0x13(%rip) # 0x7f9db0000140
0x7f9db000012d: lea -0xf1(%rip),%rax # 0x7f9db0000043
0x7f9db0000134: jmpq 0x7f9db0000018
data: [size=8]
0x7f9db0000140: .quad 0x0000564aed2120d0
---- End of Qemu output ----
This syncs the constant directly back to memory and use `inc`
instead of register add for `add_i32`, which is kind of the intuitive
thing to do. But I'm not sure whether it will break something else.
Given the above result, I'm curious about why the register is load here.
Is it because that repeatedly embedding the same large constant in
instructions inflates the code? (or it could be other reasons?) Looking
forward to your further reply!
Thanks,
lrwei
>
> > So I wonder whether there is any reason for this loading a constant
> > into register, I'll be very appreciated if someone can point out the
> > reason for me.
>
> <snip>
>
> >
> >
> > Thanks in advance.
> > lrwei
> > ------------------Original------------------
> > From: "lrwei"<[hidden email]>
> > Date: Tue, Aug 4, 2020 12:06 PM
> > To: "qemu-discuss"<[hidden email]>
> > Subject: Question on implementation detail of `temp_sync`
> >
> <re-pasted fixing html noise>
>
> > Hello to the list,
> > Recently I have been studying the code of TCG, and get confused by
> > the following detail in function `temp_sync` in tcg/tcg.c:
>
> > case TEMP_VAL_CONST:
> > /* If we're going to free the temp immediately, then we won't
> > require it later in a register, so attempt to store the
> > constant to memory directly. */
> > if (free_or_dead
> > && tcg_out_sti(s, ts->type, ts->val,
> > ts->mem_base->reg, ts->mem_offset)) {
> > break;
> > }
> > temp_load(s, ts, tcg_target_available_regs[ts->type],
> > allocated_regs, preferred_regs);
> > /* fallthrough */
>
> > Would it be better to remove the `free_or_dead` in the if statement,
> > i.e. turn the function to be:
>
> > case TEMP_VAL_CONST:
> > if (tcg_out_sti(s, ts->type, ts->val,
> > ts->mem_base->reg, ts->mem_offset)) {
> > break;
> > }
> > temp_load(s, ts, tcg_target_available_regs[ts->type],
> > allocated_regs, preferred_regs);
> > /* fallthrough */
>
>
> --
> Alex Bennée