[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 5.0.0-rc3 : Opcode 1f 12 0f 00 (7ce003e4) leaked temporaries
From: |
BALATON Zoltan |
Subject: |
Re: 5.0.0-rc3 : Opcode 1f 12 0f 00 (7ce003e4) leaked temporaries |
Date: |
Fri, 17 Apr 2020 18:08:38 +0200 (CEST) |
User-agent: |
Alpine 2.22 (BSF 395 2020-01-19) |
On Fri, 17 Apr 2020, Peter Maydell wrote:
On Fri, 17 Apr 2020 at 12:00, BALATON Zoltan <address@hidden> wrote:
On Fri, 17 Apr 2020, Peter Maydell wrote:
And yes, debug
is slower (it builds QEMU without optimization enabled
so it's easier to debug QEMU in gdb, and it turns on
various extra sanity checks.)
Last time I've looked I've found it effectively disables TB cache (at
least with PPC) because one of those checks forces a flush which is the
main source of the slowness with --enable-debug. Not sure if this could be
avoided, I didn't know about --disable-tcg-debug Philippe suggested so
haven't tested that.
It's not supposed to disable TB caching, and in my experience
it does not (no TB caching at all is incredibly slow). If it's
doing that on PPC that would be worth investigating.
I do almost all of my work and local testing with --enable-debug,
so I notice if it's slowed down to the extent that "no TB caching"
would involve. It is naturally slower than the non-debug config
both because of some extra checking and also because all the C
code is being compiled at -O0 rather than -O2.
I've dug up what I wrote when I found this but that was in an off-list
thread, and looks like I haven't reported it to the list. Here it is again
for reference, I haven't redone the profiling to verify it but I think
it's probably still the same:
On Wed, 11 Jul 2018, BALATON Zoltan wrote:
QEMU v3.0.0-rc0 was tagged with all our patches and the last fb_addr
patch is queued for 3.0 so we are on track to have 3.0 being able to
boot Amiga like OSes.
I've done some profiling of booting an installed AmigaOS from a hd image
and these are the top suspects:
samples % linenr info symbol name
1852798 30.4937 cpu.h:450 cpu_tb_jmp_cache_clear
312412 5.1417 mmu_helper.c:754 mmubooke_check_tlb
277472 4.5667 mmu_helper.c:610 ppcemb_tlb_check
256472 4.2211 mmu_helper.c:823 mmubooke_get_physical_address
95438 1.5707 object.c:622 object_dynamic_cast_assert
95264 1.5679 sm501_template.h:62 draw_line16_32
89675 1.4759 tb-lookup.h:23 tb_lookup__cpu_state
88646 1.4590 object.c:711 object_class_dynamic_cast_assert
82575 1.3590 cpu-exec.c:514 cpu_handle_interrupt
74578 1.2274 cputlb.c:924 victim_tlb_hit
70437 1.1593 tb-lookup.h:23 tb_lookup__cpu_state
67647 1.1133 tcg.c:2680 check_regs
On Sun, 15 Jul 2018, BALATON Zoltan wrote:
I've realised that the --enable-debug configure option (that I was
always using for development) makes things really slow as it enables
some additional checks. So I've got the profile after compiling QEMU
without this option and the previous culprit is now gone. (Looks like
the tlb is still flushed a lot but at least the tb_jmp_cache is not
cleared without --enable-debug which makes things considerable faster.)
Here's the new profile of booting an installed system from a HD image:
samples % linenr info symbol name
-------------------------------------------------------------------------------
270078 19.8513 cputlb.c:114 tlb_flush_nocheck
270078 19.8513 cputlb.c:114 tlb_flush_nocheck
270078 19.8513 cputlb.c:114 tlb_flush_nocheck
270078 19.8513 cputlb.c:114 tlb_flush_nocheck [self]
-------------------------------------------------------------------------------
126866 9.3249 mmu_helper.c:1353 get_physical_address
126866 9.3249 mmu_helper.c:1353 get_physical_address
126866 9.3249 mmu_helper.c:1353 get_physical_address
126866 9.3249 mmu_helper.c:1353 get_physical_address [self]
-------------------------------------------------------------------------------
108213 7.9539 mmu_helper.c:614 ppcemb_tlb_check.isra.7
108213 7.9539 mmu_helper.c:614 ppcemb_tlb_check.isra.7
108213 7.9539 mmu_helper.c:614 ppcemb_tlb_check.isra.7
108213 7.9539 mmu_helper.c:614 ppcemb_tlb_check.isra.7 [self]
-------------------------------------------------------------------------------
101977 7.4955 cpu-exec.c:656 cpu_exec
101977 7.4955 cpu-exec.c:656 cpu_exec
101977 7.4955 cpu-exec.c:656 cpu_exec
101977 7.4955 cpu-exec.c:656 cpu_exec [self]
-------------------------------------------------------------------------------
69533 5.1108 exec-all.h:410 helper_lookup_tb_ptr
69533 5.1108 exec-all.h:410 helper_lookup_tb_ptr
69533 5.1108 exec-all.h:410 helper_lookup_tb_ptr
69533 5.1108 exec-all.h:410 helper_lookup_tb_ptr [self]
19 0.0014 optimize.c:592 tcg_optimize
3 2.2e-04 optimize.c:179 tcg_opt_gen_movi.isra.2
2 1.5e-04 tcg.h:732 init_ts_info
1 7.4e-05 tcg-target.inc.c:526 tcg_out_opc.isra.10
1 7.4e-05 tcg-target.inc.c:1153 tgen_arithi
1 7.4e-05 tcg-target.inc.c:744 tcg_out_modrm_sib_offset
1 7.4e-05 tcg-target.inc.c:913 tcg_out_movi
1 7.4e-05 optimize.c:149 tcg_opt_gen_mov
-------------------------------------------------------------------------------
55120 4.0514 object.c:711 object_class_dynamic_cast_assert
55120 4.0514 object.c:711 object_class_dynamic_cast_assert
55120 4.0514 object.c:711 object_class_dynamic_cast_assert
55120 4.0514 object.c:711
object_class_dynamic_cast_assert [self]
-------------------------------------------------------------------------------
54952 4.0391 cputlb.c:606 tlb_set_page_with_attrs
54952 4.0391 cputlb.c:606 tlb_set_page_with_attrs
54952 4.0391 cputlb.c:606 tlb_set_page_with_attrs
54952 4.0391 cputlb.c:606 tlb_set_page_with_attrs [self]
-------------------------------------------------------------------------------
49256 3.6204 cputlb.c:924 victim_tlb_hit
49256 3.6204 cputlb.c:924 victim_tlb_hit
49256 3.6204 cputlb.c:924 victim_tlb_hit
49256 3.6204 cputlb.c:924 victim_tlb_hit [self]
-------------------------------------------------------------------------------
4 2.9e-04 core.c:404 usb_handle_packet
47881 3.5193 object.c:622 object_dynamic_cast_assert
47881 3.5193 object.c:622 object_dynamic_cast_assert
47881 3.5193 object.c:622 object_dynamic_cast_assert
47881 3.5193 object.c:622 object_dynamic_cast_assert
[self]
-------------------------------------------------------------------------------
29562 2.1729 qht.c:487 qht_lookup_custom
29562 2.1729 qht.c:487 qht_lookup_custom
29562 2.1729 qht.c:487 qht_lookup_custom
29562 2.1729 qht.c:487 qht_lookup_custom [self]
-------------------------------------------------------------------------------
26002 1.9112 cpus.c:347 cpu_get_clock
26002 1.9112 cpus.c:347 cpu_get_clock
26002 1.9112 cpus.c:347 cpu_get_clock
26002 1.9112 cpus.c:347 cpu_get_clock [self]
-------------------------------------------------------------------------------
On Thu, 19 Jul 2018, BALATON Zoltan wrote:
I don't remember now if I took this profile with or without
--enable-debug but I've found that with --enable-debug this
cmp_tb_jmp_cache_clear happens a lot due to some check function called
with debug but goes away (at least on other OSes) when debug is not
enabled. So this may not be that important but this should probably be
verified again with AmigaOS. Not blowing the tb_cache all the time does
make it faster but still not fast enough to reach hardware speed, I
still see a lot of tlb_flush, even without debug enabled.
So probably it does not disable TB cache but does interfere with an
important optimisation that makes ppc emulation run very slow. Not sure
about other targets.
Regards,
BALATON Zoltan
Re: 5.0.0-rc3 : Opcode 1f 12 0f 00 (7ce003e4) leaked temporaries, BALATON Zoltan, 2020/04/17