qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 5.0.0-rc3 : Opcode 1f 12 0f 00 (7ce003e4) leaked temporaries


From: BALATON Zoltan
Subject: Re: 5.0.0-rc3 : Opcode 1f 12 0f 00 (7ce003e4) leaked temporaries
Date: Fri, 17 Apr 2020 18:08:38 +0200 (CEST)
User-agent: Alpine 2.22 (BSF 395 2020-01-19)

On Fri, 17 Apr 2020, Peter Maydell wrote:
On Fri, 17 Apr 2020 at 12:00, BALATON Zoltan <address@hidden> wrote:
On Fri, 17 Apr 2020, Peter Maydell wrote:
And yes, debug
is slower (it builds QEMU without optimization enabled
so it's easier to debug QEMU in gdb, and it turns on
various extra sanity checks.)

Last time I've looked I've found it effectively disables TB cache (at
least with PPC) because one of those checks forces a flush which is the
main source of the slowness with --enable-debug. Not sure if this could be
avoided, I didn't know about --disable-tcg-debug Philippe suggested so
haven't tested that.

It's not supposed to disable TB caching, and in my experience
it does not (no TB caching at all is incredibly slow). If it's
doing that on PPC that would be worth investigating.

I do almost all of my work and local testing with --enable-debug,
so I notice if it's slowed down to the extent that "no TB caching"
would involve. It is naturally slower than the non-debug config
both because of some extra checking and also because all the C
code is being compiled at -O0 rather than -O2.

I've dug up what I wrote when I found this but that was in an off-list thread, and looks like I haven't reported it to the list. Here it is again for reference, I haven't redone the profiling to verify it but I think it's probably still the same:

On Wed, 11 Jul 2018, BALATON Zoltan wrote:
QEMU v3.0.0-rc0 was tagged with all our patches and the last fb_addr patch is queued for 3.0 so we are on track to have 3.0 being able to boot Amiga like OSes.

I've done some profiling of booting an installed AmigaOS from a hd image and these are the top suspects:

samples  %        linenr info                 symbol name
1852798  30.4937  cpu.h:450                   cpu_tb_jmp_cache_clear
312412    5.1417  mmu_helper.c:754            mmubooke_check_tlb
277472    4.5667  mmu_helper.c:610            ppcemb_tlb_check
256472    4.2211  mmu_helper.c:823            mmubooke_get_physical_address
95438     1.5707  object.c:622                object_dynamic_cast_assert
95264     1.5679  sm501_template.h:62         draw_line16_32
89675     1.4759  tb-lookup.h:23              tb_lookup__cpu_state
88646     1.4590  object.c:711                object_class_dynamic_cast_assert
82575     1.3590  cpu-exec.c:514              cpu_handle_interrupt
74578     1.2274  cputlb.c:924                victim_tlb_hit
70437     1.1593  tb-lookup.h:23              tb_lookup__cpu_state
67647     1.1133  tcg.c:2680                  check_regs

On Sun, 15 Jul 2018, BALATON Zoltan wrote:
I've realised that the --enable-debug configure option (that I was always using for development) makes things really slow as it enables some additional checks. So I've got the profile after compiling QEMU without this option and the previous culprit is now gone. (Looks like the tlb is still flushed a lot but at least the tb_jmp_cache is not cleared without --enable-debug which makes things considerable faster.) Here's the new profile of booting an installed system from a HD image:

samples  %        linenr info                 symbol name
-------------------------------------------------------------------------------
  270078   19.8513  cputlb.c:114                tlb_flush_nocheck
270078   19.8513  cputlb.c:114                tlb_flush_nocheck
  270078   19.8513  cputlb.c:114                tlb_flush_nocheck
  270078   19.8513  cputlb.c:114                tlb_flush_nocheck [self]
-------------------------------------------------------------------------------
  126866    9.3249  mmu_helper.c:1353           get_physical_address
126866    9.3249  mmu_helper.c:1353           get_physical_address
  126866    9.3249  mmu_helper.c:1353           get_physical_address
  126866    9.3249  mmu_helper.c:1353           get_physical_address [self]
-------------------------------------------------------------------------------
  108213    7.9539  mmu_helper.c:614            ppcemb_tlb_check.isra.7
108213    7.9539  mmu_helper.c:614            ppcemb_tlb_check.isra.7
  108213    7.9539  mmu_helper.c:614            ppcemb_tlb_check.isra.7
  108213    7.9539  mmu_helper.c:614            ppcemb_tlb_check.isra.7 [self]
-------------------------------------------------------------------------------
  101977    7.4955  cpu-exec.c:656              cpu_exec
101977    7.4955  cpu-exec.c:656              cpu_exec
  101977    7.4955  cpu-exec.c:656              cpu_exec
  101977    7.4955  cpu-exec.c:656              cpu_exec [self]
-------------------------------------------------------------------------------
  69533     5.1108  exec-all.h:410              helper_lookup_tb_ptr
69533     5.1108  exec-all.h:410              helper_lookup_tb_ptr
  69533     5.1108  exec-all.h:410              helper_lookup_tb_ptr
  69533     5.1108  exec-all.h:410              helper_lookup_tb_ptr [self]
  19        0.0014  optimize.c:592              tcg_optimize
  3        2.2e-04  optimize.c:179              tcg_opt_gen_movi.isra.2
  2        1.5e-04  tcg.h:732                   init_ts_info
  1        7.4e-05  tcg-target.inc.c:526        tcg_out_opc.isra.10
  1        7.4e-05  tcg-target.inc.c:1153       tgen_arithi
  1        7.4e-05  tcg-target.inc.c:744        tcg_out_modrm_sib_offset
  1        7.4e-05  tcg-target.inc.c:913        tcg_out_movi
  1        7.4e-05  optimize.c:149              tcg_opt_gen_mov
-------------------------------------------------------------------------------
  55120     4.0514  object.c:711                object_class_dynamic_cast_assert
55120     4.0514  object.c:711                object_class_dynamic_cast_assert
  55120     4.0514  object.c:711                object_class_dynamic_cast_assert
  55120     4.0514  object.c:711                
object_class_dynamic_cast_assert [self]
-------------------------------------------------------------------------------
  54952     4.0391  cputlb.c:606                tlb_set_page_with_attrs
54952     4.0391  cputlb.c:606                tlb_set_page_with_attrs
  54952     4.0391  cputlb.c:606                tlb_set_page_with_attrs
  54952     4.0391  cputlb.c:606                tlb_set_page_with_attrs [self]
-------------------------------------------------------------------------------
  49256     3.6204  cputlb.c:924                victim_tlb_hit
49256     3.6204  cputlb.c:924                victim_tlb_hit
  49256     3.6204  cputlb.c:924                victim_tlb_hit
  49256     3.6204  cputlb.c:924                victim_tlb_hit [self]
-------------------------------------------------------------------------------
  4        2.9e-04  core.c:404                  usb_handle_packet
  47881     3.5193  object.c:622                object_dynamic_cast_assert
47881     3.5193  object.c:622                object_dynamic_cast_assert
  47881     3.5193  object.c:622                object_dynamic_cast_assert
  47881     3.5193  object.c:622                object_dynamic_cast_assert 
[self]
-------------------------------------------------------------------------------
  29562     2.1729  qht.c:487                   qht_lookup_custom
29562     2.1729  qht.c:487                   qht_lookup_custom
  29562     2.1729  qht.c:487                   qht_lookup_custom
  29562     2.1729  qht.c:487                   qht_lookup_custom [self]
-------------------------------------------------------------------------------
  26002     1.9112  cpus.c:347                  cpu_get_clock
26002     1.9112  cpus.c:347                  cpu_get_clock
  26002     1.9112  cpus.c:347                  cpu_get_clock
  26002     1.9112  cpus.c:347                  cpu_get_clock [self]
-------------------------------------------------------------------------------

On Thu, 19 Jul 2018, BALATON Zoltan wrote:
I don't remember now if I took this profile with or without --enable-debug but I've found that with --enable-debug this cmp_tb_jmp_cache_clear happens a lot due to some check function called with debug but goes away (at least on other OSes) when debug is not enabled. So this may not be that important but this should probably be verified again with AmigaOS. Not blowing the tb_cache all the time does make it faster but still not fast enough to reach hardware speed, I still see a lot of tlb_flush, even without debug enabled.

So probably it does not disable TB cache but does interfere with an important optimisation that makes ppc emulation run very slow. Not sure about other targets.

Regards,
BALATON Zoltan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]