qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

notdirty_write thrashing in simple for loop


From: Mark Watson
Subject: notdirty_write thrashing in simple for loop
Date: Tue, 18 May 2021 11:54:55 +0200

Hi,

I'm trying to implement my own machine for amiga emulation using a software cpu and fpga hardware. For this I have built my own machine which consists of a large malloced ram block and some fpga hardware mmapped elsewhere into the memory space.

I'm using qemu to emulate a 68040 on an arm cortex a9 host in system mode.

It is working, though I'm investigating a strange performance issue.

I'm looking for advice on where to look next in debugging this from the specialist(s) of accel/tcg/cputlb.c please.
  
To investigate the performance issue I tried to break it down to the simplest possible case. I can reproduce it with a simple for loop (compiled without optimisation).
        for (int i=0;i!=0xffffff;++i)
{
if ((i&0xffff)==0)
{
}
}
Running it in user mode on the same host it takes ~0.6 seconds. In the built-in 'virtual' m68k machine running linux it takes 1.3 seconds.
However in my machine under amigaos I'm seeing it typically taking 5 and a half minutes! Occasionally it seems to run at the correct speed of <2 seconds, though I have yet to identify why. These are the logs of the captured code before it goes into the main chain loop.
I have verified that this performance change is not due to slow fpga memory area access, i.e. there are no accesses to that memory region during this.

I took a look in gdb while running this loop to see what is going on. Initially I was surprised that I didn't find the code in 'OUT:', however I guess it makes sense that it has to call into the framework for memory access. I noticed that a lot of calls to glib are made and see g_tree_lookup called a lot. This is caused by notdirty_write being called '000s of times and each time going into the page_collection_lock and tb_invalidate_phys_page_fast. I presume this is happening each time that "i" is incremented on the stack, which clearly has a huge overhead.

Even being able to get a proper stack trace from gdb would be very helpful to understand this. I tried to configure qemu with '--enable-debug' but still do not get a proper stack if i attach to it. I'm not sure if this is the case due to it running dynamically compiled code before calling into this.

Thanks,
Mark

reply via email to

[Prev in Thread] Current Thread [Next in Thread]