qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [INFO] Some preliminary performance data


From: Aleksandar Markovic
Subject: Re: [INFO] Some preliminary performance data
Date: Sat, 9 May 2020 14:37:52 +0200

суб, 9. мај 2020. у 13:37 Laurent Desnogues
<address@hidden> је написао/ла:
>
> On Sat, May 9, 2020 at 12:17 PM Aleksandar Markovic
> <address@hidden> wrote:
> >  сре, 6. мај 2020. у 13:26 Alex Bennée <address@hidden> је написао/ла:
> >
> > > This is very much driven by how much code generation vs running you see.
> > > In most of my personal benchmarks I never really notice code generation
> > > because I give my machines large amounts of RAM so code tends to stay
> > > resident so not need to be re-translated. When the optimiser shows up
> > > it's usually accompanied by high TB flush and invalidate counts in "info
> > > jit" because we are doing more translation that we usually do.
> > >
> >
> > Yes, I think the machine was setup with only 128MB RAM.
> >
> > That would be an interesting experiment for Ahmed actually - to
> > measure impact of given RAM memory to performance.
> >
> > But it looks that at least for machines with small RAM, translation
> > phase will take significant percentage.
> >
> > I am attaching call graph for translation phase for "Hello World" built
> > for mips, and emulated by QEMU: *tb_gen_code() and its calees)
>

Hi, Laurent,

"Hello world" was taken as an example where code generation is
dominant. It was taken to illustrate how performance-wise code
generation overhead is distributed (illustrating dominance of a
single function).

While "Hello world" by itself is not a significant example, it conveys
a useful information: it says how much is the overhead of QEMU
linux-user executable initialization, and code generation spent on
emulation of loading target executable and printing a simple
message. This can be roughly deducted from the result for
a meaningful benchmark.

Booting of a virtual machine is a legitimate scenario for measuring
performance, and perhaps even attempting improving it.

Everything should be measured - code generation, JIT-ed code
execution, and helpers execution - in all cases, and checked
whether it departs from expected behavior.

Let's say that we emulate a benchmark that basically runs some
code in a loop, or an algorithm - one would expect that after a
while, while increasing number of iterations of the loop, or the
size of data in the algorithm, code generation becomes less and
less significant, converging to zero. Well, this should be confirmed
with an experiment, and not taken for granted.

I think limiting measurements only on, let's say, execution of
JIT-ed code (if that is what you implied) is a logical mistake.
The right conclusions should be drawn from the complete
picture, shouldn't it?

Yours,
Aleksandar

> Sorry if I'm stating the obvious but both "Hello World" and a
> Linux boot will exhibit similar behaviors with low reuse of
> translated blocks, which means translation will show up in
> profiles as a lot of time is spent in translating blocks that
> will run once.  If you push in that direction you might reach
> the conclusion that a non JIST simulator is faster than QEMU.
>
> You will have to carefully select the tests you run:  you need
> a large spectrum from Linux boot, "Hello World" up to synthetic
> benchmarks.
>
> Again sorry if that was too trivial :-)
>
> Laurent



reply via email to

[Prev in Thread] Current Thread [Next in Thread]