Re: Measuring Enigma's performance: A paradox?

enigma-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Measuring Enigma's performance: A paradox?

From:	Andreas Lochmann
Subject:	Re: Measuring Enigma's performance: A paradox?
Date:	Mon, 12 Apr 2021 02:15:25 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1


Hi,

Am 11.04.21 um 15:32 schrieb shadowphrogg32642342@gmail.com:

With the statusbar closed, there's more screen real estate to update, no?

No, the total number of pixels decreases. With "deactivated" I meant:Nothing is drawn in that region. It would still show whatever was shownin this area before the level started.

On Sun, Apr 11, 2021 at 11:24 AM Daniel Heck <mail@dheck.net<mailto:mail@dheck.net>> wrote:


    Hm... it's hard to know what's going on without more information.
    Have you tried using a profiler (e.g. "gprof" or "oprofile" on
    Linux) to figure out _where_ the time is being spent? In general, I
    find CPU time to be a relatively poor proxy for performance, not
    just because it is influenced by things like CPU scaling, but also
    because it usually doesn't account for time spent waiting (I/O,
    memory, external devices) and time spent in other parts of the
    system (the operating system, hardware drivers, other processes).

Yes. In December I called valgrind/callgrind/KCacheGrind on the1.3-alpha-version. It showed that 70 % of the costs were due toSDL_UpperBlit, called via ecl::Surface::blit.

For the current performance tests however, I needed something faster,something that Enigma could measure by itself. So I implemented the--measureperformance option, which is based on CPU time (without systemcalls). This way I could analyse 125 runs of a self-solving level thisweekend, testing several possible pixel formats for SDL2. I foundARGB888 to be superior to all other pixel formats (see latest commit tomaster), but I needed a fast-n-dirty method to pull this off.

    That being said, Enigma's rendering "engine" is indeed antiquated
    and not a good fit for the way modern computers update the screen.
    Rendering in software and uploading the resulting image to the GPU
    is simply not efficient any more. Ideally, we would use the
    SDL_Render API to let the GPU do all of the drawing so that we have
    to transfer as little image data between the CPU and the GPU as
    possible. But this would require a significant rewrite of the
    display code...

I tend to disagree here. Yes, Enigma's performance is abysmal for whatit tries to accomplish, but on a relatively modern computer, even withintegrated graphics, it's still fast enough to look smooth. Let's say:"Relative performance" is bad, but "absolute performance" is still okay-ish.

On older hardware however, this is different, Enigma might run slow, andthis is what I want to improve: How Enigma is experienced on oldhardware. And this old hardware might not be able to use hardwareacceleration.

Well, yes, I'm performing my tests on modern hardware, not quite what Itry to accomplish, I know ;-)

    Now that I think of it, have your experimented with the way screen
    updates are handled in ecl::Screen:flush_updates()? When there are
    more than 200 updated regions on the screen, the function simply
    updates its entire contents, which might be related to your
    observation that drawing more is faster.

Good idea ... I did four more experiments, in otherwise the same setupas yesterday, and reproduced yesterday's results to be sure. Summary:


WITH statusbar, SOMETIMES update_all: 13.0 s (<- default)
WITH statusbar, ALWAYS update_all: 15.4 s
WITH statusbar, NEVER update_all: 13.3 s
NO statusbar, SOMETIMES update_all: 13.9 s
NO statusbar, ALWAYS update_all: 21.0 s (no typo!)
NO statusbar, NEVER update_all: 14.5 s

So ... first of all, it seems like your choice of 200 updated regionshits a sweet spot. Also: If Enigma just always flushes everything everytime, the difference between having the statusbar and not having it getseven more pronounced. What this means ... I have no idea.


In connected news: There actually is some kind of "video cache in cpu":
  https://en.wikipedia.org/wiki/Uncacheable_speculative_write_combining

It exists since at least November 1998, and is therefore relevant to oldhardware as well.


Cheers
Andreas

    - Daniel

     > On 10. Apr 2021, at 18:23, Andreas Lochmann
    <and.lochmann@googlemail.com <mailto:and.lochmann@googlemail.com>>
    wrote:
     >
     > Hi everyone,
     >
     > I'm currently performing some experiments to improve Enigma's
    performance in the graphics department. For this, I measure the CPU
    time used to solve certain self-solving levels, particularly one
    with smooth scrolling, because this is our Achilles' heel right now.
     >
     > I noticed that Enigma uses less CPU time when something else runs
    in the background, like a web video. This is easily explained by the
    CPU frequency stepping up. However, it seems like this effect
    appears even when I activate/deactivate the Enigma's own status bar
    (the one counting up the time and displaying the level title) and
    full utilisation. Let me explain:
     >
     > I first launched several prime generators in the background, so
    my cores were 100% utilised and CPU frequency on its maximum. With
    status bar activated, Enigma uses on average 13.05 CPU-seconds for a
    specific task. When I completely deactivate the status bar, the same
    task takes about 13.87 CPU-seconds -- and this although drawing the
    status bar itself has to be done within the same 13.05 CPU-seconds.
    (This is not a statistical fluke. For the average I used four runs
    each time, and all four runs WITH status bar needed consistently
    less time than WITHOUT status bar. And I did similar, slightly
    different experiments before, all showing the same paradoxical
    behaviour.)
     >
     > How can it be that drawing the status bar still leads to less CPU
    time used? Is "video memory cache in CPU" a thing? (Remember that
    Enigma relies on software acceleration for graphics.)
     >
     > Maybe someone of you knows this and can help me out?
     >
     > Because: If this turns out to be real, it might make Enigma
    faster if it has to draw more on each time step. After all, we have
    lots of code trying to reduce the blit count.

[Prev in Thread]

Current Thread

[Next in Thread]

Measuring Enigma's performance: A paradox?, Andreas Lochmann, 2021/04/10
- Re: Measuring Enigma's performance: A paradox?, Daniel Heck, 2021/04/11
  - Re: Measuring Enigma's performance: A paradox?, address@hidden, 2021/04/11
    - Re: Measuring Enigma's performance: A paradox?, Andreas Lochmann <=
    - Re: Measuring Enigma's performance: A paradox?, Daniel Heck, 2021/04/14
- Re: Measuring Enigma's performance: A paradox?, Erich Schubert, 2021/04/15

Prev by Date: Re: Measuring Enigma's performance: A paradox?
Next by Date: Re: Measuring Enigma's performance: A paradox?
Previous by thread: Re: Measuring Enigma's performance: A paradox?
Next by thread: Re: Measuring Enigma's performance: A paradox?
Index(es):
- Date
- Thread