[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Measuring Enigma's performance: A paradox?

From: Andreas Lochmann
Subject: Re: Measuring Enigma's performance: A paradox?
Date: Mon, 12 Apr 2021 02:15:25 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1


Am 11.04.21 um 15:32 schrieb shadowphrogg32642342@gmail.com:
With the statusbar closed, there's more screen real estate to update, no?

No, the total number of pixels decreases. With "deactivated" I meant: Nothing is drawn in that region. It would still show whatever was shown in this area before the level started.

On Sun, Apr 11, 2021 at 11:24 AM Daniel Heck <mail@dheck.net <mailto:mail@dheck.net>> wrote:

    Hm... it's hard to know what's going on without more information.
    Have you tried using a profiler (e.g. "gprof" or "oprofile" on
    Linux) to figure out _where_ the time is being spent? In general, I
    find CPU time to be a relatively poor proxy for performance, not
    just because it is influenced by things like CPU scaling, but also
    because it usually doesn't account for time spent waiting (I/O,
    memory, external devices) and time spent in other parts of the
    system (the operating system, hardware drivers, other processes).

Yes. In December I called valgrind/callgrind/KCacheGrind on the 1.3-alpha-version. It showed that 70 % of the costs were due to SDL_UpperBlit, called via ecl::Surface::blit.

For the current performance tests however, I needed something faster, something that Enigma could measure by itself. So I implemented the --measureperformance option, which is based on CPU time (without system calls). This way I could analyse 125 runs of a self-solving level this weekend, testing several possible pixel formats for SDL2. I found ARGB888 to be superior to all other pixel formats (see latest commit to master), but I needed a fast-n-dirty method to pull this off.

    That being said, Enigma's rendering "engine" is indeed antiquated
    and not a good fit for the way modern computers update the screen.
    Rendering in software and uploading the resulting image to the GPU
    is simply not efficient any more. Ideally, we would use the
    SDL_Render API to let the GPU do all of the drawing so that we have
    to transfer as little image data between the CPU and the GPU as
    possible. But this would require a significant rewrite of the
    display code...

I tend to disagree here. Yes, Enigma's performance is abysmal for what it tries to accomplish, but on a relatively modern computer, even with integrated graphics, it's still fast enough to look smooth. Let's say: "Relative performance" is bad, but "absolute performance" is still okay-ish.

On older hardware however, this is different, Enigma might run slow, and this is what I want to improve: How Enigma is experienced on old hardware. And this old hardware might not be able to use hardware acceleration.

Well, yes, I'm performing my tests on modern hardware, not quite what I try to accomplish, I know ;-)

    Now that I think of it, have your experimented with the way screen
    updates are handled in ecl::Screen:flush_updates()? When there are
    more than 200 updated regions on the screen, the function simply
    updates its entire contents, which might be related to your
    observation that drawing more is faster.

Good idea ... I did four more experiments, in otherwise the same setup as yesterday, and reproduced yesterday's results to be sure. Summary:

WITH statusbar, SOMETIMES update_all: 13.0 s (<- default)
WITH statusbar, ALWAYS update_all: 15.4 s
WITH statusbar, NEVER update_all: 13.3 s
NO statusbar, SOMETIMES update_all: 13.9 s
NO statusbar, ALWAYS update_all: 21.0 s (no typo!)
NO statusbar, NEVER update_all: 14.5 s

So ... first of all, it seems like your choice of 200 updated regions hits a sweet spot. Also: If Enigma just always flushes everything every time, the difference between having the statusbar and not having it gets even more pronounced. What this means ... I have no idea.

In connected news: There actually is some kind of "video cache in cpu":
It exists since at least November 1998, and is therefore relevant to old hardware as well.


    - Daniel

     > On 10. Apr 2021, at 18:23, Andreas Lochmann
    <and.lochmann@googlemail.com <mailto:and.lochmann@googlemail.com>>
     > Hi everyone,
     > I'm currently performing some experiments to improve Enigma's
    performance in the graphics department. For this, I measure the CPU
    time used to solve certain self-solving levels, particularly one
    with smooth scrolling, because this is our Achilles' heel right now.
     > I noticed that Enigma uses less CPU time when something else runs
    in the background, like a web video. This is easily explained by the
    CPU frequency stepping up. However, it seems like this effect
    appears even when I activate/deactivate the Enigma's own status bar
    (the one counting up the time and displaying the level title) and
    full utilisation. Let me explain:
     > I first launched several prime generators in the background, so
    my cores were 100% utilised and CPU frequency on its maximum. With
    status bar activated, Enigma uses on average 13.05 CPU-seconds for a
    specific task. When I completely deactivate the status bar, the same
    task takes about 13.87 CPU-seconds -- and this although drawing the
    status bar itself has to be done within the same 13.05 CPU-seconds.
    (This is not a statistical fluke. For the average I used four runs
    each time, and all four runs WITH status bar needed consistently
    less time than WITHOUT status bar. And I did similar, slightly
    different experiments before, all showing the same paradoxical
     > How can it be that drawing the status bar still leads to less CPU
    time used? Is "video memory cache in CPU" a thing? (Remember that
    Enigma relies on software acceleration for graphics.)
     > Maybe someone of you knows this and can help me out?
     > Because: If this turns out to be real, it might make Enigma
    faster if it has to draw more on each time step. After all, we have
    lots of code trying to reduce the blit count.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]