[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Measuring Enigma's performance: A paradox?
From: |
Andreas Lochmann |
Subject: |
Re: Measuring Enigma's performance: A paradox? |
Date: |
Mon, 12 Apr 2021 02:15:25 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 |
Hi,
Am 11.04.21 um 15:32 schrieb shadowphrogg32642342@gmail.com:
With the statusbar closed, there's more screen real estate to update, no?
No, the total number of pixels decreases. With "deactivated" I meant:
Nothing is drawn in that region. It would still show whatever was shown
in this area before the level started.
On Sun, Apr 11, 2021 at 11:24 AM Daniel Heck <mail@dheck.net
<mailto:mail@dheck.net>> wrote:
Hm... it's hard to know what's going on without more information.
Have you tried using a profiler (e.g. "gprof" or "oprofile" on
Linux) to figure out _where_ the time is being spent? In general, I
find CPU time to be a relatively poor proxy for performance, not
just because it is influenced by things like CPU scaling, but also
because it usually doesn't account for time spent waiting (I/O,
memory, external devices) and time spent in other parts of the
system (the operating system, hardware drivers, other processes).
Yes. In December I called valgrind/callgrind/KCacheGrind on the
1.3-alpha-version. It showed that 70 % of the costs were due to
SDL_UpperBlit, called via ecl::Surface::blit.
For the current performance tests however, I needed something faster,
something that Enigma could measure by itself. So I implemented the
--measureperformance option, which is based on CPU time (without system
calls). This way I could analyse 125 runs of a self-solving level this
weekend, testing several possible pixel formats for SDL2. I found
ARGB888 to be superior to all other pixel formats (see latest commit to
master), but I needed a fast-n-dirty method to pull this off.
That being said, Enigma's rendering "engine" is indeed antiquated
and not a good fit for the way modern computers update the screen.
Rendering in software and uploading the resulting image to the GPU
is simply not efficient any more. Ideally, we would use the
SDL_Render API to let the GPU do all of the drawing so that we have
to transfer as little image data between the CPU and the GPU as
possible. But this would require a significant rewrite of the
display code...
I tend to disagree here. Yes, Enigma's performance is abysmal for what
it tries to accomplish, but on a relatively modern computer, even with
integrated graphics, it's still fast enough to look smooth. Let's say:
"Relative performance" is bad, but "absolute performance" is still okay-ish.
On older hardware however, this is different, Enigma might run slow, and
this is what I want to improve: How Enigma is experienced on old
hardware. And this old hardware might not be able to use hardware
acceleration.
Well, yes, I'm performing my tests on modern hardware, not quite what I
try to accomplish, I know ;-)
Now that I think of it, have your experimented with the way screen
updates are handled in ecl::Screen:flush_updates()? When there are
more than 200 updated regions on the screen, the function simply
updates its entire contents, which might be related to your
observation that drawing more is faster.
Good idea ... I did four more experiments, in otherwise the same setup
as yesterday, and reproduced yesterday's results to be sure. Summary:
WITH statusbar, SOMETIMES update_all: 13.0 s (<- default)
WITH statusbar, ALWAYS update_all: 15.4 s
WITH statusbar, NEVER update_all: 13.3 s
NO statusbar, SOMETIMES update_all: 13.9 s
NO statusbar, ALWAYS update_all: 21.0 s (no typo!)
NO statusbar, NEVER update_all: 14.5 s
So ... first of all, it seems like your choice of 200 updated regions
hits a sweet spot. Also: If Enigma just always flushes everything every
time, the difference between having the statusbar and not having it gets
even more pronounced. What this means ... I have no idea.
In connected news: There actually is some kind of "video cache in cpu":
https://en.wikipedia.org/wiki/Uncacheable_speculative_write_combining
It exists since at least November 1998, and is therefore relevant to old
hardware as well.
Cheers
Andreas
- Daniel
> On 10. Apr 2021, at 18:23, Andreas Lochmann
<and.lochmann@googlemail.com <mailto:and.lochmann@googlemail.com>>
wrote:
>
> Hi everyone,
>
> I'm currently performing some experiments to improve Enigma's
performance in the graphics department. For this, I measure the CPU
time used to solve certain self-solving levels, particularly one
with smooth scrolling, because this is our Achilles' heel right now.
>
> I noticed that Enigma uses less CPU time when something else runs
in the background, like a web video. This is easily explained by the
CPU frequency stepping up. However, it seems like this effect
appears even when I activate/deactivate the Enigma's own status bar
(the one counting up the time and displaying the level title) and
full utilisation. Let me explain:
>
> I first launched several prime generators in the background, so
my cores were 100% utilised and CPU frequency on its maximum. With
status bar activated, Enigma uses on average 13.05 CPU-seconds for a
specific task. When I completely deactivate the status bar, the same
task takes about 13.87 CPU-seconds -- and this although drawing the
status bar itself has to be done within the same 13.05 CPU-seconds.
(This is not a statistical fluke. For the average I used four runs
each time, and all four runs WITH status bar needed consistently
less time than WITHOUT status bar. And I did similar, slightly
different experiments before, all showing the same paradoxical
behaviour.)
>
> How can it be that drawing the status bar still leads to less CPU
time used? Is "video memory cache in CPU" a thing? (Remember that
Enigma relies on software acceleration for graphics.)
>
> Maybe someone of you knows this and can help me out?
>
> Because: If this turns out to be real, it might make Enigma
faster if it has to draw more on each time step. After all, we have
lots of code trying to reduce the blit count.