[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH v4 10/10] util/bufferiszero: Add sve acceleration for aar
From: |
Alex Bennée |
Subject: |
Re: [RFC PATCH v4 10/10] util/bufferiszero: Add sve acceleration for aarch64 |
Date: |
Fri, 16 Feb 2024 11:05:24 +0000 |
User-agent: |
mu4e 1.11.28; emacs 29.1 |
Richard Henderson <richard.henderson@linaro.org> writes:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>
> RFC because I've not benchmarked this on real hw, only run it
> through qemu for validation.
>
<snip>
>
> +#ifdef CONFIG_SVE_OPT
> +static unsigned accel_index;
> +static void __attribute__((constructor)) init_accel(void)
> +{
> + accel_index = (cpuinfo & CPUINFO_SVE ? 2 : 1);
> + buffer_is_zero_accel = accel_table[accel_index];
> +}
This really needs to be:
- accel_index = (cpuinfo & CPUINFO_SVE ? 2 : 1);
+ unsigned info = cpuinfo_init();
+ accel_index = (info & CPUINFO_SVE ? 2 : 1);
because otherwise you are relying on constructor initialisation order
and on the Graviton 3 I built on it wasn't detecting the SVE. With that I
get this from "perf record ./tests/unit/test-bufferiszero -m thorough"
51.17% test-bufferisze test-bufferiszero [.] buffer_is_zero_sve
18.92% test-bufferisze test-bufferiszero [.] buffer_is_zero_simd
18.02% test-bufferisze test-bufferiszero [.] buffer_is_zero_int_ge256
7.67% test-bufferisze test-bufferiszero [.] buffer_is_zero_ool
4.09% test-bufferisze test-bufferiszero [.] test_1
but as I mentioned before it would be nice to have a proper benchmark
for the buffer utils as I'm sure the unit test would be prone to noise.
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
- [PATCH v4 01/10] util/bufferiszero: Remove SSE4.1 variant, (continued)
- [PATCH v4 01/10] util/bufferiszero: Remove SSE4.1 variant, Richard Henderson, 2024/02/15
- [PATCH v4 02/10] util/bufferiszero: Remove AVX512 variant, Richard Henderson, 2024/02/15
- [PATCH v4 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants, Richard Henderson, 2024/02/15
- [PATCH v4 06/10] util/bufferiszero: Improve scalar variant, Richard Henderson, 2024/02/15
- [PATCH v4 07/10] util/bufferiszero: Introduce biz_accel_fn typedef, Richard Henderson, 2024/02/15
- [PATCH v4 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel, Richard Henderson, 2024/02/15
- [RFC PATCH v4 10/10] util/bufferiszero: Add sve acceleration for aarch64, Richard Henderson, 2024/02/15
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Alexander Monakov, 2024/02/15
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Richard Henderson, 2024/02/15
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Alexander Monakov, 2024/02/15
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Richard Henderson, 2024/02/15
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Alexander Monakov, 2024/02/15
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Richard Henderson, 2024/02/16
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Alexander Monakov, 2024/02/16
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Richard Henderson, 2024/02/16