discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Volk sqrt ARM performance


From: Jeff R
Subject: Volk sqrt ARM performance
Date: Sat, 7 Oct 2023 18:22:19 -0400

I modified a simple Volk sqrt program for an ARM1176JZ-S processor to test performance, and the results are puzzling. The following program prints:


dur_VolkSqrt=(0.000000)0.001721 dur_CRTLSqrt=(0.000000)0.000318


The following processor information is displayed. It appears as though NEON is supported.


~/volk-3.0.0/build# cpu_features/list_cpu_features

arch            : aarch64

implementer     :  65 (0x41)

variant         :   0 (0x00)

part            : 3336 (0xD08)

revision        :   3 (0x03)

flags           : asimd,cpuid,crc32,fp


Why are the numbers so slow for Volk versus the CRTL? I may be missing something obvious. Thank you in advance.


Here’s the test program:



// g++ -I /usr/local/include/volk volk_sqrt.cpp -o volk_sqrt -L /usr/local/lib64/ -lvolk

// export LD_LIBRARY_PATH=/usr/local/lib64; ./volk_sqrt


#include <stdio.h>

#include <math.h>

#include <volk.h>

#include <limits.h>

#include <time.h>

#include <sys/time.h>


double get_wall_time()

{

    struct timeval time;


    if (gettimeofday(&time,NULL)) 

    {

        //  Handle error

        return 0;

    }

    return (double)time.tv_sec + (double)time.tv_usec * .000001;

}


int main(int argc, char* args[])

{

    double walStop;

    double walStart;

    double dur_VolkSqrt;

    double dur_CRTLSqrt;

    int N = 1024*16;


    unsigned int alignment = volk_get_alignment();

    float* in = (float*)volk_malloc(sizeof(float)*N, alignment);

    float* out = (float*)volk_malloc(sizeof(float)*N, alignment);


    for(unsigned int ii = 0; ii < N; ++ii)

    {

        in[ii] = (float)(ii*ii);

    }


    walStart = get_wall_time();

    volk_32f_sqrt_32f_a(out, in, N);

    //volk_32f_sqrt_32f(out, in, N);

    walStop = get_wall_time();

    dur_VolkSqrt = walStop - walStart;


    walStart = get_wall_time();

    for(unsigned int ii = 0; ii < N; ++ii)

    {

        out[ii] = sqrt(in[ii]);

    }

    walStop = get_wall_time();

    dur_CRTLSqrt = walStop - walStart;


    printf("dur_VolkSqrt=(%f)%f dur_CRTLSqrt=(%f)%f\n", dur_VolkSqrt/N, dur_VolkSqrt, dur_CRTLSqrt/N, dur_CRTLSqrt);

    volk_free(in);

    volk_free(out);

    return 0;

}


reply via email to

[Prev in Thread] Current Thread [Next in Thread]