[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss-gnuradio] *much* faster filtering
From: |
Eric Blossom |
Subject: |
[Discuss-gnuradio] *much* faster filtering |
Date: |
Tue, 10 May 2005 15:41:13 -0700 |
User-agent: |
Mutt/1.5.6i |
Do to some fine assembly language hacking by Stephane Fillod, we now
have SSE and 3DNow! versions of the guts of the "fcc" and "ccf" FIR
filters. "fcc" is float input, complex output, complex taps. "ccf"
is complex input, complex output, float taps. The "ccf" variant is
especially handy when working with the usrp, since we're generally
dealing with complex baseband data.
The new code is more than 8 times faster on the P4!
----------------------------------------------------------------
Pentium M (1.4 GHz)
address@hidden tests]$ ./benchmark_dotprod_fcc
generic: taps: 256 input: 4e+07 cpu: 110.310 taps/sec: 9.283e+07
SSE: taps: 256 input: 4e+07 cpu: 22.379 taps/sec: 4.576e+08
address@hidden tests]$ ./benchmark_dotprod_ccf
generic: taps: 256 input: 4e+07 cpu: 118.765 taps/sec: 8.622e+07
SSE: taps: 256 input: 4e+07 cpu: 22.093 taps/sec: 4.635e+08
address@hidden tests]$ ./benchmark_dotprod_fff
generic: taps: 256 input: 4e+07 cpu: 16.966 taps/sec: 6.035e+08
SSE: taps: 256 input: 4e+07 cpu: 11.194 taps/sec: 9.148e+08
Athlon 1800+ MP (1.5 GHz)
address@hidden tests]$ ./benchmark_dotprod_fcc
generic: taps: 256 input: 4e+07 cpu: 106.544 taps/sec: 9.611e+07
3DNow!: taps: 256 input: 4e+07 cpu: 17.698 taps/sec: 5.786e+08
SSE: taps: 256 input: 4e+07 cpu: 21.805 taps/sec: 4.696e+08
address@hidden tests]$ ./benchmark_dotprod_ccf
generic: taps: 256 input: 4e+07 cpu: 102.456 taps/sec: 9.994e+07
3DNow!: taps: 256 input: 4e+07 cpu: 16.247 taps/sec: 6.303e+08
SSE: taps: 256 input: 4e+07 cpu: 21.743 taps/sec: 4.71e+08
address@hidden tests]$ ./benchmark_dotprod_fff
generic: taps: 256 input: 4e+07 cpu: 13.662 taps/sec: 7.495e+08
3DNow!: taps: 256 input: 4e+07 cpu: 8.252 taps/sec: 1.241e+09
SSE: taps: 256 input: 4e+07 cpu: 9.982 taps/sec: 1.026e+09
P4 (1.7 GHz)
address@hidden tests]$ ./benchmark_dotprod_fcc
generic: taps: 256 input: 4e+07 cpu: 144.956 taps/sec: 7.064e+07
SSE: taps: 256 input: 4e+07 cpu: 18.968 taps/sec: 5.399e+08
address@hidden tests]$ ./benchmark_dotprod_ccf
generic: taps: 256 input: 4e+07 cpu: 152.732 taps/sec: 6.705e+07
SSE: taps: 256 input: 4e+07 cpu: 18.525 taps/sec: 5.528e+08
address@hidden tests]$ ./benchmark_dotprod_fff
generic: taps: 256 input: 4e+07 cpu: 18.059 taps/sec: 5.67e+08
SSE: taps: 256 input: 4e+07 cpu: 6.792 taps/sec: 1.508e+09
- [Discuss-gnuradio] *much* faster filtering,
Eric Blossom <=