|
From: | Tom Rondeau |
Subject: | Re: [Discuss-gnuradio] QT GUI time sink (float) unnecessary memmove()? |
Date: | Sun, 29 Mar 2015 17:20:22 -0700 |
When testing, I used 5 float streams rumning at over 150 Msps each, with 15 microsecomd bursts of 50 MHz at about 10 microseconds apart. I used enough x points to see two bursts on the gui. Normal trigger. (Free or auto trigger moght be too taxing.)
-Regards
Andy
On March 28, 2015 8:06:08 PM EDT, Tom Rondeau <address@hidden> wrote:On Sat, Mar 28, 2015 at 12:50 PM, Andy Walls <address@hidden> wrote:On Sat, 2015-03-28 at 14:45 -0400, Andy Walls wrote:
> Hi Tom:
>
>
> On Sat, 2015-03-28 at 11:12 -0700, Tom Rondeau wrote:
> > On Sat, Mar 28, 2015 at 11:00 AM, Andy Walls
> > <address@hidden> wrote:
>
> > Can this memmove() be safely skipped
> >
> > https://github.com/gnuradio/gnuradio/blob/master/gr-qtgui/lib/time_sink_f_impl.cc#L627
> [snip]
> > The volk_32f_convert_64f_u_avx() call is unavoidable as Qwt
> > wants
> > doubles for plotting and not floats. But it might also be able
> > to be
> > deferred to the very end when the decision to plot is known
> > for sure.
> > (But that's more surgery than I care to take on at the
> > moment.)
>
>
> > But thinking about the volk convert function, that's both copying the
> > data from the input buffer into the internal buffer as well as
> > performing the conversion. We can't just hold data in the input since
> > we don't want to back up the data until we're ready to plot both with
> > timing and with a full enough buffer -- it's just sampling a section
> > at a time and drops everything in between.
>
> Right.
>
> > That part could be converted into a memcpy instead of the volk
> > convert. Then, when we're ready to plot, we call the volk convert that
> > also does the move from d_start to 0, so it combines those two
> > elements.
>
> Yeah, that's the surgery part. :) It would require adding a new set of
> buffers to hold floats objects, and then convert them when a
> determination to plot was made.
>
> This also affects the memmove() of the tail for the trigger delay. It
> would operate on the new set of float buffers (vs the buffers holding
> doubles).
>
> > Thoughts on those proposals?
Your proposal for implementing memcpy() and deferring volk_*() to do the
conversion and "memmove" in one step is great! :)
I just implemented it, and the time_sink_f thread has gone from 41.5%
CPU down to 29.1% CPU in my tests. :) memcpy() now dominates the
thread, but that's to be expected.
With my initial hack:
> CPU: Intel Sandy Bridge microarchitecture, speed 3.5e+06 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
> samples % image name symbol name
> 78158 39.0737 libvolk.so.0.0.0 volk_32f_convert_64f_u_avx
> 22777 11.3870 no-vmlinux /no-vmlinux
> 13972 6.9851 libgnuradio-qtgui-3.7.7git.so.0.0.0 gr::qtgui::time_sink_f_impl::_test_trigger_slope(float const*) const
> 7781 3.8900 libgnuradio-qtgui-3.7.7git.so.0.0.0 gr::qtgui::time_sink_f_impl::_test_trigger_norm(int, std::vector<void const*, std::allocator<void const*> >)
> 7236 3.6175 libpthread-2.18.so pthread_mutex_lock
> 6163 3.0811 libgnuradio-runtime-3.7.7git.so.0.0.0 boost::detail::sp_counted_base::release()
> 5942 2.9706 libpthread-2.18.so pthread_mutex_unlock
> 4947 2.4732 libgnuradio-runtime-3.7.7git.so.0.0.0 gr::block_executor::run_one_iteration()
> 3826 1.9127 libgnuradio-runtime-3.7.7git.so.0.0.0 gr::block_detail::input(unsigned int)
> 3555 1.7773 libstdc++.so.6.0.19 /usr/lib64/libstdc++.so.6.0.19
> 3206 1.6028 libc-2.18.so __memmove_ssse3_back
> [...]
With my implementation of your suggestion:
CPU: Intel Sandy Bridge microarchitecture, speed 3.5e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 90000
samples % image name symbol name
27595 35.6051 libc-2.18.so __memcpy_sse2_unaligned
12225 15.7736 no-vmlinux /no-vmlinux
4051 5.2269 libpthread-2.18.so pthread_mutex_lock
3739 4.8243 libgnuradio-runtime-3.7.7git.so.0.0.0 boost::detail::sp_counted_base::release()
3362 4.3379 libpthread-2.18.so pthread_mutex_unlock
2876 3.7108 libgnuradio-runtime-3.7.7git.so.0.0.0 gr::block_executor::run_one_iteration()
2364 3.0502 libgnuradio-runtime-3.7.7git.so.0.0.0 gr::block_detail::input(unsigned int)
2091 2.6980 libstdc++.so.6.0.19 /usr/lib64/libstdc++.so.6.0.19
1388 1.7909 libgnuradio-runtime-3.7.7git.so.0.0.0 gr::tpb_detail::notify_upstream(gr::block_detail*)
1138 1.4683 libc-2.18.so __memmove_ssse3_back
[...]
2 0.0026 libvolk.so.0.0.0 __volk_32f_convert_64f_d
[...]
1 0.0013 libvolk.so.0.0.0 volk_32f_convert_64f_a_avx
Regards,
AndyAndy,Excellent!I've got a few other minor patches for some things, I'll put this in there to and test on my end as well.Tom
[Prev in Thread] | Current Thread | [Next in Thread] |