[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 3D versus 2D Indexing and the Speed Thereof
From: |
Bob Weigel |
Subject: |
Re: 3D versus 2D Indexing and the Speed Thereof |
Date: |
Thu, 12 Apr 2007 17:10:14 -0400 |
User-agent: |
KMail/1.8.2 |
>
> // First do one element to force the copy-on-write
> elem(dest_offset) = source.elem (source_offset);
> raw_source = &(source.rep->data[source_offset]);
> raw_dest = &(rep->data[dest_offset] );
>
> for (octave_idx_type i = 0; i < block_count; i++)
> {
> memcpy (raw_dest, raw_source, sizeof(T)*element_count);
> raw_source += source_stride;
> raw_dest += dest_stride;
> }
> }
>
I am not sure if this is related, but I have wanted to speed up Octave's
repmat (and zeros and ones) function. As currently implemented, repmat.m
uses some impressive vectorization tricks, but it is orders of magnitude
slower than other implementations. The way I was going to approach it was
using the method from repmat.c at
http://research.microsoft.com/~minka/software/lightspeed.
I don't fully understand the C++ code that was posted iin this thread, but I
suspect that it is using the same method of the "lightspeed" repmat.c. If it
is, then ignore this post.
The relevant part of repmat.c is as follows.
* repeat a block of memory rep times */
void memrep(char *dest, size_t chunk, int rep)
{
#if 0
/* slow way */
int i;
char *p = dest;
for(i=1;i<rep;i++) {
p += chunk;
memcpy(p, dest, chunk);
}
#else
/* fast way */
if(rep == 1) return;
memcpy(dest + chunk, dest, chunk);
if(rep & 1) {
dest += chunk;
memcpy(dest + chunk, dest, chunk);
}
/* now repeat using a block twice as big */
memrep(dest, chunk<<1, rep>>1);
#endif
}
Re: 3D versus 2D Indexing and the Speed Thereof, John W. Eaton, 2007/04/09
- Re: 3D versus 2D Indexing and the Speed Thereof,
Bob Weigel <=