[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Shuffling elements in a dataset (fwd)
From: |
Ted Harding |
Subject: |
Shuffling elements in a dataset (fwd) |
Date: |
Fri, 21 Feb 1997 23:54:21 +0000 (GMT) |
( Re Message From: address@hidden )
>
> I have the need to randomize the order (shuffle) of very large
> datasets. The way I devise, randonly sampling with elimination, is
> not very efficient. Is there a better way, using octave's matrix
> manipulation?
>
> My way:
>
> nm = num = rows(data);
>
> for i=1:num
> rn = ceil(rand * nm--);
> new_data(i,:) = data(rn,:);
> data(rn,:) = [];
> endfor
>
> Better way: perhaps creating a vector of unique indexes? but how to
> do this?
>
> idx = 1:rows(data);
> now shuffle idx
> new_data = data(idx,:)
>
> Of course, this it is the same problem in one dimension...
I find that something like
[dummy,ix] = sort(rand(1,rows(x))); new_x = x(ix,:);
seems pretty fast. (0.04 secs for 10000 rows, 0.05 secs for 100000 rows,
or for 1000000, on a 386-DX/25MHz; 0.003 secs for 10000 rows, 0.004 secs
for 100000 rows, or for 1000000, on Pentium-120, i.e. almost independent
of number of rows. However for 10000000 rows it starts swapping and takes
a while (48 MB RAM)). Above timings for 1 column only; reduce sizes pro
rata for extra columns (RAM limit).
Ted. (address@hidden)
- Shuffling elements in a dataset (fwd),
Ted Harding <=