help-octave
[Top][All Lists]

## Re: Split data depending on values

 From: Juan Pablo Carbajal Subject: Re: Split data depending on values Date: Fri, 13 Jul 2018 00:13:24 +0200

```Hi,
Please do not send private e-mails. Keep the conversation in the mailing list.

Observations:
1. Boolean operations broadcast, check e.g.
x = (0:10).';
[3 6] <= x
ans =

0  0
0  0
0  0
1  0
1  0
1  0
1  1
1  1
1  1
1  1
1  1

2. You can meshgrid your parameters

criteria = @(X,a,b,c,d) (a <= X(:,1) & X(:,1) < b) & (c <= X(:,2) & X(:,2) < d);
M = rand (1e4, 2);
np = 60;
p = linspace (0, 1, np);

tic
[A C] = meshgrid (p(1:end-1));
[B D] = meshgrid (p(2:end));
a = A(:).'; b = B(:).'; c = C(:).'; d = D(:).';

tf = criteria(M,a, b, c, d);
# Slicing will give different sizes in general
npar = size (tf, 2);
M_selected = cell(npar, 1);
for i=1:npar
M_selected{i} = M(tf(:,i),:);
endfor
toc

tic
M_selected2 = cell (np-1, np-1);
for i=1:np-1
for j=1:np-1
tf = criteria (M,p(i),p(i+1),p(j),p(j+1));
M_selected2{j,i} = M(tf,:);
endfor
endfor
toc

success = cellfun(@(x,y) all((x==y)(:)), M_selected, M_selected2(:));
all (success)

I get
Elapsed time is 0.35539 seconds.
Elapsed time is 0.496975 seconds.
ans = 1

You can check that the two algorithms have the same complexity and the
gain is marginal, so you can stick to your loopy version. In this
particular example you are doing much fo the work to build a
histrogram of a 2D variable, Check hist3 in the statistics package for
similar work.

Cheers

On Thu, Jul 12, 2018 at 3:26 PM Matthieu Chourrout
>
> Hi,
>
> Thank you for your answer. In fact, I realized my function was too
> complicated.
>
> I want to achieve what you did there but with different values, and I was
> trying to do it w/o any 'for' loop. After thinking about your code, this is
> what I wanted to do:
>
> criteria = @(a,b,c,d) M(:,1) >= a & M(:,2) >= c & M(:,1) < b & M(:,2) < d;
>
> M = rand(500, 2);
> A = linspace(0,1,10);
> result = {};
> for i = 1:9
>         for j = 1:9
>                 result(i,j) = M(criteria(A(i),A(i+1),A(j),A(j+1)), :);
>         endfor
> endfor
>
> Is there a way to get rid of the 'for' loops at this point?
>
> Best regards,
>
> Matthieu CHOURROUT
>
>
> ----- Mail original -----
>
> De: "Juan Pablo Carbajal" <address@hidden>
> Envoyé: Jeudi 12 Juillet 2018 14:20:27
> Objet: Re: Split data depending on values
>
> Hi,
>
> I am not sure I understood your problem fully. Also your code doesn't
> help much because I can't run it. Always provide a **simple** running
> example to get good answers.
>
> In general any N-by-k matrix can be sorted by rows or columns. you
> need to define the criteria as a function that returns booleans with
> the length of columns and rows, e.g. an array of booleans with N rows
> to filter rows.
>
> A simple example: lets say you want all the rows for which the row
> vector has euclidean norm between a and b
>
> function tf = criteria (X, a, b)
>   Xnorm = sqrt (sumsq (X,2));
>   tf = Xnorm > a & Xnorm < b;
> endfunction
>
> M = randn (500, 2);
> tf = criteria (M, 0.5, 1);
> m = M(tf,:);
> plot(M(:,1), M(:,2), 'o;data;', m(:,1), m(:,2), 'x;selected;')
> t = linspace (0,2*pi,100).';
> Bu = [cos(t), sin(t)];
> hold on
> plot([0.5 1] .* Bu(:,1), [0.5 1] .* Bu(:,2), '-g')
> hold off
> axis tight
>
> You can generalize this to rows. If you want a 2D filter then your
> booleans should have the same shape as your matrix, e.g.
> tf = false(10,2); tf(1:3,1) = true; tf(7:10, 1) = true; tf(4:6, 2) = true;
> will select sections of the first and second column of a 10-by-2 matrix.
>
> On Wed, Jul 11, 2018 at 3:55 PM Matthieu Chourrout
> >
> > Hi,
> >
> > I currently have a set of randomly spread data in a 2048-by-2048 grid
> > (non-integer values between 1 and 2048), arranged in an N-by-2 matrix with
> > X,Y coordinates, and I would like to split these data for further analysis,
> > starting from 4 quarters of 1024-by-1024 (but with the idea of splitting it
> > again).
> > I also need to be able to extract the indexes as some other data is
> > "paired" with this.
> >
> > It's the same idea as the BLOCKPROC function, but I'm not working with an
> > image at this point.
> >
> > To be more precise, here is my data:
> >
> > colors = "rgb";
> > for channel_id = 1:3
> >    regions.(colors(channel_id)) = regionprops(labels(:,:,2), ...
> > filtered_image(:,:,channel_id),'Area','SubarrayIdx','BoundingBox','WeightedCentroid','MaxIntensity','MinIntensity','PixelValues');
> >    centers_of_mass.(colors(channel_id)) = cat(1,
> > regions.(colors(channel_id)).WeightedCentroid);
> > endfor
> > % centers_of_mass.r, centers_of_mass.g and centers_of_mass.b are my N-by-2
> > matrices
> >
> > I tried to use some sorting and then split at the different thresholds, but
> > maybe is there another more efficient way?
> >
> >
> > function [new_idx, splitting_idx] = split(data_to_rearrange,
> > splitting_values)
> >  splitting_idx = [0 0 0];
> >
> >  [rearranged_data, new_idx] = sortrows(data_to_rearrange, [1 2]);
> >  splitting_idx(2) = -1 + find(rearranged_data(:,1) > splitting_values(1),1);
> >
> >  [rearranged_data(1:splitting_idx(2), :), tmp_idx] =
> > sortrows(rearranged_data(1:splitting_idx(2), :), [2 1]);
> >  new_idx(1:splitting_idx(2)) = new_idx(1:splitting_idx(2))(tmp_idx);
> >  splitting_idx(1) = -1 + find(rearranged_data(1:splitting_idx(2),2) >
> > splitting_values(2),1);
> >  [rearranged_data(splitting_idx(2)+1:end, :), tmp_idx] =
> > sortrows(rearranged_data(splitting_idx(2)+1:end, :), [2 1]);
> >  new_idx(splitting_idx(2)+1:end) = new_idx(splitting_idx(2)+1:end)(tmp_idx);
> >  splitting_idx(3) = -1 + splitting_idx(2) +
> > find(rearranged_data(splitting_idx(2)+1:end,2) > splitting_values(2),1);
> > endfunction
> >
> >
> >
> > With this function and the piece of code below, I can achieve it for the
> > quarters. But now I want to split it further.
> >
> >
> >
> > [I4,s4] = split(centers_of_mass.g,[acquisition_parameters.camera.center.x
> > acquisition_parameters.camera.center.y]);
> >  [I4,s4] = split(centers_of_mass.g,[1024 1024]);
> >  centers_of_mass.g = centers_of_mass.g(I4,:);
> >  centers_of_mass.r = centers_of_mass.r(I4,:);
> >  centers_of_mass.b = centers_of_mass.b(I4,:);
> >
> >  splitted_centers_of_mass.mean_g = [ nanmean(centers_of_mass.g(1:s4(1),:))
> > ; ...
> >   nanmean(centers_of_mass.g(s4(1)+1:s4(2),:)) ; ...
> >   nanmean(centers_of_mass.g(s4(2)+1:s4(3),:)) ; ...
> >   nanmean(centers_of_mass.g(s4(3)+1:end,:)) ];
> >  splitted_centers_of_mass.mean_r = [ nanmean(centers_of_mass.r(1:s4(1),:))
> > ; ...
> >   nanmean(centers_of_mass.r(s4(1)+1:s4(2),:)) ; ...
> >   nanmean(centers_of_mass.r(s4(2)+1:s4(3),:)) ; ...
> >   nanmean(centers_of_mass.r(s4(3)+1:end,:)) ];
> >  splitted_centers_of_mass.mean_b = [ nanmean(centers_of_mass.b(1:s4(1),:))
> > ; ...
> >   nanmean(centers_of_mass.b(s4(1)+1:s4(2),:)) ; ...
> >   nanmean(centers_of_mass.b(s4(2)+1:s4(3),:)) ; ...
> >   nanmean(centers_of_mass.b(s4(3)+1:end,:)) ];
> >
> >
> > Or maybe I could use BLOCKPROC earlier in my script (the problem with this
> > solution is that I may also split some of my labeled groups and get
> > duplicates or corrupted data on the edges...)?
> >
> >
> > Matthieu CHOURROUT
> >
> > PS: I remember I have read a thread quoting a function to export pastable
> > variables, so that you can directly work on my data, but I never managed to
> > find it again: if you know it, could you also mention it please?
> >
>

```