igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] slow rewire performance after subsetting a graph


From: Tamás Nepusz
Subject: Re: [igraph] slow rewire performance after subsetting a graph
Date: Fri, 21 Dec 2012 12:20:50 +0100

Hi Murat,

So I've spent a bit of time with debugging and here's the explanation. It's a 
long story.

First of all, H in your example is not a clone of G in any sense; it is a 
completely independent object, so that's not the issue here.

Second, igraph's internal data structures are optimized for queries, not 
modifications. In this sense, rewiring is probably the worst thing that one can 
do with a graph because it involves mutation operators (deletion and addition 
of edges) nearly as frequently as query operations (checking whether two nodes 
are connected or not). Therefore, most of the time in rewire() is spent by 
removing and adding edges, not by edge selection.

Third, when you specify niter=X, it means that igraph will do X _trials_ to 
rewire edges, no matter whether a particular trial succeeds or fails (because 
the selected edge pair cannot be rewired without creating loop or multiple 
edges). So, the observed speed depends not only on niter but also on the 
structure of your graph -- some graphs are easier to rewire than others because 
it is less likely that the algorithm selects an edge pair that cannot be 
swapped with a double edge swap.

Fourth, we have recently noticed a minor bug in igraph_rewire which resulted in 
the sampling not being done uniformly; in other words, some of the edges had a 
bigger chance of being selected for rewiring than others. In your particular 
case, this bug means that when you are rewiring G, a higher fraction of the 
rewiring trials fail and therefore you perceive that the rewiring is faster -- 
but actually, the function is actually performing less actual double edge swaps 
in the case of G than in the case of H. (The previous sampling method was 
sensitive on the presence of isolated vertices).

Fifth, we have already fixed the above mentioned bug in the development version 
(which is not released yet), but it actually made igraph_rewire even slower 
because most of the edge swaps actually succeed now. For your particular graph, 
about 94% of the swaps succeed, resulting in an execution time of about 13 
seconds on my computer for 5*10^5 rewiring trials. (The success percentage was 
less than 30% in the old version for your graph). Actually, I haven't noticed 
that igraph_rewire became _this_ slow until you made me double-check my code, 
so the next step for me will be to speed up igraph_rewire a bit. This will 
probably be done by temporarily switching the graph representation from an 
indexed edge list to an adjacency list instead, then doing the rewiring on the 
adjacency list (which can be mutated way faster), and converting the adjacency 
list back to a graph. A few issues are still open, though; for instance, it 
would be preferable (at least for me) to try and keep the edge attributes while 
rewiring, and I'm not really sure how to do that yet.

Anyway, I have filed a bug for this in our bug tracker; feel free to subscribe 
to it if you wish to get notified if we manage to fix it:

https://bugs.launchpad.net/igraph/+bug/1092868

-- 
T.

On 21 Dec 2012, at 04:14, Murat Tasan <address@hidden> wrote:

> hi all - i've noticed some drastic performance changes in the rewire method 
> after i added some clean-up steps prior to rewiring a graph.
> this was done using the R package (v0.6, with 1-indexing):
> 
> G is a random graph where i artificially add a collection of zero-degree 
> vertices, which i'll remove later to demonstrate the speed issue.
> 
> > G <- graph.data.frame(data.frame(v1 = sample(100, 100, replace = TRUE), v2 
> > = sample(100, 100, replace = TRUE)), vertices = data.frame(1:200))
> > H <- delete.vertices(G, which(degree(G) == 0))
> 
> the second line removes any zero-degree vertices from the graph object.
> the edges (and any connected vertices) remain the same between G and H.
> 
> > G
> IGRAPH DN-- 200 100 --
> + attr: name (v/c)
> > H
> IGRAPH DN-- 86 100 --
> + attr: name (v/c)
> 
> now some basic re-wiring.
> 
> > system.time(rewire(G, niter = 5e5))
>    user  system elapsed
>   1.070   0.000   1.078
> > system.time(rewire(H, niter = 5e5))
>    user  system elapsed
>   4.930   0.000   4.929
> 
> i'm guessing "H" is kept in memory as a view on a backing "G", leading to the 
> slow-down as boundary checks have to be made and calls forwarded to the 
> underlying object.
> for some large (real data) graphs i'm working with, the slow-down is 
> magnified by a lot and becomes a substantial problem.
> if my "view" guess is right, is there any way in R to force "H" to be treated 
> as a completely new object (i.e. a clone-like call)?
> 
> cheers, and thanks for any insight here,
> 
> -m
> _______________________________________________
> igraph-help mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/igraph-help




reply via email to

[Prev in Thread] Current Thread [Next in Thread]