Re: [igraph] slow rewire performance after subsetting a graph

On Fri, Dec 21, 2012 at 6:20 AM, Tamás Nepusz <address@hidden> wrote:

Hi Murat,

So I've spent a bit of time with debugging and here's the explanation. It's a long story.

First of all, H in your example is not a clone of G in any sense; it is a completely independent object, so that's not the issue here.

Second, igraph's internal data structures are optimized for queries, not modifications. In this sense, rewiring is probably the worst thing that one can do with a graph because it involves mutation operators (deletion and addition of edges) nearly as frequently as query operations (checking whether two nodes are connected or not). Therefore, most of the time in rewire() is spent by removing and adding edges, not by edge selection.

Third, when you specify niter=X, it means that igraph will do X _trials_ to rewire edges, no matter whether a particular trial succeeds or fails (because the selected edge pair cannot be rewired without creating loop or multiple edges). So, the observed speed depends not only on niter but also on the structure of your graph -- some graphs are easier to rewire than others because it is less likely that the algorithm selects an edge pair that cannot be swapped with a double edge swap.

Fourth, we have recently noticed a minor bug in igraph_rewire which resulted in the sampling not being done uniformly; in other words, some of the edges had a bigger chance of being selected for rewiring than others. In your particular case, this bug means that when you are rewiring G, a higher fraction of the rewiring trials fail and therefore you perceive that the rewiring is faster -- but actually, the function is actually performing less actual double edge swaps in the case of G than in the case of H. (The previous sampling method was sensitive on the presence of isolated vertices).

Fifth, we have already fixed the above mentioned bug in the development version (which is not released yet), but it actually made igraph_rewire even slower because most of the edge swaps actually succeed now. For your particular graph, about 94% of the swaps succeed, resulting in an execution time of about 13 seconds on my computer for 5*10^5 rewiring trials. (The success percentage was less than 30% in the old version for your graph). Actually, I haven't noticed that igraph_rewire became _this_ slow until you made me double-check my code, so the next step for me will be to speed up igraph_rewire a bit. This will probably be done by temporarily switching the graph representation from an indexed edge list to an adjacency list instead, then doing the rewiring on the adjacency list (which can be mutated way faster), and converting the adjacency list back to a graph. A few issues are still open, though; for instance, it would be preferable (at least for me) to try and keep the edge attributes while rewiring, and I'm not really sure how to do that yet.

Anyway, I have filed a bug for this in our bug tracker; feel free to subscribe to it if you wish to get notified if we manage to fix it:

https://bugs.launchpad.net/igraph/+bug/1092868

--
T.

On 21 Dec 2012, at 04:14, Murat Tasan <address@hidden> wrote:

> hi all - i've noticed some drastic performance changes in the rewire method after i added some clean-up steps prior to rewiring a graph.
> this was done using the R package (v0.6, with 1-indexing):
>
> G is a random graph where i artificially add a collection of zero-degree vertices, which i'll remove later to demonstrate the speed issue.
>
> > G <- graph.data.frame(data.frame(v1 = sample(100, 100, replace = TRUE), v2 = sample(100, 100, replace = TRUE)), vertices = data.frame(1:200))
> > H <- delete.vertices(G, which(degree(G) == 0))
>
> the second line removes any zero-degree vertices from the graph object.
> the edges (and any connected vertices) remain the same between G and H.
>
> > G
> IGRAPH DN-- 200 100 --
> + attr: name (v/c)
> > H
> IGRAPH DN-- 86 100 --
> + attr: name (v/c)
>
> now some basic re-wiring.
>
> > system.time(rewire(G, niter = 5e5))
> user system elapsed
> 1.070 0.000 1.078
> > system.time(rewire(H, niter = 5e5))
> user system elapsed
> 4.930 0.000 4.929
>
> i'm guessing "H" is kept in memory as a view on a backing "G", leading to the slow-down as boundary checks have to be made and calls forwarded to the underlying object.
> for some large (real data) graphs i'm working with, the slow-down is magnified by a lot and becomes a substantial problem.
> if my "view" guess is right, is there any way in R to force "H" to be treated as a completely new object (i.e. a clone-like call)?
>
> cheers, and thanks for any insight here,
>
> -m

> _______________________________________________
> igraph-help mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/igraph-help

_______________________________________________
igraph-help mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/igraph-help

From:	Murat Tasan
Subject:	Re: [igraph] slow rewire performance after subsetting a graph
Date:	Fri, 21 Dec 2012 16:16:57 -0500