[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [igraph] slow rewire performance after subsetting a graph
From: |
Tamás Nepusz |
Subject: |
Re: [igraph] slow rewire performance after subsetting a graph |
Date: |
Fri, 21 Dec 2012 12:20:50 +0100 |
Hi Murat,
So I've spent a bit of time with debugging and here's the explanation. It's a
long story.
First of all, H in your example is not a clone of G in any sense; it is a
completely independent object, so that's not the issue here.
Second, igraph's internal data structures are optimized for queries, not
modifications. In this sense, rewiring is probably the worst thing that one can
do with a graph because it involves mutation operators (deletion and addition
of edges) nearly as frequently as query operations (checking whether two nodes
are connected or not). Therefore, most of the time in rewire() is spent by
removing and adding edges, not by edge selection.
Third, when you specify niter=X, it means that igraph will do X _trials_ to
rewire edges, no matter whether a particular trial succeeds or fails (because
the selected edge pair cannot be rewired without creating loop or multiple
edges). So, the observed speed depends not only on niter but also on the
structure of your graph -- some graphs are easier to rewire than others because
it is less likely that the algorithm selects an edge pair that cannot be
swapped with a double edge swap.
Fourth, we have recently noticed a minor bug in igraph_rewire which resulted in
the sampling not being done uniformly; in other words, some of the edges had a
bigger chance of being selected for rewiring than others. In your particular
case, this bug means that when you are rewiring G, a higher fraction of the
rewiring trials fail and therefore you perceive that the rewiring is faster --
but actually, the function is actually performing less actual double edge swaps
in the case of G than in the case of H. (The previous sampling method was
sensitive on the presence of isolated vertices).
Fifth, we have already fixed the above mentioned bug in the development version
(which is not released yet), but it actually made igraph_rewire even slower
because most of the edge swaps actually succeed now. For your particular graph,
about 94% of the swaps succeed, resulting in an execution time of about 13
seconds on my computer for 5*10^5 rewiring trials. (The success percentage was
less than 30% in the old version for your graph). Actually, I haven't noticed
that igraph_rewire became _this_ slow until you made me double-check my code,
so the next step for me will be to speed up igraph_rewire a bit. This will
probably be done by temporarily switching the graph representation from an
indexed edge list to an adjacency list instead, then doing the rewiring on the
adjacency list (which can be mutated way faster), and converting the adjacency
list back to a graph. A few issues are still open, though; for instance, it
would be preferable (at least for me) to try and keep the edge attributes while
rewiring, and I'm not really sure how to do that yet.
Anyway, I have filed a bug for this in our bug tracker; feel free to subscribe
to it if you wish to get notified if we manage to fix it:
https://bugs.launchpad.net/igraph/+bug/1092868
--
T.
On 21 Dec 2012, at 04:14, Murat Tasan <address@hidden> wrote:
> hi all - i've noticed some drastic performance changes in the rewire method
> after i added some clean-up steps prior to rewiring a graph.
> this was done using the R package (v0.6, with 1-indexing):
>
> G is a random graph where i artificially add a collection of zero-degree
> vertices, which i'll remove later to demonstrate the speed issue.
>
> > G <- graph.data.frame(data.frame(v1 = sample(100, 100, replace = TRUE), v2
> > = sample(100, 100, replace = TRUE)), vertices = data.frame(1:200))
> > H <- delete.vertices(G, which(degree(G) == 0))
>
> the second line removes any zero-degree vertices from the graph object.
> the edges (and any connected vertices) remain the same between G and H.
>
> > G
> IGRAPH DN-- 200 100 --
> + attr: name (v/c)
> > H
> IGRAPH DN-- 86 100 --
> + attr: name (v/c)
>
> now some basic re-wiring.
>
> > system.time(rewire(G, niter = 5e5))
> user system elapsed
> 1.070 0.000 1.078
> > system.time(rewire(H, niter = 5e5))
> user system elapsed
> 4.930 0.000 4.929
>
> i'm guessing "H" is kept in memory as a view on a backing "G", leading to the
> slow-down as boundary checks have to be made and calls forwarded to the
> underlying object.
> for some large (real data) graphs i'm working with, the slow-down is
> magnified by a lot and becomes a substantial problem.
> if my "view" guess is right, is there any way in R to force "H" to be treated
> as a completely new object (i.e. a clone-like call)?
>
> cheers, and thanks for any insight here,
>
> -m
> _______________________________________________
> igraph-help mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/igraph-help