igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] 'decompose.graph' versus 'clusters'


From: David Hunkins
Subject: Re: [igraph] 'decompose.graph' versus 'clusters'
Date: Sat, 19 Jul 2008 10:38:51 -0700

Thanks Gabor. Don't you take a break on the weekends?!  I'm going to try what you suggest, since that seems the best path to find the most interesting 'connectors' in the large clusters. By eliminating clusters up to size 5, I can get rid of 73% of the clusters, which should improve on the O(c(E+V)) calculation. I am loath to kill the clusters up to size 10 since that represents 'interesting' green growth of the social network (and only kills another 10%). Also, thanks for the new version of betweenness--I'll let you know the results. Because 'clusters' appears to be performing so well, could you look at the following and tell me what you think?

I had done some tests on 'clusters' versus 'decompose.graph', and I found the following table of results, which suggest to me that somehow 'clusters' is very much faster than 'decompose.graph':

clusters decompose.graph
5k .005sec 1.155sec
10k .008sec 3.824sec
20k .008sec 12.257sec
5M 2.905sec <did not finish after 24 hours>

'Clusters' is amazingly fast. I hadn't realized that constructing the subgraph from a large graph would be so computationally intensive compared to just identifying the members of a cluster. But I wonder, if the subgraph were constructed at the same time as the members of the cluster are detected, would it run much faster? I realize I'm in over my head, so feel free to let that one go...

Dave

David Hunkins
im: davehunkins
415 336-8965


On Jul 19, 2008, at 2:18 AM, Gabor Csardi wrote:

Hi David,

yes, you're right, decompose.graph is not O(V+E), it is in fact O(c(V+E)),
where 'c' is the number of components, I'll correct that.

'clusters' gives back the membership of the vertices, it is
in the 'membership' component, so you could use this to create subgraphs.
But it does not make sense, since this is exactly what decompose.graph is
doing, so it will be just as slow.

What you can try, is to eliminate the trivial components from your graph
first, i.e. the one with one, two, vertices, maybe up to ten, and
then (if there are much less components left) decompose the graph. Remember,
however, that you cannot run betweenness on a graph with hundred thousend
vertices or more. Most networks have a giant component, so if you have
5M vertices in the full graph, you might still end up with 1M in the
largest component. Check this first with 'clusters'.

I've been working on speeding up betweenness.estimate, it is much
better now, but of course I'm still not sure that it is fast enough
for your graph, it depends on the graph structure as well, not
only on the size of the graph. You can give it another try, here
is the new package:
http://cneurocvs.rmki.kfki.hu/igraph/download/igraph_0.6.tar.gz

I think a viable approach could be to
1) eliminate the small clusters from the graph
2) decompose the remainder into components
3) run betweenness.estimate on the components, with cutoff=2, or 3.
It is a question, however, whether such a small cutoff is enough.

Speeding up decompose.graph has been on the TODO list for long,
I gave more priority now.

G.

On Fri, Jul 18, 2008 at 01:31:35PM -0700, David Hunkins wrote:
Hi, I'm working on a large disconnected graph (5M vertices, 10M edges, 500k
clusters). I'm trying to reduce the time it takes to compute betweenness for
each vertex by breaking the graph up into connected components. Decompose.graph
does this in a very convenient way, since it returns graph objects that I can
run betweenness on:

comps <- decompose.graph(g10k)
for (i in 1:length(comps)){
write(rbind(V(comps[[i]])$id,betweenness(comps[[i]])),file="outfile", nc=2, sep
=",", append=TRUE)
}

However decompose.graph is very slow compared with clusters, which appears to
do the same thing in almost no time. (I can compute no.clusters on my graph in
a few seconds, whereas decompose.graph, run on the same graph, does not finish
in 24 hours.) The docs for the C functions indicate that  'clusters' and
'decompose.graph' both have O(V + E) time complexity, but I have not found this
to be true.

It appears that others have selected 'clusters' to partition large graphs:

http://lists.gnu.org/archive/html/igraph-help/2007-12/msg00046.html

Does anybody have some R 'glue code' that makes clusters return a list of
graphs like decompose.graph does? (I'm an R newbie.) Or other suggestions /
clarifications?

Thanks,

Dave

David Hunkins
address@hidden
im: davehunkins




_______________________________________________
igraph-help mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/igraph-help


--
Csardi Gabor <address@hidden>    UNIL DGM


_______________________________________________
igraph-help mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/igraph-help


reply via email to

[Prev in Thread] Current Thread [Next in Thread]