[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [igraph] "reasonable" size graphs for community detection
From: |
Minh Nguyen |
Subject: |
Re: [igraph] "reasonable" size graphs for community detection |
Date: |
Mon, 30 Jul 2012 12:13:11 +1000 |
Hi Ross,
On Mon, Jul 30, 2012 at 11:44 AM, Ross Gayler <address@hidden> wrote:
> I want to find communities in a graph of ~1M vertices and will be
> running the software on a desktop PC with 16GB RAM and running 64-bit
> linux. Is it reasonable to expect I could perform that kind of
> analysis on that kind of hardware? What order of magnitude run-time
> should I expect (seconds, hours, days, weeks)?
The algorithm of (Blondel et al. 2008)
http://arxiv.org/abs/arXiv:0803.0476
is very fast. Its R interface is
http://igraph.sourceforge.net/doc/R/multilevel.community.html
and its C interface is
http://igraph.sourceforge.net/doc/html/ch22s06.html#igraph_community_multilevel
I haven't used the R interface. But from my experience with its C
interface, I was able to extract communities from the whole DBLP
dataset
http://www.informatik.uni-trier.de/~ley/db/
in about one hour. (I can't remember exactly the time, but I'm sure it
was less than 24 hours.) The DBLP dataset has over 1,058,000 nodes.
Using the R interface would take a little longer I suspect. Other
algorithms that are implemented in igraph have required over a day to
extract communities from DBLP. In particular, I don't recommend that
you use the following to handle networks with over one million nodes:
* http://igraph.sourceforge.net/doc/R/fastgreedy.community.html
* http://igraph.sourceforge.net/doc/R/label.propagation.community.html
* http://igraph.sourceforge.net/doc/R/leading.eigenvector.html
> The above case of one big graph is probably the worst case scenario.
> I suspect that the edge density is rather low and that the 1M vertices
> can probably be partitioned into thousands of totally disjoint
> subgraphs (which may consitute a reasonable definition of communities
> - but I want to allow for the possibility that there may be
> communities within large, but loosely connected subgraphs). Would
> that partitioning have a large impact on the runtime of community
> detection?
I don't know. It can depend on which algorithm you use to extract
communities. If you want a quick and dirty answer, use (Blondel et
al. 2008).
--
Regards,
Minh Van Nguyen
http://bit.ly/mvngu