igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] Slow community detection -- Infomap with problems for large


From: Tamas Nepusz
Subject: Re: [igraph] Slow community detection -- Infomap with problems for large graphs?
Date: Mon, 11 Jul 2016 17:11:55 +0200

Hi,

> I just want to report, that igraph-R Infomap has still not finished computing 
> after 165,657 minutes (16+ weeks). Compared to the same community detection 
> done in C++ implementation (4 min) this is a factor of 41414. That makes me 
> wonder whether or not the Infomap implementation in igraph may have a problem.

The Infomap implementation in igraph is identical to the last
implementation of Infomap that was released by the authors under the
GNU General Public License - the only part that was added is the
conversion to/from igraph's graph data type. Unfortunately this was a
long time ago -- Infomap has improved a lot, and as far as I know it
was completely rewritten from scratch. The rewritten C++
implementation was released under the AGPL, which is not compatible
with igraph's license, so we cannot include that in igraph.

So, there are three possible cases here:

1. There is a bug somewhere in the original implementation of Infomap
that we have included in igraph that yields such a large runtime for
large graphs.

2. There is a bug between the conversion from/to igraph's data type
and the Infomap algorithm, and this is the cause of the large runtime
that you saw.

3. There is no bug, it just happens to be the case that the old
Infomap implementation is not that performant on large graphs.

Comparing the implementation in igraph with the _current_ (i.e.
re-written, AGPL-based) Infomap implementation does not rule out any
of these possibilities, unfortunately.

> It does seem to work fine for small graphs but does not stop computing in a 
> reasonable time for large graphs. Is there a way for me to provide you with 
> more detailed information?

Profiling the current implementation could help us pinpoint where most
of the CPU time is spent. I could try profiling the implementation
with Instruments.app on Mac OS X, but I'm quite short on spare time so
it would help me a lot if you could upload an example graph somewhere
and create a small, self-contained C file that loads the graph and
runs Infomap on it. I could then compile it and load it into
Instruments.app to find the hotspots, and then we'll see whether it
could be fixed easily without modifying the source of the algorithm
too much.

T.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]