igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] Performance issue regarding when calculating induced_subgra


From: Tamas Nepusz
Subject: Re: [igraph] Performance issue regarding when calculating induced_subgra
Date: Tue, 26 Apr 2016 11:23:22 +0200

Hi,

> Loop without rbinding:
>
> system.time(for (i in 1:100) {induced_subgraph(g, clustm[clustm[, ".id"] ==
> i, 2])$vector })
>    user  system elapsed
>   16.96    1.28   18.24
You have left out probably the most time-consuming part of the entire
operation: the rbinding part :) The differences will not be obvious at
such a small scale (i.e. 100 clusters only), though. The problem with
rbind() is that it adds a new row to a matrix by creating a whole new
matrix, adding the new row to the copy and then returning the copy.
The copying will take a significant amount of time once the size of
the matrix becomes large. So, by all means, don't use rbind(), use
lapply().

> Based on the last one and on a rough calculation, preparing induced subgraph 
> 100 times takes approx. 11.4 seconds and lapply is the rest.
Well, it all depends on the sizes of the induced subgraphs - preparing
a larger induced subgraph is going to be much slower than preparing a
small one. So, if the sizes of your clusters are not evenly
distributed, taking the first few of them might not be a reliable
sample from which you could reasonably extrapolate the whole runtime.

> Do you have any idea how to further speed up the calculation?
I'm not sure whether this will be a speedup in the end or a slowdown,
but here's an alternative approach. It seems like the bottleneck in
your code is the repeated calls to induced_subgraph(), and the only
reason why we need that is because the authority scores cannot be
calculated by igraph on a part of a graph only. However, the authority
scores are simply the principal eigenvector of A^T * A, where A is the
adjacency matrix of the graph and A^T is its transposed.
_Theoretically_, you could use as_adjacency_matrix(g, sparse=T) to get
a sparse representation of the adjacency matrix, then for each
cluster, take the appropriate submatrix B of A and call the arpack()
function to find the principal eigenvector of B^T * B by providing it
with a multiplication function that takes an arbitrary vector and
multiplies it by B^T * B. Something along the lines of:

library(Matrix)
a <- get.adjacency(g, sparse=T)

get.authority.score.for.cluster <- function (members) {
    b <- a[members, members]
    bt <- t(b)
    func <- function(x, extra=NULL) { as.vector(bt %*% (b %*% x)) }
    scores <- arpack(func, options=list(n=length(members), nev=1,
ncv=3), sym=T)$vectors
    scores <- scores / max(scores)
}

lapply(get.authority.score.for.cluster, clust)

I haven't tested the code above so it might not work or give erroneous
results, but it shows the general idea. I'm not sure whether it would
be faster, though - there could be another bottleneck here, which is
that the arpack() function (which runs in the C layer) has to call the
matrix multiplication function 'func' in the R layer, and the
translation between the two layers is costly. But at least it saves
you from having to call induced_subgraph().

T.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]