Re: [igraph] Performance issue regarding when calculating induced

Hello Tamas,

Thank you for your answer! I tried what you suggested, but there was no signification
improvment in the runtime, for which I collected some results below:

Loop without rbinding:

system.time(for (i in 1:100) {induced_subgraph(g, clustm[clustm[, ".id"] == i, 2])$vector })
   user  system elapsed 
  16.96    1.28   18.24

lapply:

system.time(lapply(clust[1:100], function(x) { authority_score(induced_subgraph(g, x))$vector }))
   user  system elapsed 
  13.93    0.95   14.92

Calculating only induced subgraphs:

system.time(c(induced_subgraph(g, clust[[1]]),induced_subgraph(g, clust[[2]]),induced_subgraph(g, clust[[3]]),induced_subgraph(g, clust[[4]]),induced_subgraph(g, clust[[5]])))
   user  system elapsed 
   0.55    0.03    0.57

Based on the last one and on a rough calculation, preparing induced subgraph 100
times takes approx. 11.4 seconds and lapply is the rest. Do you have any idea
how to further speed up the calculation?

Two idea came into my mind but I don't really know how to do them. Both of them
are based on the supposition that the problem is the size of the graph. The first
one that I could reduce the runtime if I could delete those edges from the original
graph which has a node in a cluster which is already found. However impl parameter
of induced_subgraph() offers create_from_scratch and copy_and_delete possibilites
and I don't how to do this without doing full scan for each node (included in the
cluster) on graph g. The second idea is to somehow pair cluster id-s with the original
g graphs vertexes and use them as some kind of indexes which could help to avoid
checking the whole graph.

If you could suggest how to implement the above solutions, or has a better idea, I
would be very grateful!

Thank you in advance,

Roland

Hi,

I'm suspecting that rbind() is the bottleneck in your for loop because
it always  creates a copy your existing matrix when adding a new row
to it; use lapply() instead:

auth_scr <- lapply(clust, function(x) {
authority_score(induced_subgraph(g, x))$vector })

T.


On Sat, Apr 23, 2016 at 10:55 AM, Roland Sztaho <address@hidden> wrote:
> Hello Guys,
>
> I would like to prepare some stats based on clusters made by
> cluster_label_prop() method. When I use authority_score() and
> induced_subgraph() within a loop, it shows weak performance and I am looking
> for a workaround to speed up the code. My network contains approx. 3.000.000
> edges and 250.000 clusters. I think calculation of these measure for the
> clusters without the loop could be a good point, however I didn't find a way
> in the docs for this.
>
> R version 3.2.5 (2016 04 14)
> igraph package version 1.0.1
> OS: windows 7 64bit
>
> Here is what I have so far:
>
> library("igraph")
> library("plyr")
> setwd("C:/Projects/R/igraph")
>
> g_in = read.csv("my_network.csv", sep=" ")
> g = graph.data.frame(g_in, directed=FALSE)
>
> clust = groups(cluster_label_prop(g, weights=E(g)$weight))
> clust_count <- length(clust)
> clustm <- as.matrix(ldply((clust), data.matrix))
> clustm1 <- clustm[clustm[, ".id"] == 1, 2]
>
> g_sub = induced_subgraph(g, clustm1)
> auth_scr = as.matrix(authority_score(g_sub)$vector)
> for (i in 2:clust_count) {
> g_sub = induced_subgraph(g, clustm[clustm[, ".id"] == i, 2])
> auth_scr = rbind(as.matrix(auth_scr),
> as.matrix(authority_score(g_sub)$vector))
> }
>
> Any suggestions would be great!
>
> Thanks,
> Roland
>
> _______________________________________________
> igraph-help mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/igraph-help
>

From:	Roland Sztaho
Subject:	Re: [igraph] Performance issue regarding when calculating induced_subgra
Date:	Tue, 26 Apr 2016 09:29:24 +0200

Re: [igraph] Performance issue regarding when calculating induced_subgra