|
From: | David Hunkins |
Subject: | Re: [igraph] 'decompose.graph' versus 'clusters' |
Date: | Tue, 22 Jul 2008 15:56:04 -0700 |
Hi Gabor, I tried what you suggested, and I can see the merit to the approach. On my first attempt, I trimmed the all clusters smaller than 20 members: trim <- function(G) { cls <- clusters(G) smallcls <- which(cls$csize<20)-1 ids_to_remove <- which(cls$membership %in% smallcls) -1 delete.vertices(G,ids_to_remove) } I then removed the largest cluster using: remove_largest <- function(G) { cls <- clusters(G) maxcsize <- max(cls$csize) ids_in_largest <- which(cls$membership %in% (which(cls$csize==maxcsize)-1))-1 other_ids <- which(cls$membership %in% (which(cls$csize<maxcsize)-1))-1 list(delete.vertices(G,other_ids), delete.vertices(G,ids_in_largest)) } and took the second component returned and was able to decompose and run betweenness on it: tween <- function(G,OF) { comps <- decompose.graph(G) for (i in 1:(length(comps))){ write(rbind(V(comps[[i]])$id,betweenness(comps[[i]])),file=OF,nc=2, sep=",", append=TRUE) } } This gives me betweenness data for the large clusters (but not the small ones or the largest one), or about 200K vertices out of my set of 5M vertices. I would really like to get betweenness measure for the entire dataset, and I think it's within reach. I tried adding an additional step: partition <- function(G) { cls <- clusters(G) g0ids <- which(cls$membership%%4==0)-1 g1ids <- which(cls$membership%%4==1)-1 g2ids <- which(cls$membership%%4==2)-1 g3ids <- which(cls$membership%%4==3)-1 list( delete.vertices(G,c(g1ids,g2ids,g3ids)), delete.vertices(G,c(g0ids,g2ids,g3ids)), delete.vertices(G,c(g0ids,g1ids,g3ids)), delete.vertices(G,c(g0ids,g1ids,g2ids))) } That is, partitioning the set into four graphs, which I applied the same process to, only using Amazon EC2 resource of 15GB physical memory with 4 cpu cores and 64-bit Fedora OS. (I ran each of the partitions in a parallel, separate instance of R.) Each partition contains about 600K vertices, 600K edges, and consists of about 60K clusters. However, each time I run this all of them terminate independently about four hours later (at slightly different times) with the following error: Error: protect(): protection stack overflow Error: protect(): protection stack overflow Execution halted The error occurs while decompose.graph is running, cpu is at 100%, no swapping, and there is 10GB of free memory. Do you think this is coming from R or from igraph? Is there an R parameter or igraph parameter I can tune to get around this? Any help would be appreciated. My next steps will be to try subdividing into 8 partitions, then 16, until I can complete the run. But of course, each run on EC2 costs $10 or so! :-) Thanks very much! Dave David Hunkins im: davehunkins 415 336-8965 On Jul 19, 2008, at 2:18 AM, Gabor Csardi wrote:
|
[Prev in Thread] | Current Thread | [Next in Thread] |