igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[igraph] R help: Interpretation of PageRank scores for different setups(


From: danopiacek .
Subject: [igraph] R help: Interpretation of PageRank scores for different setups(weighted/mutligraph)
Date: Sat, 19 Nov 2016 11:59:01 +0100

Hello to everyone,

First of all, I would like to thank the authors of these amazing package. I use it extensively in my master thesis.
I am sorry for duplicate if there already exists same thread.

I have a huge network with couple of millions of edges and would like to calculate personalized PageRank. My network is attributed and I would like to use edge weights to calculate PageRank. I can either consider my network as directed or undirected, but in both cases my network is multigraph. I have encountered several 'strange' results when calculating weighted PageRank.

Here I will try to present it on simple networks without 'personalization'.

1a.) Example with directed symmetric network, with no multiple edges and no weights.
    A = matrix(c(0,1,1,1,0,0,1,0,0),byrow=T,nrow=3)
    g = graph.adjacency(A)

Then I compute PageRank with page_rank function and get the same result for both "PRPACK" and "ARPACK" algo. I can also compute via matrix inverse as follows:
   D = diag(1/apply(A,1,sum),nrow(A)); Beta = (1-0.85)*rep(1/nrow(A),nrow(A))
    (pr.in = Beta%*%solve(diag(1,nrow(A))-0.85*D%*%A))
And as expected I get the same PageRank score. In addition, if I consider network as undirected I get the same scores, as expected.

1b.) The same example with edge weights.

​    set.seed(1)
    E(g)$weight = round(runif(ecount(g)),1)
And again, using page_rank I get the same scores for both algorithms. Or I can get the same scores with weight matrix W as:
    W = matrix(c(0,0.3,0.4,0.6,0,0,0.9,0,0),byrow=T,nrow=3)
    D = diag(1/apply(W,1,sum),nrow(W)); Beta = (1-0.85)*rep(1/nrow(W),nrow(W))
    (pr.in = Beta%*%solve(diag(1,nrow(W))-0.85*D%*%W))
However, if I consider this network being undirected, I get different scores, namely:
    g_UN = as.undirected(g,'each') # do not collapse
    (prUn1 = page_rank(g,algo='PRPACK', directed = F)$vector)
    (prUn2 = page_rank(g,algo='ARPACK', directed = F)$vector)
    (prUn3 = page_rank(g_UN, algo='PRPACK')$vector)
    (prUn4 = page_rank(g_UN, algo='ARPACK')$vector)
Here scores given by 'ARPACK' are different to those given by 'PRPACK'. Moreover, with 'ARPACK' I get same scores for both g_UN(undirected network) and for directed=F. However 'PRPACK' gives different scores.
I can replicate results for 'ARPACK' with weight matrix, where I take sum/average of edge weights:
    W_UN = matrix(c(0,0.9,1.3,0.9,0,0,1.3,0,0),byrow=T,nrow=3)
    D = diag(1/apply(W_UN,1,sum),nrow(W_UN))
    (pr.in = Beta%*%solve(diag(1,nrow(W_UN))-0.85*D%*%W_UN))

2a.) Example with directed multigraph.

​    A2 = matrix(c(0,2,1,1,0,0,1,1,0),byrow=T,nrow=3)
    g2 = graph.adjacency(A2)
Without considering edge weights both algorithms give the same scores, which can be also found as
    D = diag(1/apply(A2,1,sum),nrow(A2))
    (pr.in = Beta%*%solve(diag(1,nrow(A2))-0.85*D%*%A2))

2b.) Same graph as undirected.
Here again both algorithms give the same score for both directed=F and as.undirected(g2,'each'), or:
    A2_UN = matrix(c(0,3,2,3,0,1,2,1,0),byrow=T,nrow=3)
    D = diag(1/apply(A2_UN,1,sum),nrow(A2_UN))
    (pr.in = Beta%*%solve(diag(1,nrow(A2_UN))-0.85*D%*%A2_UN))
gives the same scores.

2c.) Same graph with edge weights-directed.
    set.seed(1)
    E(g2)$weight = round(runif(ecount(g2)),1)
And again I get different scores
    page_rank(g2,algo='PRPACK')$vector
    page_rank(g2,algo='ARPACK')$vector
Where 'ARPACK' gives the same scores as with weight matrix W where edge weights for multiple 
edges(considering direction) are summed
    W2 = matrix(c(0,0.7,0.6,0.9,0,0,0.2,0.9,0),byrow=T,nrow=3)
    D = diag(1/apply(W2,1,sum),nrow(W2))
    (pr.in = Beta%*%solve(diag(1,nrow(W2))-0.85*D%*%W2))

2d.) Same graph with edge weights-undirected.
    g2_UN = as.undirected(g2,'each') # do not collapse
Again, 'PRPACK' and 'ARPACK' give different scores. Moreover, 'PRPACK' differs for 
    (prUn1 = page_rank(g2,algo='PRPACK',directed = F)$vector)
    (prUN2 = page_rank(g2_UN,algo='PRPACK')$vector)
, while 'ARPACK' gives same scores for both. Or with weight matrix W I can replicate 'ARPACK' by summing edge weights:
    W2_UN = matrix(c(0,1.6,0.8,1.6,0,0.9,0.8,0.9,0),nrow=3,byrow = T)
    D = diag(1/apply(W2_UN,1,sum),nrow(W2_UN))
    (pr.in = Beta%*%solve(diag(1,nrow(W2_UN))-0.85*D%*%W2_UN)) 

I have examined .c files (centrality.c, arpack.c) and I know that 'ARPACK' treats directed=F and as.undirected() same. 

So the main questions are, how does 'PRPACK' treat these two options and moreover why 'PRPACK' gives different results than 'ARPACK'/weight matrix.

Thank you very much.
Daniel Piacek

  


reply via email to

[Prev in Thread] Current Thread [Next in Thread]