I have a graph with about 100k nodes and I'm running personalized pagerank from a set of about 100-1000 nodes. I noticed really weird results with the standard implementation PRPACK and then started playing around with the different implementations.
0 0.0234442644132 289 185
1 0.0196580845557 3865 205
2 0.0117400740699 166 248
3 0.0091531577429 5289 38
4 0.00904311638885 288 82
5 0.00836828243054 5324 39
6 0.00824148456229 350 43
7 0.00751007382417 1257 62
8 0.00749031863684 1293 52
9 0.00734012549007 231 72
Power (slow)
i pr node_index indegree
0 0.152992115688 356 40
1 0.152981241191 6256 3
2 0.13547184379 6332 2
3 0.124488680058 2736 1
4 0.118762538775 1163 2
5 0.115458144486 1569 1
6 0.115458144486 1551 1
7 0.0127839057389 166 248
8 0.0109754101098 1097 18
9 0.0109437441275 2360 6
So we find that the PR distribution are very different. Also it seems weird that nodes with indegree zero would have such a high pagerank (damping=0.85, default). It seems to stop optimizing prematurely.
If this is just an issue of an "eps" in the optimization routine, how can I control this? I found nothing about how to use arpack options but it also seemed that the package chose prpack since 0.7. What can I do to get result that I can trust?
Thanks,
Tim