igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] personalized pagerank computation issue


From: Yalcin, Omer Faruk
Subject: Re: [igraph] personalized pagerank computation issue
Date: Wed, 29 Jan 2020 04:23:25 +0000

Thank you very much for these, they provide great insight into what went into the decision making when they were developed.

When I have the time, I'll look into which of the approaches seem to be more favored in the literature (in terms of the vector to use when teleporting from a dangling node).

Thanks a lot for the help,
Best,
Omer



From: Tamas Nepusz <address@hidden>
Sent: Tuesday, January 28, 2020 9:34 AM
To: Yalcin, Omer Faruk <address@hidden>
Cc: Help for igraph users <address@hidden>
Subject: Re: [igraph] personalized pagerank computation issue
 
Hi Omer,

Your comment rang a bell -- I remembered that this issue has already popped up back in 2014 when we first switched from ARPACK to PRPACK; see the following issue in the issue tracker:


Memories of mine about PRPACK were much fresher by then as I wrote the following:
  • with the ARPACK implementation, the random walker teleports according to the distribution of the reset vector when it encounters a dangling node
  • with the PRPACK implementation, the random walker teleports according to a uniform distribution by default.
After reading the thread, it also became apparent that PRPACK is doing some trickery with sink nodes; the following is from @dgleich, one of the original authors of PRPACK:

"For the case that you are jumping to nodes that have no outgoing edges, what happens is you add new edges according to the teleportation set/reset set or the special set "u"."

So, all in all, the case is probably that your implementation matches the ARPACK implementation of PRPACK from igraph, and that's why you are seeing identical results (both your code and igraph's ARPACK implementation formulates PageRank as an eigenvector problem and solves that). In case of PRPACK, PageRank is not an eigenvector problem any more; again, quoting @dgleich:

"PRPACK does decompose the graph into SCCs, but the primary advantage is that it frames the PageRank problem as a linear system instead of the eigenvalue problem. This has tremendous numerical advantages."

There was also a PR quite a while ago that attempted to introduce the possibility of specifying the personalization vector and the reset vector separately, but it did not get merged in the end:


Another issue where this question popped up -- maybe it also provides more insight:


Best,
T.


On Mon, 27 Jan 2020 at 23:17, Yalcin, Omer Faruk <address@hidden> wrote:
Thank you very much for tracking the code! Unfortunately, that doesn't work either. I am also fairly certain that allowing the node to stay at the same spot would give that node an unwarranted boost in pagerank, so it is probably undesirable.

I do have an interesting result though; when I use "arpack" instead of the default "prpack", I get the exact same results as my custom written function. In other words, when there are nodes that have no outgoing edges, "prpack" and "arpack" do the computation differently.

My problem seems to be solved (as long as there is no reason why "arpack" is wrong) but this difference between the two algorithms might be of interest to you.

Thank you very much.
Omer

From: Tamas Nepusz <address@hidden>
Sent: Monday, January 27, 2020 4:15 PM
To: Yalcin, Omer Faruk <address@hidden>
Cc: Help for igraph users <address@hidden>
Subject: Re: [igraph] personalized pagerank computation issue
 

That being said, after your question, I set the probability of navigating to other nodes from a node that has no outbound links to the personalization vector. That doesn't reproduce the igraph result either.
There's also a third option: if there are no outbound nodes, stay at the same node with probability equal to 1-damping, _or_ navigate to a randomly picked node accoding to the persionalization vector with probability equal to damping. Sorry for not being too precise here; the thing is that igraph is using an external library (PRPACK) to calculate personalized PageRank scores, and I only managed to track the code to a point where I am convinced that we are passing down two  vectors to PRPACK; one is a uniform vector, and the other one is the personalization vector submitted by the user. Based on this, I would assume that PRPACK uses the personalization vector when the random walk is reset, and the uniform vector for a random teleport (after all, why would PRPACK need two vectors if it used the personalization vector for both cases?), but I did not manage to track it down further because PRPACK contains at least six different solvers, optimized for different use-cases, and I did not manage to figure out which one it would use in your particular case. But I'm pretty sure that the discrepancy between your results and ours is due to some corner case in the handling of sink nodes.

T.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]