[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [igraph] all_pairs shortest path
From: |
Tamás Nepusz |
Subject: |
Re: [igraph] all_pairs shortest path |
Date: |
Thu, 20 Dec 2012 17:07:45 +0100 |
> G = igraph.Graph(directed = True)
>
> 1) Adding edges:
> G.add_edge([(1,2)])
> G.add_edge([(1,2)])
> if I add twice the same edges, my digraph keeps two occurence of the same
> edges in G.get_edgelist: [(1, 2), (1, 2)]
Why is that a problem? If you add the edge twice, of course you are going to
see it multiple times in the graph.
> How can i remove duplicates?
Use the simplify() method of the graph, this will remove multiple and/or loop
edges.
> 2) Is there a way to compute once for all all shortest paths between all
> pairs?
There is -- calculate them for each source vertex and then concatenate the
resulting lists. :)
Okay, so the full story is as follows:
- If you need the _lengths_ of the shortest paths for each pair of vertices,
you can use G.shortest_paths(), which returns a matrix. In your case, it's
gonna be a 10K by 10K matrix, which is pretty big, but theoretically it's
manageable (unless you want to print the resulting matrix from IPython because
IPython will spend a lot of time with formatting the matrix). To be honest, I
have never found a use-case where the full matrix was useful for me -- I
usually calculate the shortest path from one vertex to all the others, and then
move on to the next vertex, one by one.
- If you need _one_ shortest path between all pairs of vertices, use
G.get_shortest_paths() and concatenate the resulting lists. Again, this is
going to be a huge list if you have 10K vertices:
all_paths = sum((g.get_shortest_paths(i) for i in xrange(g.vcount())), [])
all_paths will then give you exactly one possible path for each pair, so you
can simply find the path leading from vertex i to vertex j in entry (i*N+j) of
the list, where N is the number of vertices.
- If you need _all_ shortest paths between all pairs of vertices, use
G.get_all_shortest_paths(), similarly to the previous step. The only catch is
that it is harder to find all the paths between vertex i and j in this huge
list because one particular pair might occupy multiple positions in the list.
> 3) why is G.get_all_shortest_paths(1) does not work on the graph joined
> (graph.edges)?
My guess is that it works but it takes a looooooooong time because there are
too many shortest paths and get_all_shortest_paths() aims to return all of
them. For instance, there are 1116 shortest paths between vertex 1 and vertex
46 in your graph. This is not a problem on its own, but consider another vertex
X for which vertex 46 lies on the shortest path from vertex 1 to X -- this
means that vertex X will also be reachable from at least 1116 shortest paths
(since you go first from vertex 1 to vertex 46 on any of the 1116 paths, then
from vertex 46 to X). This can easily result in combinatorial explosion.
> Other remark maybe bug:
> http://stackoverflow.com/questions/13974279/igraph-why-is-add-edge-function-so-slow-ompared-to-add-edges
This is not a bug; it works as intended. The reason is that igraph uses an
indexed edge list as its data structure in the C layer. The index makes it
possible to query the neighbors of a specific vertex in constant time. This is
good if your graph rarely changes, but it becomes a burden when the
modification operations are far more frequent than the queries, since whenever
you add or remove an edge, you have to update the index. So, every call
toadd_edge will make igraph reindex its internal data structures. The upside is
that if you have to rebuild the index anyway, you can just as well add many
edges using add_edges with approximately the same cost. So, in your first code
example, you rebuild the index 30000 times, while in the second example, you
rebuild the index only once.
Best,
Tamas