[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [igraph] Python - Creating new vertices and edges
From: |
Tamás Nepusz |
Subject: |
Re: [igraph] Python - Creating new vertices and edges |
Date: |
Thu, 27 Feb 2014 10:51:47 +0100 |
Hi,
> I was wondering what's the rationale behind not returning a reference object
> to newly created vertices and edges in Python (or at least an index)?
References to newly created vertices and edges are not returned because there
is actually no new vertex/edge object created when you do the addition. The
core of igraph is written in C, so adding a vertex involves increasing a
variable that stores the number of vertices, plus extending a few index
vectors; similarly, adding an edge involves extending some vectors and
recreating some indices. The Vertex and Edge objects you see in Python are
actually dead simple proxies that store a reference to the graph they are a
part of, plus the index of the vertex or edge that they represent. Creating
such a proxy object just for the sake of returning it is an overhead,
especially considering that they will be thrown away immediately.
You are right that we could at least return the index of the newly created edge
or vertex; we do not do this because igraph simply allocates the next available
vertex or edge ID. So, if you have n vertices and add one more, the newly
created vertex will have ID=n (and the same for edges).
> If I am reading a large graph file (special-format, let's assume), checking
> if a vertex (or an edge between two vertices) has been already created is
> really problematic.
How about using g.are_connected(source, target)? This method automatically
accepts vertex names and converts them to vertex IDs.
> I am assuming that "find" takes O(n) time where n is number of nodes. So this
> is an O(n*n) time operation.
True, but most methods accept vertex names in place of vertex IDs, and they use
an internal dictionary that maps vertex names back to vertex IDs to do the
conversion. So, for instance, using g.vs.find(name=“whatever”).degree() is
O(n), but using g.degree(“whatever”) is O(1). This is because the find() method
cannot assume that the VertexSeq it is called on represents the entire vertex
set (it might represent a subset or a permutation of it) so we cannot use a
plain dictionary lookup.
For what it’s worth, my goal is to allow the usage of vertex names in place of
vertex IDs for every igraph method that accepts vertices or lists of vertices.
Some of the methods have slipped through the cracks, so for instance
g.get_eid(“A”, “B”) does not work yet (even though g.are_connected(“A”, “B”)
does). If you find a method that should accept vertex names but does not accept
it yet, please file a bug report at https://github.com/igraph/igraph and I’ll
fix it.
By the way, if you are constructing a large graph with repeated calls to
add_vertex() and add_edge(), then you will probably encounter another
bottleneck as your graph grows. igraph’s data structures are optimized for
static graphs (i.e. graphs where vertex and edge mutations are far less
frequent than queries). Every time you add an edge, some internal data
structures have to be rebuilt from scratch, which effectively means that it is
almost as fast to add, say, thousands of edges at the same time as the addition
of a single edge. In most cases, it is way faster to create the edge list of
your graph in advance and then add all the edges at once.
Alternatively, you may take a look at the Graph.DictList() constructor, which
takes two iterables, one for the vertices and one for the edges. Both iterables
should yield dictionaries; keys of the dictionaries are converted to
vertex/edge attribute names and the corresponding values are converted to
vertex/edge attribute values. The “source” and “target” keys in the
dictionaries corresponding to edges are automatically used to look up the
source and target vertex efficiently, and Graph.DictList() is also capable of
“batching” edge additions in larger chunks to alleviate the overhead of
repeated calls to add_edges().
All the best,
T.