igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] Memory-Mapped Graphs


From: Tamas Nepusz
Subject: Re: [igraph] Memory-Mapped Graphs
Date: Tue, 24 Feb 2015 16:27:45 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

> I would like to store that graph in a persistent format, and it seems like
> igraph using a memory-mapped file (mmap) would do the trick. Has anyone
> attempted that in the past?
I don't think so, and actually, I'm not convinced about the advantages of this
approach -- the thing is that igraph would have to load the entire graph into
memory at some point anyway, so you cannot avoid reading through the whole
file.

I assume that you are using the edge list format (which is the fastest to load)
and I did some profiling on my machine with a graph the size of yours (but
generated with igraph_erdos_renyi_game). On my machine, it took ~25 seconds to
load the entire graph, and about 38.7% of that time was spent in
igraph_create() which converts an edge list (presented as a list of numbers)
into an igraph_t object. This is the part that cannot be avoided, no matter
what format you use to store your graph on the disk.

The part where we can probably save some time is the calls to fscanf(), getc(),
ungetc() and feof(), which accounted for 31.9% + 12.3% + 6.6% + 5.8% of the 
time,
respectively. These calls are necessary due to how the nodes are stored in the
file: they are written as strings, which have to be parsed. You could probably
save a lot of time by using a binary graph format and then creating an
appropriate reader and writer for that. I am thinking about something like
this:

first, use 4 bytes to store the number of vertices and maybe one byte to store
whether the graph is directed or undirected
then, for each vertex,
    store the number of neighbors in 4 bytes
        and then store the indices of the actual neighbors on 4 bytes each

This has the advantage that there is no need to call fscanf(), getc(), ungetc()
and feof() -- you can simply read chunks of four bytes as needed and there is
no need to check for EOF repeatedly because you "know" in advance how many
vertices there will be. It also uses a more compact representation because
n edges originating from the same vertex use only (n+1)*4 bytes, so there will
be less data to read from the disk. Also, due to the way how the edge list is
stored in this case, the edge list will be sorted by source vertices by
default, and this might also save some more time in the call to igraph_create()
later on.

Actually, igraph already contains a reader which is very similar to this; it is
called igraph_read_graph_graphdb(). The only difference is that it handles
undirected graphs only and it stores vertex indices on two bytes so it
scales up to 65535 vertices only. It is probably easy to adapt the source code
of igraph_read_graph_graphdb() to make it use four bytes per vertex and then
you are essentially ready. (And maybe it would be a nice addition to the core
library as well, although it's probably Gabor's call in the end and not mine).

All the best,
-- 
T.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]