Re: [igraph] Memory-Mapped Graphs

igraph-help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] Memory-Mapped Graphs

From:	Tamas Nepusz
Subject:	Re: [igraph] Memory-Mapped Graphs
Date:	Wed, 25 Feb 2015 22:22:38 +0100
User-agent:	Mutt/1.5.23 (2014-03-12)

> Thanks, Tamas. I'll add a little more color to what I'm doing.
Thanks for the clarification, it is a bit clearer now.

> The advantage I see of memory mapping is that if we can get to the point
> where a particular igraph_t (and all of the data it references) is backed
> by a memory mapped file, then loading it doesn't involve parsing at all.
Getting all the memory that igraph_t refers to either directly or indirectly
is a bit complicated, especially when igraph is embedded in a higher level
language. Let me explain it. igraph_t itself is basically six growable vectors
(to store the edge list and the first and second level indexes of the edge
list) plus a pointer to another chunk of memory that is used by the attribute
handler to store the graph/vertex/edge attributes. Attribute handling is
decoupled from the "core" igraph library to allow different igraph interfaces
(R, Python, Ruby etc) to bring their own implementations of the attribute
handler -- this is how we ensure that the Python interface can store arbitrary
Python objects as attributes while the R interface can store arbitrary
R objects. Mapping the six vectors to a file is probably doable with a bit of
hacking -- however, mapping the memory that is used for storing the attributes
is probably impossible without hacking the host language itself. For instance,
in the Python interface, the pointer in igraph_t points to an array containing
three Python dicts; one for the graph attributes, one for the vertex attributes
and one for the edge attributes. Each of these dicts in turn probably refer to
other Python objects (the keys and values of the dicts), but these cannot be
memory-mapped easily because the Python interpreter is responsible for managing
the memory that they occupy.

So, *if* you don't use graph, vertex and edge attributes at all, then *maybe*
it is doable, although I'm not sure about the amount of hacking involved.
Theoretically, igraph_t is an opaque data type for all parts of the igraph
library except a single file: type_indexededgelist.c. The idea is that you
could implement a, say, type_mmapped_edgelist.c, replace type_indexededgelist.c
with it, recompile igraph and then you get the same library with a different
storage method for graphs. The truth is that no one ever really tried it (as
far as I know).

> Also, I don't think it's the case that "you have to load the entire graph
> into memory at some point". If using a memory mapped file, I think you'd be
> able to rely on the os to only have the subset of pages resident that you
> actually need to run the particular graph operations.
True -- I was sort of assuming that the analysis you would like to perform
would need the entire graph sooner or later anyway, so you would be just
spreading out the cost of loading the graph across a longer time interval.

T.

[Prev in Thread]

Current Thread

[Next in Thread]

[igraph] Memory-Mapped Graphs, Calen Pennington, 2015/02/24
- Re: [igraph] Memory-Mapped Graphs, Tamas Nepusz, 2015/02/24
- Re: [igraph] Memory-Mapped Graphs, Calen Pennington, 2015/02/24
  - Re: [igraph] Memory-Mapped Graphs, Tamas Nepusz <=
- Re: [igraph] Memory-Mapped Graphs, Calen Pennington, 2015/02/25

Prev by Date: Re: [igraph] graph from an adjacency matrix
Next by Date: Re: [igraph] Memory-Mapped Graphs
Previous by thread: Re: [igraph] Memory-Mapped Graphs
Next by thread: Re: [igraph] Memory-Mapped Graphs
Index(es):
- Date
- Thread