igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[igraph] How to add a vertex using the id, High memory usage when using


From: Muller, L.Y.L.
Subject: [igraph] How to add a vertex using the id, High memory usage when using igraph_vector_t?
Date: Thu, 5 Mar 2009 23:43:55 +0100

Hi,

in order to analyze a large data set, I'm trying to load the graph into igraph.
However, this fails due the following two issues that are described below.

Question 1:
===========
Is it possible to add a node (vertex) by id?

Example:
--------
If I have the following dataset (node ids): {0, 1, 3, 4, 5}

Using this igraph function: igraph_add_vertices(&igraph_graph, 5, 0);
Would result in a dataset like this {0, 1, 2, 3, 4}.

When adding edges to the graph that are connected to node 5, igraph will return an error or crash because node id 5 is not in the vertex list.
(Also, id 2 shouldn't be part of the list).

How can should I solve this problem in igraph?


Question 2:
===========
For some reason the igraph_vector_t is using more memory than a std::vector. When attempting to load the graph described below, igraph seems to run out of memory. Is there some (size) limit in igraph_vector_t and which method should I use to load a large graph?

Example:
--------
Some bioinformatics related graph:
Number of nodes: 1.515.271
Number of edges: 41.860.740

The dataset (490 MB) is stored in binary format and the application uses a custom file reader (for performance reasons).

Method A. Loading data into std::vector (STL)
---------------------------------------------
Code example C++
----------------

// My Obscure Binary Graph Format
struct s_NODE {
        int node;
        int edgeListSize;
};

struct s_EDGE {
        int n1;
        int n2;
        float property1;
};

int main()
{
        ...
        std::vector <int> node_list;    // The node list contains <int> values (node id)
        std::vector <s_EDGE> edge_list; // The edge list contains <s_EDGE> values (the entire edge struct)

// file reader start (nodes.dat)
        // read one entry and store it in "s_NODE pData". Push the node id (.node) on the vector (node_list)
        node_list.push_back(pData.node);        // while loop
// file reader end

// file reader start (edges.dat)
        // read one entry and store it in "s_EDGE pData2". Push the entire struct on the vector (edge_list).
        edge_list.push_back(pData2);    // while loop
// file reader end     
        ...
       
        return 0;
}

Method A result
---------------
It takes a +/- 30 seconds to read/load the entire dataset.
Once it is loaded, the application uses +/- 510 MB of memory.
No problems here.


Method B. Loading data into igraph_t and igraph_vector_t
--------------------------------------------------------
Using std::vectors to store the graph is very convient. However since I want to analyse the data with igraph, it is required to use igraph data structures instead of std::vectors.

Code example C++
----------------

// Globals
igraph_t igraph_graph;
igraph_vector_t edge_list;

int main()
{
        // Init our data structures
        igraph_empty(&igraph_graph, 0, true);
        igraph_vector_init(&edge_list, 0);
       
// file reader start (nodes.dat)
// read entries and count the total number of nodes (would be better to use the exact id... see Question 1)
        ++nodes_counter;
// file reader end
        igraph_add_vertices(&igraph_graph, nodes_counter, 0);   // in our case nodes_counter = 1515271
       
       
// file reader start (edges.dat)       
        // read one edge entry and push the .n1 and .n2 node ids (int) in the edge_list vector
        // while loop (start)
        igraph_vector_push_back(&edge_list, pData2.n1);        
        igraph_vector_push_back(&edge_list, pData2.n2);
        // while loop (end)            
// file reader end             

        // all edges should now be in the edge_list,
        // in our case we have 41860740 edges
        // every edge is stored in two entries
        // total items in edge_list = 83721480
       
        // next assign all edges to the graph  
        igraph_add_edges(&igraph_graph, &edge_list, 0);
       
        // Release our data structures
        igraph_vector_destroy(&edge_list);
        igraph_destroy(&igraph_graph);
       
        return 0;
}

Method B result
---------------
Adding 1.515.271 nodes to the igraph_graph works without any issues.

The trouble starts when trying to load the edge data. Only a part of the edges could be added to the edge_list.
It will fail after adding 33.554.432 edges (actually 67.108.864 entries) to the edge_list.

Output:
-------
Loading nodes... Done! (1515271 nodes)
Loading edges... Error at d:\igraph\0.5-main-rev1294\src\vector.pmt:408
:cannot reserve space for vector, Out of memory

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.


The memory usage just before the application crashes is +/- 1,6 GB.
When comparing igraph_vector_t container with the std::vector, it seems like it uses more memory.

Is there a way to avoid this memory limit/high memory usage?

Kind regards,
- Laurence

------------------------------------------
Laurence Muller (M.Sc.)

Website/Blog/Portfolio:
1. http://www.multigesture.net/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]