igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] How to read in a large graph (and output a sparse matrix)


From: Raphael C
Subject: Re: [igraph] How to read in a large graph (and output a sparse matrix)
Date: Mon, 1 Aug 2016 21:02:51 +0100

It does now seem to work once I transform the node ids as we
discussed. igraph still takes 4GB of RAM for only 62.5 million edges
and 31.25 million vertices but at least that fits.

Is there a function to write a sparse adjacency matrix of a graph to a
file? I see  "write_adjacency" but the docs don't indicate it gives a
sparse matrix.

I just need to write the adjacency matrix in any format that scipy can
read.  Clearly it has to be sparse otherwise the file would be vast.

Raphael

On 1 August 2016 at 20:58, Raphael Clifford <address@hidden> wrote:
> It does now seem to work once I transform the node ids as we
> discussed. igraph still takes 4GB of RAM for only 62.5 million edges
> and 31.25 million vertices but at least that fits.
>
> Is there a function to write a sparse adjacency matrix of a graph to a
> file? I see  "write_adjacency" but the docs don't indicate it gives a
> sparse matrix.
>
> Raphael
>
>
> On 1 August 2016 at 14:57, Tamas Nepusz <address@hidden> wrote:
>> Yes, it's probably the best if you do the relabeling externally. Let
>> us know if it still doesn't work after using Read_Edgelist() with a
>> relabeled file.
>> T.
>>
>>
>> On Mon, Aug 1, 2016 at 2:37 PM, Raphael C <address@hidden> wrote:
>>> Thank you for the quick reply. My system is certainly 64 bit. The
>>> problem is just the amount of RAM
>>>
>>> g = Graph.Read_Ncol('edges.txt')
>>>
>>> uses it seems.
>>>
>>> Here is some code to produce a fake edge list that reproduces my problem.
>>>
>>> import random
>>>
>>> #Number of edges, vertices
>>> m = 62500000
>>> n = m/2
>>>
>>> for i in xrange(m):
>>>     fromnode = str(random.randint(0, n-1)).zfill(9)
>>>     tonode = str(random.randint(0, n-1)).zfill(9)
>>>     print fromnode, tonode
>>>
>>> If I produce a file edges.txt using this code and  then run
>>>
>>> from igraph import Graph
>>> g = Graph.Read_Ncol('edges.txt')
>>>
>>> it runs out of RAM.
>>>
>>> To get a better picture of the RAM usage I ran the same test with m =
>>> 20000000 (that is about one third of the edges and vertices).
>>>
>>> /usr/bin/time -v python ./test.py
>>>
>>> shows
>>>
>>> Maximum resident set size (kbytes): 3172988
>>>
>>> With m = 30000000 I see Maximum resident set size (kbytes): 4750440
>>>
>>> Maybe one solution is to relabel the nodes myself external so I can
>>> avoid the overhead of Ncol?
>>>
>>> Raphael
>>>
>>>
>>>
>>>
>>> On 1 August 2016 at 10:23, Tamas Nepusz <address@hidden> wrote:
>>>> Hello,
>>>>
>>>> Read_Edgelist() won't work because that assumes that the vertex IDs
>>>> are in the range [0; |V|-1] so it would create lots of isolated
>>>> vertices if your vertex ID range has "gaps" in it. Read_Ncol() is the
>>>> way to go, but it has an additional space penalty as it has to
>>>> maintain a mapping from the numeric IDs in the file to the range [0;
>>>> |V|-1].
>>>>
>>>> igraph requires 32 bytes per edge and 16 bytes per vertex to store the
>>>> graph itself, plus additional data structures to store the vertex/edge
>>>> attributes. Therefore, a graph of your size would require ~2.5 GB of
>>>> memory plus the attributes. 8 GB of RAM should therefore be enough --
>>>> however, note that Python might not be able to utilize all that
>>>> memory. In particular, 32-bit Python on Windows is limited to 2 or 3
>>>> GBs of memory (see
>>>> https://msdn.microsoft.com/en-us/library/aa366778(v=vs.85).aspx#memory_limits
>>>> ). If you happen to use a 32-bit Python on a 64-bit machine, you will
>>>> need to install a 64-bit Python with a corresponding igraph package
>>>> that is also built for 64-bit, and then try again.
>>>>
>>>> Best,
>>>> T.
>>>>
>>>>
>>>> On Mon, Aug 1, 2016 at 9:52 AM, Raphael C <address@hidden> wrote:
>>>>> I have 8GB of RAM and I have a simple edge list text file of size
>>>>> 1.2GB. It was 62500000 edges and about half that many vertices. Each
>>>>> line looks like
>>>>>
>>>>>      287111206 357850135
>>>>>
>>>>> I would like to read in the graph and output a sparse adjacency
>>>>> matrix. I am failing on all counts.  I have tried
>>>>>
>>>>>
>>>>> g = Graph.Read_Edgelist('edges.txt')
>>>>>
>>>>> but this fails immediately with
>>>>>
>>>>> MemoryError: Error at vector.pmt:439: cannot reserve space for vector,
>>>>> Out of memory
>>>>>
>>>>> This seems unrelated to the size of the graph is just a function of
>>>>> the node ids being large.
>>>>>
>>>>> So instead I tried
>>>>>
>>>>> g = Graph.Read_Ncol('edges.txt')
>>>>>
>>>>> This eats up all the RAM in my PC forcing me to kill the code.
>>>>>
>>>>> I fact I tested g = Graph.Read_Ncol('edges.txt') with the first 1/5 of
>>>>> the edges and have the same memory problem.
>>>>>
>>>>> Each node id is a 32 bit integer so the graph should fit easily in 8GB of 
>>>>> RAM.
>>>>>
>>>>> What can I do?
>>>>>
>>>>> Thanks very much for any help.
>>>>> Raphael
>>>>>
>>>>> _______________________________________________
>>>>> igraph-help mailing list
>>>>> address@hidden
>>>>> https://lists.nongnu.org/mailman/listinfo/igraph-help
>>>>
>>>> _______________________________________________
>>>> igraph-help mailing list
>>>> address@hidden
>>>> https://lists.nongnu.org/mailman/listinfo/igraph-help
>>>
>>> _______________________________________________
>>> igraph-help mailing list
>>> address@hidden
>>> https://lists.nongnu.org/mailman/listinfo/igraph-help
>>
>> _______________________________________________
>> igraph-help mailing list
>> address@hidden
>> https://lists.nongnu.org/mailman/listinfo/igraph-help



reply via email to

[Prev in Thread] Current Thread [Next in Thread]