igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] Specifying multiple output nodes from 1 input node


From: Dan Suthers
Subject: Re: [igraph] Specifying multiple output nodes from 1 input node
Date: Wed, 8 Apr 2020 16:28:08 -1000
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:68.0) Gecko/20100101 Thunderbird/68.6.0

Use tidyverse packages for data manipulation. They are excellent at this sort of thing.

I had a similar problem. I used readr::read_delim to read a .csv from Twint's representation of twitter data into tibble 'tweets'. Each tweet mentions several users in the tweets$mentions field, in the same format as yours but as a string, for example "['repadamschiff', 'realdonaldtrump']"

I used stringr::str_extract_all to turn this string into a list, and then tidyr::unnest_longer to turn the single row into one row per each value of this list:

mention_edges <-
  tweets %>%
  # Extract lists of mentioned users from the string representation.
  mutate(mentioned_user = str_extract_all(tweets$mentions,
                                          boundary("word"))) %>%
  # Unnest each mention into its own row
  unnest_longer(mentioned_user) %>%
  # drop tweets that don't mention anyone
  drop_na(mentioned_user) %>%
  ... continues with other processing

It is done in memory, but I have been able to run this on a fairly large data set.

-- Dan

On 4/8/20 2:41 AM, Siddhartha R Dalal wrote:
I have many  large dataframes of the following structure with 1 input node in each row and multiple output nodes and edge weights. 
  input_node            output_nodes             edge-weights             id-attr      attribute
1    11347-5 ['64837-1', '116228-0']  [0.01001617, 0.01778383] 82249852    372856
2   116228-0             ['14328-3']                 [0.3505]                     82283186    372892
3    39644-0            ['116228-0']             [0.10184362]                 82273700    372878
4   116228-0            ['116228-0']             [0.21326264]                82278451    372887
5   116228-0 ['64827-1', '116228-0']  [0.02947139, 0.08275262] 82249816    372855
>

For example, rows 1 and 5 have 1 input node, 2 output nodes,  the corresponding 2 edge weights (they are numbers), and few attributes; rows 2 through 4 have 1 input, and 1 output, etc .
How do I read this dataframe in igraph to make a graph while retaining attributes. Typically igraph asks for the dataframe to have the first 2 columns to be individual and output nodes. This is a large dataframe where, the # of output nodes could be large in some rows.
I can imagine doing this by a "for" loop and regex. But, that would be too slow and the new dataframe would require more memory. Would appreciate any suggestions.
Thank you. Sid

_______________________________________________
igraph-help mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/igraph-help
-- 
Dan Suthers 

Professor and Graduate Program Chair
Dept. of Information and Computer Sciences 
University of Hawaii at Manoa 
1680 East West Road, POST 309, Honolulu, HI 96822 
(808) 956-3890 office
Personal: http://www2.hawaii.edu/~suthers/
Lab: http://lilt.ics.hawaii.edu/
Department: http://www.ics.hawaii.edu/


reply via email to

[Prev in Thread] Current Thread [Next in Thread]