Hi all,
I'm a first timer w/ igraph and am using it in R. I'm sorry for
putting to messages into one, but the trouble reading in a large
graph is causing me to try and subset a whole graph, so they're kind
of smashed together. I'll try to explain. Sorry, this is long.
I'm building a network from what is essentially an 'ncol' formatted
file. It's the protein interaction network for yeast from the STRING
db(http://string.embl.de). Here's the first few lines:
Q0032 YNL182C 433
Q0045 Q0060 186
Q0045 Q0085 716
Q0045 Q0105 997
Q0045 Q0110 882
Q0045 Q0115 898
Q0045 Q0120 928
Q0045 Q0130 201
Q0045 Q0140 222
Q0045 Q0250 999
...
=========
Problem 1
=========
I'd actually like to filter the graph I'm creating from the file on
the edge score listed on the third column above, but I'm not having
an easy time of doing this correctly. Earlier today I stumbled on
what I thought would be a simple way to do this:
# threshold and directed are defined elsewhere
graf <- read.graph(filepath, format='ncol', directed=directed)
good.edges <- E(graf)[weight >= threshold]
good.graf <- graph(edges, directed=directed)
Problem is I'm losing my name attributes from my vertices and my
`good.graf` now is just a list of vertex IDs. Being that I'm not
really comfortable with the igraph API, I'm having a hard time
figuring out the best way to keep the correct names associated with
the new IDs that are in `good.graf`.
I'm thinking this is done often, so I was wondering if anyone could
offer any good suggestions.
=========
Problem 2
=========
Prior to trying it this way, I was taking inspiration from the
technique "creating graphs" section here:
http://igraph.sourceforge.net/igraphbook/igraphbook-creating.html#id2569368
I'm subsetting my `links` file to contain only the edges with scores
> threshold, so that I can use the subsetted data.frame to build the
graph straight away, however there is some weirdness happening where
the first edge at the top of my `links` data.frame isn't being
registered, but its weight is, and my weights are all off by one.
Have a look:
links <- read.delim(pathToFile, sep=" ", header=FALSE,
stringsAsFactors=FALSE, col.names=c('from', 'to', 'weight'))
## I won't filter for easy comparison with data above
## links <- subset(links, weight >= threshold)
vNames <- unique(c(links$from, links$to))
ids <- seq_along(vNames) - 1
names(ids) <- vNames
edges <- matrix(c(ids[links$from], ids[links$to]), nc=2)
g <- add.vertices(graph.empty(directed=FALSE), length(ids),
name=vNames)
g <- add.edges(g, t(edges), weight=links$weight)
I will paste below on the left the edges in the graph. There's an
extra col surrounded with [] that I'm putting there to show you the
scores/weights that are associated with this edge (I'll show the R
output of the scores after so you see they are the same). On the
right (## comented) are the first 10 lines from the input file for
comparison:
# modifed head(E(g), 10) call (modifications in trailing []):
R> head(E(g), 10) ## Top 10 lines from file
[1] Q0045 -> Q0060 [433] ## Q0032 YNL182C 433
[2] Q0045 -> Q0085 [186] ## Q0045 Q0060 186
[3] Q0045 -> Q0105 [716] ## Q0045 Q0085 716
[4] Q0045 -> Q0110 [997] ## Q0045 Q0105 997
[5] Q0045 -> Q0115 [882] ## Q0045 Q0110 882
[6] Q0045 -> Q0120 [898] ## Q0045 Q0115 898
[7] Q0045 -> Q0130 [928] ## Q0045 Q0120 928
[8] Q0045 -> Q0140 [201] ## Q0045 Q0130 201
[9] Q0045 -> Q0250 [222] ## Q0045 Q0140 222
[10] Q0045 -> Q0275 [999] ## Q0045 Q0250 999
# Here is the call to get the weights with its output, just to verify
R> head(E(wtf$graph)$weight ,10)
[1] 433 186 716 997 882 898 928 201 222 999
See how my first edge in the igraph is really the second edge in my
input file, BUT its weight is the one from the first edge of my
input file? I'm stumped.
I tried my code on a smaller test file, and it works fine. The real
data file has 623,530 edges. In the original data file, each edge is
repeated once to represent the undirected nature of protein
interaction networks, eg there is:
A B <weight>
... somewhere down the line ...
B A <weight>
My small test file did the same thing, and it worked fine. Also the
problem still persists whether I specify my igraph to be directed or
undirected prior to constructing.
I feel like I'm doing something wrong, since I'm imagining this
behavior would have been spotted by now if there's a bug, but I
can't see where I'm making a mistake.
Any help to either issue would be greatly appreciated.
Thanks,
-steve