[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: algorithm for reducing rows/columns
From: |
Mike Miller |
Subject: |
Re: algorithm for reducing rows/columns |
Date: |
Mon, 20 Dec 2010 13:00:51 -0600 (CST) |
User-agent: |
Alpine 2.00 (DEB 1167 2008-08-23) |
On Mon, 20 Dec 2010, Kim Hansen wrote:
I just had some fun implementing the greedy approach, it is pasted in
below.
You need to select max after each delete, if you just do a single sort
the result will be bad. I tried on the simple example from run.m below,
and the one-sort gave a clique of 9 nodes when greedy_clique() gave a
clique of 272 nodes.
Yep, that works better than what I was doing. In my example I started
with 7278 subjects and reduced that to 4664 based on known relationships.
There were still some relatives in there, so I made the 4664 x 4664 matrix
with 1 for every pair with a probability of .2 or better of sharing one
allele identical by descent (basically closer relatives than first cousins
got the 1). When I used my sorting scheme, I identified a clique of 4593
non-relatives, so I'd remove 71 people because of their genetic relations
to the others. Your method found a clique of 4628, so that I only had to
drop 36 people instead of 71. Very nice. If I leave all 7278 subjects in
the relationship matrix, greedy_clique finds a clique of 4634, which is 6
better than when I "helped" by dropping known relatives in advance.
You've helped me a lot, Kim. Thanks!
Would you like to publish a short, one-page paper about this? We would
use my data and include your code. It would go well in Bioinformatics.
I could write about the genetics and reasons why we want to do this kind
of thing, and you could write a bit about the clique problem,
NP-completeness, the greedy method, or whatever. I would also explain
some efficient methods for computing the relationship matrix in the first
place.
Mike