Re: [Bammds-users] Dataset problem

On Tue, May 26, 2015 at 5:53 PM, Daniel Fernandes <address@hidden> wrote:

Hi,

I am trying to use bammds on some MiSeq data but I am getting an error regarding headers.
I was able to run the program without any crash using your small "HDGP.Reference.Subsample.txt" by editing my bam file's header to be in the order 1-22,X,Y,M.

However I would like to use the complete dataset, but it is not working.
My sample was aligned to hg19 from UCSC, and here is the header:

@HD    VN:1.3    SO:coordinate
@SQ    SN:1    LN:249250621
@SQ    SN:2    LN:243199373
@SQ    SN:3    LN:198022430
@SQ    SN:4    LN:191154276
@SQ    SN:5    LN:180915260
@SQ    SN:6    LN:171115067
@SQ    SN:7    LN:159138663
@SQ    SN:8    LN:146364022
@SQ    SN:9    LN:141213431
@SQ    SN:10    LN:135534747
@SQ    SN:11    LN:135006516
@SQ    SN:12    LN:133851895
@SQ    SN:13    LN:115169878
@SQ    SN:14    LN:107349540
@SQ    SN:15    LN:102531392
@SQ    SN:16    LN:90354753
@SQ    SN:17    LN:81195210
@SQ    SN:18    LN:78077248
@SQ    SN:19    LN:59128983
@SQ    SN:20    LN:63025520
@SQ    SN:21    LN:48129895
@SQ    SN:22    LN:51304566
@SQ    SN:X    LN:155270560
@SQ    SN:Y    LN:59373566
@SQ    SN:M    LN:16569
@RG    ID:PetExp+Dab    SM:D1    PL:ILLUMINA

It worked using your subsampled dataset file (although giving only less than 50 matches and being a total outlier). However, using the original HGDP dataset (I used in ped format), I was given this error after about 20 minutes:

bammds: Error: The mpileup command does not work. Are the chromosomes named the same way in all files (e.g. NN and not chrNN)?
Are the chromosomes ordered the same way in all the files?
bammds: Error: BAM header of AUG.test.bam:
@HD VN:1.3 SO:coordinate
@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
@SQ SN:13 LN:115169878
@SQ SN:14 LN:107349540
@SQ SN:15 LN:102531392
@SQ SN:16 LN:90354753
@SQ SN:17 LN:81195210
@SQ SN:18 LN:78077248
@SQ SN:19 LN:59128983
@SQ SN:20 LN:63025520
@SQ SN:21 LN:48129895
@SQ SN:22 LN:51304566
@SQ SN:X LN:155270560
@SQ SN:Y LN:59373566
@SQ SN:M LN:16569
@RG    ID:PetExp+Dab    SM:D1    PL:ILLUMINA
bammds: Error: Header of panel:
@HD VN:1.3 SO:coordinate
@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
@SQ SN:13 LN:115169878
@SQ SN:14 LN:107349540
@SQ SN:15 LN:102531392
@SQ SN:16 LN:90354753
@SQ SN:17 LN:81195210
@SQ SN:18 LN:78077248
@SQ SN:19 LN:59128983
@SQ SN:20 LN:63025520
@SQ SN:21 LN:48129895
@SQ SN:22 LN:51304566
@SQ SN:X LN:155270560
@SQ SN:Y LN:59373566
@SQ SN:M LN:16569
@RG    ID:PetExp+Dab    SM:D1    PL:ILLUMINA

Looks like I am getting the input read as the dataset? This was the command:
bammds AUG.test.bam /home/daniel/NGSdata/Datasets/hgdp/HGDP952.bed

I have noticed that the map file for this dataset seems to have the chromossomes in a different order:
23    rs473491    0    154553040
23    rs644138    0    154580775
23    rs557132    0    154582606
24    rs2058276    0    2728456
24    rs1865680    0    6928118
24    rs2032597    0    13357186
24    rs2032590    0    13529007
24    rs2032624    0    13535818
24    rs3848982    0    20176596
24    rs2032612    0    20325879
24    rs2032621    0    20332126
24    rs2032617    0    20355649
24    rs2032652    0    20376701
25    rs4933045    0    214201
25    rs2738388    0    237771

Being 22,Y,X,M. Can that be the problem, since my files have 22,X,Y,M? I'm trying now with this order in my bam, but it is taking its time.

I tried to use the same file with the dataset files that can be downloaded directly from http://www.hagsc.org/hgdp/files.html and got also an error:

Argument "MitoA16164G" isn't numeric in multiplication (*) at /usr/local/bin/bammds line 16392.
Argument "MitoA1738G" isn't numeric in multiplication (*) at /usr/local/bin/bammds line 16392.
bammds: Error: Your reference panel must be sorted by chromosome and position. It is not at chr MitoA1738G, pos 1738.
You may be able to fix it by doing something like:
cat reference.txt | head -n 2 > ref.out
cat reference.txt | tail -n +3 | sort -k1,1n -k3,3n >> ref.out
mv ref.out reference.txt

Thank you and regards,
Daniel Fernandes
School of Archaeology PhD Student
University College Dublin
Funded by the Irish Research Council
http://www.ucd.ie/archaeology/research/phd/daniel_fernandes/

PS: your server to download your dataset is not working (wget http://dna.ku.dk/~sapfo/HGDP_hg19_one_allele.txt). Access forbidden.

José Víctor Moreno Mayar
Centre for GeoGenetics
University of Copenhagen

From:	José Víctor Moreno Mayar
Subject:	Re: [Bammds-users] Dataset problem
Date:	Thu, 11 Jun 2015 14:33:48 +0200