bammds-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bammds-users] Dataset problem


From: José Víctor Moreno Mayar
Subject: Re: [Bammds-users] Dataset problem
Date: Thu, 11 Jun 2015 14:33:48 +0200

Hello Daniel,

Sorry for the late reply!

The quick solution would be to run with the HGDP file that we distribute. Can you try to download from here?

http://dna.ku.dk/~jmoreno/HGDP_hg19_one_allele.txt

With regards to your panel, I guess it is the weird chr names that are killing bammds. My suggestion to fix your panel would be that you extract only autosomes and create a new plink file. 

for i in $(seq 1 22)
do
echo 'grep -P "^'$i'\t" /home/daniel/NGSdata/Datasets/hgdp/HGDP952.bim | cut -f 2' | sh >> rsToInclude
done

plink --noweb --bfile /home/daniel/NGSdata/Datasets/hgdp/HGDP952 --make-bed --out HGDP952_auto --extract rsToInclude

Try running with that new panel, and let me know if it worked. 

Hope it helps!

Cheers, 

Víctor


On Tue, May 26, 2015 at 5:53 PM, Daniel Fernandes <address@hidden> wrote:
Hi,

I am trying to use bammds on some MiSeq data but I am getting an error regarding headers.
I was able to run the program without any crash using your small "HDGP.Reference.Subsample.txt" by editing my bam file's header to be in the order 1-22,X,Y,M.

However I would like to use the complete dataset, but it is not working.
My sample was aligned to hg19 from UCSC, and here is the header:

@HD    VN:1.3    SO:coordinate
@SQ    SN:1    LN:249250621
@SQ    SN:2    LN:243199373
@SQ    SN:3    LN:198022430
@SQ    SN:4    LN:191154276
@SQ    SN:5    LN:180915260
@SQ    SN:6    LN:171115067
@SQ    SN:7    LN:159138663
@SQ    SN:8    LN:146364022
@SQ    SN:9    LN:141213431
@SQ    SN:10    LN:135534747
@SQ    SN:11    LN:135006516
@SQ    SN:12    LN:133851895
@SQ    SN:13    LN:115169878
@SQ    SN:14    LN:107349540
@SQ    SN:15    LN:102531392
@SQ    SN:16    LN:90354753
@SQ    SN:17    LN:81195210
@SQ    SN:18    LN:78077248
@SQ    SN:19    LN:59128983
@SQ    SN:20    LN:63025520
@SQ    SN:21    LN:48129895
@SQ    SN:22    LN:51304566
@SQ    SN:X    LN:155270560
@SQ    SN:Y    LN:59373566
@SQ    SN:M    LN:16569
@RG    ID:PetExp+Dab    SM:D1    PL:ILLUMINA

It worked using your subsampled dataset file (although giving only less than 50 matches and being a total outlier). However, using the original HGDP dataset (I used in ped format), I was given this error after about 20 minutes:

bammds: Error: The mpileup command does not work. Are the chromosomes named the same way in all files (e.g. NN and not chrNN)?
Are the chromosomes ordered the same way in all the files?
bammds: Error: BAM header of AUG.test.bam:
@HD VN:1.3 SO:coordinate
@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
@SQ SN:13 LN:115169878
@SQ SN:14 LN:107349540
@SQ SN:15 LN:102531392
@SQ SN:16 LN:90354753
@SQ SN:17 LN:81195210
@SQ SN:18 LN:78077248
@SQ SN:19 LN:59128983
@SQ SN:20 LN:63025520
@SQ SN:21 LN:48129895
@SQ SN:22 LN:51304566
@SQ SN:X LN:155270560
@SQ SN:Y LN:59373566
@SQ SN:M LN:16569
@RG    ID:PetExp+Dab    SM:D1    PL:ILLUMINA
bammds: Error: Header of panel:
@HD VN:1.3 SO:coordinate
@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
@SQ SN:13 LN:115169878
@SQ SN:14 LN:107349540
@SQ SN:15 LN:102531392
@SQ SN:16 LN:90354753
@SQ SN:17 LN:81195210
@SQ SN:18 LN:78077248
@SQ SN:19 LN:59128983
@SQ SN:20 LN:63025520
@SQ SN:21 LN:48129895
@SQ SN:22 LN:51304566
@SQ SN:X LN:155270560
@SQ SN:Y LN:59373566
@SQ SN:M LN:16569
@RG    ID:PetExp+Dab    SM:D1    PL:ILLUMINA

Looks like I am getting the input read as the dataset? This was the command:
bammds AUG.test.bam /home/daniel/NGSdata/Datasets/hgdp/HGDP952.bed

I have noticed that the map file for this dataset seems to have the chromossomes in a different order:
23    rs473491    0    154553040
23    rs644138    0    154580775
23    rs557132    0    154582606
24    rs2058276    0    2728456
24    rs1865680    0    6928118
24    rs2032597    0    13357186
24    rs2032590    0    13529007
24    rs2032624    0    13535818
24    rs3848982    0    20176596
24    rs2032612    0    20325879
24    rs2032621    0    20332126
24    rs2032617    0    20355649
24    rs2032652    0    20376701
25    rs4933045    0    214201
25    rs2738388    0    237771

Being 22,Y,X,M. Can that be the problem, since my files have 22,X,Y,M? I'm trying now with this order in my bam, but it is taking its time.

I tried to use the same file with the dataset files that can be downloaded directly from http://www.hagsc.org/hgdp/files.html and got also an error:

Argument "MitoA16164G" isn't numeric in multiplication (*) at /usr/local/bin/bammds line 16392.
Argument "MitoA1738G" isn't numeric in multiplication (*) at /usr/local/bin/bammds line 16392.
bammds: Error: Your reference panel must be sorted by chromosome and position. It is not at chr MitoA1738G, pos 1738.
You may be able to fix it by doing something like:
  cat reference.txt | head -n 2 > ref.out
  cat reference.txt | tail -n +3 | sort -k1,1n -k3,3n >> ref.out
  mv ref.out reference.txt

Thank you and regards,
Daniel Fernandes
School of Archaeology PhD Student
University College Dublin
Funded by the Irish Research Council
http://www.ucd.ie/archaeology/research/phd/daniel_fernandes/

PS: your server to download your dataset is not working (wget http://dna.ku.dk/~sapfo/HGDP_hg19_one_allele.txt). Access forbidden.



--
José Víctor Moreno Mayar
Centre for GeoGenetics
University of Copenhagen

reply via email to

[Prev in Thread] Current Thread [Next in Thread]