TransWikia.com

Unable to open .bam file in C++ using SeqAn due to 'seqan::UnknownExtensionError'

Bioinformatics Asked on August 10, 2021

I am trying to open .bam files in C++ to extract reads occurring at specific scaffolds and loci. I essentially want to call "samtools view sample.bam -o sample.sam scaffold:pos-pos" from C++. I have tried system("samtools view sample.bam -o sample.sam scaffold:pos-pos") and can explain the issues if you want to know, but for the sake of conciseness I will leave the details out of this post. So, I am using SeqAn. When I run my script, I get the following error:

terminate called after throwing an instance of 'seqan::UnknownExtensionError'
  what():  Unknown file extension of copyPolly_SRR6511930.bam: iostream error

I am running my program on my institution’s supercomputer clusters (Texas A&M Terra cluster), to which I do not have sudo access. I have loaded the following modules:

GCC/9.3.0
SeqAn/2.4.0-GCCcore-9.3.0
zlib/1.2.11-GCCcore-9.3.0

Here is my minimal script:

#include <seqan/bam_io.h>
#include <zlib.h>

using namespace seqan;

int main(){
    CharString bamFileName = "copyPolly_SRR6511930.bam";
// Open input file.
    BamFileIn bamFileIn(toCString(bamFileName));
    if (!open(bamFileIn, toCString(bamFileName))){
        std::cerr << "ERROR: Could not open " << bamFileName << std::endl;
        return 1;
    }

    unsigned numUnmappedReads = 0;

    try{
        // Read header.
        BamHeader header;
        readHeader(header, bamFileIn);

        // Read records.
        BamAlignmentRecord record;
        while (!atEnd(bamFileIn)){
            readRecord(record, bamFileIn);
            if (hasFlagUnmapped(record))
                numUnmappedReads += 1;
        }
    }
    catch (Exception const & e)
    {
        std::cout << "ERROR: " << e.what() << std::endl;
        return 1;
    }
    return 0;
}

I have tried the following solutions:

-Adding #define SEQAN_HAS_ZLIB (1) to my script’s header

-Adding -D SEQAN_HAS_ZLIB to my compilation such that it reads g++ script.cpp -D SEQAN_HAS_ZLIB

-Adding -lz to my compilation such that it reads g++ script.cpp -lz

I am not dead-set on using SeqAn for this project, so if anyone knows of another C++ library which extracts specific reads from .bam files, please share.

Update: I’ve never used HTSlib and it seems to have a fairly steep learning curve, so if anyone has an answer for the SeqAn error, that would still be very much appreciated.

3 Answers

The helpdesk for my institution's supercomputers showed me the following flags:

g++ seqan_test.cpp -DSEQAN_HAS_ZLIB -lz -lpthread

This fixed the error. I'm now running into different errors and am suspicious that I am getting them because I am using SeqAn2 rather than SeqAn3. I'm going to try HTSlib.

Correct answer by annabelperry on August 10, 2021

Htslib is the official library to access alignment files. It supports more formats. The typical way to read through a SAM/BAM/CRAM:

#include "htslib/sam.h"
int main(int argc, char *argv[])
{
  samFile *fp = sam_open(argv[1], "r");
  bam_hdr_t *h = sam_hdr_read(fp);
  bam1_t *b = bam_init1();
  while (sam_read1(fp, h, b) >= 0) {
    if (b->core.tid < 0) continue;
    int end_pos = bam_endpos(b);
    printf("%st%dt%dn", h->target_name[b->core.tid], b->core.pos, end_pos);
  }
  bam_destroy1(b);
  bam_hdr_destroy(h);
  sam_close(fp);
}

PS: the above started as a comment for "relevant but minor or transient information to" the original question. I didn't consider htslib a direct answer to the question, but because this had been changed to an answer, I added some details anyway. By the way, seqan2 has largely been discontinued. Use seqan3.

Answered by gringer on August 10, 2021

I'm not quite sure why assigning a string to a variable would lead to a complaint about file extensions. From the SeqAn BAM tutorial, it looks like you need to use the BamFileIn object for reading BAM/SAM files:

#include <seqan/bam_io.h>
using namespace seqan;

int main(){
    CharString bamFileName = "copyPolly_SRR6511930.bam";
    // Open input file, BamFileIn can read SAM and BAM files.
    BamFileIn bamFileIn;
    if (!open(bamFileIn, toCString(bamFileName))) {
        std::cerr << "ERROR: Could not open " << bamFileName << std::endl;
        return 1;
    }
    return 0;
}

Edit: after seeing the updated code... I've got nothing, sorry. It all seems like reasonable code that shouldn't be generating errors.

Answered by gringer on August 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP