How to demultiplex a mix of single-indexed and dual-indexed samples

Bioinformatics Asked by AmadeusDrZaius on April 27, 2021

The problem

If I have a sample sheet that contains both single-indexed and dual-indexed samples, I can split it up into two sample sheets and then run bcl2fastq on each one. However, when doing this, large Undetermined fastq files are generated. E.g., when processing the single-indexed samples, all the dual-indexed samples go to Undetermined. And when processing the dual-indexed samples, the single-indexed samples go to Undetermined.

Additionally, because they are being processed separately, if any single index A is part of a dual index A+B, then when processing the single indexes, an A+B may be mistaken for an A, so it would seem that they need to processed simultaneously to avoid this mis-assignment.

The question

Given a sample sheet and directories of BCL files, how can such a set of sequencing data be demultiplexed correctly, either using bcl2fastq or the Picard tools?

To put it another way, I want to demultiplex a single sequencing run that contains both single-indexed and dual-indexed samples. It can be assumed that the indexes are sufficiently distinct such that any sample’s index configuration is different from any other. But assuming that the different index configurations are not segregated to particular lanes of the sequencer, the question is how to demultiplex the files correctly such that both the single-indexed and dual-indexed samples are recognized.

Attempts at a solution

Using bcl2fastq directly

If a sample sheet of a mix of indexes is given to bcl2fastq (v2.17) directly, it produces the error

ERROR: bcl2fastq::common::Exception: Success (0): .../bcl2fastq2/src/cxx/lib/layout/BarcodeCollisionDetector.cpp(127): 
Throw in function void bcl2fastq::layout::BarcodeCollisionDetector::validateNewBarcodeSizesAgainstExisting(const std::vector<long unsigned int>&) const
Dynamic exception type: boost::exception_detail::clone_impl<bcl2fastq::layout::BarcodeCollisionError>
std::exception::what: Barcodes have an unequal number of components.

Barcodes have an unequal number of components.

Using picard

It seems that it should be possible using the Picard tools, but I have not found a way to set up the inputs to ExtractIlluminaBarcodes and IlluminaBasecallsToFastq that processes this configuration correctly.

The syntax for supporting multiple index configurations is not entirely clear. But when using various combinations of N and ‘*’ on multiplex_params.tsv and barcodes.txt required by the Picard tools, a large Undetermined file is still produced and actual sample fastq files are tiny, indicating it is not processing them correctly.

Using multiple calls to bcl2fastq

As indicated in the discussion above, this suffers from the problem of having all the dual indexed samples going to the “Undetermined” file when processing the single-indexed samples, or vice versa, and creates two output directories of reports and stats, which must be merged.

Padding empty indexes with N

By padding, I mean converting ACGTACGT to ACGTACGT+NNNNNNNN so that the single-index samples in the sample sheet are “dual” as well. This strategy seems that it would work except there is a bug in bcl2fastq that it treats “N” literally instead of as a wildcard. See the release notes for details.

One Answer

Have you just tried giving bcl2fastq one sample sheet with a mix of single and dual indices? I don't think what you are trying to do is a problem.

if any single index A is part of a dual index A+B,

Well, it might be a problem if you did that. That was poor planning.

Answered by swbarnes2 on April 27, 2021

Add your own answers!

Related Questions

Blastp MSA to the same length

0  Asked on July 14, 2021


ATAC-seq macs2 peak splitting in sliding windows

1  Asked on July 12, 2021 by user5191


Subset FASTA file by species name

2  Asked on July 10, 2021 by tahunami


Querying metadata (GDC) using a filter

1  Asked on July 10, 2021 by lab


Get gene sequence based on the annotation

3  Asked on July 9, 2021 by igor-filippov


Hg38 annotation tracks retrieval

1  Asked on July 6, 2021 by trakesh


Extracting WDL map keys as a task

1  Asked on July 5, 2021 by xophmeister


A database for RefSeq protein accession IDs

1  Asked on July 3, 2021 by ehsan-salehabadi


two aligners combined results

1  Asked on July 2, 2021


Does NCBI’s blast API block my IP?

1  Asked on July 1, 2021


Ask a Question

Get help from others!

© 2023 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir