TransWikia.com

creating a tab delimited file

Bioinformatics Asked by Edwardo on May 29, 2021

I am working on a project using a fasta file. I am writing my command in nano within command-line and executing using python, also within my command-line.

I would like my command to provide me with a tab delimited file with three columns: first column should contain my sequence name, second column should provide me with my sequence length, and the third column should show the sequence itself.

I have written the following command so far within nano:

from Bio import SeqIO
import sys
for hello_fasta in SeqIO.parse(sys.argv[1], "fasta"):

  list = hello_fasta.split("t")

  print hello_fasta.description
  print (len(hello_fasta.seq))

For example, I would like my command to provide me with the desired output and with the following order: Gene name ; Gene length ; Gene seq

H0192X 26 FORUWOHRPPTRWFAWWEAKJNFWEJ

2 Answers

You can use a list and insert() to add an element in a specific order, then expand the list with *. Or you can use join().

from Bio import SeqIO
import sys

for hello_fasta in SeqIO.parse(sys.argv[1], "fasta"):
  sequences = []
  sequences.insert(0, hello_fasta.description)
  sequences.insert(1, len(hello_fasta.seq))
  sequences.insert(2, hello_fasta.seq)
  # option 1
  print(*sequences, sep='t')
  # option 2
  print('t'.join(map(str, sequences)))

Answered by zorbax on May 29, 2021

Here's a solution using pandas if you want to save the tsv:

from Bio import SeqIO
import pandas as pd
from io import StringIO

example = """
>seq0
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>seq1
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLMELKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>seq2
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>seq3
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
>seq4
EEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVVSYEMRLFGVQKDNFALEHSLL
>seq5
SWEEFAKAAEVLYLEDPMKCRMCTKYRHVDHKLVVKLTDNHTVLKYVTDMAQDVKKIEKLTTLLMR
>seq6
FTNWEEFAKAAERLHSANPEKCRFVTKYNHTKGELVLKLTDDVVCLQYSTNQLQDVKKLEKLSSTLLRSI
>seq7
SWEEFVERSVQLFRGDPNATRYVMKYRHCEGKLVLKVTDDRECLKFKTDQAQDAKKMEKLNNIFF
>seq8
SWDEFVDRSVQLFRADPESTRYVMKYRHCDGKLVLKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>seq9
KNWEDFEIAAENMYMANPQNCRYTMKYVHSKGHILLKMSDNVKCVQYRAENMPDLKK
>seq10
FDSWDEFVSKSVELFRNHPDTTRYVVKYRHCEGKLVLKVTDNHECLKFKTDQAQDAKKMEK
"""

# This example just happens to be a string, just load your
# fasta file using the method you're already using
example_records = SeqIO.parse( StringIO(example), 'fasta')

# Dictionary to hold the data you eventually want in the tsv
data = {"Gene name" : list(),
        "Gene length" : list(),
        "Gene seq" : list()}

# Append the necessary into the data dictionary
for record in example_records:
    data['Gene name'].append(record.description)
    data['Gene length'].append(len(record.seq))
    data['Gene seq'].append(str(record.seq))

# Convert your data into a pandas DataFrame and save as a tsv
gene_df = pd.DataFrame(data)
gene_df.to_csv("gene_info.tsv", sep = 't', index = False)

This results in a tsv that looks like this:

$ head gene_info.tsv
Gene name       Gene length     Gene seq
seq0    62      FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
seq1    106     KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLMELKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
seq2    67      EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
seq3    58      MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
seq4    62      EEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVVSYEMRLFGVQKDNFALEHSLL
seq5    66      SWEEFAKAAEVLYLEDPMKCRMCTKYRHVDHKLVVKLTDNHTVLKYVTDMAQDVKKIEKLTTLLMR
seq6    70      FTNWEEFAKAAERLHSANPEKCRFVTKYNHTKGELVLKLTDDVVCLQYSTNQLQDVKKLEKLSSTLLRSI
seq7    65      SWEEFVERSVQLFRGDPNATRYVMKYRHCEGKLVLKVTDDRECLKFKTDQAQDAKKMEKLNNIFF
seq8    68      SWDEFVDRSVQLFRADPESTRYVMKYRHCDGKLVLKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM

Hopefully this helps!

Answered by Robert Link on May 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP