TransWikia.com

How to get % similarity between strains and mutation files

Bioinformatics Asked on January 22, 2021

I’m very new to python, and having some difficulty getting hang of some more complicated things

I have multiple files which look like so:

hCoV-19/Singapore/4/2020|EPI_ISL_410535|2020-02-03

hCoV-19/USA/WA13-UW9/2020|EPI_ISL_413601|2020-03-02

hCoV-19/USA/WA-UW142/2020|EPI_ISL_416680|2020-03-11

Please be aware that the lines above are meant to be one file

I want to extract the EPI_ISL_000000 for an easy comparison among files.

Could someone please advise on:

  1. A programme to extract this data into new files (There’s many lines in each file – 1000+)

  2. A programme to then give a % comparison between two or more files – comparing all lines in one file against all lines in a second+ file

One Answer

   left_lineagelist = [x.split('_')[-1].split('|')[0] 
                          for x in left_lineagelist]
        right_lineagelist = set([x.split('_')[-1].split('|')[0] 
                          for x in right_lineagelist])

Allows for extraction of 6 digit EPI, provided the file has had sequences removed prior; as such:

for line in lines:
    if line[0] == '>':
        print(line[1:])

Correct answer by Theo Jones on January 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP