TransWikia.com

Join two csv files by matching columns, join command

Unix & Linux Asked by cdxun on December 20, 2020

I have two .csv files that I need to match based on column 1.

The two file structures look like this.

FILE1

gopAga1_00004004-RA,1122.825534,    -2.497919969,   0.411529843

gopAga1_00010932-RA,440.485381, 1.769511316,    0.312853434 

gopAga1_00007012-RA, 13.37565185,   -1.973108929,   0.380227982

etc...

FILE2

gopAga1_00004004-RA,    ENSACAP00000013845

gopAga1_00009937-RA,    ENSACAP00000000905

gopAga1_00010932-RA,    ENSACAP00000003279

gopAga1_00000875-RA,    ENSACAP00000000296

gopAga1_00010837-RA,    ENSACAP00000011919

gopAga1_00007012-RA,    ENSACAP00000012682

gopAga1_00017831-RA,    ENSACAP00000016147

gopAga1_00005588-RA,    ENSACAP00000011117

etc..

This is my current command that I am running using join:

This is formatted from what I have also read on the following threads here

join -1 1 -2 1 -t , -a 1 -e "NA" -o "2.2,1.1,1.2,1.3" <(sort -k 1 healthy_vs_unhealthy_de.csv) <(sort RBH.csv) > output.txt

However, every time I run this prompt it only writes the first row to output.

Anyone know why my code is running like this and not actually merging the two files based on the GOP ID?

One Answer

we should specify delimiter as comma for sort

# join -1 1 -2 1 -t , -a 1 -e "NA" -o "2.2,1.1,1.2,1.3" <(sort -t',' -k 1 healthy_vs_unhealthy_de.csv) <(sort -t',' RBH.csv)
ENSACAP00000013845,gopAga1_00004004-RA,1122.825534,    -2.497919969
ENSACAP00000012682,gopAga1_00007012-RA, 13.37565185,   -1.973108929
ENSACAP00000003279,gopAga1_00010932-RA,440.485381, 1.769511316

Answered by Siva on December 20, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP