TransWikia.com

I want to compare and match two files and print them into one file

Unix & Linux Asked by SKG on September 14, 2020

I have two files, file1 and file2

file1:

r11_abc_gkhsa 1.0 1.5 1.9
r11_bcd_gkhsa 1.0 1.5 1.7
r11_acd_gkhsa 1.3 1.6 1.5
r11_xyz_gkhsa 1.0 1.5 1.9

file2:

sd1_bcd_gkhsa 1.8 1.5 1.9
ab1_abc_gkhsa 1.6 1.4 1.5
sfs_xyz_gkhsa 1.4 1.6 1.4
sd1_acd_gkhsa 1.2 1.3 1.5
sfs_ryb_gkhsa 1.5 1.2 1.7

I want to match " abc , bcd, acd, and xyz" of file1 with file2. Whenever it matched with file2 I want to print it the following way.

Output:

r11_abc_gkhsa 1.0 1.5 1.9     ab1_abc_gkhsa 1.6 1.4 1.5
r11_bcd_gkhsa 1.0 1.5 1.7     sd1_bcd_gkhsa 1.8 1.5 1.9
r11_acd_gkhsa 1.3 1.6 1.5     sd1_acd_gkhsa 1.2 1.3 1.5
r11_xyz_gkhsa 1.0 1.5 1.9     sfs_xyz_gkhsa 1.4 1.6 1.4
sfs_ryb_gkhsa 1.5 1.2 1.7

can use Perl or sed. can someone give me ideas to work on it.

2 Answers

If you just want to use plain bash arrays --

#read in the data from 2 files
unset arr1; declare -A arr1; 
while read -r -u3 line; do 
    i=${line%_*}; 
    i=${i#*_}; 
    arr1[$i]+=" $line"; 
done 3< <(cat f1 f2); 
exec 3<&-
#output array by iterating throug the keys
for k in "${!arr1[@]}"; do 
     echo ${arr1[$k]}; 
done | sort

Output --

r11_abc_gkhsa 1.0 1.5 1.9 ab1_abc_gkhsa 1.6 1.4 1.5
r11_acd_gkhsa 1.3 1.6 1.5 sd1_acd_gkhsa 1.2 1.3 1.5
r11_bcd_gkhsa 1.0 1.5 1.7 sd1_bcd_gkhsa 1.8 1.5 1.9
r11_xyz_gkhsa 1.0 1.5 1.9 sfs_xyz_gkhsa 1.4 1.6 1.4
sfs_ryb_gkhsa 1.5 1.2 1.7

Answered by jai_s on September 14, 2020

Using join,sort, and sed:

join -j 2 -t_ -a 1 -a 2  -o 1.1,1.2,1.3,1.9999,2.1,2.2,2.3 
     <(sort -t_ -k2 file1) <(sort -t_ -k2 file2) | 
     sed 's/__/  /g;s/^ *//g' | sort
  1. sort file1 & file2 using bash's *process substitution, then...
  2. Using _ as a field separator, join the two sorted files on common instances of field #2, and also print singly any line from either file that doesn't match. The nonexistent field 1.9999 separates each joined pair with an extra _ to simplify step #3.
  3. Clean up ugly bits of output with sed.
  4. sort the results.

Output:

r11_abc_gkhsa 1.0 1.5 1.9  ab1_abc_gkhsa 1.6 1.4 1.5
r11_acd_gkhsa 1.3 1.6 1.5  sd1_acd_gkhsa 1.2 1.3 1.5
r11_bcd_gkhsa 1.0 1.5 1.7  sd1_bcd_gkhsa 1.8 1.5 1.9
r11_xyz_gkhsa 1.0 1.5 1.9  sfs_xyz_gkhsa 1.4 1.6 1.4
sfs_ryb_gkhsa 1.5 1.2 1.7

Answered by agc on September 14, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP