TransWikia.com

How to get rows with similar values in two different columns using command line?

Bioinformatics Asked by MudithMMBc on November 28, 2020

I have following example data from a large file.

cont100 1128 1125
cont1005 3642 3642
cont1006 103 19
cont1037 3146 3146
cont104 895 890
cont1056 934 934
cont1059 1750 1750
cont1072 2577 2577
cont1078 43 42
cont1081 727 2

I need to get only the rows with similar values in column 2 and column 3. The output should be like this.

cont1005 3642 3642
cont1037 3146 3146
cont1056 934 934
cont1059 1750 1750
cont1072 2577 2577

Also I need to get the inverse of that. That means I need to get the rows with values that are not similar in the columns. The output should be like this.

cont100 1128 1125
cont1006 103 19
cont104 895 890
cont1078 43 42
cont1081 727 2

2 Answers

I think it is easy to do with awk, e.g. see here.

For your example, with your data in file:

%  awk '$2 != $3 {print $0}' file
# outputs
cont100 1128 1125
cont1006 103 19
cont104 895 890
cont1078 43 42
cont1081 727 2

%  awk '$2 == $3 {print $0}' file
# outputs:
cont1005 3642 3642
cont1037 3146 3146
cont1056 934 934
cont1059 1750 1750
cont1072 2577 2577

Note that where you say "similar" I am interpreting that to mean "exactly the same", because that is the case in your example.

Correct answer by Maximilian Press on November 28, 2020

The Perl solution is similar to the awk solution from MaximilianPress:

Print lines with identical values in columns 2 and 3:

perl -lane 'print if $F[1] == $F[2];' in.txt

# Prints:
cont1005 3642 3642
cont1037 3146 3146
cont1056 934 934
cont1059 1750 1750
cont1072 2577 2577

Print lines with different values in columns 2 and 3:

perl -lane 'print if $F[1] != $F[2];' in.txt

# Prints:
cont100 1128 1125
cont1006 103 19
cont104 895 890
cont1078 43 42
cont1081 727 2

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option. Arrays in Perl are 0-indexed.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Answered by Timur Shtatland on November 28, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP