TransWikia.com

grep -v: How to exclude only the first (or last) N lines that match?

Unix & Linux Asked on November 23, 2021

Sometimes there are a few really annoying lines in otherwise tabular data like

column name | other column name
-------------------------------

I generally prefer removing garbage lines that shouldn’t be there by grep -v ing a reasonably unique string, but the problem with that approach is that if the reasonably unique string appears in the data by accident that’s a serious problem.

Is there a way to limit the number of lines that grep -v can remove (say to 1)? For bonus points, is there a way to count the number of lines from the end without resorting to <some command> | tac | grep -v <some stuff> | tac ?

5 Answers

Another possible solution is to use bashs own utilities:

count=1
found=0
cat execute-commons-fileupload.sh | while read line
do 
   if [[ $line == *"myPattern"* ]]
   then 
      if [ $found -eq $count ]
      then 
         echo "$line"
      else 
         found=$(($found+1))
      fi
   else 
     echo "$line"
   fi
done

By setting count, you can change the count of occurences of your pattern which you want to remove.

For me personally, this seems to be easier extendable, since you can easily add other conditions to the if statement (but this might be caused by my marginal knowledge of sed).

Answered by David Georg Reichelt on November 23, 2021

To do this you might have to use awk.

The simple way I know is this:

cat file | awk '{ $1=""; print}'

You can skip multiple columns too:

cat file | awk '{ $1=$2=$3=""; print}'

If you want to skip the last column and you're not sure how much columns you will have:

cat file | awk '{ $NF=""; print}'

Tested on Ubuntu 16.04 (GNU bash, version 4.3.48)

Best.

Answered by Peycho Dimitrov on November 23, 2021

You could use awk to ignore the first n lines that match (e.g. assuming you wanted to remove only the 1st and 2nd match from the file):

n=2
awk -v c=$n '/PATTERN/ && i++ < c {next};1' infile

To ignore the last n lines that match:

awk -v c=${lasttoprint} '!(/PATTERN/ && NR > c)' infile

where ${lasttoprint} is the line number of the nth+1 to last match in your file. There are various ways to get that line no. (e.g. print only the line number for each match via tools like sed/awk, then tail | head to extract it)... here's one way with gnu awk:

n=2
lasttoprint=$(gawk -v c=$((n+1)) '/PATTERN/{x[NR]};
END{asorti(x,z,"@ind_num_desc");{print z[c]}}' infile)

Answered by don_crissti on November 23, 2021

Perhaps reduce the chances of filtering out your data by using a more accurate grep command. For example:

grep -v -F -x 'str1'

For lines that are exatctly str1. Or maybe:

grep -v '^str1.*str2$'

For lines that start with 'str1' and end with 'str2'.

Answered by ifb on November 23, 2021

sed provides a simpler way:

... |  sed '/some stuff/ {N; s/^.*n//; :p; N; $q; bp}' | ...

This way you delete first occurrence.

If you want more:

sed '1 {h; s/.*/iiii/; x}; /some stuff/ {x; s/^i//; x; td; b; :d; d}'

, where count of i is count of occurrences (one or more, not zero).

Multi-line Explanation

sed '1 {
    # Save first line in hold buffer, put `i`s to main buffer, swap buffers
    h
    s/^.*$/iiii/
    x
}

# For regexp what we finding
/some stuff/ {
    # Remove one `i` from hold buffer
    x
    s/i//
    x
    # If successful, there was `i`. Jump to `:d`, delete line
    td
    # If not, process next line (print others).
    b
    :d
    d
}'

In addition

Probably, this variant will work faster, 'cos it reads all rest lines and print them in one time

sed '1 {h; s/.*/ii/; x}; /a/ {x; s/i//; x; td; :print_all; N; $q; bprint_all; :d; d}'

As result

You can put this code into your .bashrc (or config of your shell, if it is other):

dtrash() {
    if [ $# -eq 0 ]
    then
        cat
    elif [ $# -eq 1 ]
    then
        sed "/$1/ {N; s/^.*n//; :p; N; $q; bp}"
    else
        count=""
        for i in $(seq $1)
        do
            count="${count}i"
        done
        sed "1 {h; s/.*/$count/; x}; /$2/ {x; s/i//; x; td; :print_all; N; $q; bprint_all; :d; d}"

    fi
}

And use it this way:

# Remove first occurrence
cat file | dtrash 'stuff' 
# Remove four occurrences
cat file | dtrash 4 'stuff'
# Don't modify
cat file | dtrash

Answered by ValeriyKr on November 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP