TransWikia.com

Find match pattern and delete the first occurrence

Stack Overflow Asked by NEHA CHOUDHARY on December 8, 2020

I have a file1

NR2SKRD12BWP210H6P51CNODSVT-(.A1(n7),.A2(),.ZN(n8)); |2
MUX2D2BWP210H6P51CNODSVT-(.I0(n8),.I1(),.S(),.Z(n9)); |4
CKLHQD16BWP210H6P51CNODSVT-(.CPN(#),.E(1),.TE(n9),.Q(n10)); |5
LHCSNQD1BWP210H6P51CNODSVT-(.CDN(n10),.D(),.E(1),.SDN(),.Q(n11)); |6
OAI21D8BWP210H6P51CNODSVT-(.A1(n11),.A2(),.B(),.ZN(n12)); |9
DCCKND16BWP210H6P51CNODSVT-(.I(n12),.ZN(n13)); |10
INVSKFD14BWP210H6P51CNODSVT-(.I(n13),.ZN(n14)); |11
NR2SKRD12BWP210H6P51CNODSVT-(.A1(n7),.A2(n1),.ZN(n8)); |2
MUX2D2BWP210H6P51CNODSVT-(.I0(n8),.I1(n2),.S(),.Z(n9)); |4

I need to find matching lines in file1 i.e 1st field. Field are seperated by- and if match found , delete 1st match line.

I want output as

CKLHQD16BWP210H6P51CNODSVT-(.CPN(#),.E(1),.TE(n9),.Q(n10)); |5
LHCSNQD1BWP210H6P51CNODSVT-(.CDN(n10),.D(),.E(1),.SDN(),.Q(n11)); |6
OAI21D8BWP210H6P51CNODSVT-(.A1(n11),.A2(),.B(),.ZN(n12)); |9
DCCKND16BWP210H6P51CNODSVT-(.I(n12),.ZN(n13)); |10
INVSKFD14BWP210H6P51CNODSVT-(.I(n13),.ZN(n14)); |11
NR2SKRD12BWP210H6P51CNODSVT-(.A1(n7),.A2(n1),.ZN(n8)); |2
MUX2D2BWP210H6P51CNODSVT-(.I0(n8),.I1(n2),.S(),.Z(n9)); |4

Here NR2SKRD12BWP210H6P51CNODSVT andMUX2D2BWP210H6P51CNODSVT have same $1. So delete their 1st match line.

I tried the code

awk -F'-' 'FNR==NR{a[$1];next} !(($1) in a)' file1

But this code is to find match and delete lines between two files. How can I find match and delete for single file.
*delete the first match line only. Keep the second,third, fourth etc repeats.

4 Answers

To delete first duplicate:

awk -F- 'NR==FNR {++a[$1]; next} a[$1]==1; {a[$1]=1}' file file

Read same file twice. Count $1 on the first read, decide what to do with the count on the next.

Answered by rowboat on December 8, 2020

another awk

$ awk -F- 'NR==FNR{a[$1]++; next} !(--a[$1])' file{,}

CKLHQD16BWP210H6P51CNODSVT-(.CPN(#),.E(1),.TE(n9),.Q(n10)); |5
LHCSNQD1BWP210H6P51CNODSVT-(.CDN(n10),.D(),.E(1),.SDN(),.Q(n11)); |6
OAI21D8BWP210H6P51CNODSVT-(.A1(n11),.A2(),.B(),.ZN(n12)); |9
DCCKND16BWP210H6P51CNODSVT-(.I(n12),.ZN(n13)); |10
INVSKFD14BWP210H6P51CNODSVT-(.I(n13),.ZN(n14)); |11
NR2SKRD12BWP210H6P51CNODSVT-(.A1(n7),.A2(n1),.ZN(n8)); |2
MUX2D2BWP210H6P51CNODSVT-(.I0(n8),.I1(n2),.S(),.Z(n9)); |4

double scan file, first round count the occurrences of each key, second round print the last one only.

Answered by karakfa on December 8, 2020

This might work for you (GNU sed):

sed -E 'H;x;s/^(n[^-]*-)[^n]*(.*1)/2/;x;$!d;x;s/.//' file

Make a copy of the current line in the hold space.

If the current key already exists in the hold space, remove the first line.

At the end of the file, swap to the hold space, remove the first newline that was introduced when making copies and print the results.

Answered by potong on December 8, 2020

Could you please try following, written and tested with shown samples in GNU awk.

awk '
BEGIN{ FS="-" }
FNR==NR{
  arr[$1]++
  next
}
arr[$1]>1 && ++arrAgain[$1]==1{ next }
1
' Input_file Input_file

Explanation: Adding detailed explanation for above.

awk '                             ##Starting awk program from here.
BEGIN{ FS="-" }                   ##Setting field separator as dash here.
FNR==NR{                          ##Checking FNR==NR condition which will be TRUE when 1st time Input_file is being read.
  arr[$1]++                       ##Creating array arr with 1st field index and keep increasing its value with 1 on each of its occurrence.
  next                            ##next will skip all further statements from here.
}
arr[$1]>1 && ++arrAgain[$1]==1{   ##Checking if arr value with 1st field index is greater than 1 and its first time occurring in arrAgain then skip that line.
  next                            ##next will skip all further statements from here.
}
1                                 ##1 will print current line.
' Input_file Input_file           ##Mentioning Input_file names here.

Answered by RavinderSingh13 on December 8, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP