TransWikia.com

Find Total Number of Repetetions of numbers in a file

Unix & Linux Asked on December 6, 2021

I have a file with a string Global=x , where x is a number in between lines of text. I want to calculate the total number of repetitions of the number ‘x’ extracted from the string "Global=x". I don’t want the number of occurrences of each ‘x’ printed.

For example, if the input file is like

Global=33333
Global=33333
Global=33334
Global=33335
Global=33336
Global=33337
Global=33337
Global=33337

the output should be 2, as two numbers ‘33333’ and ‘33337’ are repeated (it does not matter how many times they are repeated).

I tried

grep -Po '(Global)=Kd+' file.dat | sort | uniq -c

but I get the frequency of occurrence of each number, which I don’t need:

2 33333
1 33334
1 33335
1 33336
3 33337

Any help will be appreciated, gre, awk and sed solutions are acceptable.

3 Answers

Using any awk in any shell on every UNIX box:

$ awk -F'=' '++cnt[$2] == 2{ dups++ } END{print dups+0}' file
2

If you do need to check for Global then:

$ awk -F'=' '($1 == "Global") && (++cnt[$2] == 2){ dups++ } END{print dups+0}' file
2

The +0 in the END is to ensure you get numeric output (0 instead of a null string) even if there are no dups in the input.

Answered by Ed Morton on December 6, 2021

You could change uniq -c to uniq -d:

$ grep -Po '(Global)=Kd+' file.dat | sort | uniq -d
33333
33337

-d prints only duplicated lines. A further pipe to wc -l could count those lines. Also note that both -P & -o options to grep are non-standard, so will not be available in every version of grep.

Answered by guest on December 6, 2021

To get a list of numbers that are repeated and eliminate all extra processes:

$ awk -F= '$1=="Global"{c[$2]++} END{for (num in c) if(c[num]>1)print num}' file.dat
33333
33337

The above code uses = as a field separator. If the first field is Global, then we keep track in associative array c of the number of times that the second field, $2, has appeared in the file.

After the file has been read completely, we look through array c and print all numbers which had a count larger than 1.

Shorter version

As proposed by glenn jackman in the comments, we could simply print the number on its second appearance:

$ awk -F= '++c[$2] == 2 {print $2}' file.dat
33333
33337

Answered by John1024 on December 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP