compare available list with chosen ids in shell script

Question

I have a log file containing events where a user has chosen a certain ID, which I want to parse to create a summary.
For every event, the file contains the list of IDs available for the user in single lines starting with the keyword Available, and at the end the ID the user ultimately chose, in a line starting with the keyword Chosen. Below is an example
Available for user:75=1654 at Time=5504.09 
Chosen by user:75=1655

Available for user:10=1300 at Time=550.09
Available for user:10=1301 at Time=550.09
Available for user:10=1303 at Time=550.09
Chosen by user:10=1301

In the example, we see that the user 75 has chosen the ID 1655 although only 1654 was shown as Available at that time. At a different time, we see that the user 10 had ID options 1300, 1301 and 1303 to choose from, and selected 1301.
Note that there may be more than one such event logged for every user; some users may have hundreds of such choice events logged for them.
The question is: How can we count the occurence of those events and print a summary based on the log file. I want to print for every user a summary of how often that user chose an ID that was available to him, and how often he chose one that was not available.
The output would ideally look like
User 1 chose 103 ids from list and 23 not from list
User 2 chose 31 ids from list and 6 not from list
...

I have tried with grep but was not able to store the list of available IDs each time and compare with chosen one. But any solution with grep, awk or sed is welcome.

Ed Morton · Answer

This might be what you're looking for, without the expected output it's a guess but hopefully you can massage to suit:
$ cat tst.awk
BEGIN { FS="[:= ]" }
$1 == "Chosen" {
    if ( $5 in avail ) {
        availCnt++
        str = ""
    }
    else {
        notAvailCnt++
        str = " not"
    }
    userCnt++
    printf "User %s chose %s which was%s availablen", $4, $5, str
    delete avail
    next
}
NF { avail[$5] }
END {
    printf "%d users chose available %d times and not available %d timesn", userCnt, availCnt, notAvailCnt
}

.
$ awk -f tst.awk file
User 75 chose 1655 which was not available
User 10 chose 1301 which was available
2 users chose available 1 times and not available 1 times

The above will work using any awk in any shell on every UNIX box.

seshoumara · Answer

Try this sed script:
/^Avail/{
    s/[^=]*=([^ ]*).*/1/;H
}
/^Chosen/{
    s/.*=//;G;h;x;y/n/,/
    s/,/ is ab in /;s/$/,/
    /(.*) is.*,1,/s/ab/pre/
    s/(ab|pre) in ,(.*),/1sent in 2/
    p;s/.*//;x
}

Output: custom format (wasn't specified by OP)
sesh@pc:~/unix$ sed -nrf script.sed input.txt
1655 is absent in 1654
1301 is present in 1300,1301,1303

Counting in sed is very difficult. You can pipe grep -c absent for example, to count all events where the user selected an ID not present in the available IDs.

compare available list with chosen ids in shell script

2 Answers

Add your own answers!

Ask a Question