TransWikia.com

how to check validation of csv file with two fields

Unix & Linux Asked on December 12, 2021

I have simple csv file that should contain only two non-empty fields as the following

This is example of right csv file

$ more file.csv    
why_we_need_help,log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,log_low=53687091200
.
.
.

The target is to check if the csv file contains only two non-empty fields.

I start with the following awk to check if file has only two fields

awk 'BEGIN{FS=OFS=","} NF!=2{print "not enough fields" }' file.csv

But it does not give “not enough fields” in this example below, which is not OK.

why_we_need_help,log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,

Example of other wrong csv files:

why_we_need_help,,log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200,
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,log_low=53687091200,

or

why_we_need_help log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,,

5 Answers

Inputfile

 cat op.txt 
why_we_need_help,log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,log_low=53687091200
look_on_the_room, 
   ,ajay 

Awk command

awk -F "," 'NF == "2" {print $0}' filename | sed "s/,/ /g"| sed -n '/s{2,}/!p'| awk '{gsub(" ",",",$0);print}'

output

why_we_need_help,log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,log_low=53687091200

Python

#!/usr/bin/python
import re
u=re.compile(r' {2,}')
k=open('filename','r')
for i in k:
    q=re.sub(","," ",i)
    if not  re.search (u,q):
        print q.replace(" ",",").strip()

output

why_we_need_help,log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,log_low=53687091200

Answered by Praveen Kumar BS on December 12, 2021

Here's a small function that uses grep. Its exit code will be 0 when no lines are invalid and will be 1 if at least 1 line is invalid (in which case, the first invalid line is printed and processing is aborted - no further lines are checked).

The regexp used means at the beginning of the line, one or more characters that aren't a comma, followed by 1 comma, followed by one or more characters that aren't a comma, and then nothing else.

lines_are_valid() {
  grep -E -m1 -v '^[^,]+,[^,]+$' && return 1 || return 0
}

How to use it:

cat myFile | lines_are_valid

More examples:

echo 'this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,' 
| lines_are_valid 
  && echo "All lines OK" 
  || echo "Invalid line found, see above"

look_on_the_room,

Invalid line found, see above

echo 'this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,aaa' 
| lines_are_valid 
  && echo "All lines OK" 
  || echo "Invalid line found, see above"

All lines OK

echo 'this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,,
also wrong,' 
| lines_are_valid 
  && echo "All lines OK" 
  || echo "Invalid line found, see above"

look_on_the_room,,

Invalid line found, see above

echo 'this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,,asdfasdf
also wrong,' 
| lines_are_valid 
  && echo "All lines OK" 
  || echo "Invalid line found, see above"

look_on_the_room,,asdfasdf

Invalid line found, see above

In case you want to show all invalid lines:

show_all_invalid_lines() {
  grep -E -v '^[^,]+,[^,]+$' && return 1 || return 0
}

Answered by Elifarley on December 12, 2021

I'm not sure but I THINK what you're looking for is:

awk -F',' 'NF!=2 || /^,|,$/{print "bad:", NR | "cat>&2"; exit 1}' file

which could be improved to report the specific error(s) on the line:

awk -F',' '
    NF<2 { err="too few fields" }
    NF>2 { err="too many fields" }
    /^,|,$/ { err=(err == "" ? "" : err " and ") "empty fields" }
    err != "" { print err, "at line", NR | "cat>&2"; exit 1 }
' file

or if you want all errors on all lines found at once:

awk -F',' '
    NF<2 { err="too few fields" }
    NF>2 { err="too many fields" }
    /^,|,$/ { err=(err == "" ? "" : err " and ") "empty fields" }
    err != "" { print err, "at line", NR | "cat>&2"; err=""; f=1 }
    END { exit f }
' file

Answered by Ed Morton on December 12, 2021

Another awk option is

awk 'BEGIN{FS=OFS=","}NF!=2||$1==""||$2==""{print "Not enough fields";exit 5}' file.csv

It checks explicitly if any of the two fields is empty. If so, it prints the message and immediately exits with error code 5 (this number is arbitrary, choose the one you like most).

Answered by Quasímodo on December 12, 2021

Try this:

awk 'BEGIN{FS=OFS=","} f{skip} NF!=2||!length($1)||!length($2){f=1} END{if (f) {print "File contains malformed lines"; exit 1}}' file.csv

If will set a flag f whenever a file doesn't contain two ,-separated fields or any of the two required fields is empty. In the end, it prints a message if the flag was set while parsing the file, and exits with error code 1 (as per your request).

The first rule skips parsing the line if the flag was already set, to speed up the process, since you only want to know if there is any one malformed line - so once such a line was found, we know that the file is malformed and don't need to consider the rest of the file.

In case you want to know how many lines were malformed, this small change would print it:

awk 'BEGIN{FS=OFS=","} NF!=2||!length($1)||!length($2){f++} END{if (f) {printf("File contains %d malformed line(s)n",f); exit 1}}' file.csv

Answered by AdminBee on December 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP