Fill missing fields with values from the line below

Question

I have a semicolon-separated data file, where only a few lines have the "complete" dataset. They are at the end of the block of lines to which the dataset applies. I want to add data from this complete filled row to the rows above, using a shell script (or a similar command-line tool).
For example, let's say the file I have contains the following data:
86540701
86951202
86262402
86509002
86770802
86459902
86301002
86485102
86556002;Vivo Y11;1630000;NULL;;;
86447404
86161405
86388604
86106105
86426405;Xiaomi Redmi 8A Pro (Redmi 8A Dual);1465000;4;;;

I want to be able to find the complete lines and substitute that data into the incomplete lines above, like so:
86540701;Vivo Y11;1630000;NULL;;;
86951202;Vivo Y11;1630000;NULL;;;
86262402;Vivo Y11;1630000;NULL;;;
86509002;Vivo Y11;1630000;NULL;;;
86770802;Vivo Y11;1630000;NULL;;;
86459902;Vivo Y11;1630000;NULL;;;
86301002;Vivo Y11;1630000;NULL;;;
86485102;Vivo Y11;1630000;NULL;;;
86556002;Vivo Y11;1630000;NULL;;;
86447404;Xiaomi Redmi 8A Pro (Redmi 8A Dual);1465000;4;;;
86161405;Xiaomi Redmi 8A Pro (Redmi 8A Dual);1465000;4;;;
86388604;Xiaomi Redmi 8A Pro (Redmi 8A Dual);1465000;4;;;
86106105;Xiaomi Redmi 8A Pro (Redmi 8A Dual);1465000;4;;;
86426405;Xiaomi Redmi 8A Pro (Redmi 8A Dual);1465000;4;;;

thanasisp · Accepted Answer

This is a task where we can use tac to parse the file in reversed order:
tac file | awk -F';' 'NF > 1 {p = substr($0,index($0,FS))} {print $1 p}' | tac

So, we don't store any lines, but we print after reading each of them.
When NF > 1 we store the substring from the first FS to end of line  for future use.

AdminBee · Answer

Another awk-based solution using a double-pass approach (requires GNU awk or nawk for the gensub() function):
awk -F';' 'FNR==NR{if (NF>1) data[++i]=gensub(/^[^;]+/,"","1");next}
           {if (NF==1) $0=$0 data[j+1]; else j++;} 1' input.csv input.csv

This will scan the file twice. The first time, it creates an array of "data parts" of those lines that contain more than one field. The second time, it substitutes the data part where it is missing, and increases the array counter every time it encounters a "complete" line so that the next data part is substituted for the following lines.

guest_7 · Answer

Using GNU sed with the extended regex mode turned ON -E

Store records in hold and wait for the semicolon record when the action begins.
When we encounter the semicolon line, the merging process starts wherein the last portion of the pattern space(= semicolon line) is appended to the first portion of the pattern space and the first portion printed and stripped off. This continues till we exhaust the pattern space.

$ sed -Ee '/n/ba
    H;/;/!d;z;x;D;:a
    s/n(.*n)?[^;]+(;.*)/2&/
    P;/n.*n/D;s/.*n//
' file

$ perl -lne '$, = ";";
    push(@A,$_),next if !/;/;
    my $a = s/.*?;//r;
    print $_, $a for splice @A;
    print;
' file

αғsнιη · Answer

Using sed:
sed -E '
    /;/!{ :a N;/;/!{ s/n/-/;ta; }; };
    /;/ { s/n/-/; };
    :c s/([^-]*)-([^;]*)(;.*)$/13n23/; tc' infile

Fill missing fields with values from the line below

4 Answers

Add your own answers!

Ask a Question