TransWikia.com

Fill missing fields with values from the line below

Unix & Linux Asked by Joni on December 10, 2020

I have a semicolon-separated data file, where only a few lines have the "complete" dataset. They are at the end of the block of lines to which the dataset applies. I want to add data from this complete filled row to the rows above, using a shell script (or a similar command-line tool).

For example, let’s say the file I have contains the following data:

86540701
86951202
86262402
86509002
86770802
86459902
86301002
86485102
86556002;Vivo Y11;1630000;NULL;;;
86447404
86161405
86388604
86106105
86426405;Xiaomi Redmi 8A Pro (Redmi 8A Dual);1465000;4;;;

I want to be able to find the complete lines and substitute that data into the incomplete lines above, like so:

86540701;Vivo Y11;1630000;NULL;;;
86951202;Vivo Y11;1630000;NULL;;;
86262402;Vivo Y11;1630000;NULL;;;
86509002;Vivo Y11;1630000;NULL;;;
86770802;Vivo Y11;1630000;NULL;;;
86459902;Vivo Y11;1630000;NULL;;;
86301002;Vivo Y11;1630000;NULL;;;
86485102;Vivo Y11;1630000;NULL;;;
86556002;Vivo Y11;1630000;NULL;;;
86447404;Xiaomi Redmi 8A Pro (Redmi 8A Dual);1465000;4;;;
86161405;Xiaomi Redmi 8A Pro (Redmi 8A Dual);1465000;4;;;
86388604;Xiaomi Redmi 8A Pro (Redmi 8A Dual);1465000;4;;;
86106105;Xiaomi Redmi 8A Pro (Redmi 8A Dual);1465000;4;;;
86426405;Xiaomi Redmi 8A Pro (Redmi 8A Dual);1465000;4;;;

4 Answers

This is a task where we can use tac to parse the file in reversed order:

tac file | awk -F';' 'NF > 1 {p = substr($0,index($0,FS))} {print $1 p}' | tac

So, we don't store any lines, but we print after reading each of them.

When NF > 1 we store the substring from the first FS to end of line for future use.

Correct answer by thanasisp on December 10, 2020

Another awk-based solution using a double-pass approach (requires GNU awk or nawk for the gensub() function):

awk -F';' 'FNR==NR{if (NF>1) data[++i]=gensub(/^[^;]+/,"","1");next}
           {if (NF==1) $0=$0 data[j+1]; else j++;} 1' input.csv input.csv

This will scan the file twice. The first time, it creates an array of "data parts" of those lines that contain more than one field. The second time, it substitutes the data part where it is missing, and increases the array counter every time it encounters a "complete" line so that the next data part is substituted for the following lines.

Answered by AdminBee on December 10, 2020

Using GNU sed with the extended regex mode turned ON -E

  • Store records in hold and wait for the semicolon record when the action begins.
  • When we encounter the semicolon line, the merging process starts wherein the last portion of the pattern space(= semicolon line) is appended to the first portion of the pattern space and the first portion printed and stripped off. This continues till we exhaust the pattern space.

$ sed -Ee '/n/ba
    H;/;/!d;z;x;D;:a
    s/n(.*n)?[^;]+(;.*)/2&/
    P;/n.*n/D;s/.*n//
' file

$ perl -lne '$, = ";";
    push(@A,$_),next if !/;/;
    my $a = s/.*?;//r;
    print $_, $a for splice @A;
    print;
' file

Answered by guest_7 on December 10, 2020

Using sed:

sed -E '
    /;/!{ :a N;/;/!{ s/n/-/;ta; }; };
    /;/ { s/n/-/; };
    :c s/([^-]*)-([^;]*)(;.*)$/13n23/; tc' infile

Answered by αғsнιη on December 10, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP