TransWikia.com

Sorting by date with format dd-Mmm-yyyy from specific file

Unix & Linux Asked by JugaruGabi on January 29, 2021

So I am having the following file:

    Certificate Name,Expiry Date,Type,Certificate file path
    CN=server10.de.go.com:app, 16-Jan-2021, physical, /path/to/file/
    CN=sv999-c.int.de.aaas.internet.go.com:testclient, 31-Oct-2020, vm, /path/to/file/
    CN=server26.de.go.com:app, 06-Feb-2021, physical, /path/to/file/

How can I sort the file to show the earliest date first and the latest last?

Tried with the command below, but it doesn’t do the trick (it sorts only the days part from the date and that is it):

sort -t "," -k 2 file

Also, I need to make sure that the header is not being captured by the sorting command.

A little bit of help here, please?

3 Answers

I like to use msort for its flexibility. Here is a snippet of a demonstration script:

# Utility functions: print-as-echo, print-line-with-visual-space.
pe() { for _i;do printf "%s" "$_i";done; printf "n"; }
pl() { pe;pe "-----" ;pe "$*"; }

pl " Input data file $FILE:"
head $FILE

pl " Modified ( ', ' -> ',') input file t1:"
sed 's/, /,/g' $FILE > t1
head t1

pl " Results, msort:"
{
head -n1
msort -q -l -j -d "," -n 2,2 --comparison-type date --date-format "d-m-y"
} < t1

producing:

-----
 Input data file data1:
Certificate Name,Expiry Date,Type,Certificate file path
CN=server10.de.go.com:app, 16-Jan-2021, physical, /path/to/file/
CN=sv999-c.int.de.aaas.internet.go.com:testclient, 31-Oct-2020, vm, /path/to/file/
CN=server26.de.go.com:app, 06-Feb-2021, physical, /path/to/file/

-----
 Modified ( ', ' -> ',') input file t1:
Certificate Name,Expiry Date,Type,Certificate file path
CN=server10.de.go.com:app,16-Jan-2021,physical,/path/to/file/
CN=sv999-c.int.de.aaas.internet.go.com:testclient,31-Oct-2020,vm,/path/to/file/
CN=server26.de.go.com:app,06-Feb-2021,physical,/path/to/file/

-----
 Results, msort:
Certificate Name,Expiry Date,Type,Certificate file path
CN=sv999-c.int.de.aaas.internet.go.com:testclient,31-Oct-2020,vm,/path/to/file/
CN=server10.de.go.com:app,16-Jan-2021,physical,/path/to/file/
CN=server26.de.go.com:app,06-Feb-2021,physical,/path/to/file/

Run on:

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 5.7.0-3-amd64, x86_64
Distribution        : Debian GNU/Linux bullseye/sid 
bash GNU bash 5.0.18

Part of the flexibility comes from a good design of features, for example from the man page:

   -f,--date-format <date format>
          Permutation of ymd with separators, e.g. y-m-d for international
          date format, m/d/y for American date format, or a permutation of
          yd with separators, e.g. y-d, for day-of-year dates.  All  three
          components  may  be  numbers in any available number system. The
          month field may also be a month name, determined by the same de-
          vices as independent month name fields.

The msort code is available in many repositories, here is some additional information:

msort   sort records in complex ways (man)
Path    : /usr/bin/msort
Version : 8.53
Type    : ELF 64-bit LSB shared object, x86-64, version 1 ( ...)
Help    : probably available with -h,--help
Repo    : Debian GNU/Linux bullseye/sid 
Home    : http://www.billposer.org/Software/msort.html (pm)

Best wises ... cheers, drl

Answered by drl on January 29, 2021

Perl can do date parsing and sorting:

perl -MTime::Piece -F, -ane '
    next if $. == 1;
    push @lines, [$_, Time::Piece->strptime($F[1], " %d-%b-%Y")->epoch];
    END {
        @sorted = sort {$a->[1] <=> $b->[1]} @lines;
        print $sorted[$_]->[0] for (0,-1);
    }
' file
    CN=sv999-c.int.de.aaas.internet.go.com:testclient, 31-Oct-2020, vm, /path/to/file/
    CN=server26.de.go.com:app, 06-Feb-2021, physical, /path/to/file/

Answered by glenn jackman on January 29, 2021

With GNU sort or compatible¹, you can use the M sorting flag for month names:

{
  head -n1
  LC_ALL=C sort -t, -k2.9n -k2.5,2.7M -k2n
} < file

Or to ignore the spacing after the , (in case it's not always exactly one SPC character):

{
  head -n1
  LC_ALL=C sort -t, -k2.8bn -k2.4b,2.6bM -k2n
} < file

(here using the C locale to expect English month names as opposed to those in the language of the user).

Here, we avoid passing the header to sort, but it wouldn't be a problem if we did as the first sort key (year) for that header would evaluate to 0 and would be sorted first. Using head like that works for regular file (or other seekable files) as input, it wouldn't work for pipes (with GNU sed, you could replace head -n1 with sed -u q).

POSIXly, you could do something like:

awk -F',[[:space:]]*' -v months=JanFebMarAprMayJunJulAugSepOctNovDec '
  NR == 1 {print; next}
  {
    split($2, f, "-")
    printf "%04d%02d%02dt%sn", f[3], index(months, f[2]), 
      f[1], $0 | "sort | cut -f2-"
  }' < file

Where we prepend each line (except the first which is printed first unmodified) with 2020xx15, where xx is the position of the month abbreviation inside that months variable (so values 01, 04, 07, 10... not the month numbers, but something that grows with the month number), which are then sorted lexically and removed with cut.

If your awk is mawk (which doesn't support POSIX character classes), you can replace [[:space:]] with [ t] (add other whitespace character to the list as needed).


¹ GNU sort is found on GNU systems and most BSDs, though some BSDs have since replaced it with their own with a compatible API. busybox sort now also supports that M flag (though at least with 1.30.1, it seems buggy when combined with key specifications that include offsets; I find that I need to remove the key end part for it to work here).

Answered by Stéphane Chazelas on January 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP