TransWikia.com

Sort bash output by a number extracted from a specific column

Unix & Linux Asked on December 17, 2021

I am new to bash and I am struggling with the following task:
The output from UGE looks like this:

mgenkin@bamdev2:~/projects/BrainFlowUtilities/BrainFlowSimulations$ qstat

job-ID     prior   name       user         state submit/start at     queue                          
jclass                         slots ja-task-ID 
------------------------------------------------------------------------------------------- 
-----------------------------------------------------
  5247 0.51599 Genkin_Fit mgenkin      r     07/21/2020 16:40:21 comp.q@bam11                                                     16 1
  5247 0.51599 Genkin_Fit mgenkin      r     07/21/2020 16:40:21 comp.q@bam17                                                     16 2
  5247 0.51599 Genkin_Fit mgenkin      r     07/21/2020 16:40:21 comp.q@bam13                                                     16 3
  5247 0.51599 Genkin_Fit mgenkin      r     07/21/2020 16:40:21 comp.q@bam05                                                     16 4
  5247 0.51599 Genkin_Fit mgenkin      r     07/21/2020 16:40:21 comp.q@bam08                                                     16 5
  5247 0.51599 Genkin_Fit mgenkin      r     07/21/2020 16:40:21 comp.q@bam15                                                     16 6
  5247 0.51599 Genkin_Fit mgenkin      r     07/21/2020 16:40:21 comp.q@bam21                                                     16 7
  5247 0.51599 Genkin_Fit mgenkin      r     07/21/2020 16:40:21 comp.q@bam27                                                     16 8
  5247 0.51599 Genkin_Fit mgenkin      r     07/21/2020 16:40:21 comp.q@bam12                                                     16 9
  5247 0.51599 Genkin_Fit mgenkin      r     07/21/2020 16:40:21 comp.q@bam22                                                     16 10
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam22                                                      1 1
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam22                                                      1 2
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam27                                                      1 3
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam22                                                      1 4
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam20                                                      1 5
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam27                                                      1 6
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam22                                                      1 7
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam20                                                      1 8
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam27                                                      1 9
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam17                                                      1 10
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam15                                                      1 11
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam01                                                      1 12
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam20                                                      1 13
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam27                                                      1 14
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam17                                                      1 15
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam15                                                      1 16
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam01                                                      1 17
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam20                                                      1 18
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam27                                                      1 19
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam23                                                      1 20
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam17                                                      1 21
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam15                                                      1 22
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam01                                                      1 23
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam20                                                      1 24
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam27                                                      1 25
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam23                                                      1 26
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam26                                                      1 27
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam17                                                      1 28
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam15                                                      1 29
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam03                                                      1 30
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam01                                                      1 31
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam28                                                      1 32
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam20                                                      1 33
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam27                                                      1 34
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam23                                                      1 35
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam26                                                      1 36
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam17                                                      1 37
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam15                                                      1 38
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam03                                                      1 39
 30995 0.50020 Genkin_Fit mgenkin      r     07/23/2020 13:12:08 comp.q@bam01                                                      1 40

I want to sort the output based on the number in queue column, so the first row should be with q@bam01 and so on.

I don’t understand how to it with grep/awk

One Answer

To sort by column 8 (the "queue" column), considering only the numeric portion, and assuming that the leading text is always "comp.q@bam" (10 characters), and there's only one space before that field, and to sort them numerically, you could use:

qstat | head -n 2
qstat | sed 1,2d | sort -k8.12,8.14n

The two separate calls to qstat are to print the two header lines first (so that they're not sorted with the data), then to sort the actual data (removing those first two header lines). The sort is "keyed" (-k) on field 8, starting at position 12 and ending at position 14, with a numeric sort. The key definition here counts the leading space before the "queue" field as position #1, so the actual numbers (in your sample) start at position 12. If your output could have longer numeric fields, adjust the ending range (8.14) upwards.

If this was something you wanted to keep around as something reusable, you could create a function:

function sortqstat() {
  qstat | head -n 2
  qstat | sed 1,2d | sort -k8.12,8.14n
}

Since the numeric portion appears to be zero-padded, you could try a simpler variation -- just tell sort to just sort field 8 generically:

qstat | sort -k8,8

This simplistic sort moves the headers around; to keep them separate, use separate calls like above. In this simplification, duplicate queue names will sort together, then numerically within them -- and not all strictly numerically if there are different queue names.


glenn jackman commented with an improvement -- a way to parse the output with only one call to qstat; I've modified their idea slightly, to this:

qstat | { IFS= read -r header1; 
          IFS= read -r header2; 
          printf "%sn" "$header1" "$header2"; 
          sort -k8,8; }

This opens a pipe from qstat to a command group (surrounded in braces {}); that command group reads the first two lines, into the variables header1 and header2, then prints those header lines. Since those lines are now gone from the input, the subsequent sort command has only the data left to sort. I found it more obvious to explicitly read the two header lines, but you could do a simple "read and print" twice, or with a loop.

Answered by Jeff Schaller on December 17, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP