TransWikia.com

How to get nth result from find?

Unix & Linux Asked by ddlfmbqrc on December 4, 2021

I have access to a distributed computing/server farm with a job scheduler (Slurm) that gives each parallel job an integer ID from 1 to n (I know the value of n, in the example below, n = 10).

I am using find -maxdepth 1 -name '2019 - *' to find the list of file names I want to pass to my program as an argument.

Sample file names:

2019 - Alphabet
2019 - Foo Bar
2019 - Reddit
2019 - StackExchange

The order does not matter. All matching files should only be used once.

This is an example of a "template" script I can use:

#!/bin/bash

# in this case, from i = 1 to i = 10
#SBATCH --array=1-10

# pseudocode begins
    # it is given that filename_array has 10 unique elements
    filename_array="$(find -maxdepth 1 -name '2019 - *')"

    # SLURM_ARRAY_TASK_ID is the value of i, from i = 1 to i = 10
    filename=filename_array[$SLURM_ARRAY_TASK_ID]
# pseudocode ends

./a.out "$filename"

This is more or less what it does (but with each process running in a different computer in parallel):

./a.out "./2019 - Alphabet" &
./a.out "./2019 - Foo Bar" &
./a.out "./2019 - Reddit" &
./a.out "./2019 - StackExchange" &

How can I write a bash script that would run the template script exactly once for each of the file names given by find -maxdepth 1 -name '2019 - *'?

2 Answers

Can you use $SLURM_JOB_NODELIST?

In that case GNU Parallel seems like an obvious solution:

find -maxdepth 1 -name '2019 - *' |
  parallel --slf $SLURM_JOB_NODELIST --wd . ./a.out {}

Answered by Ole Tange on December 4, 2021

Probably using find is a mistake, particularly as you are only interested in files in the current directory. You can just use a shell glob pattern.

#/bin/sh

for f in '2019 - '*
do
    [ -f "$f" ] && ./a.out "$f" &
done

The test for it being a file is for portability. If you are using bash you could use shopt -s nullglob to make a non-matching pattern expand to nothing rather than itself, and so make the loop run zero times rather than one if there are no matching files. However portability is good, and handles cases like directory names which match the pattern.

Apparently what is required is a "template script", but I have limited idea what this means.

Perhaps

#!/bin/bash
# magic string for slurm to run on 10 hosts
#SBATCH --array=1-10

filename_array=( '2019 - '* )
filename=${filename_array[$SLURM_ARRAY_TASK_ID-1]}
./a.out "$filename"

is what is wanted?

Edit: Another requirement change. Support regular expressions for the patterns.

#!/bin/bash
# magic string for slurm to run on 10 hosts
#SBATCH --array=1-10

readarray -d '' filename_array < <( find . -maxdepth 1 -regex '.*2019 -.*' -print0 | sort -z )
filename=${filename_array[$SLURM_ARRAY_TASK_ID-1]}
./a.out "$filename"

Answered by icarus on December 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP