TransWikia.com

run a script in multiple folders in parallel

Unix & Linux Asked by user233520 on December 11, 2020

I have several sub-directories within on high level directory. Each sub-directory has several files and a for loop shell script. The same for loop script is present in each sub-directory. I want to go into each sub-directory and run the for loop script in parallel in several terminals.
I tried this but it seems to do serially (one after another) but I want run all of them in parallel.

find dir_* -type f -execdir sh for_loop.sh {} ;

5 Answers

Assuming this does the right thing - only in serial:

find dir_* -type f -execdir sh for_loop.sh {} ;

Then you should be able to replace that with:

find dir_* -type f | parallel 'cd {//} && sh for_loop.sh {}'

To run it in multiple terminals GNU Parallel supports tmux to run each command in its own tmux pane:

find dir_* -type f | parallel --tmuxpane 'cd {//} && sh for_loop.sh {}'

It defaults to one job per CPU core. In your case you might want to run one more job than you have cores:

 find dir_* -type f | parallel -j+1 --tmuxpane 'cd {//} && sh for_loop.sh {}'

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

For security reasons you should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || 
   fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 67bd7bc7dc20aff99eb8f1266574dadb
12345678 67bd7bc7 dc20aff9 9eb8f126 6574dadb
$ md5sum install.sh | grep b7a15cdbb07fb6e11b0338577bc1780f
b7a15cdb b07fb6e1 1b033857 7bc1780f
$ sha512sum install.sh | grep 186000b62b66969d7506ca4f885e0c80e02a22444
6f25960b d4b90cf6 ba5b76de c1acdf39 f3d24249 72930394 a4164351 93a7668d
21ff9839 6f920be5 186000b6 2b66969d 7506ca4f 885e0c80 e02a2244 40e8a43f
$ bash install.sh

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Answered by Ole Tange on December 11, 2020

Probably the perfect tool for this is GNU Parallel:

parallel ::: dir_*/for_loop.sh

GNU Parallel not only runs each job in parallel, but also it demultiplexes their output so they won't interfere with each other.

From its man page:

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input into blocks and pipe a block into each command in parallel.

If you use xargs and tee today you will find GNU parallel very easy to use as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel.

GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.

Answered by dr_ on December 11, 2020

You should be passing on find's output to xargs, running in parallel mode:

find dir_*/ -type f -name for_loop.sh -print0 | xargs -0 -r -n 1 -P 3 -t sh

We are asking find here to find all files with names of for_loop.sh recursively under the directories beginning with the names dir_ and pass them on to xargs, a file at a time, in parallel mode of running no more than 3 processes at any given time.

Use is made of the null delimiter in printing filenames by find and splitting them on nulls by xargs.

Answered by user218374 on December 11, 2020

you can do from your top level directory

for D in `find . -type d -maxdepth 1`
do 
     $D/<yourScriptName>.sh &
done

the "&" is to run them in the background

Answered by M4rty on December 11, 2020

find won't do that for you.

create a skript, locate your for_loop.sh scripts and execute them, like so:

#!/bin/bash

for theScript in $(find dir_* -name for_loop.sh); do
  "$theScript" &
done

if the script has to be run inside the sub-dir, try to cd into before, maybe like cd $(dirname "$theScript") && . $(basename "$theScript").

my examples are not tested in detail and not error-tolerant ...

Edit 1:

As Sato Katsura commented correctly, the script above breaks if there are spaces in the directory name.

So I changed to loop to read:

#!/bin/bash
find dir_* -name for_loop.sh | while IFS= read -r theScript; do
  "$theScript" &
done

Answered by ChristophS on December 11, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP