TransWikia.com

Dynamically append text to filenames in Bash

Super User Asked on December 28, 2020

I have the following for loop to individually sort all text files inside of a folder (i.e. producing a sorted output file for each).

for file in *.txt; 
do
   printf 'Processing %sn' "$file"
   LC_ALL=C sort -u "$file" > "./${file}_sorted"  
done

This is almost perfect, except that it currently outputs files in the format of:

originalfile.txt_sorted

…whereas I would like it to output files in the format of:

originalfile_sorted.txt 

This is because the ${file} variable contains the filename including the extension. I’m running Cygwin on top of Windows. I’m not sure how this would behave in a true Linux environment, but in Windows, this shifting of the extension renders the file inaccessible by Windows Explorer.

How can I separate the filename from the extension so that I can add the _sorted suffix in between the two, allowing me to easily differentiate the original and sorted versions of the files while still keeping Windows’ file extensions intact?

I’ve been looking at what might be possible solutions, but to me these seem more equipped to dealing with more complicated problems. More importantly, with my current bash knowledge, they go way over my head, so I’m holding out hope that there’s a simpler solution which applies to my humble for loop, or else that someone can explain how to apply those solutions to my situation.

One Answer

These solutions you link to are in fact quite good. Some answers may lack explanation, so let's sort it out, add some more maybe.

This line of yours

for file in *.txt

indicates the extension is known beforehand (note: POSIX-compliant environments are case sensitive, *.txt won't match FOO.TXT). In such case

basename -s .txt "$file"

should return the name without the extension (basename also removes directory path: /directory/path/filenamefilename; in your case it doesn't matter because $file doesn't contain such path). To use the tool in your code, you need command substitution that looks like this in general: $(some_command). Command substitution takes the output of some_command, treats it as a string and places it where $(…) is. Your particular redirection will be

… > "./$(basename -s .txt "$file")_sorted.txt"
#      ^^^^^^^^^^^^^^^^^^^^^^^^^^^ the output of basename will replace this

Nested quotes are OK here because Bash is smart enough to know the quotes within $(…) are paired together.

This can be improved. Note basename is a separate executable, not a shell builtin (in Bash run type basename, compare to type cd). Spawning any extra process is costly, it takes resources and time. Spawning it in a loop usually performs poorly. Therefore you should use whatever the shell offers you to avoid extra processes. In this case the solution is:

… > "./${file%.txt}_sorted.txt"

The syntax is explained below for a more general case.


In case you don't know the extension:

… > "./${file%.*}_sorted.${file##*.}"

The syntax explained:

  • ${file#*.}$file, but the shortest string matching *. is removed from the front;
  • ${file##*.}$file, but the longest string matching *. is removed from the front; use it to get just an extension;
  • ${file%.*}$file, but the shortest string matching .* is removed from the end; use it to get everything but extension;
  • ${file%%.*}$file, but with the longest string matching .* is removed from the end;

Pattern matching is glob-like, not regex. This means * is a wildcard for zero or more characters, ? is a wildcard for exactly one character (we don't need ? in your case though). When you invoke ls *.txt or for file in *.txt; you're using the same pattern matching mechanism. A pattern without wildcards is allowed. We have already used ${file%.txt} where .txt is the pattern.

Example:

$ file=name.name2.name3.ext
$ echo "${file#*.}"
name2.name3.ext
$ echo "${file##*.}"
ext
$ echo "${file%.*}"
name.name2.name3
$ echo "${file%%.*}"
name

But beware:

$ file=extensionless
$ echo "${file#*.}"
extensionless
$ echo "${file##*.}"
extensionless
$ echo "${file%.*}"
extensionless
$ echo "${file%%.*}"
extensionless

For this reason the following contraption might be useful (but it's not, explanation below):

${file#${file%.*}}

It works by identifying everything but extension (${file%.*}), then removes this from the whole string. The results are like this:

$ file=name.name2.name3.ext
$ echo "${file#${file%.*}}"
.ext
$ file=extensionless
$ echo "${file#${file%.*}}"

$   # empty output above

Note the . is included this time. You might get unexpected results if $file contained literal * or ?; but Windows (where extensions matter) doesn't allow these characters in filenames anyway, so you may not care. However […] or {…}, if present, may trigger their own pattern matching scheme and break the solution!

Your "improved" redirection would be:

… > "./${file%.*}_sorted${file#${file%.*}}"

It should support filenames with or without extension, albeit not with square or curly brackets, unfortunately. Quite a shame. To fix it you need to double quote the inner variable.

Really improved redirection:

… > "./${file%.*}_sorted${file#"${file%.*}"}"

Double quoting makes ${file%.*} not act as a pattern! Bash is smart enough to tell inner and outer quotes apart because the inner ones are embedded in the outer ${…} syntax. I think this is the right way.

Another (imperfect) solution, let's analyze it for educational reasons:

${file/./_sorted.}

It replaces the first . with _sorted.. It will work fine if you have at most one dot in $file. There is a similar syntax ${file//./_sorted.} that replaces all dots. As far as I know there's no variant to replace the last dot only.

Correct answer by Kamil Maciorowski on December 28, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP