TransWikia.com

Find filenames that contain number

Unix & Linux Asked by user979974 on November 21, 2021

I would like to find file names which contain a number and list them in a range of numbers. For example, in my directory I have:
**

Ion_001_rawlib.bam
Ion_002_rawlib.bam
Ion_003_rawlib.bam
Ion_004_rawlib.bam
Ion_005_rawlib.bam
...
Ion_020_rawlib.bam

**

and I want to list only Ion filenames from 003 to 005. I tried to do something like that:

find -name '*Ion_*[3-5]*rawlib.bam'

but it doesn’t produce the effect expected. Do you have any idea if it can be performed?
Thanks.

One Answer

With the zsh shell, you can do:

print -rC1 Ion_<3-5>_rawlib.bam

Where <x-y> is a glob operator that matches on textual decimal representations of positive integer numbers within the given range (from x to y, included).

Recursively:

print -rC1 -- **/Ion_<3-5>_rawlib.bam

(add (D) if you also want to look for those files in hidden folders, or (N) if you don't want to consider it an error when there's no matching file).

With find implementations that support a -regex predicate, you can do:

LC_ALL=C find . -regex '.*/Ion_0*[345]_rawlib.bam'

(matches for file paths that are 0 or more (*) bytes (. with LC_ALL=C) followed by /Ion_ followed by 0 or more (*) 0s, followed by either one of the 3, 4 or 5 characters followed by rawlib.bam).

Here, it's relatively easy for a 3..5 range, but it would become much more painful for ranges like 78..123 for instance (and you'd run into compatibility issues as the few find implementations that support -regex use different formats of regexps there).

Standard find only supports -name and -path for matching on file names and it's done with basic shell wildcards as opposed to regular expressions but wildcards don't have the equivalent of the * regexp operator (0 or more of the preceding atom), its * operator is the equivalent of regexp .* (0 or more characters), so Ion_*[3-5]_rawlib.bam would match on Ion_9994_rawlib.bam for instance as * matches on 999.

In this simple case however, you could do it using several patterns and negation such as:

LC_ALL=C find . -name 'Ion_*[345]_rawlib.bam' 
              ! -name 'Ion_*[!0]*?_rawlib.bam'

Non-recursively:

LC_ALL=C find . ! -name . -prune 
    -name 'Ion_*[345]_rawlib.bam' 
  ! -name 'Ion_*[!0]*?_rawlib.bam'

To find files that contain decimal representations of integer numbers x to y anywhere in the name, you need a pattern that matches that range (like zsh's <x-y>) but also make sure that pattern is not surround by other digits. For instance foo305.txt does contain 3, 05 and 5, all of which match <3-5>.

In zsh, that would be:

print -rC1 -- (|*[^0-9])<3-5>(|[^0-9]*)

That is <3-5> (which matches, 3, 03, 003...) following either nothing or a string ending in a non-digit and followed by either nothing or a string starting with a non-digit.

With BSD find:

LC_ALL=C find -E . -regex '.*/([^/]*[^0-9])?0*[3-5]([^0-9][^/]*)?'

With GNU find, same, but replace -E . with . -regextype posix-extended.

With busybox find (though depends on how it was compiled):

busybox find . -regex '.*/([^/]*[^0-9])?0*[3-5]([^0-9][^/]*)?'

Another approach is to use find to report the list of files, but use more advanced languages like perl to filter that list:

find . -print0 | perl -l -0ne '
  if (m{[^/]*z}) {
    for $n ($& =~ /d+/g) {
      if ($n >= 3 && $n <= 5) {
        print;
        next LINE;
      }
    }
  }'

Here, using perl to extract all the sequences of decimal digits from the basename of each file, and outputting the files if at least one of those sequences of digits represent a number in the 3..5 range.

Answered by Stéphane Chazelas on November 21, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP