TransWikia.com

how can extract single character before the number in a string

Unix & Linux Asked by Nas Ahmet on December 26, 2021

I have a list that contains the names of hosts available in our company.

For example :

  • gswast03
  • gkjbossp1
  • frdwop04

The last characters t, p, q before the number represent the environment of the host. All strings end with a number.

  • t for test
  • p for prod
  • q for qas

I need a reqular expression to obtain the character that is leading to number at the and of the host name (I need a solution for a single string not all list in a file)

For example :

gswast03

In this string, I just want to extract t character.

Thank you in advance.

4 Answers

This will work using any sed in any shell on every UNIX box:

$ sed 's/.*([^0-9]).*/1/' file
t
p
q

The above was run against this input file:

$ cat file
gswast03
gkjbossp1
frdwoq04

Answered by Ed Morton on December 26, 2021

With plain bash

shopt -s extglob
for host in "${hosts[@]}"; do
  tmp=${host%%+([[:digit:]])}   # strip the trailing digits
  echo "$host => ${tmp: -1}"    # extract the last character
done
gswast03 => t
gkjbossp1 => p
frdwop04 => p

Or with regex matching:

for host in "${hosts[@]}"; do
  if [[ $host =~ ([^[:digit:]])[[:digit:]]+$ ]]; then
    echo "$host => ${BASH_REMATCH[1]}"
  fi
done

Answered by glenn jackman on December 26, 2021

Since you've tagged the question with grep and sed, I assume that list of string is expressed as one item per line of some text input.

Then:

sed -n 's/^.*([^[:digit:]])[[:digit:]]{1,}$/1/p' < that-input

or (assuming GNU grep or compatible built with perl-like regexp support):

grep -Po 'D(?=d+$)' < that-input

would output the non-digit character that precede trailing digits in lines that end in a non-digit followed by 1 or more digit.

Both use regexps to do the matching but sed uses basic regular expressions while grep -P uses perl-like regular expressions.

Some sed implementations support -P as well, but not the most common ones. Several support -E for extended regular expressions which is yet another dialect of regular expressions. With those:

sed -E -n 's/^.*([^[:digit:]])[[:digit:]]+$/1/p' < that-input

Or you could just use perl itself:

perl -lne 'print $1 if /(D)d+$/' < that-input

(beware perl works at byte-level by default instead of character level, see the -C option to tell it to interpret the input as UTF-8 characters, or -Mopen=locale to decode/encode input/output as per the locale's encoding like grep/sed typically do).

or pcregrep, the sample grep implementation that comes with libpcre (the library used by GNU grep -P):

pcregrep -o1 '(D)d+$' < that-input

Answered by Stéphane Chazelas on December 26, 2021

.*([pqt])d+$

Matches any characters, followed by a p, q or t and one or more digits. The match group is the single letter you're interested in.

Answered by eleventyone on December 26, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP