TransWikia.com

How to Split and Extract Characters from A String Using One Expression in Regex

Stack Overflow Asked by Lucas H. Xu on December 18, 2021

Example:

aaa.bbbb.ccc4.ddd1.eee.fff
1112.2223.333.4445.555.6661.7773.8881.999

And how to return ddd and 777 using one expression, where they are always the first 3 characters of last third string between dots.

I know how to do this in two expression:

`[^.]+.[^.]+.[^.]+$`
`^w{3}`

Is there a way to combine them together? And the second expr is applied to not the original but the result of the first expr?

3 Answers

Here is another option:

(?=(.[^.]*){3}$).(.{3})

Where you'd match:

  • (?= - Positive lookahead.
    • (.[^.]*){3} - 1st Capture group to match a literal dot, anything but a dot zero or more times. Repeat capture group three times.
    • $) - End string ancor and close lookahead.
  • . - A literal dot.
  • (.{3}) - 2nd Capture group to capture first three digits after the dot.

Extract from 2nd capture group. Or if you want you could use a non-catpure group and capture from 1st capture group: (?=(?:.[^.]*){3}$).(.{3})

Answered by JvdV on December 18, 2021

You could match the regular expression

(?<=.).{3}(?=[^.]*(?:.[^.]*){2}$)

Start your engine!

The regex engine performs the following operations.

(?<=.)        : positive lookbehind asserts previous
                 char was '.'
.{3}           : match 3 chars
(?=            : begin positive lookahead
  [^.]*        : match 0+ chars other than '.'
  (?:.[^.]*)  : match '.' then 0+ chars other than
                 '.' in a non-capture group
  {2}          : execute non-capture group twice
  $            : assert end of string
)              : end positive lookahead

Another way would be to use the regular expression

(?=.(.{3})[^.]*(?:.[^.]*){2}$)

capturing the desired 3-character string in capture group 1.

Restart engine

(?=            : begin positive lookahead
  .           : match '.'
  (.{3})       : match 3 chars in capture group 1
  [^.]*        : match 0+ chars other than '.'
  (?:.[^.]*)  : match '.' then 0+ chars other than
                 '.' in a non-capture group
  {2}          : execute non-capture group twice
  $            : assert end of string
)              : end positive lookahead

If the match succeeds an empty string at the beginning of the string is matched, but it is the contents of capture group 1 that is of interest.

Answered by Cary Swoveland on December 18, 2021

You could match a dot, and capture 3 characters in a capturing group followed by matching 0+ times any char except a dot till the next dot.

Then match the last 2 parts and assert the end of the string.

.([^.]{3})[^.]*.[^.]+.[^.]+$

Regex demo

If there is nothing preceding, you could either match a dot or assert the start of the string.

(?:^|.)([^.]{3})[^.]*.[^.]+.[^.]+$

Regex demo

Note that a [^.] can also match a space or a newline. Use S to match a non whitespace char.

Answered by The fourth bird on December 18, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP