TransWikia.com

regex to match question sentences in long text

Stack Overflow Asked by Duke Wellington on December 29, 2020

I have a long text in form of a string.

This text includes a lot of questions that are at the same time the headers of sections.

These headers always start with a number+dot+whitespace character combination and end with a question mark, I am trying to extract these strings.

This is what I’ve got so far: longString.match(/d.s+[a-zA-Z]+s\?/g).
Sure enough this doesn’t work.

One Answer

In your example you use [a-zA-Z]+, but you might extend that to matching 1 or more word characters using w+

This part at the end of the pattern s\? matches an expected whitespace char followed by an optional backslash.


To match multiple words, you can optionally repeat the pattern to match a word preceded by 1 or more whitespace characters.

You one option is to use

d.s+w+(?:s+w+)*s*?

Explanation

  • d. Match a single digit (for 1 or digits use d+)
  • s+w+ Match a . and 1+ whitspace chars and 1+ word chars
  • (?:s+w+)* Optionally repeat 1+ whitspace chars and 1+ word chars
  • s*? Match 0+ whitespace chars and a question mark.

Regex demo

A broader match might be matching at least a single time any char except a question mark or whitespace char after the digit, dot and whitespace:

d.s+[^s?]+(?:s+[^s?]+)*?

Regex demo

Answered by The fourth bird on December 29, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP