TransWikia.com

Python regex extract strings between matching strings, including matching strings

Stack Overflow Asked by sundar_ima on August 22, 2020

I am trying to extract a text between two strings including the anchored ones. The file content is simplified as variable like this:-

variable = '70026 TTBB 70128 70026 00020 01006 ' 
           '11925 04300 22919 03903 33911 00114 ' 
           '44880 02233 55834 00227 66806 02056 ' 
           '77788 00647 88771 00661 41414 /////=' 
           'PPBB 70128 70026 90001 02512 01510 ' 
           '03013 90234 05012 04022 04521 90567 '
           '04533 04025 03023 9089/ 02526 02525 '
           '91246 02022 01521 9535/ 08510 04006=' 
           'TTAA 70121 70026 99020 01006 02512 ' 
           '00171 00301 03014 92793 04300 05014 ' 
           '85472 00627 04029 70025 03947 02027 ' 
           '31313 42408 81101  03026='

What I would like to get is that strings between TT to = (including these anchores) and save all matching strings as list. The expected output is:-

['TTBB 70128 ... 88771 41414 /////=', 'TTAA 70121 ... 42408 81101  03026=']

What I tried is:-

print(re.findall(r'TT(.*?)=', variable))

Which gives me close to what I want as shown below:-

['BB 70128 ... 88771 41414 /////', 'AA 70121 ... 42408 81101  03026']

As you can see above, the match strings are not included. So how do I tell re to include TT and = in the result.

One Answer

If I understood correctly, you need to group the TT and the = too:

print(re.findall(r'(TT.*?=)', variable))

Correct answer by TJR on August 22, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP