TransWikia.com

how to eliminate checking of alphanumeric character from middle of string python regex

Stack Overflow Asked by shaikh akbar on December 27, 2021


I want to eliminate "CC1009" and need to check remaining which include.
"ABC" "Tx" "XYZ" "20200506"(date) .*
This 4 value is mendatory

So inshort from this text
‘ABCTxXYZCC100920200506050003.xml’ = ‘ABCTxXYZ CC1009 20200506050003.xml’
i want my expression should dont check anything like this – CC1009
after XYZ and before 20200506 (date)

This value CC1009 is dynamic it is available in some file and also not available in some file also length is not defined.
Please help me accordingly

I tried in below code but it is not working (python regex)

import re

file_name = 'ABCTxXYZCC100920200506050003.xml'
Split Example = 'ABC  Tx  XYZ  CC1009  20200506  050003.xml'
 
RegexPattern = re.compile(r'^(ABC|CDE)+(Tx|Fm)+(XYZ)+([a-zA-Z0-9]*)+([0-9]{4})+(0[1-9]|1[012])+(0[1-9]|[12][0-9]|3[01])+(.+)$')
pattern_check = RegexPattern.match(file_name)

if pattern_check:
    print('Match')
else:
    print('No Match')

One Answer

You could remove all the + characters after the capturing groups. As CC1009 is dynamic, you can use the character class and make it non greedy [a-zA-Z0-9]*? to prevent matching too much digits.

If you want the date to be a whole group, you can create a single capturing group and use a non capturing group (?: inside it for the month and day parts.

^(ABC|CDE)(Tx|Fm)(XYZ)([a-zA-Z0-9]*?)([0-9]{4}(?:0[1-9]|1[012])(?:0[1-9]|[12][0-9]|3[01]))(.+)$

Regex demo | Python demo

Example code

import re

file_name = 'ABCTxXYZCC100920200506050003.xml'
RegexPattern = re.compile(r'^(ABC|CDE)(Tx|Fm)(XYZ)([a-zA-Z0-9]*?)([0-9]{4}(?:0[1-9]|1[012])(?:0[1-9]|[12][0-9]|3[01]))(.+)$')
pattern_check = RegexPattern.match(file_name)

if pattern_check:
    print('Match')
else:
    print('No Match')
    
print(re.findall(RegexPattern, file_name))
print(re.findall(RegexPattern, "ABCTxXYZ20200506050003.xml"))

Output

Match
[('ABC', 'Tx', 'XYZ', 'CC1009', '20200506', '050003.xml')]
[('ABC', 'Tx', 'XYZ', '', '20200506', '050003.xml')]

Answered by The fourth bird on December 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP