AnswerBun.com

stripping tabs, newlines, and spaces from string output, but leave one space so that words are not connected

Stack Overflow Asked by AndrewLittle1 on January 3, 2022

I have a list_3, with one element, a string:

[['nnn Headquarters or Regional OfficennnnntttttttttMain Headquarterstttttttnn', 'nnn FoundersnnnnntttttttttThomas Lon Vantttttttnn', 'nnn Founder DiversitynnnnntttttttttN/Atttttttnn', 'nnn Year Foundednnnnnttttttttt2016tttttttnn', 'nnn # of Employeesnnnnnttttttttt1-10tttttttnn', 'nnn Seeking Funding?nnnnntttttttttNo tttttttnn', 'nnn Funding PhasennnnntttttttttN/Atttttttnn'], ['nnn Headquarters or Regional OfficennnnntttttttttMain Headquarterstttttttnn', 'nnn FoundersnnnnntttttttttMacKenzie T Stout,tttttttnn', 'nnn Founder DiversitynnnnntttttttttN/Atttttttnn', 'nnn Year Foundednnnnnttttttttt2020tttttttnn', 'nnn # of Employeesnnnnnttttttttt1-10tttttttnn', 'nnn Seeking Funding?nnnnntttttttttYestttttttnn', 'nnn Funding PhasennnnntttttttttPre-Seedtttttttnn']]

I want to use regex to strip ntr, from the output and return the text in an easy to read format

This is what I have tried:

list_33 = []
for i in list_3:
     string = ''.join(list_3)
     list_33.append(re.sub('s+','', string))
print(list_33)

output:

['HeadquartersorRegionalOfficeMainHeadquarters', 'FoundersThomasLonVan', 'FounderDiversityN/A', 'YearFounded2016', '#ofEmployees1-10', 'SeekingFunding?No', 'FundingPhaseN/A']

This is almost what I need but I would like there to be one space between each word and colon after the first text block from list_3, ie:

['Headquarters or Regional Office: Main Headquarters', 'Founders: Thomas Lon Van', 'Founder Diversity: N/A', 'Year Founded: 2015', '# of Employees 1-10', 'Seeking Funding?: No', 'Funding Phase: N/A']

Any ideas of how I can incorporate both regex functions into one?

Thanks

ps. I know that I don’t need to use a for loop for a list with just one element, but in the future the list will have more elements, I am trying to generalize the code structure using just one input right now.

One Answer

You can navigate through each string in the list and the use re.sub to replace each occurrence of more than 2 white space by a :

>>> import re
>>> lst = ['nnn Headquarters or Regional OfficennnnntttttttttMain Headquarterstttttttnn', 'nnn FoundersnnnnntttttttttThomas Lon Vantttttttnn', 'nnn Founder DiversitynnnnntttttttttN/Atttttttnn', 'nnn Year Foundednnnnnttttttttt2016tttttttnn', 'nnn # of Employeesnnnnnttttttttt1-10tttttttnn', 'nnn Seeking Funding?nnnnntttttttttNo tttttttnn', 'nnn Funding PhasennnnntttttttttN/Atttttttnn']
>>> [re.sub(r'ss+', ': ', word).strip(': ') for word in lst]
['Headquarters or Regional Office: Main Headquarters', 'Founders: Thomas Lon Van', 'Founder Diversity: N/A', 'Year Founded: 2016', '# of Employees: 1-10', 'Seeking Funding?: No', 'Funding Phase: N/A']

Answered by Prem Anand on January 3, 2022

Add your own answers!

Related Questions

onclick is not calling the javascript function

1  Asked on December 15, 2020 by umakanth-pendyala

   

Node-MySQL unable to insert TIME

1  Asked on December 15, 2020 by niclassic

         

How to annotate a value from a related model in Django

2  Asked on December 15, 2020 by roman-safonov

     

A network-related or instance-specific error with C#

1  Asked on December 15, 2020 by nathan-nguyen

 

Where are user deletion logs stored on SAP?

1  Asked on December 15, 2020 by jorge-valentini

   

How can I pass date/text to ?

2  Asked on December 15, 2020 by muska

   

Java Scanner useDelimiter() Method

1  Asked on December 14, 2020 by mnh

 

How to implement negative null floating point regex

2  Asked on December 14, 2020 by frontdev24

   

Cannot build Dockerfile

1  Asked on December 14, 2020 by user1765862

   

Ask a Question

Get help from others!

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP