TransWikia.com

regex with bs4 is splitting the results

Stack Overflow Asked by Edison on February 8, 2021

My regex is producing split results so I have to subscript for a quick fix.

Code

my_url = 'https://www.zoopla.co.uk/for-sale/property/b23/?page_size=100&q=B23&radius=0&results_sort=newest_listings&search_source=refine'

house_listings = page_soup.findAll("div", {"class":"listing-results-right clearfix"})

listings = house_listings[3] # item 3 for prototyping

house_type = re.findall('(?:(?!.for).)*', str(listings.h2.a.text))

print(house_type)
# `['4 bed detached house', '', 'for sale', '']`

Fix

house_type = re.findall('(?:(?!.for).)*', str(listings.h2.a.text))[0]
print(house_type)
# 4 bed detached house

But beyond that, I need a new regex for better matching.

Desired Match
start from the word after ‘bed’ (minus the following space) and ignore the "for sale" portion.
e.g. results: detached house, terrace house, semi-detached house, flat, maisonette.

Source
https://www.zoopla.co.uk/for-sale/property/b23/?page_size=100&q=B23&radius=0&results_sort=newest_listings&search_source=refine

One Answer

This should be all you need:

(?<=bed ).*(?= for)

Demo

Answered by jdaz on February 8, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP