TransWikia.com

Code was working but now doesn't work all of a sudden - Dataframe issues

Stack Overflow Asked by user7041266 on January 9, 2021

I am not very proficient at python but my aim was to extract data from my share dealing website with the intention of analysis further down the line. The below code worked for me once and now i get an error about arrays not being the same length but they already are. this literally worked for me with no modification of the code but now suddenly its not working.

Code and error below:

import requests
import pandas as pd
from bs4 import BeautifulSoup as bs

pd.set_option('display.max_rows', None)


r = requests.get("https://www.somewebsite.co.uk/shares/shares-search-results/a")
soup = bs(r.content, features = "lxml")
a = soup.find('ul', {'class':'list-unstyled list-indent'}).find_all("strong")

r3 = requests.get("https://www.somewebsite.co.uk/shares/shares-search-results/b")
soup = bs(r3.content, features = "lxml")
b = soup.find('ul', {'class':'list-unstyled list-indent'}).find_all("strong")

r5 = requests.get("https://www.somewebsite.co.uk/shares/shares-search-results/c")
soup = bs(r5.content, features = "lxml")
c  = soup.find('ul', {'class':'list-unstyled list-indent'}).find_all("strong")


header  = a + b + c

r1 = requests.get("https://www.somewebsite.co.uk/shares/shares-search-results/a")
soup = bs(r1.content, features = "lxml")
links = [a['href'] for a in soup.find('ul', {'class' : 'list-unstyled list-indent'}).find_all("a", href=True)]

r4 = requests.get("https://www.somewebsite.co.uk/shares/shares-search-results/b")
soup = bs(r4.content, features = "lxml")
links1 = [a['href'] for a in soup.find('ul', {'class' : 'list-unstyled list-indent'}).find_all("a", href=True)]

r6 = requests.get("https://www.somewebsite.co.uk/shares/shares-search-results/c")
soup = bs(r6.content, features = "lxml")
links2 = [a['href'] for a in soup.find('ul', {'class' : 'list-unstyled list-indent'}).find_all("a", href=True)]

column2 = links + links1 + links2

com_list = []
for b in header[0:]:
    result = b.text.strip()
    com_list.append(result)

com_com = pd.DataFrame({'COMPANY': com_list, 'LINKS': column2})
print(com_com)

The error I get:

Traceback (most recent call last):
  File "hls.py", line 42, in <module>
    com_com = pd.DataFrame({'COMPANY': com_list, 'LINKS': column2})
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 392, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/internals/construction.py", line 212, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/internals/construction.py", line 51, in arrays_to_mgr
    index = extract_index(arrays)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/internals/construction.py", line 317, in extract_index
    raise ValueError('arrays must all be same length')
ValueError: arrays must all be same length

One Answer

Hey you have different number of elements in com_list and column2. They must be same.

check

len(com_list) == len(column2)

Correct answer by HimanshuGahlot on January 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP