TransWikia.com

Splitting a list (?)

Stack Overflow Asked by SuperAnnuated on December 27, 2020

I’ve been searching for a while and I think I may have built the block wrong, but I’m hoping there is a simple solution. I need to break apart a list and every solution I could think of has failed, (limited knowledge). My code is built to look for specific words within the text and pulling the section that text is in, I am also adding the filename that the text was found in. However, this is all to the same list!


for filename in os.scandir(directory):
    if filename.path.endswith(".txt"):
        f = open(filename, encoding = 'utf-8')
        lines = f.readlines()
        for line in lines:
            if pattern.search(line) != None:
                list.append((filename.name, line.rstrip('n')))
                
        continue
    else:
        continue

when this prints it looks like:

[(‘AEE_0000018654_10Q_20200331_Item1A_excerpt.txt’, ‘In 2019, Ameren Missouri entered into a build-transfer agreement to acquire, after construction, an up-to 300-megawatt wind generation facility. In 2018, Ameren Missouri entered into a build-transfer agreement to acquire, after construction, an up-to 400-megawatt wind generation facility. Unless relevant regulations are modified by the IRS or applicable legislation is enacted by Congress to include an extension of the December 31, 2020 in-service date criteria, if any portion of these facilities is completed ‘), (‘AEE_0000018654_10Q_20200331_Item2_excerpt.txt’, ‘an up-to 400-megawatt wind generation facility. These two agreements are subject to customary contract terms and conditions. The two build-transfer acquisitions collectively represent $1.2 billion of capital expenditures and would support Ameren Missouri’s compliance with the Missouri renewable energy standard. Ameren Missouri and the developers continue to monitor the impact to each project schedule. To date, neither developer has reported to Ameren Missouri that the projects will not be completed in 2020. Ameren Missouri expects the up-to 400-megawatt project to be placed in-service by the end of 2020. However, at this time, due to manufacturing, shipping, and other supply chain issues, and based on Ameren Missouri’s discussions with the developer, Ameren Missouri expects that a portion of the up-to 300-megawatt project, representing approximately $100 million of investment, could be placed in-service in the first quarter of 2021.’)]

So, is there a way I can split this up so that the file name is in a separate list? I would like to use –

import pandas
df = pandas.DataFrame(data={"col1": filename, "col2": list})
df.to_csv("./SECParse.csv", sep=',',index=False)

but so far I am unable to break up this list I’ve created.

Any help?

One Answer

Since you already have a list of tuples in the form (filename,text), I think you can just call

pd.DataFrame(ls,columns=['filename','text'])

where ls is the list you generated from your for loop.

Output should look like this:

    filename                                        text
0   AEE_0000018654_10Q_20200331_Item1A_excerpt.txt  In 2019, Ameren Missouri entered into a build-...
1   AEE_0000018654_10Q_20200331_Item2_excerpt.txt   an up-to 400-megawatt wind generation facility...

Answered by Jeff on December 27, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP