TransWikia.com

Converting or formating nested list with name of month into new list in python

Stack Overflow Asked by Muhtadi on December 22, 2020

I have a nested list like this:

data = [[[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['tiktok', 'tenaga kesehatan'], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [['kanker'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['jantung'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['jantung'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19', 'covid-19'], 'October'],
 [['covid-19'], 'October'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [[], 'September'],
 [['covid-19', 'covid-19'], 'September'],
 [['jantung'], 'September'],
 [['jantung'], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [[], 'August'],
 [[], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['jantung'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19', 'covid-19'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'July']]

And i want to count the all the token (‘covid-19’,’jantung’… etc) by the name of month so i can get the token frequency by month.

Heres my expected output:

result = [
    ['covid-19',0,0,0,0,0,0,1,19,17,21,0,0],
    ['tiktok',0,0,0,0,0,0,0,0,0,1,0,0],
    ['jantung',0,0,0,0,0,0,0,1,2,2,0,0],
    ['kanker',0,0,0,0,0,0,0,0,0,1,0,0],
    ['tenaga kesehatan',0,0,0,0,0,0,0,0,0,1,0,0],   
]

Note that : '0,0,0,0,0,0,1,19,17,21,0,0' is the order from January to December and the sum of the token from that month.please suggest me a way to convert that nested into the result list.

Any ideas?

5 Answers

Here we go with a possible solution:

import calendar

data = [[[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['tiktok', 'tenaga kesehatan'], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [['kanker'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['jantung'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [[], 'October'],
 [['covid-19'], 'October'],
 [[], 'October'],
 [['jantung'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19'], 'October'],
 [['covid-19', 'covid-19'], 'October'],
 [['covid-19'], 'October'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [[], 'September'],
 [['covid-19', 'covid-19'], 'September'],
 [['jantung'], 'September'],
 [['jantung'], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [[], 'September'],
 [['covid-19'], 'September'],
 [[], 'September'],
 [[], 'August'],
 [[], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['jantung'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'August'],
 [[], 'August'],
 [['covid-19'], 'August'],
 [['covid-19', 'covid-19'], 'August'],
 [['covid-19'], 'August'],
 [['covid-19'], 'July']]

final = []
for el in data:
    if len(el[0]) > 0:
        for key in el[0]:
            if key not in [sub[0] for sub in final]:
                final.append([key] + [0]*12)
            for sub in final:
                if sub[0] == key:
                    sub[list(calendar.month_abbr).index(el[-1][:3])] += 1

print(final)

The output will be:

[['covid-19', 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0, 0], ['tiktok', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['tenaga kesehatan', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['kanker', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['jantung', 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0]]

NOTE: As someone mentioned, however, it might be a good idea to use a different data structure to store the result. Surely a dictionary would be more convenient and would allow you to write a more linear solution.

Correct answer by lorenzozane on December 22, 2020

Solve the same problem with functional programming, perhaps.

from functional import seq 

NONE_KEY = 'NONE'
MONTHS = {
    'January': 1,
    'Feburary': 2,
    'March': 3,
    'April': 4,
    'May': 5,
    'June': 6,
    'July': 7,
    'August': 8,
    'September': 9,
    'October': 10,
    'November': 11,
    'December': 12
}

def reGroupByFirstItem(d):
    if (len(d[0]) > 0):
        return seq(d[0]).map(lambda key: (key, d[1])).to_list()
    else:
        return [(NONE_KEY, d[1])]

def hasKey(l, key):
    return seq(l).filter(lambda x: x[0] == key).len() > 0

def getIndexByKey(ll, key):
    for i in range(len(ll)):
        if ll[i][0] == key:
            return i    

def initList(key):
    l = [0 for x in range(12)]
    l.insert(0, key)
    return l 

def updateList(l, month):
    l[ MONTHS[month] ] += 1
    return l

def updateByKey(ll, key, val):
    i = getIndexByKey(ll, key)
    ll[i] = updateList(ll[i], val)
    return ll

def initListWithValue(key, val):
    l = initList(key)
    return updateList(l, val)

def createNewList(nextItem, current):
    key = nextItem[0]
    val = nextItem[1]
    
    if hasKey(current, key):
        current = updateByKey(current, key, val)
    else:
        current.append(initListWithValue(key, val))
    
    return current

result = seq(data)
         .map(reGroupByFirstItem)
         .flatten()
         .fold_right([], createNewList)
         

print(result)


You do need to install the pyFunctional first:

pip install pyfunctional 

Full documentation here https://docs.pyfunctional.pedro.ai/en/latest/index.html

Answered by Joel Chu on December 22, 2020

While others have written really good answers, I feel solving this via pandas is both more maintainable and more verbose. Plus pandas objects are really easy to work with.

First the imports:

import pandas as pd
import calendar
from pprint import pprint

Here's the main body of code:

df = pd.DataFrame(data, columns=["lists", "month"])
names = list(set([y for x in df["lists"] for y in x]))
df[names] = 0


def func(row):
    for n in names:
        for k in row["lists"]:
            if k == n:
                row[n] += 1
    return row


df = df.apply(func, axis=1)
df.drop(["lists"], inplace=True, axis=1)

new_df = df.groupby(by="month").sum().T.reset_index()
new_df.columns.name = None # Just for my taste to remove the "month" label of groupby result

months = list(calendar.month_name)[1:]  # list of months. There's an empty string at index 0.
new_df[[m for m in months if m not in new_df.columns]] = 0 #Creating columns for unseen months
new_df = new_df[["index"] + months] #sorting the months
print(new_df) 
pprint(new_df.values.tolist())

The output will be:

              index  January  February  ...  October  November  December
0            kanker        0         0  ...        1         0         0
1          covid-19        0         0  ...       19         0         0
2           jantung        0         0  ...        2         0         0
3            tiktok        0         0  ...        1         0         0
4  tenaga kesehatan        0         0  ...        1         0         0

[5 rows x 13 columns]


[['kanker', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
 ['covid-19', 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0, 0],
 ['jantung', 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0],
 ['tiktok', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
 ['tenaga kesehatan', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]]

The outputs will be:

              index  January  February  ...  October  November  December
0  tenaga kesehatan        0         0  ...        1         0         0
1          covid-19        0         0  ...       19         0         0
2            kanker        0         0  ...        1         0         0
3           jantung        0         0  ...        2         0         0
4            tiktok        0         0  ...        1         0         0

[5 rows x 13 columns]


[['tenaga kesehatan', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
 ['covid-19', 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0, 0],
 ['kanker', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
 ['jantung', 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0],
 ['tiktok', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]]


Answered by Farhood ET on December 22, 2020

you really shouldn't be storing different data in a list like that, how about something that looks like this?

{'covid-19': [0, 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0],
 'jantung': [0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0],
 'kanker': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
 'tenaga kesehatan': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
 'tiktok': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]}

and here's a code snippet to make this dict:

from collections import defaultdict
result = defaultdict(lambda: [0]*12)
for i in data: 
    if i[0]: 
        for j in i[0]: 
            result[j][datetime.datetime.strptime(i[1],"%B").month - 1] += 1

Answered by AntiMatterDynamite on December 22, 2020

I suggest you to change nested list to be dictionary like this

{
  "October":{
     "covid-19":8,
     "jantung":5
  },
  "November":{...},
  ...
}

or like this

{
  "covid-19":{
     "Oktober":8,
     "November":5
  },
  "Jantung":{...},
  ...
}

Answered by Aqil Fiqran on December 22, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP