TransWikia.com

Pandas compare items in list in one column with single value in another column

Stack Overflow Asked on December 13, 2021

Consider this two column df. I would like to create an apply function that compares each item in the "other_yrs" column list with the single integer in the "cur" column and keeps count of each item in the "other_yrs" column list that is greater than or equal to the single value in the "cur" column. I cannot figure out how to enable pandas to do this with apply. I am using apply functions for other purposes and they are working well. Any ideas would be very appreciated.

    cur other_yrs
1   11  [11, 11]
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]
4   16  [15, 85]
5   17  [17, 17, 16]
6   13  [8, 8]

Below is the function I used to extract the values into the "other_yrs" column. I am thinking I can just insert into this function some way of comparing each successive value in the list with the "cur" column value and keep count. I really only need to store the count of how many of the list items are <= the value in the "cur" column.

def col_check(col_string):
cs_yr_lst = []
count = 0
if len(col_string) < 1:  #avoids col values of 0 meaning no other cases.
    pass
else:
    case_lst = col_string.split(", ")  #splits the string of cases into a list
    for i in case_lst:
        cs_yr = int(i[3:5])  #gets the case year from each individual case number
        cs_yr_lst.append(cs_yr)  #stores those integers in a list and then into a new column using apply
return cs_yr_lst

The expected output would be this:

  cur other_yrs    count
1   11  [11, 11]     2
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]   11
4   16  [15, 85]     1
5   17  [17, 17, 16] 3
6   13  [8, 8]  2

3 Answers

If columns contain millions of records in both of the dataframes and one has to compare each element in first column with all the elements in the second column then following code might be helpful.

for element in Dataframe1.Column1:
    
   Dataframe2[Dateframe2.Column2.isin([element])]

Above code snippet will return one by one specific rows of dataframe2 where element from dataframe1 is found in dataframe2.column2.

Answered by Adeel Afzal on December 13, 2021

Use zip inside a list comprehension to zip the columns cur and other_yrs and use np.sum on boolean mask:

df['count'] = [np.sum(np.array(b) <= a) for a, b in zip(df['cur'], df['other_yrs'])]

Another idea:

df['count'] = pd.DataFrame(df['other_yrs'].tolist(), index=df.index).le(df['cur'], axis=0).sum(1)

Result:

   cur                                   other_yrs  count
1   11                                    [11, 11]      2
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]     11
4   16                                    [15, 85]      1
5   17                                [17, 17, 16]      3
6   13                                      [8, 8]      2

Answered by Shubham Sharma on December 13, 2021

You can consider explode and compare then group on level=0 and sum:

u = df.explode('other_yrs')
df['Count'] = u['cur'].ge(u['other_yrs']).sum(level=0).astype(int)

print(df)
    cur                                   other_yrs  Count
1   11                                    [11, 11]      2
2   12  [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]     11
4   16                                    [15, 85]      1
5   17                                [17, 17, 16]      3
6   13                                      [8, 8]      2

Answered by anky on December 13, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP