TransWikia.com

Element-wise random choice of a Series of lists (without a loop)

Stack Overflow Asked on December 22, 2021

I want to randomly select an element from each list in a Series of lists.

import pandas as pd
import numpy as np

l=[['a','b','c'],['d','e','f'],['g','h','i'],['j','k','l'],['m','n','o']]
s = pd.Series(l)

So s is:

0    [a, b, c]
1    [d, e, f]
2    [g, h, i]
3    [j, k, l]
4    [m, n, o]
dtype: object

I know I can do the following:

s = pd.Series([np.random.choice(i) for i in s])

Which does work:

0    a
1    e
2    h
3    j
4    m
dtype: object

But I am wondering if there is a non-loop approach to do this?

For instance, (assuming each list is equal size) you could make an array of random indices to try and pick a different element from each list:

i = np.random.randint(3, size=len(l))
#array([2, 2, 0, 1, 0])

But doing say s[i] doesn’t work because that is indexing s rather than applying to each list:

2    [g, h, i]
2    [g, h, i]
0    [a, b, c]
1    [d, e, f]
0    [a, b, c]
dtype: object

My motivation is to have something that would work on a large amount of lists, hence the avoidance of a loop. But if my list comprehension seems like the most reasonable, or there is no builtin pandas/numpy function for this, please tell me.

2 Answers

I can only think of this way , however, the performance may be the problem

np.array(s.tolist())[np.arange(len(s)), np.random.randint(3, size=len(s))]
array(['c', 'e', 'i', 'k', 'n'], dtype='<U1')

Some timing

%timeit s.explode().sample(frac=1, random_state=1) 
5.05 ms ± 294 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.Series([np.random.choice(i) for i in s])
23.1 ms ± 184 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.array(s.tolist())[np.arange(len(s)), np.random.randint(3, size=len(s))]
1.63 ms ± 50.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Answered by BENY on December 22, 2021

You could try explode, shuffle the exploded series, then sample. This doesn't even require that the lists have the same length.

(s.explode()
   .sample(frac=1, random_state=1)  # random_state added for repeatability, drop if needed
   .groupby(level=0).head(1)
)

Output:

1    d
2    h
0    c
3    k
4    n
dtype: object

Answered by Quang Hoang on December 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP