TransWikia.com

Random selection of a row from a pandas DataFrame with weights

Stack Overflow Asked by Mehdi Zare on November 7, 2021

I’m trying to randomly select a row from a pandas DataFrame based on provided weights. I tried to use .sample() method with these parameters, but can’t get the syntax working:

import pandas as pd

df = pd.DataFrame({
    'label': [1,0,1,-1],
    'ind': [2,3,6,8],
})

df.sample(n=1, weights=[0.5, 0.4, 0.1], axis=0)

labels are 1,0 and -1 and I want to assign different weights to each label for random selection.

3 Answers

You should scale the weight so it matches the expected distribution:

weights = {-1:0.1, 0:0.4, 1:0.5}

scaled_weights = (pd.Series(weights) / df.label.value_counts(normalize=True))

df.sample(n=1, weights=df.label.map(scaled_weights) )

Test distribution with 10000 samples

(df.sample(n=10000, replace=True, random_state=1,
           weights=df.label.map(scaled_weights))
   .label.value_counts(normalize=True)
)

Output:

 1    0.5060
 0    0.3979
-1    0.0961
Name: label, dtype: float64

Answered by Quang Hoang on November 7, 2021

For each row, divide the desired weight by the frequency of that label in the df:

weights=df['label'].replace({1:0.5,0:0.4,-1:0.1})/df.groupby('label')['label'].transform('count')

df.sample(n=1, weights=weights, axis=0)

Answered by Chris Schmitz on November 7, 2021

You can try following code. It assigns desired weights from dictionary to your rows in df (assuming you gave them in such an order). In case you want weights to be dependent from number of elements - you can replace lambda with more complex function.

w = df['label'].apply( lambda x: {-1:0.5, 0:0.4, 1:0.1}[x] )
df.sample(n=1, weights=w, axis=0)

Answered by RunTheGauntlet on November 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP