TransWikia.com

How to create bins in Python

Stack Overflow Asked by Commander on December 5, 2021

I am trying to create data bins in Python which produces the following output.

binsize = 5
data = 0.4, 1.7, 10.7, 8.0, 3.2, 6.7, 11.4, 10.4

(bin_lower_bound - bin_higher_bound)^as a tuple: num_frequency
0.4 - 5.4:  3
5.4 - 10.4: 2
10.4 - 15.4: 3

I have made an attempt at using a for loop to use the lower value within data as the lower_bound for the first bin and then create a new bin at each bin size until the maximum value has been reached. But no luck, unfortunately.
The idea is I’m trying to use a dictionary too but I’m trying to achieve this without NUMPY.

bins: {
0.4 – 5.4: 3
5.4 – 10.4: 2
10.4 – 15.4: 3
}

Any help would be appreciated.

2 Answers

Approach below should be quite efficient and doesn't use any imports (as requested). Of note with this approach if there is a bin that doesn't have any contents, it will not show up in the result. If you would rather see a "0" for a bin with no results, you'll have to make a quick lap through between the min-max and seed all of the bins with a zero. Right now they are made "on the fly" from the data.

binsize = 5
data = [0.4, 1.7, 10.7, 8.0, 3.2, 6.7, 11.4, 10.4]
min_val = min(data)  # needed to anchor the first bin
bins = {}
for value in data:
    bin_num = int((value - min_val) // binsize) # integer division to find bin
    bins[bin_num] = bins.get(bin_num, 0) + 1

# pretty up the labels...optional
bins2 = { (round(k*binsize+min_val,1), round((k+1)*binsize+min_val,1)) : 
            bins[k] for k in bins }

# or with string-based labels
bins3 = { f'{round(k*binsize+min_val,1)} - {round((k+1)*binsize+min_val,1)}' : 
            bins[k] for k in bins}    

print(bins2)
# {(0.4, 5.4): 3, (2.4, 7.4): 3, (1.4, 6.4): 2}
print(bins3)
# {'0.4 - 5.4': 3, '2.4 - 7.4': 3, '1.4 - 6.4': 2}

Answered by AirSquid on December 5, 2021

This will work for any data and any binsize.

data = [0.4, 1.7, 10.7, 8.0, 3.2, 6.7, 11.4, 10.4]
data.sort()

from collections import defaultdict
binsize = 5
minval = min(data)
maxval = max(data)

def create_bins(minval, maxval):
    
    bins = []
    
    while minval < maxval:
        bins.append(f"{str(minval)} - {str(minval + binsize)}")
        minval += binsize
        
    return bins
    
bins = create_bins(minval, maxval)

bins_with_values = defaultdict(list)

i = 0
for val in data:
    
    if i < len(bins):
        
        temp = bins[i].split()
        if val < float(temp[2]):
            bins_with_values[bins[i]].append(val)
        else:
            i += 1
            bins_with_values[bins[i]].append(val)
            
print(bins_with_values)

Output:

defaultdict(<class 'list'>, {'0.4 - 5.4': [0.4, 1.7, 3.2], '5.4 - 10.4': [6.7, 8.0], '10.4 - 15.4': [10.4, 10.7, 11.4]})

Answered by Kakarot_7 on December 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP