TransWikia.com

Can you create a data frame of a dictionary?

Stack Overflow Asked by Greg Sullivan on December 16, 2021

I am trying to understand a black of code. Is it possible to have a data frame of a dictionary?

def plot_dists(num_samples, mu=0, sigma=1):

  norm_samples = numpy.random.normal(
      loc=mu, scale=sigma, size=num_samples)
  poisson_samples = numpy.random.poisson(
      lam=sigma**2, size=num_samples)  
  
  dists = pandas.DataFrame({
      'norm': norm_samples,
      'poisson': poisson_samples,
  })

  min_x = dists.min().min()
  max_x = dists.max().max()
  bw = (max_x - min_x) / 60
  pyplot.hist(dists.norm, width=bw, bins=60,
              label='N(%.1f, %.1f)' % (mu, sigma), alpha=.5, normed=True)
  pyplot.hist(dists.poisson, width=bw, bins=60,
              label='Poisson(%.1f)' % sigma, alpha=.5, normed=True)
  pyplot.legend()
  
plot_dists(100000)

The following block is throwing me off:

  dists = pandas.DataFrame({
      'norm': norm_samples,
      'poisson': poisson_samples,
  })

Is this a data frame of a dictionary? Everything I am reading online is telling me how to convert a dictionary to a data frame or a data frame to a dictionary. I am not sure if this is a data frame of a dictionary in it or how that works. If you any can help me understand the code a little better it would be much appreciated. Thanks in advance

One Answer

pandas dataframe is a way to represent tabular data. if you read the documentation the first parameter of the class constrcutor (data ) accepts ndarray, Iterable, dict, or DataFrame.

[https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html]

so to create a data frame you can pass a dictionary as parameter for this example it will look like this: (first row only)

|   | norm | poisson  |   
|---|------|----------|
| 0 |0.455 |     2    |   

you can notice that the dictionary keys (norm and poisson) are the name of columns respectively.

i reproduced your code using google colab:


import matplotlib.pyplot as pyplot
import numpy
import pandas

 
def  plot_dists(num_samples, mu=0, sigma=1):
    norm_samples = numpy.random.normal(
    loc=mu, scale=sigma, size=num_samples)
    poisson_samples = numpy.random.poisson(

    lam=sigma**2, size=num_samples)
    dist = {'norm': norm_samples,
            'poisson': poisson_samples}
    
    dists = pandas.DataFrame(dist)
    
    min_x = dists.min().min()
    max_x = dists.max().max()
    bw = (max_x - min_x) / 60
    
    #normed is deprecated i think use density instead
    pyplot.hist(dists.norm, width=bw, bins=60,
    label='N(%.1f, %.1f)' % (mu, sigma), alpha=.5, density =True)
    
    pyplot.hist(dists.poisson, width=bw, bins=60,
    label='Poisson(%.1f)' % sigma, alpha=.5, density =True)
    pyplot.legend()
    
    #return the dataframe for debugging and visualization. 
    return dists

 
dists = plot_dists(100000)
dists.tail()

Answered by Adel Bennaceur on December 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP