TransWikia.com

Ordering a material science dataset (properties names, properties scalars, formulas)

Data Science Asked by James Arten on January 24, 2021

I’m dealing with a materials science dataset and I’m in the following situation,

I have data organized like this:

Chemical_ Formula     Property_name            Property_Scalar

    He                Electrical conduc.          1
    NO_2              Resistance                  50
    CuO3              Hardness
    ...               ...                        ...
    CuO3              Fluorescence                300
    He                Toxicity                    39
    NO2               Hardness                    80
    ...               ...                         ...

As you can understand it is really messy because the same chemical formula appears more than once through the entire dataset, but referred to a different property that is considered. My question is, how can I easily maybe split the dataset in smaller ones, fitting every formula with its descriptors in ORDER? I really need help on this… thank you. ( I used fiction names and values, just to explain my problem.)

I’m on Jupyter Notebook and I’m using Pandas.

I’m editing my question trying to be more clear:

My goal would be to plot some histograms of (for example) n°materials vs conductivity at different temperatures (100K, 200K, 300K). So I need to have both conductivity and temperature for each material to be clearly comparable. For example, I guess that a more convenient thing to obtain would be:

Chemical formula     Conductivity      Temperature

      He                 5                  10K
      NO_2               7                  59K
      CuO_3              10                 300K
      ...                ...                ...
      He                 14                 100K
      NO_2               5                  70K
      ...                ...                ...

One Answer

Given that your Dataframe is:

df2 = pd.DataFrame({
    "Chemical_Formula":["He", "NO_2", "CuO3", "CuO3", "He", "NO2"],
    "Property_name":["Electrical conduc.", "Resistance", "Hardness", "Fluorescence", "Toxicity", "Hardness"],
    "Property_Scalar":[1, 50, 10, 300, 39, 80]
})
Chemical_Formula Property_name Property_Scalar
0 He Electrical conduc. 1
1 NO_2 Resistance 50
2 CuO3 Hardness 10
3 CuO3 Fluorescence 300
4 He Toxicity 39
5 NO2 Hardness 80

You can use pivot to "unmelt" this in a wide format

df3 = df2.pivot(index="Chemical_Formula", columns="Property_name")
Chemical_Formula ('Property_Scalar', 'Electrical conduc.') ('Property_Scalar', 'Fluorescence') ('Property_Scalar', 'Hardness') ('Property_Scalar', 'Resistance') ('Property_Scalar', 'Toxicity')
CuO3 nan 300 10 nan nan
He 1 nan nan nan 39
NO2 nan nan 80 nan nan
NO_2 nan nan nan 50 nan

From then on you can drop columns you don't need and plot them.

Answered by lytseeker on January 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP