TransWikia.com

how to extract CountryData[] without String names for use in numeric analysis

Mathematica Asked on December 5, 2020

I have the following Code that extracts selected variables and makes a table. I like to use these data in statistical analysis such as Fit[...], but I cannot use them because String variable units are also extracted together with the raw data.

How can I extract the raw data without String units and export it as an XLS file for use in Linear and Nonlinear Regression analysis?

countLst={"Argentina", "Australia", "Austria", "Belgium", 
  "Bulgaria","Brazil", "Brunei Darussalam", "Canada",  
  "Switzerland", "Chile", "China", "Colombia", "Costa Rica", 
  "Cyprus", "Czech Republic","Germany", "Denmark", "Spain", 
  "Estonia", "Finland", "France", "United Kingdom", "Greece", 
  "Hong Kong", "Croatia", "Hungary", "Indonesia", "India", 
  "Ireland", "Iceland", "Israel", "Italy", "Japan", "Kazakhstan",
  "Cambodia", "South Korea", "Lithuania", "Luxembourg", "Latvia", 
  "Morocco", "Mexico", "Malta", "Malaysia", "Netherlands", 
  "Norway", "New Zealand", "Peru", "Philippines", "Poland", 
  "Portugal", "Romania", "Russian Federation", "Saudi Arabia", 
  "Singapore", "Slovak Republic", "Slovenia", "Sweden", 
  "Thailand", "Tunisia", "Turkey", "Taiwan", "United States", 
  "Vietnam", "South Africa"
 };

Text[Grid[
  Prepend[{CountryData[#, "Name"],
  CountryData[#,"PopulationGrowth"],
  CountryData[#, "GDP"],
  CountryData[#, "TotalFertilityRate"], 
  CountryData[#, "GrossInvestment"], 
  CountryData[#, "InternetUsers"], 
  CountryData[#, "InventoryChange"], 
  CountryData[#, "MedianAge"], 
  CountryData[#, "TradeValueAdded"], 
  CountryData[#, "UnemploymentFraction"]} & /@ countLst, {"", 
  "pop. growth", "GDP", "fertility", "grossInv", "internet", 
  "inventory", "medianAge", "tradeVA", "unempl."}], Frame -> All, 
  Background -> {None, {LightBlue, {LightYellow}}}]
 ]

2 Answers

A streamlined way to construct the desired rectangular table using Outer:

countryList = (SeedRandom[777]; RandomSample[countLst, 20]);

propList = {"Name", "PopulationGrowth", "GDP", "TotalFertilityRate", 
  "GrossInvestment", "MedianAge"}; 

propLabels = {"country", "pop. growth", "GDP", "fertility", "grossInv", "medianAge"};

table = Prepend[propLabels] @ Select[FreeQ[_Missing]] @ ReplaceAll[Quantity -> (# &)] @
     Outer[CountryData, countryList, propList];

Grid @ table

enter image description here

You might also consider Dataset:

ds = Dataset @ Select[FreeQ[_Missing]] @ AssociationThread[countryList, 
    Map[AssociationThread[propList,
      Function[p, ReplaceAll[Quantity -> (# &)] @ 
       CountryData[#, p]] /@ propList] &] @ countryList]

enter image description here

Correct answer by kglr on December 5, 2020

Following @C.E's suggestion, I retrieved selected variables without strings:

Text[Grid[
   Prepend[data = {CountryData[#, "Name"], 
   QuantityMagnitude@CountryData[#,    "PopulationGrowth"], 
   QuantityMagnitude@CountryData[#, "GDP"], 
   QuantityMagnitude@CountryData[#, "TotalFertilityRate"], 
   QuantityMagnitude@CountryData[#, "GrossInvestment"], 
   QuantityMagnitude@CountryData[#, "InternetUsers"], 
   QuantityMagnitude@CountryData[#, "InventoryChange"], 
   QuantityMagnitude@CountryData[#, "MedianAge"], 
   QuantityMagnitude@CountryData[#, "TradeValueAdded"], 
   QuantityMagnitude@CountryData[#, "UnemploymentFraction"]} & /@ 
 countLst, {"", "pop. growth", "GDP", "fertility", "grossInv", 
"internet", "inventory", "medianAge", "tradeVA", "unempl."}],
Frame -> All, Background -> {None, {LightBlue, {LightYellow}}}]]

Then, I deleted Missing observations from data:

(*Export "data" to XLS format after deleting "Missing" observations*)

dataClean = Take[Select[data, FreeQ[#, _Missing] &], All, All];  (* thanks to @kglr *)
Export["(*directory address to save the Exported data*)dataClean.xls", dataClean];

Then I run two regression estimations (linear and quadratic) and plot original data points together with the regression lines:

(*Use a subset of "data" to run "Fit[...]" over "ListPlot[...]"*)

ClearAll[data, line, parabola];
data = {QuantityMagnitude@CountryData[#, "PopulationGrowth"],      QuantityMagnitude@CountryData[#, "GDP"]}&/@countLst;
line = Fit[data, {1, x}, x];
parabola = Fit[data, {1, x, x^2}, x];
Show[
 ListPlot[data, GridLines -> Automatic, ImageSize -> Large, PlotLegends -> "Expressions"], Plot[{line, parabola}, {x, 0, 15}]
 ]

In summary, here is what I did with this Code:

  1. Extract data from CountryData[] without any strings
  2. Delete Missing observations from the extracted data set
  3. Export the clean data set to an external XLS file
  4. Run two types of regressions using the clean data
  5. Plot the original data points together with the Regression functions
  6. Here is I produced:

enter image description here

Answered by Tugrul Temel on December 5, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP