TransWikia.com

Importing Excel File as Dataset

Mathematica Asked by indiffer on January 23, 2021

This might be a basic question but I couldn’t get it to work.

I have a dataset in Excel that I import to mathematica using

dat = Import["etcetcetc.xlsx", "Dataset", "HeaderLines" -> 1]

However I can’t get it to calculate length or extract slices from the dataset as Mathematica sees the whole dataset as one value and thus length is 1 with Length[dat].

What am I doing wrong?

3 Answers

Like the previous answer mentions, excel imports as a list Datasets corresponding to each sheet. There's a couple approaches you can take. If you have multiple sheets of interest, anything you do will need to be mapped.

data = Import[..];
Length /@ data

To get Rows/Cols count, do something like

Dimensions /@ data

I wouldn't be surprised if upcoming versions had elements to get the number of sheets/dimensions of each sheet in a more efficient way as well....

However, in this case you only have one sheet of interest, so to import that sheet directly as a Dataset rather than list of Datasets, do this:

Import["etcetcetc.xlsx", {"Dataset", 1}, "HeaderLines" -> 1]

And you can work with your flat Dataset how you were expecting originally.

If you had multiple sheets and wanted to combine them into a dataset, the other answer was sort of on the right path, but you'd still need to separate the datasets so there's a clear barrier between sheets, something like:

Dataset@AssociationThread[Import[.., "Sheets"], Import[.., "Dataset", "HeaderLines" -> 1]]

You can easily trim off the totals by changing the "Dataset" to {"Dataset", All, ;; -3}

Correct answer by GenericAccountName on January 23, 2021

On Importing an Excel file, you get a list of tabs, each of which is equivalent to a CSV file in its own right. You can then Map each tab to a Dataset.

toDataset[tabs_] := (
    With[{h = First@#, d = Rest@#},
        Map[Association@*(MapThread[Rule, {h, #}] &)][d]
    ] & /* Dataset
)[tabs];

xlsxFile = (
    Import[#, "XLSX"] & /*
    MapIndexed[(("tab" <> ToString[First[#2]]) -> toDataset[#1]) &] /*
    Association
)["path/to/xlsx/file"];

xlsxFile["tab1"] // Head (* Dataset *)
xlsxFile["tab1"] // Length (* 1234 *)

Answered by Shredderroy on January 23, 2021

Using the Energy Information Agency's (EIA) 860 report that covers all generation assets in the US I may have came up with a simpler example. You can download the same data from EIA 860 Database.

wbLocation = "C:UsersxxxxxxDocumentseia86020193_1_Generator_Y2019.xlsx";

sheetL = Import[wbLocation, "Sheets"]

{"Operable", "Proposed", "Retired and Canceled"}

You can nest the list of sheets within the "Dataset" command. The hiccup comes from needing to convert the initial datasets into a list of associations to be able to put the use the AssociationThread function appropriately. Then you can apply the Dataset function and get the desired results.

Dataset[
  AssociationThread[sheetL, 
    Normal[
      Import[wbLocation, 
         {"Dataset", sheetL},"SkipLines"->1,"HeaderLines" -> 1]]]]

Answered by Andy Krock on January 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP