TransWikia.com

One by one making a histogram out each column of a dataframe

Stack Overflow Asked by dnv89 on November 15, 2021

I have a table that looks like this. (In reality, there are over a million rows and 70-something columns)

ind1 ind2 ind3 ... indn
0.1 0.2 0.3 0.4 0.5
1.0 0.9 0.8 0.7 0.6
1.0 1.0 1.0 1.0 1.0
0.9 0.9 0.9 0.9 0.9

I want an automated procedure to create a histogram for every column in the table. This is the code I have, but it doesn’t work

for (i in 1:10){
  plN <- ggplot(cdf, aes(x=colnames(cdf)[i])) + geom_histogram(binwidth = 0.01)
  plot(plN)
}

How do I extract the column name for the ith column of the dataframe, so I can plug that in in place of x=…? The current setup doesn’t work.

(Answered in Duck’s comment below. Thanks for the help!)

2 Answers

I also like writing single functions and then using a loop to call the function.

I'll create your dataset with some random numbers.

library(tidyverse)
set.seed(123)
cdf <- data.frame(
  A = rnorm(1000, -3, 1),
  B = rnorm(1000, 1, 1),
  C = rnorm(1000, 5, 10),
  D = rnorm(1000, -3, 2)
)

Then write a function to graph a given dataset and function. I also include whether to print it.

myhist_function <- function(mydata, mycolumn, printit = F) {
  mycolname <- colnames(mydata)[mycolumn]
  coltograph <- mydata[mycolumn]
  ggplot(mydata, aes(x = mydata[, mycolumn])) +
    geom_histogram(binwidth = 0.1) +
    xlab(mycolname) -> plN
    if (printit == T) {
      print(plN)
    }
  return(plN)
}

Calling the function as such once looks like:

myhist_function(mydata = cdf, mycolumn = 2, printit = T)

functioncal

Then I can create an empty list, and loop over whatever columns I desire for any given dataset. And populate the list with ggplot2 graph objects.

mygraphs <- list()

columns_toplot <- names(cdf)
for (i in seq_along(columns_toplot)) {
  mygraphs[[i]] <-
    myhist_function(mydata = cdf, mycolumn = i, printit = F)
}
names(mygraphs) <- columns_toplot

You can visualize any graph you wish by mygraphs[[i]], but you can also use the ggarrange() function from the ggpubr package to do things like this:

library(ggpubr)
ggarrange(mygraphs[[1]],
          mygraphs[[2]],
          mygraphs[[3]],
          mygraphs[[4]],
          ncol = 2,
          nrow = 2)

ggarrange example

Answered by akaDrHouse on November 15, 2021

We can store the output in a list

plN <- vector('list', ncol(cdf))
for(i in seq_along(cdf)) {
      plN[[i]] <- ggplot(cdf, aes(x= !! rlang::sym(names(cdf)[i]) + 
                 geom_histogram(binwidth = 0.01)

 }

plN[[1]]

Answered by akrun on November 15, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP