TransWikia.com

error in random forest analysis

Bioinformatics Asked on March 7, 2021

I am now struggling to do random forest analysis, I will be thankful if you could help with code for random forest analysis.

I got samples from the root, soil, and leaf from two regions (bau & mau) and these samples belong to two seasons (Wet and Dry).

Now I am interested to do random forest analysis at genera or family level to identify the taxa which contribute the differences like in root samples based on region as well as season.

Here is my code, but I am getting the error.

    library(randomForest)
    library(knitr)
    
    
    
    
    
    #### RANDOM FOREST ANALYSIS #####
     #### Prepare data ####
     #Load OTU table
     OTU_table=t(read.table("asv.table.txt", row.names=1,sep="t", header=T, blank.lines.skip=F, check.names=F))
     table(apply(OTU_table,1,sum)) #verify rarefaction
    
     #Load metadata
     Meta=read.table("metadata.txt", header=T, row.names=1, stringsAsFactors=F, na.strings="NA",check.names=FALSE)
     Meta$sampleid=rownames(Meta)
    
     #Load taxonomy
     taxo=read.table("taxonomy.txt", row.names=1, sep="t", header=F ,stringsAsFactors=F,quote="")
     rownames(taxo)=paste("a.",row.names(taxo),sep="")
    
    
     #### Run models ####
     #1. Root only
     # 1.1. both region
     # 1.2. bau only
     # 1.3. mau only
     #2. Soil only
    # 2.1. both region
     # 2.2. bau only
     # 2.3. mau only
     #Params RF
     NTREE=1000 # Number of Trees
     NbVar=1000 # Number of variables tested at each split
    
    
     
    
     
     #### Root ONLY 1-3 ####

 # 1. BOTH Region

 #Subset of data
 RootSamples=as.character(Meta[Meta$Compartment=="Root","sampleid"])
 Root_OTU_table=OTU_table[RootSamples,]




 #Model with microbiome based on Season, region

whole_root_pred=data.frame(Season=Meta[RootSamples,"Season"],Region=Meta[RootSamples,"Region"],a=Root_OTU_table)
 
head(whole_root_pred)
     Season Region a.d2ec9f3b77975c0f457e4b7413b217ff
     a.3147790f0d5a78316fb9dd64f53b9473 a.97aecc1f35cc1f50db507ad71dd22367
     a.bfad6370d28182cc6304844e9bec7fb6 a.5fa2a987221a1d9ca416148570c18086 


    **RF_model_Root_all=randomForest(y=?,sampsize=c(143,143),strata=?,x=whole_Root_pred,importance = T,proximity = T,ntree =
    NTREE,mtry = NbVar)**

print(RF_model_Root_all)
     #plot summary using the 5% most important OTUs ERROR ON LAST LINE
     imp=data.frame(importance(RF_model_Root_all))
     imp$genus=as.character(taxo[rownames(imp),"Genus"])
     Best=imp[imp$MeanDecreaseAccuracy>quantile(x = imp$MeanDecreaseAccuracy,.95),]
     bymedian <- with(Best, reorder(genus, -MeanDecreaseAccuracy, median))
    
     pdf(width = 20,height = 10,file=paste(pathforplots,"Variable_Importance_Root_BothRegion_raref.pdf",sep=""))
     par(mar=c(15,5,1,1))
     boxplot(Best$MeanDecreaseAccuracy ~ bymedian, data = Best,
     xlab = "", ylab = "Variable Importance",
     main = paste("Root in Both Countries; Error Rate=",round(RF_model_Feces_all$err.rate[NTREE,"OOB"],3),sep=""), varwidth = TRUE,
     col = "lightgray",las=2)
    
    
     dev.off()

Many thanks

One Answer

This question will be a little hard to answer without more information.

For example, we will need to see your dataset (whole_root_pred), to decide why Stunting_Root is NULL.

  1. You might need to initialize Stunting_Root as a variable. It is currently not clear if it is e.g. a column of your dataframe, or just uninitialized. Uninitialized variables are NULL, which would explain your problem. randomForest might not know to look for strata inside your dataframe, for example. Is it in your dataframe?
  2. Also, I might be missing something, but why are you passing ? as a response? I'm not an expert but I believe that is an illegal character in R (I'm pretty sure?).

Answered by Maximilian Press on March 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP