Cross Validated Asked by Sarmes on January 5, 2022

i have a set of data that is generated by expensive computational model evaluations, on a total data set of 10000 samples in 40 dimensions. This sample data set is composed of different data sets, originating partly from random runs, latin hypercube DOE, radial design DOE, linear parameter studies, and a large part is based on the history data generated by several optimization runs using genetic algorithms.

My thought was that a large part of the function evaluations generated during the genetic algorithms runs, could be some how used to augment them to the set of random and latin hypercube samples, in order to have a larger sample set to perform a variance based sensitivity analysis.

I came up with 2 ideas, but i am an engineer, not a mathematician:

1) using the covariance matrix for the total samples matrix, trying to filter out samples until the of diagonal terms are smaller then some threshold, to avoid correlations.

2)The other idea was to make some sort of minimum distance filter to avoid areas with tightly clustered samples.

Would that be sufficient? are there any tests for randomness, that i could use?

The problem is that i don’t know the right terminology, so maybe there exist ready to run methods for such problems, but i don’t know how to find them, because i don’t know their names.

I am thankful for any helpful suggestions.

Have you thought about orthogonalizing the entire data matrix with PCA? You could replace the columns of $mathbf{X}$ with the un-correlated principal components (eigenvectors normalized to their $sqrt{lambda_m}$).

It sounds like you don't have grouping categorical variables among the 40 variables as well. In this, the only thing you are left with is measuring the association between variables. Indeed, if you are trying to linear and non-linear assessments on sensitivity analysis and variance explanation, then break up the data using a "divide and conquer" approach to solve a large problem by solving smaller problems. Mixtures of variables generated from DOW, LHS, and genetic algorithms sounds quite complex -- but as long as you generate questions singly, and then do the associated analysis to answer the problem, you can work through your analytic goals.

By the way, there doesn't exist variance explanation approaches that allow you to pull out non-linear and linear components using the same model, unless you code what you are doing using non-linear regression and linear regression. There are packages that allow you to fit data based on equations, so maybe look at those (IGOR, EGRET, AMFIT(Poisson), MATLAB, etc.)

Last, be careful of the "so what?" question, whereby after you have done all of your model checking, a reader could ask why you did all of this on simulated data.

Answered by user32398 on January 5, 2022

1 Asked on December 15, 2021

1 Asked on December 15, 2021

forecasting machine learning regression scoring rules threshold

2 Asked on December 15, 2021 by ana-hernandez

2 Asked on December 15, 2021 by f-c-akhi

3 Asked on December 13, 2021

forecasting hidden markov model neural networks references time series

1 Asked on December 13, 2021

confidence interval descriptive statistics error normal distribution standard deviation

1 Asked on December 13, 2021

3 Asked on December 13, 2021 by swiss-army-man

circular statistics distributions estimation von mises distribution

3 Asked on December 13, 2021 by brainpermafrost

1 Asked on December 13, 2021 by velvetshelter

1 Asked on December 13, 2021

0 Asked on December 13, 2021

1 Asked on December 13, 2021 by econstat

1 Asked on December 13, 2021

circular statistics expected value univariate von mises distribution

1 Asked on December 13, 2021

1 Asked on December 13, 2021

backpropagation conv neural network machine learning neural networks

2 Asked on December 13, 2021 by domb

2 Asked on December 13, 2021 by fernando-camargo

0 Asked on December 13, 2021 by evy

Get help from others!

Recent Questions

- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?
- Does Google Analytics track 404 page responses as valid page views?

Recent Answers

- haakon.io on Why fry rice before boiling?
- Peter Machado on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?
- Jon Church on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP