TransWikia.com

Multivariate linear regression

Mathematica Asked on September 2, 2021

Say I have a 4-dimensional data set with two independent variables x1, x2, and two dependent variables y1, y2, i.e. each row is {x1, x2, y1, y2}:

data = RandomReal[{0, 1}, {100, 4}];

How do I feed it to LinearModelFit to fit

$$ begin{pmatrix}
y_1
y_2
end{pmatrix} = begin{pmatrix}
a_{11} & a_{12}
a_{21} & a_{22}
end{pmatrix}
begin{pmatrix}
x_1
x_2
end{pmatrix} +
begin{pmatrix}
c_1
c_2
end{pmatrix} $$

?

3 Answers

LinearModelFit doesn't do multivariate regression as far as I'm aware. You can use my repository function BayesianLinearRegression instead. The first example in the "Scope" section shows you how. You can provide the data in the format

data[[All, {1, 2}]] -> data[[All, {3, 4}]]

or

#[[{1, 2}]] -> #[[{3, 4}]]& /@ data

For example:

fitData = ResourceFunction["BayesianLinearRegression"][
  data[[All, {1, 2}]] -> data[[All, {3, 4}]],
  {1, x1, x2}, (* basis functions *)
  {x1, x2} (* independent variables *)
];

You can find the best-fit expression with:

Mean[fitData["Posterior", "PredictiveDistribution"]]

The output should provide you with all the details about the uncertainties of the predictions and the regression coefficients. My blog post has more background information, if you need it.

Correct answer by Sjoerd Smit on September 2, 2021

You can build the regression yourself as an NMinimize of residuals which are squared distances to points.

First let's build some synthetic noisy data:

(* create some noisy data that follows a linear model *)
n = 1000;
datax = RandomReal[{-1, 1}, {n, 2}];
testmtx = {{3, 4}, {1/2, 1/6}};
testoffset = {3/2, 5/7};
fn[{x1_, x2_}] := testmtx.{x1, x2} + testoffset
noise = RandomVariate[NormalDistribution[0, 1/10], {n, 2}];
datay = (fn /@ datax) + noise;

(* this is the noisy 4d data *)
data = MapThread[Join, {datax, datay}];

ListPlot[{datax, datay}, PlotRange -> {{-4, 4}, {-4, 4}}, 
 AspectRatio -> 1, PlotStyle -> PointSize[Small]]

The ideal fit is:

$$ left( begin{array}{cc} y_1 y_2 end{array} right)= left( begin{array}{cc} 3 & 4 1/2 & 1/6 end{array} right) left( begin{array}{cc} x_1 x_2 end{array} right) + left( begin{array}{cc} 3/2 5/7 end{array} right) $$

... but pretend we don't know that and we only work with data from this point. Here's what the $x_1,x_2$ values (blue) vs the noisy $y_1,y_2$ values (orange) look like: noisy data

Then construct a residual function and an objective which is to minimize the total residuals:

matrix = {{a1, a2}, {a3, a4}};
offset = {c1, c2};
sqresidual[{x1_, x2_, y1_, y2_}, mtx_, c_] := 
 SquaredEuclideanDistance[c + mtx.{x1, x2}, {y1, y2}]
objective = Total[sqresidual[#, matrix, offset] & /@ data];

... and finally use NMinimize:

NMinimize[objective, {a1, a2, a3, a4, c1, c2}]
(* result: {19.8142, {a1 -> 2.99722, a2 -> 4.00609, a3 -> 0.498218, 
  a4 -> 0.165467, c1 -> 1.49577, c2 -> 0.7118}} *)

The result is pretty close to ideal!

Answered by flinty on September 2, 2021

For simple applications of regression, there is no need to fit multiple dependent variables simultaneously. The results are the same as a regression of each dependent variable separately. There are caveats if you are doing further analysis, but if you just want the basic regression results, you can do separate fits.

Answered by nanoman on September 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP