# Quantifying the universal approximation theorem

Cross Validated Asked by ofow on January 3, 2022

Let $$mgeq 1$$ be an integer and $$Fin mathbb{R}[x_1, dots, x_m]$$ be a polynomial. I want to approximate $$F$$ on the unit hypercube $$[0, 1]^m$$ by a (possibly multilayer) feedforward neural network. The activation function is $$mathrm{tanh}$$ for all the connections.

Let $$varepsilon>0$$ be a real number. If I want the approximation to deviate from $$F$$ by less than $$varepsilon$$ in the $$L^2$$ norm what is the smallest possible number of non-zero weights?

It is kind of stupid to approximate a function that is known to be polynomial by a neural network but I just wanted to get more quantitative insight into the universal approximation theorem (and polynomials seem to be the most accessible class of functions).

## Related Questions

### Capacity of neural network with one hidden neuron?

1  Asked on November 24, 2021

### What does Y-axis of Normal Distribution’s plot denote?

2  Asked on November 24, 2021 by dmittal

### Confidence intervals and multiple regression for a multiply imputed data set

2  Asked on November 24, 2021 by appleseed

### Universal approximation theorem on limited precision arithmetic

0  Asked on November 24, 2021 by mrmartin

### Intercept interpretation in multi-level model when first-level predictor discrete

1  Asked on November 24, 2021

### Factors given by DoE can experimentally not be reached

0  Asked on November 24, 2021

### Why should we compare estimates of generalized linear model with its corresponding standard errors?

3  Asked on November 24, 2021

### Is it possible to view sequential independent trials as pre-determined with unknown outcome?

1  Asked on November 24, 2021 by jack-arthur

### Interpret coefficient of negative binomial regression

1  Asked on November 24, 2021

### Gaussian process smoothers (bs = “gp”) in GAMs

0  Asked on November 24, 2021 by doug-sponsler

### simulation of logistic regression sensitivity to prior probability: Brier score vs accuracy

0  Asked on November 24, 2021

### What is the best structure (Accuracy of the text extracted) for building an OCR? ATTENTION, CRNNN, DRAM,RAM, CTC based

0  Asked on November 24, 2021

### Nearest-neighbor returns different results based on coordinates chosen

0  Asked on November 24, 2021 by zhutchens1

### Counterexample where E(u|x)=0 in a regression model cannot hold in the population?

1  Asked on November 24, 2021

### report output GLMER and do contrasts

0  Asked on November 24, 2021 by chiara-toschi

### How determine the bandwidth of a gaussian kernel such that k nearest points represent a certain % of sum weight

1  Asked on November 24, 2021 by tzirtzi

### Comparing ISOMAP residual variance to PCA explained variance

1  Asked on November 21, 2021 by user3358740

### How to optimize Gaussian-process parameters for multiple tasks with GPML?

1  Asked on November 21, 2021 by scott-thibault

### For conjoint attribute importance calculation, should insignificant attribute levels be included in the calculation?

1  Asked on November 21, 2021 by arctan27

### Multilevel Poisson Regression

1  Asked on November 21, 2021