How can I randomly sample the space of consistent neural networks for given data?

Data Science Asked by Jack M on December 2, 2020

Suppose I have a dataset $$X$$ and target labels $$Y$$. For a fixed neural network architecture, how can I randomly and uniformly sample from the space of all possible assignments of weights such that the neural network maps $$X$$ to $$Y$$?

It's probably hard to get exactly a uniform distribution on the weights. One heuristic approximation is to repeat the following many times:

Randomly choose initial weights. Train the neural network until you get 100% accuracy on the training set. Save the resulting neural network.

Each neural network is a sample of weights that are consistent with the training set. Are they uniformly distributed among the set of all such weights? That seems unlikely. But they might give an approximation to such a sample.

This might fail, if training never gives you 100% accuracy. However, research has demonstrated empirically that if you choose a deep neural network architecture with sufficient capacity and you train for long enough, neural nets can memorize the training set and achieve 100% accuracy on the training set [1]. So, if it fails, I'd recommend increasing the size of the network and trying again. Of course, there are no guarantees -- it can still fail. It's a heuristic.

[1] Understanding deep learning requires rethinking generalization. Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals. arXiv:1611.03530

Answered by D.W. on December 2, 2020

I am trying to formalize your question before discussing it.

If I understand correctly, you ask for the following:

For $$X subset mathbb{R}^{n}$$ and $$Y subset mathbb{R}^m$$, let $$f:X rightarrow Y$$ be a map.

Let $$w in mathbb{R}^q$$ be weights. We consider a neural network $$g: mathbb{R}^{n}times mathbb{R}^q rightarrow mathbb{R}^m$$, and let $$g^{(w)}: mathbb{R}^{n} rightarrow mathbb{R}^{m}, x mapsto g(x,w)$$ be the neural network parametrized by $$w$$.

Now you want to sample from the set $$underline{W}(f,g,X):= {w in mathbb{R}^{q} mid f = (g^{(w)})_{mid X} }$$.

However, I think constructing $$underline{W}(f,g,X)$$ is very difficult in general.

The following question arises:

Do you already have some $$w in underline{W}(f,g,X)$$ ?

If not, note that $$underline{W}(f,g,X) = emptyset$$ is possible! (its easy to construct an example for that)

Note also that all known universal approximation theorems have some requirements on $$f$$, and only state that $$f$$ can be approximated by some neural network. However, for fixed architecture, it might be that there is no $$w in mathbb{R}^q$$ with $$f = (g^{(w)})_{mid X}$$ nor that $$f$$ can be approximated by $$(g^{(w)})_{mid X}$$ (e.g. in terms of the uniform-norm).

If you have some $$w in underline{W}(f,g,X)$$, there are certains trivial permutations (e.g. permuting the nodes of a fully-connected layer, or some channels). Apart from that, I am not aware of a full description of $$underline{W}(f,g,X)$$. And without further details or constraints, I think its there is no general answer at the moment.

I hope this helps!

Answered by Graph4Me Consultant on December 2, 2020

Related Questions

Is it a best practice to exclude retweets from the data set?

3  Asked on January 11, 2021 by user84037

Replacing mean by median over batch-size to lessen the impact of outliers

1  Asked on January 11, 2021

Why is predict_generator is returning an empty array?

1  Asked on January 11, 2021

Best way to narrow down a list and rank based on attributes?

1  Asked on January 11, 2021 by stardust123

Domain scoring based on ranking

1  Asked on January 11, 2021

Append and replacing data in r

1  Asked on January 11, 2021 by xingqi-fei

How to handle categorical features in K-means?

3  Asked on January 11, 2021 by sathya

Object detection stops predicting well according to how I collect the images

1  Asked on January 11, 2021 by denisb411

Predictive model to maximize sum of dependent variable?

1  Asked on January 11, 2021 by lamden

Customer Segmentation – Can I use an ML model for this problem?

0  Asked on January 11, 2021 by meena-nagarajan

Hypothesis vs Hyperplane in Machine Learning

1  Asked on January 11, 2021

Should features be correlated or uncorrelated for classification?

3  Asked on January 10, 2021 by srishti-m

How do I best visualize this voltage data for a science project

1  Asked on January 10, 2021 by user3687778

How to downsample a dataset with constraints in Python?

0  Asked on January 10, 2021 by mr-cysl

How to predict class label from class probability given by predict_generator for testdata?

3  Asked on January 10, 2021 by b-kanani

How can I use my laptop to implement analysis in Python but use a different fast server for computation of analysis?

1  Asked on January 10, 2021

Where does the evaluation speed advantage of Transformer-XL come from?

0  Asked on January 10, 2021 by usabik

Why is the variance of my model predictions much smaller than the training data?

1  Asked on January 9, 2021 by eartoolbox

out of frame landmark detection with CNN

0  Asked on January 9, 2021 by yuri

Interpretation of accuracy score on subset of data points

2  Asked on January 9, 2021 by jake-morris