TransWikia.com

Barren plateaus in quantum neural network training landscapes

Quantum Computing Asked by asdf on December 24, 2020

Here the authors argue that the efforts of creating a scalable quantum neural network using a set of parameterized gates are deemed to fail for a large number of qubits. This is due to the fact that, due to the Levy’s Lemma, the gradient of a function in high dimensional spaces is almost zero everywhere.

I was wondering if this argument can be also applied to other hybrid quantum-classical optimization methods, like VQE (Variational Quantum Eigensolver) or QAOA (Quantum Approximate Optimization Algorithm).

What do you think?

One Answer

First: The paper references [37] for Levy's Lemma, but you will find no mention of "Levy's Lemma" in [37]. You will find it called "Levy's Inequality", which is called Levy's Lemma in this, which is not cited in the paper you mention.

Second: There is an easy proof that this claim is false for VQE. In quantum chemistry we optimize the parameters of a wavefunction ansatz $|Psi(vec{p})rangle$ in order to get the lowest (i.e. most accurate) energy. The energy is evaluated by:

$$ E_{vec{p}} = frac{leftlangle Psi(vec{p})right|Hleft|Psi(vec{p})rightrangle}{leftlanglePsi(vec{p}) right|left.Psi(vec{p}) rightrangle}. $$

VQE just means we use a quantum computer to evaluate this energy, and a classical computer to choose how to improve the parameters in $vec{p}$ so that the energy will be lower in the next quantum iteration.

So whether or not the "gradient will be will be 0 almost everywhere when the number of parameters in $vec{p}$ is large" does not depend at all on whether we are using VQE (on a quantum computer) or just running a standard quantum chemistry program (like Gaussian) on a classical computer. Quantum chemists typically variationally optimize the above energy with up to $10^{10}$ parameters in $vec{p}$, and the only reason we don't go beyond that is because we run out of RAM, not because the energy landscape starts to become flat. In this paper you can see at the end of the abstract that they calculated the energy for a wavefunction with about $10^{12}$ parameters, where the parameters are coefficients of Slater determinants. It is generally known that the energy landscape is not so flat (like it would be if the gradient were 0 almost everywhere) even when there's a trillion parameters or even more.

Conclusion: The application of Levy's Lemma is going to depend on the particular energy landscape that you have, which will depend on both $H$ and your ansatz $|Psi(vec{p})rangle$. In the case of their particular implementation of QNN's, they have found an application of Levy's Lemma to be appropriate. In the case of VQE, we have a counter-example to the claim that Levy's Lemma "always" applies. The counter example where Levy's Lemma does not apply is when $H$ is a molecular Hamiltonian and $|Psirangle$ is a CI wavefunction.

Correct answer by user1271772 on December 24, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP