Where is the proof that replacing missing lagged values with zero in Arellano-Bond like estimators is a valid approach?

Question

If I use lagged values ($k, k geq 2$) as instruments to estimate an $AR(1)$ model (to take into account that the lagged value is endogenous),
there's the problem of missing values. For example, if I decide that $Y_{t-3}$ is a valid instrument for $Y_{t-1}$, of course I'd have it available only for individuals with at least $4$ observations, and in general only observations $Y_t, t geq 4$ would be observed.
I've found below:
(pdf)
(pdf)
that, at least when the panel is unbalanced, missing values should be replaced by zero.
Here:
https://www.jstor.org/stable/pdf/1913103.pdf?refreqid=excelsior%3Aaff11b8d9e04796448ebfbf42d6d7132
I read:
“recall that our procedure involves dropping the equations for the first m + 2 time periods. When the parameters are nonstationary this procedure involves no loss in efficiency. Although the equations that are dropped may be correlated with the remaining equations, there are no cross equation restrictions, and they are underidentified. When the parameters are stationary, dropping the first m + 2 periods may involve some loss in efficiency. Because there are cross-equation restrictions, efficiency can be improved by adding back t = m + 2 and t = m + 1 period equations, both of which have observable lags. Also, if there is no heteroskedasticity (across time or individuals) in the innovation variance for yit and xit, then all of the parameters for the joint Yit and process can be estimated without the earliest cross-section moments, so that it may be possible to further improve efficiency by using these moments. Cross-section moment based estimation of moving average (but not autoregressive) time series models in panel data has been considered by MaCurdy (1981a)”.
Thus, it is clear to me from the above that using $Y_{t-3}$ will avoid a loss of efficiency, but not that it should be set to $0$ when missing.
In general, I think that in some cases replacement with $0$ seems to me quite natural: for example, if the variable refers to a policy program that was not yet implemented, or to service use of children that had not been born yet (when there's no service/program, or I am too young to be entitled to it or not even been born, I cannot use it). But when we are simply talking about variables that were not observed before a given time, with a replacement to $0$, the model:  $ Y_{t-1}=alpha+beta*Y_{t-3}+epsilon_{t-1}$ leads to  $ hat{Y}_{t-1}=alpha$, thus something completely non-informative about the individual. How can this not bias estimates toward $0$? It seems to me analogous to the situation where, in a context where :  $ Y_{t}=alpha+beta*Y_{t-1}+epsilon_t$, instead of starting from $t=2$, we set $Y_0=0$.
++++++ EDIT 24 JULY 2020 ++++++
On second thought, I guess that the reason why estimates don't get biased is that both in $ Y_{t-1}=alpha_0+beta_0*Y_{t-3}+epsilon_{t-1}$ and in $ Y_{t-1}=alpha_1+beta_1*Y_{t-3}+epsilon^*_{t-1}$, the value of $beta$ is irrelevant for fitted values of the outcomes. This however leads me to think that, despite not introducing bias, substituting missing values of $Y_{t-3}$ with $0$ may affect other parameters and the variance explained, but not the estimate of the autoregressive parameter itself.
++++++ EDIT 14 SEPTEMBER 2020 ++++++
Now I have a different understanding: estimates are not affected in the first-stage equations. Nevertheless, more observations will be used for the final regression, thus increasing efficiency of the estimates. While I've found this online, I still haven't found a clear explanation of that in the literature.

Where is the proof that replacing missing lagged values with zero in Arellano-Bond like estimators is a valid approach?

Add your own answers!

Ask a Question