Best approximation of sum of unit vectors by a smaller subset

Question

Let $v_1,ldots,v_N$ be linear independent unit vectors in $mathbb{R}^N$ and denote their scaled sum by $s_N = frac{1}{N}sum_{k=1}^N v_k.$ I would like to find a small subset of size $n$ among those vectors such that their scaled sum approximates $s_N$ well. In other words find
$$ J = underset{Jinmathscr{J}}{operatorname{argmin}} bigglVert s_N - frac{1}{n}sum_{k=1}^n v_{J_k}biggrVert$$
where $J$ runs over the set $mathscr{J}$ of all subsets of ${1,ldots,N}$ with size $n$ and $lVert cdot rVert$ is the euclidean norm.
The set of vectors can be considered an iid sample drawn uniformly from the sphere. And, of course, in my case $N$ and $n$ are too large ($N$ will be of the order of 10'000 or 100'000 and $n$ maybe one or two magnitudes smaller) to just try all subsets. So  I am looking for something more clever.
My approach so far
I tried

Repeated random subsampling, i.e. drawing many, many subsets of size $n$ in an iid fashion, calculating the approximation for each instance and retaining the best.
Greedy approach, starting with a single vector, and then increasing the set in steps every time by a single vector. The vector is that single vector which gives the best approximation for the enlarged set.

Questions

Is this a known problem with a proper name?
Is it hard (as in NP-hard for example) or are clever solutions known?
Are there better heuristic approaches?
Are there theoretic results/performance guarantees for the two heuristics I used?

Note: I edited the question to include scaling. Some of the answers/comments refer to the older version where vectors were not scaled.

RobPratt · Answer

As suggested by @BenGrossmann, you can use integer linear programming to minimize the 1-norm.  Explicitly, let binary decision variable $x_j$ indicate whether $j in J$.  The problem is to minimize $sum_{i=1}^N (z_i^+ + z_i^-)$ subject to linear constraints
begin{align}
(s_N)_i - frac{1}{n}sum_{j=1}^N (v_j)_i x_j &= z_i^+ - z_i^- &&text{for $i in {1,dots,N}$} \
sum_{j=1}^n x_j &= n \
z_i^+ &ge 0 &&text{for $i in {1,dots,N}$}\
z_i^- &ge 0 &&text{for $i in {1,dots,N}$}
end{align}
This might provide a good approximation for your 2-norm objective or a good starting solution for an improvement heuristic.

For the 2-norm, the problem is to minimize $sum_{i=1}^N left((s_N)_i - frac{1}{n}sum_{j=1}^N (v_j)_i x_jright)^2$ subject to linear constraint
$$
sum_{j=1}^n x_j = n tag1
$$
Because $x_i$ is binary, we have $x_i^2 = x_i$.  For $i < j$, you can linearize each product $x_i x_j$ as described here.
You can also strengthen the formulation by multiplying both sides of the cardinality constraint $(1)$ by $x_i$, yielding:
$$sum_{j=1}^{i-1} x_j x_i + x_i^2 + sum_{j=i+1}^n x_i x_j = n x_i$$
And then linearize this quadratic constraint by using the products from the objective linearization:
$$sum_{j=1}^{i-1} y_{j,i} + sum_{j=i+1}^n y_{i,j} = (n - 1) x_i$$

Given a feasible solution, a simple improvement heuristic for either norm is to replace one vector in $J$ with one vector not in $J$ if it improves the objective value.

Ben Grossmann · Answer

Let $A$ denote the matrix whose columns are $v_1,dots,v_N$. Then your problem is that of minimizing $|s_N - Ax|$ subject to the constraint that $x$ has $0,1$ entries and $|x| leq sqrt{n}$.
Removing the constraint that $x$ has $0,1$ entries leaves us with a much easier problem to deal with. I suspect that its solution will yield a useful heuristic.
If $A = U Sigma V^T$ is an SVD and we make the substitutions $b = U^Ts_N$ and $y = V^Tx$, we are left with the simplified problem
$$
min |Sigma y - b| quad text{s.t. } quad |y| leq sqrt{n}.
$$
This is easily solved with Lagrange multipliers. The squared objective and constraint functions have the forms
$$
f(y) = |Sigma y - b|^2 implies nabla f = 2 [Sigma^2 y - Sigma b]
\g(y) = |y|^2 implies nabla g = 2y
$$
So, we have
$$
nabla f = lambda nabla g implies Sigma^2 y - Sigma b = lambda y implies (Sigma^2 - I)y = lambda Sigma b implies y = lambda(Sigma^2 - I)^{-1}Sigma b.
$$
Note: this assumes that $A$ does not have $1$ as a singular value, which occurs with probability $1$. Plugging into the constraint yields
$$
|lambda(Sigma^2 - I)^{-1}Sigma b|^2 = n implies lambda = pm sqrt{frac{n}{|(Sigma^2 - I)^{-1}Sigma b|^2}},
$$
which is simply to say that this solution for $y$ should be normalized to the radius-$sqrt{n}$ sphere.
I'm not sure if this can be written in terms that remove the SVD. For what it's worth, though, we have
$$
(Sigma^2 - I)^{-1}Sigma = V^T[(A^TA - I)^{-1}sqrt{A^TA}]V.
$$

Best approximation of sum of unit vectors by a smaller subset

My approach so far

Questions

2 Answers

Add your own answers!

Ask a Question