Now I can (slightly better) do it with 2N-3 perm-vectors (each with nonnegative weights) plus some multiple of the all-1 vector. Regard V as pre-sorted wlog (since otherwise, sort, express as sum, and apply the unsort permutation to everything). Now consider the "gaps" V2-V1, V3-V2, etc. in sorted order by increasing length. As the first summand use the vector (0,1,2,3...,N-1)*mingap. Now we have reduced to an (N-1) dimensional problem provided that we use (0,1,2,3,...,k; k+1,...,N-1)+(k,k-1,...,1,0; k+1,...N-1), permuted into appropriate order to handle the mingap in the vector after k prior mingaps have been squelched. So thus the N-1 gaps are handled via a weighted sum with 2*(N-2)+1=2N-3 weighted perm-vectors as summands. Finally we are left with a gapless vector, which therefore is some multiple of the all-1s vector. The entire algorithm runtime is O(N) after two initial pre-sortings, one for the vector entries, and the second for the gaps. However, it is clear on linear algebra and convex-containment grounds in N-1 dimensions that at most N perms are required, each with a nonnegative weight (in addition to some multiple of the all-1 vector). Therefore the above algorithm certainly is suboptimal by almost a factor of 2 as far as the number of summands is concerned. The problem of minimizing the L1 norm of the weights in the linear combination is a linear program with N!+2 variables and we may (if desired) demand all weights nonnegative in this linear program. But this does not make it obvious how to solve this LP efficiently -- can it be done in polynomial(N) time? Minimizing the L2 norm is doable via linear algebra, but again it is not obvious whether and how it can be done efficiently.