Re: [math-fun] Disjoint subset sums problem by Victor Miller, solved CORRECTED.

4 Oct 2012

      Sorry, I think due to mouse typo I posted my wrong file, not
my later correct file, giving the solution.
(Arithmetic errors.)
Here is the later solution.
...
Say that a finite set of integers, A, has *property P* is there are two
non-empty disjoint subsets S and T of A, such that sum(S) = sum(T)
(where sum(X) is the sum of the elements of X).
1) Show that every 10 element subset of {1,...,100} has property P (this
is fairly easy).
2) Is it true that every 9 element subset of {1,...,100} has property P?
--Suppose we have an M-element subset of {1,2,...,N}.
Let us try to prove that is M is large enough, then the subset must have
"Miller's property P"  (while trying not to barf at the horrible name).

The number of nonempty subsets of the M-set is 2^M-1.
The subset sum must be in the range [1,2,...,N*M].   Hence two subset
sums must be
equal, if 2^M-1 > N*M.   This sum-equality implies a sum-equality
for DISJOINT sums because just remove the shared summands form both sums.

When N=100 and M=10 we have 2^10-1 = 1023 > 10*100 = 1000, so
Miller's question (1) is solved: 10 is impossible.

----

It would be possible to solve Miller's problem 2 by computer:
There are binomial(100, 9) = 1.9*10^12  subsets.
Can use negation symmetry to halve it, and shift-symetry for all
subsets whose min and max elements are not full range, to get a large
reduction... so it should be feasible
to solve by brute force.

-----

To solve his problem 2 without computer, we need a somewhat  stronger
bound.  Consider only nonempty subsets of the M-set with L <=
cardinality <= J.  If
    Sum[K=L..J]of     binomial(M, K) > N*J - L + 1
then two such subsets must have equal sum, and it is not possible for
one to be contained in the other (since all summands positive) hence
by removing shared
elements we get two equal disjoint subset sums.

But... this bound is not strong enough
to solve Miller problem 2 with N=100, M=9>=J>L>0, although it would
have been if
N=77 using J=6.

To do better, we need to use the fact that the subsubset sums are
nonuniformly distributed within the permitted range [L ,..., N*J].
Considering a random J-subset of our M-set, its mean is J*Q
(where Q is the mean value of an M-set element) and its
variance<(min(J, M-J)/4)*(N-1)^2
where I am here cleverly using the fact we sample from the M-set
WITHOUT replacement, that's why that min is there.   (The variance of
a sum is the same as variance of complement-set-sum.)  We actually
can even do a tiny bit better than that, reducing the (N-1)^2 / 4
to ((N-1)^2 + (N-3)^2 )/ 2   if we are talking about J=4
element sets, maximum variance occurring if the 4 elements are N, N-1, 1, and 2.

The probability we are within k standard deviations off mean, is >=1-1/k^2
(http://en.wikipedia.org/wiki/Chebyshev's_inequality).
This means we can, by choosing k optimally, "amplify" the
effectiveness of the above naive bound.

For example, M=9, there are 2^9-1=511 subsets.
Of those 511 subsubset sums, at least 511*(1-1/1.746^2)=343.377 of them,
i.e. at least 344 in view of integrality, lie within 1.746 standard deviations
i.e. less than +-1.746*(N-1.5)=1.746*98.5=171.98 off mean,
i.e. +-171 in view of integrality.
Since 345 > 171+171+1 = 344
the theorem is (just barely!) proven -- 9 is not possible!

-----

I have a sneaking suspicion Miller had a modular argument in mind.
I also suspect by using better tail
bounds than Chebyshev we could do a bit better still.

You might want to consider those possibilities...   by milking them all,
what is the best bound one can get?

Warren Smith

tags

participants (1)