Re: [math-fun] Disjoint subset sums problem by Victor Miller, solved CORRECTED.
Sorry, I think due to mouse typo I posted my wrong file, not my later correct file, giving the solution. (Arithmetic errors.) Here is the later solution.
Say that a finite set of integers, A, has *property P* is there are two non-empty disjoint subsets S and T of A, such that sum(S) = sum(T) (where sum(X) is the sum of the elements of X).
1) Show that every 10 element subset of {1,...,100} has property P (this is fairly easy). 2) Is it true that every 9 element subset of {1,...,100} has property P?
--Suppose we have an M-element subset of {1,2,...,N}. Let us try to prove that is M is large enough, then the subset must have "Miller's property P" (while trying not to barf at the horrible name). The number of nonempty subsets of the M-set is 2^M-1. The subset sum must be in the range [1,2,...,N*M]. Hence two subset sums must be equal, if 2^M-1 > N*M. This sum-equality implies a sum-equality for DISJOINT sums because just remove the shared summands form both sums. When N=100 and M=10 we have 2^10-1 = 1023 > 10*100 = 1000, so Miller's question (1) is solved: 10 is impossible. ---- It would be possible to solve Miller's problem 2 by computer: There are binomial(100, 9) = 1.9*10^12 subsets. Can use negation symmetry to halve it, and shift-symetry for all subsets whose min and max elements are not full range, to get a large reduction... so it should be feasible to solve by brute force. ----- To solve his problem 2 without computer, we need a somewhat stronger bound. Consider only nonempty subsets of the M-set with L <= cardinality <= J. If Sum[K=L..J]of binomial(M, K) > N*J - L + 1 then two such subsets must have equal sum, and it is not possible for one to be contained in the other (since all summands positive) hence by removing shared elements we get two equal disjoint subset sums. But... this bound is not strong enough to solve Miller problem 2 with N=100, M=9>=J>L>0, although it would have been if N=77 using J=6. To do better, we need to use the fact that the subsubset sums are nonuniformly distributed within the permitted range [L ,..., N*J]. Considering a random J-subset of our M-set, its mean is J*Q (where Q is the mean value of an M-set element) and its variance<(min(J, M-J)/4)*(N-1)^2 where I am here cleverly using the fact we sample from the M-set WITHOUT replacement, that's why that min is there. (The variance of a sum is the same as variance of complement-set-sum.) We actually can even do a tiny bit better than that, reducing the (N-1)^2 / 4 to ((N-1)^2 + (N-3)^2 )/ 2 if we are talking about J=4 element sets, maximum variance occurring if the 4 elements are N, N-1, 1, and 2. The probability we are within k standard deviations off mean, is >=1-1/k^2 (http://en.wikipedia.org/wiki/Chebyshev's_inequality). This means we can, by choosing k optimally, "amplify" the effectiveness of the above naive bound. For example, M=9, there are 2^9-1=511 subsets. Of those 511 subsubset sums, at least 511*(1-1/1.746^2)=343.377 of them, i.e. at least 344 in view of integrality, lie within 1.746 standard deviations i.e. less than +-1.746*(N-1.5)=1.746*98.5=171.98 off mean, i.e. +-171 in view of integrality. Since 345 > 171+171+1 = 344 the theorem is (just barely!) proven -- 9 is not possible! ----- I have a sneaking suspicion Miller had a modular argument in mind. I also suspect by using better tail bounds than Chebyshev we could do a bit better still. You might want to consider those possibilities... by milking them all, what is the best bound one can get?
participants (1)
-
Warren Smith