## Closest Two-Sum To Zero

### June 23, 2015

Before we write a solution, here is a function that returns a random list of *k* integers with absolute value less than *n*; it uses `range`

and `randint`

from the Standard Prelude.

`(define (rand-list k n)`

(map (lambda (r) (randint (- 1 n) n)) (range k)))

The brute-force solution compares each integer to each other integer and keeps track of the smallest pair:

`(define (brute-force xs)`

(let ((min-sum (apply max xs)) (min-pair (list)))

(do ((ys xs (cdr ys))) ((null? ys) min-pair)

(do ((zs (cdr ys) (cdr zs))) ((null? zs))

(let ((s (abs (+ (car ys) (car zs)))))

(when (< s min-sum)

(set! min-sum s)

(set! min-pair (list (car ys) (car zs)))))))))

That takes time O(*k*^{2}), and is useful for testing. A better algorithm sorts by the absolute value, then scans:

`(define (sort-and-scan xs)`

(define (lt? x y) (< (abs x) (abs y)))

(let loop ((xs (sort lt? xs))

(min-sum (apply max xs))

(min-pair (list)))

(if (null? (cdr xs)) min-pair

(let ((s (abs (+ (car xs) (cadr xs)))))

(if (< s min-sum)

(loop (cdr xs) s (list (car xs) (cadr xs)))

(loop (cdr xs) min-sum min-pair))))))

That takes time O(*k* log *k*). Here is an example:

`> (let ((xs (rand-list 7 100)))`

(display xs) (newline)

(display (brute-force xs)) (newline)

(display (sort-and-scan xs)) (newline))

(45 -29 -96 -7 -17 72 -60)

(72 -60)

(-60 72)

You can run the program at http://ideone.com/s2iFaQ.

I have a question for the mathematicians who read this: What is the expected minimum sum for a given k and n? For instance, given n = 1000 and k = 17, what is the expected minimum sum? In ten trials I had minimum sums 12, 0, 7, 6, 16, 1, 10, 7, 8 and 8, for an average of 7.5. Is that normal?

I think the expected minimum sum is zero, since you’re picking k independent uniformly from [-n, n].

I’m too under-caffeinated to properly work out the full distribution, though. As for “is that normal?”, I

think you may be falling prey to small sample size; with only 10 trials, it’s hard to get an idea of what

the distribution looks like.

Here’s an implementation in vanilla Python:

And for conducting a large number of experiments, we turn to the excellent numpy and pandas

libraries:

In one run, I got a mean of 0.03161 with a standard deviation of 10.806607.

Thinking a bit more about the distribution, it makes sense that the expected min sum would be zero.

A sum of 2n is only possible if two different entries are both plus or minus n, but there are n different

ways to get a sum of zero: plus and minus x for any x in {0, …, n}. This distribution is also symmetric

about the middle.

@Graham: Does k matter? If k is very small, there won’t be very many ways to make 0. Is there some minimum k required to guarantee an expected min sum of 0?

My intuition is that k doesn’t matter for the expected minimum sum to be zero; it should always

be zero. The variance probably plays a big role, though; if it’s relatively the same size as k, the

expected min sum may not be all that much more likely than other values. If I get a bit, I’ll take a

stab at the full conditional distribution.

If you look at the average of the absolute sums you get something close to 7.45. If you take the average of the sum, you come close to zero, of course. I have googled a little to find papers on problems like this, but did not find any yet.

My solution – sort the list and scan from each end:

The problem of finding the expected value of the minimum sum, feels like a “birthday paradox” type problem. In particular, the variation in which you have a room of m men and w women, and want to know the probability that a man and a woman share a birthday. E.g., consider men, women to be the random numbers zero, respectively, and their absolute value to be the birthday. See: https://en.wikipedia.org/wiki/Birthday_problem#Generalization_to_multiple_types

WordPress ate the less than and greater than symbols. I was trying to say let the random integers less than zero represent the men, and the random integers greater than zero represent women. Then the problem maps fairly well onto the case in the Wikipedia article. Not sure how to handle zero, maybe alternate so first zero is male, second is female, etc.

@praxis – In your code min-sum is the absolute value of the minimum sum, so your min-sum’s are all positive. I think that explains why your expected min-sum wasn’t what you expected.

In @Graham’s experiment, he used the actual minimum sum, which can be positive or negative. Therefor his expected value is 0.

This IPython notebook shows has more info: http://nbviewer.ipython.org/gist/anonymous/4abb2d282e72ba68cb5f

Solution – generate array; compare each item with every item after it, and compare to the closest pair of integers.

I think this problem is close to the problem to find the smallest distance between the points from an array. A nice explanation of the formula for that problem is given in MathOverflow.

We have m cards (in our case 2000) with k marked cards (17 in our case). If we deal them out in the following way we get a result where the cards have at least a given distance d. We will assume that cards dealt from the bottom are safe (not marked). Start to deal from the top. When a marked ard is dealt we deal d “safe” cards from the bottom. When all the cards are dealt we have the desired result, if the (k – 1) d cards at the bottom were indeed safe cards (not marked). The probabillity that no marked cards are in the bottom (k-1)d cards is the probability that we can deal cards with a minimum distance of d. From this you get a minimum distance value of L / (k^2-1), which is exact, if you let go m to infinity – L is the length of the interval (2000 in our case).

For our problem the situation is slightly different, but we can use the same argumentation. Our deal starts from positions 0, -1, +1, -2, +2, ….-2000, +2000. When we hit a marked card and the position is positive, we deal d cards on the negative side (we fill positions -pos-1, …-pos-d). When we hit a marked card and the position is negative, we deal d+1 cards on the positive side (we fill positions -pos, -pos-1, …,-pos-d). So on average we fill d+1/2 positions i.so. d positions for the minimum distance case. After that we can safely fill d cards at the same side (as the marked card) from the top. Tthere are 2 complications, which are not handled yet. The first is, that if the frist marked card is very early more safe cards have to be used, as a very small positive or a few very small negative numbers can result in a low sum too. The second complication is, that if after a marked card within d cards another marked card follows, the number of safe cards that have to be used the fill positions on the other side is less.

All in all, it is to be expected that the minimum distance problem is close to the closest to zero sum problem . As we have to deal slightly more “safe” cards for our problem, is is expected that the expectation value will be higher. As you can see from the table (m=2000) for all values of k the simulation value for our problem is in between the values for the minimum distance for k-1 and k. (all simulations were done with floats i.s.o. ints; the differences are negligable)

I think that the easiest way to solve the problem directly to iterate over all the elements

function closest(input) {

var v1, v2, sum, result;

for (var idx1 = input.length; idx1 >= 0; idx1–) {

for (var idx2 = input.length; idx2 >= 0; idx2–) {

if (idx1 != idx2) {

v1 = input[idx1];

v2 = input[idx2];

sum = v1 + v2;

if (sum >= 0 && (!result || result[2] > sum)) {

result = [v1, v2, sum];

}

}

}

}

return result;

}

var input = [45, -29, -96, -7, -17, 72, -60];

console.log(closest(input));

A solution in (Racket) Scheme.