Motviation

A Sorted Tuple Missing an Element Has a Change Point

Suppose that

a \in {[1 \dots n]}^{n - 1}

and that

set (a) = [1 \dots n] ∖ {x}

for some

x \in [1 \dots n]

, and that

a

is sorted in ascending order. Then there exists some

i \in [1 \dots n]

such that

\forall j \in [n]

j \leq i

then

a [j] = j

and if

i < j

then

a [j] = j + 1

The above proposition says in plain english that if we have a sorted tuple that is missing one element, then there is some index $i$ such that the index equals the value at that position up to or equal to index $i$ and after that the value is the index plus one. Note that the above talks about numbers $[1 \dots n]$ , in our problem we use numbers from the set ${0, \dots, m}$ , the situation is analogous.

The idea is that we need to find the "change point", it can be seen that if $a [j] = j$ then the change point can only be to the right, otherwise it must be to the left, this in conjuction with the fact that $a$ is sorted allows us to employ binary search to find the specific element. We've proven in the past that this algorithm is correct, therefore it will correctly identify the change point.

Pseudocode

        def find_missing_rec(arr, start_idx, end_idx):

            if start_idx == end_idx:
                return arr[start_idx] - 1

            mid_idx = (start_idx + end_idx) / 2

            if arr(mid_idx) == mid_idx:
                find_missing_rec(arr, mid_point + 1, end_idx)
            else:
                find_missing_rec(arr, start_idx, mid_idx)

        def find_missing(arr):
            sort(arr)
            find_missing_rec(arr, 0, len(arr) - 1)

Runtime

Note that find_missing_rec operates the same way that binary search does, note that binary seach has $O (\log_{2} n)$ runtime. But we also must sort the array before passing it to find_missing_rec this makes the runtime $O (n \log_{2} n)$

Motivation and Corretness

Equal Number of Ones and Zeros

Suppose that

X := b a s e_r e p r ([0 \dots 2^{l} - 1], 2) \subseteq B

where we are taking the image of the base representation function. Then we claim that for any

i \in [1 \dots l]

we have

| {x \in X : x [i] = 0} | = | {x \in X : x [i] = 1} | = 2^{l - 1}

With the above claim in hand, it's not hard to see that since $B$ is missing one element then it's size is $2^{l} - 1$ , thus by fixing an index $i$ and then partitioning by those that have $0$ at position $i$ and those that have a $1$ (let us denote the former by $0_{i}$ and the latter with $1_{i}$ , then there will be an imbalance. By that we mean that one partition have $\pm 1$ the size of the other.

This imbalance is key, suppose $0_{i}$ has one less element than $1_{i}$ , then we observe that the missing element must have a $0$ at position $i$ otherwise it would have increased the size of $0_{i}$ which would have removed the imbalance (which is a contradiction).

Therefore by repeating this process for each index $i$ we can obtain what the missing value's bit is at index $i$ . Note that whenever we finish finding the value at index $i$ , we no longer have to use index $i$ ever again. Given $0_{i}$ , if we construct the set $0_{i}^{'}$ by simply extracting the $0$ at position $i$ from each binary number, and then doing the same but extracting the $1$ 's in $1_{i}$ to obtain $1_{i}^{'}$ , then $0_{i}^{'} = 1_{i}^{'}$ , this is true if we consider generating binary numbers by taking the previous binary numbers and extending them by appending a $1$ or a $0$ at a specific position to produce to produce binary numbers of one greater length.

The upshot from the above is that we can simply select one of $0_{i}$ , $1_{i}$ when we continue our search. With this reduction we can see that the size of our search cuts down by half each time.

We could simply conclude that by the master theorem that our runtime is in $O (m)$ , but it more visually pleasing to see a direct calculation. Note that in the first iteration we must go through each binary number in the list, which there are $m$ of, thus we take $m$ steps, now we throw out half of our space, and do the same, thus this takes $\frac{m}{2}$ iterations, we do this exactly $\log_{2} m = l$ times, which yields the finite summation of $\sum_{i = 0}^{l} \frac{m}{2^{i}} = m \sum_{i = 0}^{l} \frac{1}{2^{i}}$

Note that the summation being multiplied by $m$ is bounded by $2$ $\sum_{i = 0}^{l} \frac{1}{2^{i}} = \frac{\frac{1}{2^{l + 1}} - 1}{\frac{1}{2} - 1} = \frac{1 - \frac{1}{2^{l + 1}}}{\frac{1}{2}} = 2 - \frac{1}{2^{l}} \leq 2$

Therefore we can conlcude that our runtime is $O (2 m) = O (m)$

Pseudocode

        def find_missing_binary_number(binary_numbers: List[String], initial_size, search_idx: int, partial_solution: String):

            length_of_individual_binary_numbers = log_2(initial_size)

            if search_idx == length_of_individual_binary_numbers - 1: // we have a full number now.
                return partial_solution

            num_zeros = 0
            num_ones = 0
            for i in len(binary_numbers):
                if bit_search(i, search_idx) == 1:
                    num_ones += 1
                else:
                    num_zeros += 1

            assert(num_zeros != num_ones) // because there is a single missing one

            missing_a_zero =  num_zeros < num_ones

            if missing_a_zero:
                partial_solution += "0"
            else:
                partial_solution += "1"

            binary_numbers_to_search = []
            if missing_a_zero:
                binary_numbers_to_search = filter(lambda b: b[search_idx] == 0, binary_numbers)
            else:
                binary_numbers_to_search = filter(lambda b: b[search_idx] == 1, binary_numbers)

            return find_missing_binary_number(binary_numbers_to_search, search_idx + 1, partial_solution)

The returned number has same string length, but is not equal to any binary number in the input list therefore it is the missing one.