The above proposition says in plain english that if we have a sorted tuple that is missing one element, then there is some index such that the index equals the value at that position up to or equal to index and after that the value is the index plus one. Note that the above talks about numbers , in our problem we use numbers from the set , the situation is analogous.
The idea is that we need to find the "change point", it can be seen that if then the change point can only be to the right, otherwise it must be to the left, this in conjuction with the fact that is sorted allows us to employ binary search to find the specific element. We've proven in the past that this algorithm is correct, therefore it will correctly identify the change point.
def find_missing_rec(arr, start_idx, end_idx): if start_idx == end_idx: return arr[start_idx] - 1 mid_idx = (start_idx + end_idx) / 2 if arr(mid_idx) == mid_idx: find_missing_rec(arr, mid_point + 1, end_idx) else: find_missing_rec(arr, start_idx, mid_idx) def find_missing(arr): sort(arr) find_missing_rec(arr, 0, len(arr) - 1)
Note that find_missing_rec
operates the same way that binary search does, note that binary seach has runtime. But we also must sort the array before passing it to find_missing_rec
this makes the runtime
With the above claim in hand, it's not hard to see that since is missing one element then it's size is , thus by fixing an index and then partitioning by those that have at position and those that have a (let us denote the former by and the latter with , then there will be an imbalance. By that we mean that one partition have the size of the other.
This imbalance is key, suppose has one less element than , then we observe that the missing element must have a at position otherwise it would have increased the size of which would have removed the imbalance (which is a contradiction).
Therefore by repeating this process for each index we can obtain what the missing value's bit is at index . Note that whenever we finish finding the value at index , we no longer have to use index ever again. Given , if we construct the set by simply extracting the at position from each binary number, and then doing the same but extracting the 's in to obtain , then , this is true if we consider generating binary numbers by taking the previous binary numbers and extending them by appending a or a at a specific position to produce to produce binary numbers of one greater length.
The upshot from the above is that we can simply select one of , when we continue our search. With this reduction we can see that the size of our search cuts down by half each time.
We could simply conclude that by the master theorem that our runtime is in , but it more visually pleasing to see a direct calculation. Note that in the first iteration we must go through each binary number in the list, which there are of, thus we take steps, now we throw out half of our space, and do the same, thus this takes iterations, we do this exactly times, which yields the finite summation of
Note that the summation being multiplied by is bounded by
Therefore we can conlcude that our runtime is
def find_missing_binary_number(binary_numbers: List[String], initial_size, search_idx: int, partial_solution: String): length_of_individual_binary_numbers = log_2(initial_size) if search_idx == length_of_individual_binary_numbers - 1: // we have a full number now. return partial_solution num_zeros = 0 num_ones = 0 for i in len(binary_numbers): if bit_search(i, search_idx) == 1: num_ones += 1 else: num_zeros += 1 assert(num_zeros != num_ones) // because there is a single missing one missing_a_zero = num_zeros < num_ones if missing_a_zero: partial_solution += "0" else: partial_solution += "1" binary_numbers_to_search = [] if missing_a_zero: binary_numbers_to_search = filter(lambda b: b[search_idx] == 0, binary_numbers) else: binary_numbers_to_search = filter(lambda b: b[search_idx] == 1, binary_numbers) return find_missing_binary_number(binary_numbers_to_search, search_idx + 1, partial_solution)
The returned number has same string length, but is not equal to any binary number in the input list therefore it is the missing one.