TransWikia.com

The fastest way to find one unique value when all other values are the same

Code Review Asked by Jacques on January 4, 2022

I have completed this question and I am wondering what is the fastest way to solve it.

The question is "There is an array with some numbers. All numbers are equal except for one. Try to find it!"

Example:

find_uniq([ 1, 1, 1, 2, 1, 1 ]) == 2
find_uniq([ 0, 0, 0.55, 0, 0 ]) == 0.55

I came up with the solution:

from collections import Counter

def find_uniq(arr):
    nums = list(Counter(arr).items())
    data = [i for i in nums if i[1] == 1]
    return data[0][0]

I decided on using Counter because I felt comfortable using it but when looking at others answers some use sets and others use counter as well.

I am wondering is my code sufficient and which method to solving this question would lead to the fastest time complexity?

9 Answers

Why do n comparisons when you only need ~n/2? We can compare every pair of elements until we find a non-matching pair, then "short-circuit" and return whichever element is unique.

def find_uniq(arr):
    common = arr[0]
    if common not in arr[1:3]:
        return common
    for a, b in zip(arr[1::2], arr[2::2]):
        if a != b:
            if a == common:
                return b
            else:
                return a
    return arr[-1]

A further improvement would be to use iter to avoid copies of arr being made in the zip statement.

def find_uniq(arr):
    iterator = iter(arr)
    common = next(iterator)
    if common not in arr[1:3]:
        return common
    for a, b in zip(iterator, iterator):
        if a != b:
            if a == common:
                return b
            else:
                return a
    return arr[-1]

Answered by Kyle G on January 4, 2022

First, check that there are, at least, 3 elements otherwise this is undefined!

Personally, I would check the first and second elements:

  1. If different: one of them is the one you are looking for. Compare with third element.
  2. If equal: iterate over all elements until you find it.

This seems to be the most optimal solution:

from collections.abc import Iterable

def unique_different(iterable):
    assert isinstance(iterable, Iterable)
    assert len(iterable) > 2
    if iterable[0] != iterable[1]:
        return iterable[0] if iterable[1] == iterable[2] else iterable[1]
    else
        for element in iterable[2:]:
            if element != iterable[1]:
                return element
```

Answered by Oussama Ennafii on January 4, 2022

This is my first time posting here, so please let me know if there are any conventions I'm missing.

Here is my solution, which doesn't need to traverse the entire array except by using the built-in sum() function:

def find_uniq(listToSearch):
    if len(listToSearch) < 3:
        return 'Cannot have one unique value unless there are at least three values.'
    
    #out of three values, minimum of two must be the same
    if listToSearch[0] == listToSearch[1]:
        commonValue = listToSearch[0]
    elif listToSearch[0] == listToSearch[2]:
        commonValue = listToSearch[0]
    elif listToSearch[1] == listToSearch[2]:
        commonValue = listToSearch[1]
    else:
        return 'Array has more than one unique value'
    
    numberOfCommonItems = len(listToSearch) - 1;
    uniqueValue = sum(listToSearch) - numberOfCommonItems * commonValue
    return uniqueValue

These are the test cases I've tried:

find_uniq([ 1, 1, 1, 2, 1, 1 ])
find_uniq([ 0, 0, 0.55, 0, 0 ])
find_uniq([ 0, 0, -0.55, 0, 0 ])
find_uniq[ 1, 1.0, 1, 2, 1, 1 ])

And these are the outputs:

2
0.55
-0.55
2.0

This solution is O(n) as it only has to perform one extra addition per extra element of the array. Besides that, assuming the data format is valid, there are a maximum of four if statements, one multiplication operation and one subtraction operation.

Answered by Mehmet on January 4, 2022

Benchmarks!

Benchmarks for lists with a thousand or a million elements, with the unique element in the middle of the array to reflect the "typical"/"average" case. The results are times, so lower=faster.

n=1000
0.90 find_uniq_Jacques
1.18 find_uniq_tinstaafl_1
0.59 find_uniq_tinstaafl_2
0.88 find_uniq_GZ0_1
0.14 find_uniq_GZ0_2
0.88 find_uniq_Peilonrayz
0.22 find_uniq_RootTwo
0.26 find_uniq_HeapOverflow_1
0.28 find_uniq_HeapOverflow_2
0.26 find_uniq_HeapOverflow_3
0.09 find_uniq_HeapOverFlow_Codewars
0.06 find_uniq_HeapOverflow_GZ0
0.57 unique_different_ethiy
0.28 find_uniq_KyleG_1
0.25 find_uniq_KyleG_2

n=1000000
0.94 find_uniq_Jacques
1.36 find_uniq_tinstaafl_1
0.68 find_uniq_tinstaafl_2
0.99 find_uniq_GZ0_1
0.19 find_uniq_GZ0_2
0.98 find_uniq_Peilonrayz
0.19 find_uniq_RootTwo
0.23 find_uniq_HeapOverflow_1
0.26 find_uniq_HeapOverflow_2
0.25 find_uniq_HeapOverflow_3
0.09 find_uniq_HeapOverFlow_Codewars
0.04 find_uniq_HeapOverflow_GZ0
0.57 unique_different_ethiy
0.28 find_uniq_KyleG_1
0.22 find_uniq_KyleG_2

Done with Python 3.8.1 32 bit on Windows 10 64 bit.

Benchmark code:

from timeit import timeit
from collections import Counter
from itertools import groupby

solutions = []
def register(solution):
    solutions.append(solution)
    return solution

@register
def find_uniq_Jacques(arr):
    nums = list(Counter(arr).items())
    data = [i for i in nums if i[1] == 1]
    return data[0][0]

@register
def find_uniq_tinstaafl_1(arr):
    for i in range(len(arr)-1):
        if arr[i] != arr[i+1]:
            if i == 0 and arr[i] != arr[i + 2]:
                return arr[i]
            return arr[i + 1]

@register
def find_uniq_tinstaafl_2(arr):
    for i in range(0,len(arr) - 1, 2):
        if arr[i] != arr[i+1]:
            if i == 0:
                if arr[i] != arr[i + 2]:
                    return arr[i]
                return arr[i + 1]
            else:
                if arr[i] != arr[i-1]:
                    return arr[i]
                return arr[i + 1]
    return arr[-1]

@register
def find_uniq_GZ0_1(arr):
    return next(k for k, freq in Counter(arr).items() if freq == 1)

@register
def find_uniq_GZ0_2(arr):
    group_iter = groupby(arr)
    k1, g1 = next(group_iter)
    c1 = len(list(g1))
    k2, g2 = next(group_iter)
    if c1 > 1:
       # Group g1 has more than one element
       return k2
    try:
       # Group g2 has more than one element
       next(g2)
       next(g2)
       return k1
    except StopIteration:
       # Both g1 and g2 has one element
       return k2 if next(group_iter)[0] == k1 else k1

@register
def find_uniq_Peilonrayz(arr):
    return Counter(arr).most_common()[-1][0]

@register
def find_uniq_RootTwo(arr):
    a, b = set(arr)
    return a if arr[:3].count(a) < 2 else b

@register
def find_uniq_HeapOverflow_1(arr):
    a = arr[0]
    if a not in arr[1:3]:
        return a
    for b in arr:
        if b != a:
            return b

@register
def find_uniq_HeapOverflow_2(arr):
    dupe = sorted(arr[:3])[1]
    for x in arr:
        if x != dupe:
            return x

@register
def find_uniq_HeapOverflow_3(arr):
    a = arr[0]
    for b in arr:
        if b != a:
            return b if a in arr[1:3] else a

@register
def find_uniq_HeapOverFlow_Codewars(arr):
    arr.sort()
    return arr[-(arr[0] == arr[1])]

@register
def find_uniq_HeapOverflow_GZ0(arr):
    group_iter = groupby(arr)
    k1, _ = next(group_iter)
    k2, g2 = next(group_iter)
    next(g2)
    return k1 if k2 in g2 else k2

@register
def unique_different_ethiy(iterable):
    # assert isinstance(iterable, Iterable)
    # assert len(iterable) > 2
    if iterable[0] != iterable[1]:
        return iterable[0] if iterable[1] == iterable[2] else iterable[1]
    else:
        for element in iterable[2:]:
            if element != iterable[1]:
                return element

@register
def find_uniq_KyleG_1(arr):
    common = arr[0]
    if common not in arr[1:3]:
        return common
    for a, b in zip(arr[1::2], arr[2::2]):
        if a != b:
            if a == common:
                return b
            else:
                return a
    return arr[-1]

@register
def find_uniq_KyleG_2(arr):
    iterator = iter(arr)
    common = next(iterator)
    if common not in arr[1:3]:
        return common
    for a, b in zip(iterator, iterator):
        if a != b:
            if a == common:
                return b
            else:
                return a
    return arr[-1]

# Run the benchmarks
for e in 3, 6:
    n = 10**e
    number = 10**(7 - e)  # fewer number of runs for larger n
    print(f'{n=}')
    arr = [0] * n
    arr[n // 2] = 1

    # Repeat round-robin to reduce effects of CPU speed changes etc
    timeses = [[] for _ in solutions]
    for i in range(20):
        for solution, times in zip(solutions, timeses):
            arrs = iter([arr[:] for _ in range(number)])
            t = timeit(lambda: solution(next(arrs)), number=number)
            times.append(t)
        print(i, end=' ')
    print()
    for solution, times in zip(solutions, timeses):
        print('%.2f' % min(times), solution.__name__)
    print()

Answered by Kelly Bundy on January 4, 2022

Another one going only as far as necessary, with O(1) to check whether the first value is the outlier and otherwise simple O(n) to search for the outlier.

def find_uniq(arr):
    a = arr[0]
    if a not in arr[1:3]:
        return a
    for b in arr:
        if b != a:
            return b

Slight variation, getting the duplicate value from the first three and then searching the non-dupe:

def find_uniq(arr):
    dupe = sorted(arr[:3])[1]
    for x in arr:
        if x != dupe:
            return x

Another variation, finding a difference pair first:

def find_uniq(arr):
    a = arr[0]
    for b in arr:
        if b != a:
            return b if a in arr[1:3] else a

Optimized version of this, also O(n) because, you know, Timsort:

def find_uniq(arr):
    arr.sort()
    return arr[-(arr[0] == arr[1])]

Optimized version of GZ0's groupby solution, faster and taking only O(1) space:

def find_uniq(arr):
    group_iter = groupby(arr)
    k1, _ = next(group_iter)
    k2, g2 = next(group_iter)
    next(g2)
    return k1 if k2 in g2 else k2

Answered by Kelly Bundy on January 4, 2022

No matter how the array is traversed, the distinguished element can occur at the end of the traversal. Therefore, it is necessary to go through the entire array in the worst case and there does not exist an algorithm that can have a better worst-case time complexity than $n$. However, in practise, the actual runtime of your implementation can be improved, as well as the average-case time complexity.

Firstly, your solution converts the key-value pairs of Counter(arr) into a list. Assuming the input is well-formed, this conversion is unnecessary since it is sufficient to return the first key that has a corresponding frequency value of 1. The improved implementation is as follows:

def find_uniq(arr):
    return next(k for k, freq in Counter(arr).items() if freq == 1)

Secondly, creating a Counter requires going through the entire input array. In most cases, this can be avoided by returning the distinguished element once it is found, as mentioned in the previous answer. This approach improves the average-case time complexity by a constant factor of 2. Note that if the time complexity is described using the $O(cdot)$ and $Theta(cdot)$ notations there is no difference, since these notations only characterize the asymptotic order of growth of runtime given the input size. More explanations can be found here.

A Python-specific efficient implementation of this improved approach is to use the itertools.groupby function, as shown in the following. It avoids an explicit for-loop in Python, which is typically slower than an implicit-loop-based implementation, such as Counter(arr).

from itertools import groupby

def find_uniq(arr):
    group_iter = groupby(arr)
    k1, g1 = next(group_iter)
    c1 = len(list(g1))
    k2, g2 = next(group_iter)
    if c1 > 1:
       # Group g1 has more than one element
       return k2
    try:
       # Group g2 has more than one element
       next(g2)
       next(g2)
       return k1
    except StopIteration:
       # Both g1 and g2 has one element
       return k2 if next(group_iter)[0] == k1 else k1

Update: @HeapOverflow provides an improved version of this implementation in his answer.

Answered by GZ0 on January 4, 2022

One of things about the solutions presented so far, is they all require iterating over all the elements at least once.

Using an iterative approach allows you to short circuit the loop when the unique item is found. something like this would work:

def find_uniq(arr):
    for i in range(len(arr)-1):
        if arr[i] != arr[i+1]:
            if i == 0 and arr[i] != arr[i + 2]:
                return arr[i]
            return arr[i + 1]]

Did some thinking and came up with an optimization which improves the time considerably:

def find_uniq(arr):
    for i in range(0,len(arr) - 1, 2):
        if arr[i] != arr[i+1]:
            if i == 0:
                if arr[i] != arr[i + 2]:
                    return arr[i]
                return arr[i + 1]
            else:
                if arr[i] != arr[i-1]:
                    return arr[i]
                return arr[i + 1]
    return arr[-1] 

The complexity of this in the worst case is O(n) the lengthy of the array - 1.

Answered by user33306 on January 4, 2022

A Counter is basically a "multiset". The question doesn't ask for a count of the numbers, so counting them may be extra overhead. Here's an possible set implementation:

def find_uniq(arr):
    a, b = set(arr)
    return a if arr[:3].count(a) < 2 else b

Both implementations pass through the list once, so they are O(n) time complexity. Your list comprehension, my .count(a), and @Peilonrays' .most_common() are insignificant for large n.

Answered by RootTwo on January 4, 2022

You can use .most_common to remove the need for the list comprehension. This makes the code easier to read. You will still need to use [0] as it will return a tuple of the key and the value.

def find_uniq(arr):
    return Counter(arr).most_common()[-1][0]

Answered by Peilonrayz on January 4, 2022

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP