TransWikia.com

Checking if an integer is a palindrome using either a string or a dict

Code Review Asked on October 27, 2021

I have solved the palindrome question on LeetCode today, but I wasn’t happy with the original solution – I used strings, it worked slowly and used a lot of memory.

So I rewrote the answer using a dictionary approach instead knowing that looking up a value in a dictionary has lower time complexity than searching through a list, but the results were strange – my latest answer is much better in terms of memory use (better than 94.65% of other submissions), but it became significantly slower. As you can see in the image below, my last submission is much slower than the first, but memory use has improved a lot:

enter image description here

My original solution was this:

class Solution:
    def isPalindrome(self, x: int) -> bool:        
        s = str(x)
        
        if len(s) == 1:
            palindrome = True
            
        i = int(len(s)/2)
        
        for p in range(i):
            if s[p] != s[-(p+1)]:
                return False
            else:
                palindrome = True
        
        return palindrome 

And the updated version is:

class Solution:
    def isPalindrome(self, x: int) -> bool:        
        s = {i:j for i,j in enumerate(str(x))} 
        i = (max(s)+1)//2
    
        if max(s)+1 == 1:
            return True
        
        for p in range(i):
            if s[p] != s[max(s)-p]:
                return False
            else:
                palindrome = True
        
        return palindrome

Can anyone explain why the latest version is so much slower? I thought it should be faster…

What should I change to improve the speed?

EDIT:
New version based on @hjpotter comment. Now I only have 1 call to max() in the entire script and yet it’s even worse – 168ms and 13.9 Mb. I don’t understand what’s happening here:

class Solution:
    def isPalindrome(self, x: int) -> bool:
        s = {i:j for i,j in enumerate(str(x))} 
        m = max(s)
        i = (m+1)//2
    
        if m+1 == 1:
            return True
        
        for p in range(i):
            if s[p] != s[m-p]:
                return False
            else:
                palindrome = True
        
        return palindrome

6 Answers

Lol you believe LeetCode's timing is proper? You're so cute.

If you actually want to see speed differences between those solutions, with LeetCode's test cases and on LeetCode, I suggest running the solution 100 times on each input:

class Solution:
    
    def isPalindrome(self, x: int) -> bool:        
        for _ in range(100):
            result = self.real_isPalindrome(x)
        return result
    
    def real_isPalindrome(self, x: int) -> bool:        
        # Your real solution here

And then you should still submit several times and take the average or so, as times for the exact same code can differ a lot depending on server load.

I did that with your three solutions and the times were about 1200 ms, 4300 ms, and 2600 ms. Since that did 100 times the normal work, the actual times of the solutions were about 12 ms, 43 ms, and 26 ms. The difference between that and the time shown for a regular submission of a solution is the judge overhead, which LeetCode brilliantly includes in the time.

I also submitted your first solution normally a few times, got 60 ms to 72 ms. So if 100 runs of your solution take 1200 ms and one run takes 66 ms, then you have:

JudgeOverhead + 100 * Solution = 1200 ms
JudgeOverhead + Solution = 66 ms

Subtract the second equation from the first:
=> 99 * Solution = 1134 ms

Divide by 99:
=> Solution = 11.45 ms

And:
=> JudgeOverhead = 66 ms - 11.45 ms = 54.55 ms

Oh and the ordinary

    def isPalindrome(self, x: int) -> bool:
        s = str(x)
        return s == s[::-1]

solution got about 420 ms with the 100-times trick. Subtract 55 ms judge overhead and divide by 100 and you're at 3.65 ms for the solution. So in that case, the judge overhead absolutely dwarfs the actual solution time, and together with the time variance, the shown time for a normal single-run submission is absolutely meaningless.

Answered by Kelly Bundy on October 27, 2021

You're mixing up searching and accessing. If you're looking for the value for a index/key, that's accessing. If you're looking for a particular value, without knowing its index/key, that's searching. Accessing is O(1) for both lists and dictionaries[1][2]. While they are the same time complexity, using a dictionary can take longer, due to the overhead of hashing, and the fact that a hash table takes up more space than list. Under the hood, a dictionary is an array with the indices being the hash values of the keys, and the entries being the values for the indices that are the hag value of a key, and null for indices that aren't. When you access a value in a dictionary, it takes the key you give it, hashes it, then looks at the list for the entry there.

[1] Accessing is treated as being constant time, but that is somewhat of a simplification. Larger data sets do have longer access times. For one thing, if the data become large enough, it'll have to be moved out of active memory, so strictly speaking the time complexity is a non-constant function of the size of the list.

Answered by Acccumulation on October 27, 2021

Apart from performance, there are several other problems in your implementation which the other answers did not address.

  1. This should not be a class. It should be a function, because it being a class offers no advantage whatsoever and it makes its usage more complex.
  2. The variable palindrome is unnecessary and confuses the reader: this variable is always going to be True. So remove it.
  3. Don’t cast the result of a division to int. Use integer division instead.
  4. The if len(s) == 1 check is redundant: its logic is already subsumed in the subsequent code.
  5. Use descriptive variable names: s for an arbitrary string is fine but i to denote the middle index of a sequence is cryptic.

An improved implementation (without changing the algorithm) would look as follows:

def is_palindrome(n: int) -> bool:
    s = str(n)
    mid = len(s) // 2
    for i in range(mid):
        if s[i] != s[-i - 1]:
             return False
    return True

Or, using a generator expression:

def is_palindrome(n: int) -> bool:
    s = str(n)
    mid = len(s) // 2
    return all(s[i] == s[-i - 1] for i in range(mid))

… and of course this whole loop could also be replaced with return s[:mid] == s[: -mid - 1 : -1] because Python gives us powerful subsetting.


I also want to address a misconception in your question:

looking up a value in a dictionary has lower time complexity than searching through a list,

That’s only true if the dictionary has the same size as the sequence (more or less …), and if looking up the key is constant. Neither is true in your case. But what’s more, you still need to construct the list/dictionary, and you completely omit the complexity of these operations.

In fact, the first thing your dictionary does is also construct the sequence — namely, by invoking str(x). It then goes on to traverse the whole list! Because that’s what i:j for i,j in enumerate(str(x)) does.

So before even starting your algorithm proper, it has already performed more work than your other implementation which, after all, only needs to traverse half the string. Even if you stopped at this point and returned, the second implementation would already be slower than the first, and you haven’t even finished constructing the dictionary yet (constructing the dictionary from key: value mappings also takes time).

As for the actual algorithm, you’re still traversing half the string, so once again you are doing the same work as your initial algorithm. This means you have the same asymptotic complexity. But even that part of the second algorithm is slower, because a dictionary is a much more complex data structure than a plain sequence, and each key access of the dictionary (s[p]) is substantially slower than the same access on a string or list would be.

Answered by Konrad Rudolph on October 27, 2021

Dictionary key access is slower than array index access. Both is in O(1), but the constant time is higher for the dictionary access.

Is there actually a difference?

I'm not sure how leetcode runs the code, so let's make a proper speed analysis:

enter image description here

generated by

# Core Library modules
import operator
import random
import timeit

# Third party modules
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns


def is_palindrome_str(x: int) -> bool:
    s = str(x)

    if len(s) == 1:
        palindrome = True

    i = int(len(s) / 2)

    for p in range(i):
        if s[p] != s[-(p + 1)]:
            return False
        else:
            palindrome = True

    return palindrome


def is_palindrome_dict(x: int) -> bool:
    s = {i: j for i, j in enumerate(str(x))}
    i = (max(s) + 1) // 2

    if max(s) + 1 == 1:
        return True

    for p in range(i):
        if s[p] != s[max(s) - p]:
            return False
        else:
            palindrome = True

    return palindrome


def generate_palindrome(nb_chars):
    chars = "abcdefghijklmnopqrstuvwxyz"
    prefix = "".join([random.choice(chars) for _ in range(nb_chars)])
    return prefix + prefix[::-1]


def main():
    text = generate_palindrome(nb_chars=1000)
    functions = [
        (is_palindrome_str, "is_palindrome_str"),
        (is_palindrome_dict, "is_palindrome_dict"),
    ]
    functions = functions[::-1]
    duration_list = {}
    for func, name in functions:
        durations = timeit.repeat(lambda: func(text), repeat=1000, number=3)
        duration_list[name] = durations
        print(
            "{func:<20}: "
            "min: {min:0.3f}s, mean: {mean:0.3f}s, max: {max:0.3f}s".format(
                func=name,
                min=min(durations),
                mean=np.mean(durations),
                max=max(durations),
            )
        )
        create_boxplot(duration_list)


def create_boxplot(duration_list):
    plt.figure(num=None, figsize=(8, 4), dpi=300, facecolor="w", edgecolor="k")
    sns.set(style="whitegrid")
    sorted_keys, sorted_vals = zip(
        *sorted(duration_list.items(), key=operator.itemgetter(1))
    )
    flierprops = dict(markerfacecolor="0.75", markersize=1, linestyle="none")
    ax = sns.boxplot(data=sorted_vals, width=0.3, orient="h", flierprops=flierprops,)
    ax.set(xlabel="Time in s", ylabel="")
    plt.yticks(plt.yticks()[0], sorted_keys)
    plt.tight_layout()
    plt.savefig("output.png")


if __name__ == "__main__":
    main()

There clearly is a difference. And that is for a palindrome. My initial thought was that the generation of the dict is overhead. If you have a long string which clearly is no palindrome (e.g. already clear from the first / last char), then the first approach is obviously faster.

Answered by Martin Thoma on October 27, 2021

So I rewrote the answer using a dictionary approach instead knowing that looking up a value in a dictionary has lower time complexity than searching through a list

In this case a dictionary is slower than a list-like indexing. A string / list provides direct access to its elements based on indices, while a dictionary -- which is normally implemented as a hash table -- needs more computation such as computing the hash values of keys.

for p in range(i):
    if s[p] != s[max(s)-p]:
        return False
    else:
        palindrome = True

Here, max is called repeatedly -- and unnecessarily -- in the loop. There are two problems:

  1. The value max(s) is unchanged so it should be computed only once outside the loop and the result can be stored and accessed inside the loop.
  2. max(s) requires comparing all keys of s. In this case, using len(s) - 1 yields the same value and it is faster.

There are some Python-specific improvements that could speed up the program. For example, the following implementations should be faster in most cases because they replace explicit for loops in Python with implicit ones, which are implemented in lower-level, interpreter-dependent languages (such as C) and therefore are much faster:

Solution 1:

class Solution:
    def isPalindrome(self, x: int) -> bool:
        s = str(x)
        return s == s[::-1]

Solution 2:

class Solution:
    def isPalindrome(self, x: int) -> bool:
        s = str(x)
        mid = len(s) // 2
        return s[:mid] == s[:-mid-1:-1]

Answered by GZ0 on October 27, 2021

Time complexity of Orignal Solution u made : O(i)--only using a for loop. Time complexity of your new solution Containing enum: as you are using enum it takes time O(n) and then u are running a for loop O(i),which you can see in your run time.some overhead from enum can be seen. Note:there comes time with using function (everything is not ultra fast or optimal).

Answered by Mayank Joshi on October 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP