TransWikia.com

Mergesort in python

Code Review Asked by Nishil Patel on October 28, 2020

My mergesort implementation currently takes a very long time, for a list with 1 million elements it takes over 130 seconds to get the list sorted.

  • Could someone kindly have a look at my code and suggest what could I do to improve it?

  • Is there anything particular in my code that is taking significantly long?

Code

def splitlist(L): #splits the list to a list of individual listed elements (e.g. [1,2] to [[1],[2]])
    envelop = lambda x: [x]
    return(list(map(envelop,L)))


def merge(L_1,L_2): #merges two (already sorted) lists to one sorted list
    N = []
    while len(L_1) > 0 and len(L_2) > 0:
        if L_1[0] > L_2[0]:
            N += [L_2.pop(0)]
        else:
            N += [L_1.pop(0)]
    if len(L_1) == 0:
        N += L_2
    else:
        N += L_1
        
    return(N)


#performs one round of pairwise merges (e.g. [[2],[1],[4],[3]] to [[1,2],[3,4]]), or [[5,10],[1,8],[2,3]] to [[1,2,3,5,8,10]])   
def mergelist(L): 
    N = []
    if len(L) % 2 == 0:
        for i in range(0,len(L)//2):
            N += [merge(L[2*i],L[2*i + 1])]
    else:
        for i in range(0,len(L)//2 - 1):
            N += [merge(L[2*i],L[2*i + 1])]
        N += [merge(merge(L[-3],L[-2]),L[-1])]
    
    return(N)

def mergesort(L): #recursively performs mergelist until there is only 1 sorted list remaining
    L = splitlist(L)
    while len(L) > 1:
        L = mergelist(L)
    return(L[0])

Here is my code for generating the million elements:

rlist = random.sample(range(0,2000000),1000000)

One Answer

The pop(0) takes linear time. Do that differently, in O(1) time. The standard way uses index variables. See this question's answers for some more pythonic ways. Or you could merge from right to left, using pop(), and then in the end reverse() the result.

One way to do the latter:

def merge(L1, L2):
    """Merges two (already sorted) lists to one sorted list."""
    N = []
    while L1 and L2:
        L = L1 if L1[-1] > L2[-1] else L2
        N.append(L.pop())
    N.reverse()
    N[:0] = L1 or L2
    return N

Other changes I did and which you can apply in the other parts of your code as well:

  • Removed the underscores from the variables, I think it reads better. I kept them upper case because for L, that's what PEP 8 says. And then I kept N for consistency. Usually I'd use result or maybe merged. Don't know why you chose N. If you have a meaningful word that starts with "n", then I suggest using that.
  • Space between the function parameters.
  • Proper docstring format instead of comment.
  • Replaced len(L_1) > 0 with the normal L1 non-emptiness check.
  • Replaced N += [x] with the normal N.append(x).

Just another way, replacing that one "long" line to determine L with a clearer but slower way:

def merge(L1, L2):
    """Merges two (already sorted) lists to one sorted list."""
    N = []
    def last(L):
        return L[-1]
    while L1 and L2:
        L = max(L2, L1, key=last)
        N.append(L.pop())
    N.reverse()
    N[:0] = L1 or L2
    return N

For some fun, two list comprehension hacks:

def merge(L1, L2):
    """Merges two (already sorted) lists to one sorted list."""
    def end(L):
        return L[-1:]
    return [max(L2, L1, key=end).pop() for _ in L1 + L2][::-1]
def merge(L, R):
    """Merges two (already sorted) lists to one sorted list."""
    return [(R, L)[L[-1:] > R[-1:]].pop() for _ in L + R][::-1]

And I don't want to leave without a much faster way:

def merge(L1, L2):
    """Merges two (already sorted) lists to one sorted list."""
    return sorted(L1 + L2)

That's O(n) because of Timsort. And fast O(n) because of C code. If you think using the mighty sorted inside a mergesort defeats the purpose of writing the mergesort in the first place: Even that can be meaningful, if you're not just doing mergesort. At least once I wrote a mergesort with embedded counting of something, and I indeed used sorted just for the merging. Because that made my solution faster/shorter/simpler.

Even more efficient (both space and time):

def merge(L1, L2):
    """Merges two (already sorted) lists to one sorted list."""
    L1 += L2
    L1.sort()
    return L1

(If L2 can be longer than L1, it might be advantageous to insert L1 into L2 instead.)

Correct answer by superb rain on October 28, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP