# Mergesort in python

Code Review Asked by Nishil Patel on October 28, 2020

My mergesort implementation currently takes a very long time, for a list with 1 million elements it takes over 130 seconds to get the list sorted.

• Could someone kindly have a look at my code and suggest what could I do to improve it?

• Is there anything particular in my code that is taking significantly long?

### Code

def splitlist(L): #splits the list to a list of individual listed elements (e.g. [1,2] to [[1],[2]])
envelop = lambda x: [x]
return(list(map(envelop,L)))

def merge(L_1,L_2): #merges two (already sorted) lists to one sorted list
N = []
while len(L_1) > 0 and len(L_2) > 0:
if L_1[0] > L_2[0]:
N += [L_2.pop(0)]
else:
N += [L_1.pop(0)]
if len(L_1) == 0:
N += L_2
else:
N += L_1

return(N)

#performs one round of pairwise merges (e.g. [[2],[1],[4],[3]] to [[1,2],[3,4]]), or [[5,10],[1,8],[2,3]] to [[1,2,3,5,8,10]])
def mergelist(L):
N = []
if len(L) % 2 == 0:
for i in range(0,len(L)//2):
N += [merge(L[2*i],L[2*i + 1])]
else:
for i in range(0,len(L)//2 - 1):
N += [merge(L[2*i],L[2*i + 1])]
N += [merge(merge(L[-3],L[-2]),L[-1])]

return(N)

def mergesort(L): #recursively performs mergelist until there is only 1 sorted list remaining
L = splitlist(L)
while len(L) > 1:
L = mergelist(L)
return(L[0])


Here is my code for generating the million elements:

rlist = random.sample(range(0,2000000),1000000)


The pop(0) takes linear time. Do that differently, in O(1) time. The standard way uses index variables. See this question's answers for some more pythonic ways. Or you could merge from right to left, using pop(), and then in the end reverse() the result.

One way to do the latter:

def merge(L1, L2):
"""Merges two (already sorted) lists to one sorted list."""
N = []
while L1 and L2:
L = L1 if L1[-1] > L2[-1] else L2
N.append(L.pop())
N.reverse()
N[:0] = L1 or L2
return N


Other changes I did and which you can apply in the other parts of your code as well:

• Removed the underscores from the variables, I think it reads better. I kept them upper case because for L, that's what PEP 8 says. And then I kept N for consistency. Usually I'd use result or maybe merged. Don't know why you chose N. If you have a meaningful word that starts with "n", then I suggest using that.
• Space between the function parameters.
• Proper docstring format instead of comment.
• Replaced len(L_1) > 0 with the normal L1 non-emptiness check.
• Replaced N += [x] with the normal N.append(x).

Just another way, replacing that one "long" line to determine L with a clearer but slower way:

def merge(L1, L2):
"""Merges two (already sorted) lists to one sorted list."""
N = []
def last(L):
return L[-1]
while L1 and L2:
L = max(L2, L1, key=last)
N.append(L.pop())
N.reverse()
N[:0] = L1 or L2
return N


For some fun, two list comprehension hacks:

def merge(L1, L2):
"""Merges two (already sorted) lists to one sorted list."""
def end(L):
return L[-1:]
return [max(L2, L1, key=end).pop() for _ in L1 + L2][::-1]

def merge(L, R):
"""Merges two (already sorted) lists to one sorted list."""
return [(R, L)[L[-1:] > R[-1:]].pop() for _ in L + R][::-1]


And I don't want to leave without a much faster way:

def merge(L1, L2):
"""Merges two (already sorted) lists to one sorted list."""
return sorted(L1 + L2)


That's O(n) because of Timsort. And fast O(n) because of C code. If you think using the mighty sorted inside a mergesort defeats the purpose of writing the mergesort in the first place: Even that can be meaningful, if you're not just doing mergesort. At least once I wrote a mergesort with embedded counting of something, and I indeed used sorted just for the merging. Because that made my solution faster/shorter/simpler.

Even more efficient (both space and time):

def merge(L1, L2):
"""Merges two (already sorted) lists to one sorted list."""
L1 += L2
L1.sort()
return L1


(If L2 can be longer than L1, it might be advantageous to insert L1 into L2 instead.)

Correct answer by superb rain on October 28, 2020

## Related Questions

### CSS Alphabetizer

2  Asked on December 2, 2021

### Hacking Minigame

1  Asked on December 2, 2021 by aguywhocodes

### config files for different development environments (Production, Dev, Testing)

1  Asked on November 30, 2021 by moshevi

### Finding the position in a triangle for the given challenge

2  Asked on November 28, 2021

### Parse a simple key=value config file in C

6  Asked on November 28, 2021 by sedmaister

### Powershell script to verify Linux-generated md5sum file

1  Asked on November 28, 2021

### Battleship game in C

1  Asked on November 25, 2021 by mrplow

### QTableView implementation

1  Asked on November 25, 2021 by 19172281

### A Level Statistics Calculator/Helper For A Casio Fx-CG50 Calculator’s MicroPython

1  Asked on November 25, 2021 by user227128

### upload multiple images and resize

1  Asked on November 25, 2021 by retqio

### RFC 1951 “Inflate” (de-compression)

0  Asked on November 25, 2021 by george-barwood

### Generalised NxN Sudoku solver using heap

1  Asked on November 25, 2021 by srt1104

### Determine the height of a Tetris game’s board after a sequence of moves

3  Asked on November 25, 2021 by meowlicious

### Created a paperfold like effect

0  Asked on November 25, 2021 by nico-shultz

### (Improved) get-release npm module

1  Asked on November 25, 2021

### Define cd so that it changes a variable (as it does already with PWD, OLDPWD, …)

0  Asked on November 21, 2021

### Simple Flask REST API to a MongoDB

0  Asked on November 21, 2021 by amos-baron

### Hit the target number

0  Asked on November 19, 2021 by mixopteryx

### Pizza Order System

1  Asked on November 19, 2021 by chris-jaurigue