TransWikia.com

How does python implement the indexing of strings?

Stack Overflow Asked on December 9, 2021

The Data Structures and Algorithms written by Goodrich says that the python array is to store a group of related variables one after another in a continuous area of ​​the computer memory, so the index can be accessed directly by calculating the address.For example,if the memory address of the first element of the array is 2146, and each element occupies two bytes of memory, then the memory address of the sixth element is 2146+2*5=2156, so the computer can directly access address 2156 to get the sixth elements.

But I tried to verify it,only to found that the results didn’t accord with the theory.

str1 = "example"
for i in range(1,6):
    print(id(str1[i])-id(str1[i-1]))

The output is as follows

-336384
471680
-492352
313664
178944

Why does this happen, if the memory address is not continuous, how does python get its memory address through index and then access the element?

2 Answers

The closest thing to a "Python array" I know of is a numpy.array, and indeed:


In [1]: import numpy as np                                                                                                                                                                       

In [2]: a = np.array([12, 4, 120, 24, 3, 0, 13, 13], dtype='int8')                                                                                                                               

In [3]: asint64 = a.view('int64')[0]                                                                                                                                                             

In [4]: for i in range(8): 
   ...:     print(asint64 % 2**(8*(i+1)) // 2**(8*(i))) 
   ...:                                                                                                                                                                                          
12
4
120
24
3
0
13
13


What is happening here is that you are first building an array of 8 numbers using each 8 bits; when you later ask numpy to consider them as a single 64 bit number, you get that it is composed by the 8 bit representation of each of the 8 numbers, juxtaposed. So indeed the original 8 integers were adiacent in memory.

In general, asking Python "tell me what is at this arbitrary memory position", or "tell me where exactly in memory is this item of a string or array" is... slightly less straigthforward.

EDIT: ... it is slightly less straightforward, but at least it alleviates any suspect that a.view is doing strange things, so here we are loooking at the exact positions in memory of sub-arrays of our array:

In [5]: for i in range(8): 
   ...:     print(a[i:].__array_interface__['data'][0]) 
   ...:                                                                                                                                                                                          
45993728
45993729
45993730
45993731
45993732
45993733
45993734
45993735

(as long as you trust .__array_interface__['data'][0] not do do strange things!)

Answered by Pietro Battiston on December 9, 2021

As far as I understand, Python (or at least CPython) implements lists as dynamic arrays.

It might be worth checking out this article for a quick intro to the concept of dynamic arrays if you are unfamiliar.

However, the way that these dynamic arrays are implemented in Python is different from a "standard implementation" of the data structure - see below quote from the Wikipedia page on dynamic arrays.

... in languages like Python or Java that enforce reference semantics, the dynamic array generally will not store the actual data, but rather it will store references to the data that resides in other areas of memory. In this case, accessing items in the array sequentially will actually involve accessing multiple non-contiguous areas of memory, so the many advantages of the cache-friendliness of this data structure are lost. - Wikipedia

I have been unable to find any supporting articles or documentation around this at first glance, however I would think that strings are implemented in a similar way to how lists are within Python - with some major modifications but similar underlying logic in parts.

Strings being implemented in such a way would neatly explain why you are seeing sub-strings (Python does not have an explicit char class, a char variable is simply a string with length 1 as far as Python is concerned) within a string at non-contiguous locations within memory at runtime.

Answered by JPI93 on December 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP