Arrays in Python Explained!

If you are getting into programming or learning about Python, this article is for you, arrays are one of the most used data structures. How does Python represent arrays ? How can we manipulate arrays in Python to solve different issues ? This is what we are going to find out in this article.

How Do Computers Store Information?

As you know the most primary form of information in a computer is composed of bits of information. 8 bits of these information are then grouped to form what we call a byte.

Because computers can store very large chunks of these bytes, they need a new mechanism that enables them to quickly read and write bytes, this is where memory addresses come into play, just like your personal home address differentiates where you live from your neighbor, the computer uses memory addresses to differentiate where different pieces of data and information live.

It’s also worth mentioning that the computer takes O(1) access time to reach any memory address, this means accessing byte #5489637 is the same as accessing byte #64.

What Are Arrays?

A common practice in programming languages is to use variables to refer to an address in a memory for example identifier X can refer to byte #32.

In case we want to store multiple memory addresses creating multiple identifiers is not efficient and scalable. We would prefer more if we had one name for the group of identifiers and use numbers to access specific identifiers. Such representation is an array, in other words its a representation of a group of variables in a contiguous portion of the computer’s memory.

Strings in Python for example are stored this way, let’s say you declare a new variable with the text ‘example’ Python will allocate:

2 byte (memory needed to store a Unicode character) * 7 (number of characters in the word example) = 14 bytes of contiguous memory to store this variable.

Referential Arrays

Python uses another type of arrays which are referential arrays, to understand them let’s take a look at this example, let’s say you own a hotel and you have 50 rooms, you need a way to store the name of who is staying in which room so a representation like the following:

to do this Python must respect the condition of contiguous memory space to allocate. But how does it do this when the names are of different lengths (different lengths mean different space requirements) ?

One solution would be finding the length of the biggest string and allocating enough memory space for all but this will not be memory efficient as some space will be wasted and what if a new string comes along which is now the biggest string ?

To solve this Python creates an array of contiguous references to multiple objects, so now even though the size of each individual object may vary the memory used to store it’s memory address is the same. In this way Python guarantees constant access time to different elements of the array. We refer to this type of array as referential array.

To illustrate this take a look at this image:

Lists and Tuples in Python use this type of array to store data.

Note: As referential arrays point to references, be careful when changing reference values as you can also modify values in Lists by mistake. Also when creating copies, make sure you know the difference between Shallow and Deepcopy.

Compact Arrays

Previously we talked about Strings and how we store them in an array of contiguous memory, in fact we call this array a Compact Array.

An advantage of using compact arrays over referential ones is no extra overhead from storing memory references, we directly store the primary data. Another advantage worth mentioning is that in case of using a list which is a referential array, Python has no control on where your data is stored, even though the list maintains references to where the data is. Because of how hardware works, it’s more efficient to have your data close to each other, it might be used for same computation.

Despite these advantages, for Lists and tuples Python uses referential arrays as they are convenient to use. However if you insist on using a compact array, Python provides you with a module called array that allows you to do this:

Here you noticed that we passed i as a first argument to array. Because to create a compact array we need to specify the type for the proper memory allocation. For other data types check this table:

If you want to learn more about array module click here.

One downside to the array module is that it doesn’t support user defined data types. Compact arrays with such data structure can be defined using the module ctypes.

Dynamic Arrays

When creating a compact array, we specify the precise size of that array so that the computer knows what memory to allocate. Because it allocates neighboring memory area to other arrays.

In practice we can’t always know the size of our arrays, they might expand over time. As a remedy to this problem, Python provides as abstraction called Dynamic Array.

This abstraction uses a list instance that maintains an underlying array which has capacity more than the length of the actual list. This extra capacity helps us to add more and more data to the list.

When we eventually exhaust the extra capacity, Dynamic Array requests a new larger array and copies all the elements to the new array and the system reclaims the space from the old array.

Python List is based on this abstraction but Tuples and Strings are immutable (cannot be changed) therefore thus are not based on this.

It’s also worth mentioning that the dynamic array not only expands in size but it can also shrink if we delete elements from the list.

Lists & Tuples

So from the analysis we did in the above section, we can divide Lists & Tuples methods to two types of methods:

Unchanging methods: methods that exist in both tuples and lists and do not change their content. Below is a table of these methods and their time complexity:

Mutating methods: methods that exist only in lists as they change the underlying data and structure of the list. Below is a table of these methods and their time complexity:

We didn’t discuss here what each method does (It’s intuitive and easy)but you can find out more about Python lists and tuples.

Takeaways

Referential arrays store references in their array memory and then access the actual values through the references.
Compact arrays store the actual data in a contiguous memory.
Dynamic arrays are arrays that automatically expand and shrink depending on number of elements in them.
Lists are based on referential and dynamic arrays.
Tuples are based on referential arrays.
String are based on compact arrays.

I hope you enjoyed this article, if you have any question or if I made a mistake feel free to comment down below.

LinkedIn: https://www.linkedin.com/in/mohamed-aziz-belaweid/

GitHub: https://github.com/azayz

Search This Blog

Tech with Aziz