Data Structures in Python.

When we code in Python usually there are multiple data structures such as list, tuple, set, and dict that we mostly use. But besides these four, Python has other data structures that each of them like those four has some strengths and weaknesses based on their implementation and structure that we can use, and it’s better to know them because sometimes it might help and have a better performance and features based on your problem than list, tuple, set, and dict.

Data structures in Python overview image

Notes:

  • Container Sequences are sequences that can hold different types and can be nested.
  • Flat Sequences are sequences that only can hold one simple type.

I’ll try to explain the above diagram from left to right. Also, I’m not going through some data structures that you are already familiar with them.

collections.deque

A List in Python is a dynamic array that can contain multiple data types and, you can easily use a Python list as a stack using .append() and .pop() methods and have a good performance. But, when it comes to .insert() (add an item to the beginning of a list) or .pop() from the beginning of a list, it’s not efficient because all other items should be shifted by one. So, it’s recommended to use collections.deque for a First-in, First-out queue.

from collections import deque

queue = deque([1, 2, 3]) # queue = [1, 2, 3]
queue.append(4) # queue = [1, 2, 3, 4]
queue.popleft() # queue = [2, 3, 4]

array.array

Python’s array type is a sequence type that only can hold basic types like characters, integers, and float, and it’s very efficient when you hold a large number of items in it. array has other methods like .frombytes and .tofile besides .append, .pop, and .extend that you have in list.

from array import array
numbers = array('l', [1, 2, 3, 4, 5]) # creating an array of 5 [type] items

See Python documentation: array.

set and frozenset

set is an unordered data structure in Python that can hold non-duplicate hashable items. Hashable are types or classes that implemented the __hash__ method.

from collections import abc
# Test hashable
issubclass(int, abc.Hashable) # True
issubclass(list, abc.Hashable) # False

# define set using set() and {}
cities = set(['Berlin', 'Amsterdam', 'London', 'Amsterdam']) # {'Amsterdam', 'Berlin', 'London'}

cities = {'Berlin', 'Amsterdam', 'London', 'Amsterdam'{ # {'Amsterdam', 'Berlin', 'London'}

cities.add('London') # {'Amsterdam', 'Berlin', 'London'}
cities.add('Rome') # {'Amsterdam', 'Berlin', 'Rome', 'London'}

Also, there is an immutable version of set called frozenset that you can use.

cities = frozenset(['Berlin', 'Amsterdam', 'London', 'Amsterdam']) # {'Amsterdam', 'Berlin', 'London'}

See Python documentation: set and fronzenset.

collections.OrderedDict

After the release of Python 3.6 dict is ordered based on the insertion of items and you don’t need collections.OrderedDict unless you want your code to be compatible with Python < 3.6!

collections.Counter

Counter is a subclass of dict for hashable items in Python that you can use to count the number of items using it.

from collections import Counter

repeat_count = Counter(['Berlin', 'Amsterdam', 'London', 'Amsterdam'])
# Counter({'Amsterdam': 2, 'Berlin': 1, 'London': 1})

See Python documentation: Counter.

collections.ChainMap

By using ChainMap you can group multiple mappings together and create an updatable view of them.

from collections import ChainMap

base = {'python': '3.10', 'year': '2021'}
new_update = {'python': '3.11', 'year': '2022', 'author': 'Guido van Rossum'}

python = ChainMap(base, new_update)
# ChainMap({'python': '3.10', 'year': '2021'}, {'python': '3.11', 'year': '2022', 'author': 'Guido van Rossum'})

list(python) # ['python', 'year', 'author']

You can think of using ** to join multiple dict together instead of using ChainMap but keep in mind that ChainMap incorporates the underlying mappings by reference so if you update your dict, your ChainMap view is getting updated too.

a = {**base, **new_update} # {'python': '3.11', 'year': '2022', 'author': 'Guido van Rossum'}

new_update['python'] = '3.12'

a # {'python': '3.11', 'year': '2022', 'author': 'Guido van Rossum'}

python # ChainMap({'python': '3.10', 'year': '2021'}, {'python': '3.12', 'year': '2022', 'author': 'Guido van Rossum'})

See Python documentation: ChainMap.

Different kinds of Queue

In the queue library of Python, you can find multiple implementations of queue that I give you a brief explanation of them here.

Queue

An implementation of a FIFO(first-in, first-out) queue in Python. Queue takes a maxsize argument which is the size of the queue and if the number of items in the queue reaches this number, your queue is blocked until you consume items from it.

from queue import Queue

q = Queue(1)
q.put('first')
q.put_nowait('second') # does not wait until get an item and raise an error if queue is full, which in this case queue is full and raise error
q.get() # 'first'

LifoQueue

LifoQueue is a subclass of Queue that implements a LIFO(last-in, first-out) queue.

PriorityQueue

PriorityQueue is a subclass of Queue that returns items in the queue based on a Priority of smallest to largest.

from queue import PriorityQueue

q = PriorityQueue(3)
q.put(30)
q.put(14)
q.put(21)
q.get() # 14
q.get() # 21
q.get() # 30

SimpleQueue

SimpleQueue is an implementation of the FIFO(first-in, first-out) queue that is unbounded and added in Python 3.7. Note that Python uses collections.deque under the hood in SimpleQueue.

See Python documentation: queue.

comments powered by Disqus