Not many people have heard of Intering in Python. You might be one them. If so, you are not alone.

Intering is the way Python optimizes the memory used in Python. It does this by sharing the reference to common literals, numeric or string, used by more than one variable.

Let’s dig a bit deeper.

Variables

Before we start digging into it, let’s recap what a variable is.

A variable is a pointer to a space in memory where some object is stored. For instance, if you assign literal integer to a variable, it will simply point to the space in memory where that integer is saved.

When you declare a variable x = 5 , it doesn’t actually mean that x is literally 5 . What happens is that the value of x is the address of where the number 5 is stored.

If you write the following code,

x, y = 5, 500
# hex(id(x)) - returns the memory address where the variable is pointing to in hexadecimal
print("The id for x is: {}".format(hex(id(x))))
print("The id for y is: {}".format(hex(id(y))))

In my case it printed,

The id for x is: 0x106d7d060
The id for y is: 0x105897f90

You will notice that both variables hold a pointer where the values are stored, not the values themselves.
So now that we know that variables are pointers to memory addresses, we can take a deeper look about Intering and how Python handles it.

Integers Intering

Python automatically saves the most common integers in memory. Those integers are between [-5, 256]. Whenever you declare a variable within this range, Python will point to that pre-allocated space in memory.

If you have this code,

a = 5
b = 5
c = a
d = b
print("The id for a is: {}".format(hex(id(a))))
print("The id for b is: {}".format(hex(id(b))))
print("The id for c is: {}".format(hex(id(c))))
print("The id for d is: {}".format(hex(id(d))))

The output will be this one,

The id for a is: 0x105e13060
The id for b is: 0x105e13060
The id for c is: 0x105e13060
The id for d is: 0x105e13060

All four variable point to the same address.

Here’s what happens if you change the value of one of those variables,

b = 340
print("The id for b is: {}".format(hex(id(b))))
The id for b is: 0x105525fd0

Python will allocate space in memory to store the integer 340. The variable, b, now holds the address to that spot in memory.

If we had another variable with the same value 340 then Python would make that new variable point to the same address to reuse that space in memory.

String Intering

As with integers, Python optimizes repeated strings. When a string is repeated, it will assign to the new variable the address of the previously created string.

a = 'hello_world'
b = 'hello_world'
c = 'Hello_world'
d = 'hello world this is a long long string that python will intern to make the app work faster'
e = 'hello world this is a long long string that python will intern to make the app work faster'

print("The id for a is: {}".format(hex(id(a))))
print("The id for b is: {}".format(hex(id(b))))
print("The id for c is: {}".format(hex(id(c))))
print("The id for d is: {}".format(hex(id(d))))
print("The id for e is: {}".format(hex(id(e))))

Note that the string assigned to c is different than the string assigned to a and b. The string assigned to c starts with a capital H.

This is the result,

The id for a is: 0x107702930
The id for b is: 0x107702930
The id for c is: 0x1077029f0
The id for d is: 0x10764f930
The id for e is: 0x10764f930

If the strings match completely, the memory address will be the same.

An advantage specifically with strings is that when comparing two strings instead of comparing them with the == operator, you can use the keyword is. The difference relies in that the == will compare character to character. If every character of the first string matches the characters of the second string then it will return true. But using the is keyword is will compare the memory address, which is much faster.

Maybe in small texts you wouldn’t even notice a difference, but if you had to work with a really large text it would more obvious.

import time

def compare_strings():
d = 'hello world this is a long long string that python will intern to make the code work faster'*10000
e = 'hello world this is a long long string that python will intern to make the code work faster'*10000
for i in range(100000):
if d == e:
pass

start = time.perf_counter()
compare_strings()
end = time.perf_counter()
print("Elapsed time {}".format(end-start))

In the previous code we are comparing two strings using ==. It took my computer 4.73 seconds. Knowing that the strings are the same and knowing that Python would intern them, I used the is keyword to make the comparison go faster,

import time

def compare_strings():
d = 'hello world this is a long long string'*10000
e = 'hello world this is a long long string'*10000
for i in range(100000):
if d is e:
pass

start = time.perf_counter()
compare_strings()
end = time.perf_counter()
print("Elapsed time {}".format(end-start))

The elapsed time on my computer was just 0.004 seconds, so it does make a huge difference when it comes to really long strings.

Finally…

Intering is a way to optimize a program by reusing integer and string literals. Specially string intering is helpful because you can optimize your code a lot just by changing a simple statement.

An application of string intering would be for natural language processing, where you have to make a lot of evaluations of words and phrases. These tend to be long strings and the amount of optimization significant.

If you are interested on Python optimizations you could check out my article about Python Optimizations: Peephole.