Python Optimizations: Intering

Python Optimizations: Intering

Not many people have heard of Intering in Python. You might be one them. If so, you are not alone.

Intering is the way Python optimizes the memory used in Python. It does this by sharing the reference to common literals, numeric or string, used by more than one variable.

Let’s dig a bit deeper.

Variables

Before we start digging into it, let’s recap what a variable is.

A variable is a pointer to a space in memory where some object is stored. For instance, if you assign literal integer to a variable, it will simply point to the space in memory where that integer is saved.

When you declare a variable x = 5 , it doesn’t actually mean that x is literally 5 . What happens is that the value of x is the address of where the number 5 is stored.

If you write the following code,

In my case it printed,

You will notice that both variables hold a pointer where the values are stored, not the values themselves.
So now that we know that variables are pointers to memory addresses, we can take a deeper look about Intering and how Python handles it.

Integers Intering

Python automatically saves the most common integers in memory. Those integers are between [-5, 256]. Whenever you declare a variable within this range, Python will point to that pre-allocated space in memory.

If you have this code,

The output will be this one,

All four variable point to the same address.

Here’s what happens if you change the value of one of those variables,

Python will allocate space in memory to store the integer 340. The variable, b, now holds the address to that spot in memory.

If we had another variable with the same value 340 then Python would make that new variable point to the same address to reuse that space in memory.

String Intering

As with integers, Python optimizes repeated strings. When a string is repeated, it will assign to the new variable the address of the previously created string.

Note that the string assigned to c is different than the string assigned to a and b. The string assigned to c starts with a capital H.

This is the result,

If the strings match completely, the memory address will be the same.

An advantage specifically with strings is that when comparing two strings instead of comparing them with the == operator, you can use the keyword is. The difference relies in that the == will compare character to character. If every character of the first string matches the characters of the second string then it will return true. But using the is keyword is will compare the memory address, which is much faster.

Maybe in small texts you wouldn’t even notice a difference, but if you had to work with a really large text it would more obvious.

In the previous code we are comparing two strings using ==. It took my computer 4.73 seconds. Knowing that the strings are the same and knowing that Python would intern them, I used the is keyword to make the comparison go faster,

The elapsed time on my computer was just 0.004 seconds, so it does make a huge difference when it comes to really long strings.

Finally…

Intering is a way to optimize a program by reusing integer and string literals. Specially string intering is helpful because you can optimize your code a lot just by changing a simple statement.

An application of string intering would be for natural language processing, where you have to make a lot of evaluations of words and phrases. These tend to be long strings and the amount of optimization significant.

If you are interested on Python optimizations you could check out my article about Python Optimizations: Peephole.

Focus Mode

Contact Request

Close

We will call you right away. All information is kept private