Analyze the performance characteristics of String Concatenation (`+=`) vs. `str.join()` in Python. Why does building a large string inside a loop using `+=` result in $O(n^2)$ performance?

Python interview question for Advanced practice.

Answer

Strings in Python are immutable. This means they cannot be modified in place. The Concatenation Problem ($O(n^2)$): When you execute s += chunk, Python cannot just append the new data to the existing memory block of s. Instead, it must: 1. Allocate a new block of memory large enough for the old string + the new chunk. 2. Copy the entire contents of the old string into the new block. 3. Copy the chunk into the new block. In a loop of $N$ iterations, the first iteration copies length 1, the second copies length 2, ..., up to length $N$. The sum of these copy operations is $1+2+...+N = N(N+1)/2$, which is quadratic complexity. The Join Solution ($O(n)$): "".join(chunks) is optimized. It performs two passes: 1. It scans the list of chunks to calculate the total size of the final string. 2. It allocates the exact memory needed once. 3. It copies all chunks into the buffer sequentially. This ensures every byte is copied exactly once, resulting in linear complexity.

Explanation

CPython has a specific optimization that sometimes makes s += t fast if the variable s is not referenced elsewhere, but relying on this is an anti-pattern.

Related Questions