SIMDDigital Garden

SIMD stands for Single instruction multiple data.

With SIMD the CPU can operate on multiple pieces of data in parallel using the same instruction. Like vectorized operations. Here is a quick breakdown

Say we have two arrays

A = [1, 2, 3, 4]
B = [5, 6, 7, 8]

With SIMD, the CPU can process multiple additions in parallel using a single instruction — for example, adding all 4 pairs in one go if the vector register supports 4 integers.

You need hardware support for SIMD, special SIMD registers perform the parallel operations.

Modern compilers can automatically detect simple patterns like loops and apply SIMD vectorization when safe automatically:

for (int i = 0; i < N; ++i) {
    C[i] = A[i] + B[i];
}

SIMD registers can be 64, 128, 256, or 512 bits wide. These are split into lanes, typically 32 bits wide when working with single-precision floats.

For example:

__m256 a = _mm256_loadu_ps(A);
__m256 b = _mm256_loadu_ps(B);
__m256 c = _mm256_add_ps(a, b);

This splits A and B into 8x32-bit wide lanes which get added in parallel.

Real-World Use

Libraries like Eigen, OpenCV, TensorFlow, or PyTorch use SIMD under the hood for fast math.
GPUs also use SIMD-like concepts but at larger scales (SIMT - Single Instruction, Multiple Threads)

Computer Architecture

Ishaan Sehgal

Explorer

SIMD

Real-World Use

Graph View