CPU optimizations

Sep. 30, 2024

The greater the number of computations, the slower a program execution becomes. Moreover, the relation between the number of computations and the execution time in some problems is not linear, but exponential.

As a case study, I was playing on implementing a naive algorithm for matrix multiplication. Its execution time grows as a cubic function to the number of elements. The best alternative to approach this problem is using a more efficient algorithm (Nadis, 2024). Besides that, you can implement optimizations to take advantage of your current infrastructure.

Regarding optimizations, I implemented these strategies based on CPU features:

The below table shows the results in seconds. The environment was an Intel i7 11 Gen. machine with 16 GB RAM and Linux kernel 6.10. The simple case is with no optimizations. In all the instances, times grow exponentially; but the simple case is the worst scenario. On the other hand, when the matrix size is higher, combining threads and vector computations produces the best execution times.

Matrix size and execution time

Optimization is a broad area. This experiment simply proved that in some cases it can be a worthwhile effort.

Source code: https://codeberg.org/jailop/coding-examples/src/branch/main/c++-low-latency/matrix_mul

References:

Jaime López
Centereach, NY

Sections:

Profiles: