CPU optimizations
Sep. 30, 2024
The greater the number of computations, the slower a program execution becomes. Moreover, the relation between the number of computations and the execution time in some problems is not linear, but exponential.
As a case study, I was playing on implementing a naive algorithm for matrix multiplication. Its execution time grows as a cubic function to the number of elements. The best alternative to approach this problem is using a more efficient algorithm (Nadis, 2024). Besides that, you can implement optimizations to take advantage of your current infrastructure.
Regarding optimizations, I implemented these strategies based on CPU features:
- Using threads to divide the total of computations among multiple branches (Rinku & Rani, 2017)
- Using CPU vector instructions to parallelize the same operation on a collection of values (Hassan & Hemeida & Mahmoud, 2016)
- A combination of both
The below table shows the results in seconds. The environment was an Intel i7 11 Gen. machine with 16 GB RAM and Linux kernel 6.10. The simple case is with no optimizations. In all the instances, times grow exponentially; but the simple case is the worst scenario. On the other hand, when the matrix size is higher, combining threads and vector computations produces the best execution times.
Optimization is a broad area. This experiment simply proved that in some cases it can be a worthwhile effort.
Source code: https://codeberg.org/jailop/coding-examples/src/branch/main/c++-low-latency/matrix_mul
References:
- Hassan, S. & Hemeida, A. & Mahmoud, M. (2016). Performance Evaluation of Matrix-Matrix Multiplications Using Intel's Advanced Vector Extensions (AVX). Microprocessors and Microsystems, 47(8), pp. 369-374. https://doi.org/10.1016/j.micpro.2016.10.002
- Nadis, S. (2024). New Breakthrough Brings Matrix Multiplication Closer to Ideal. Quanta Magazine. https://www.quantamagazine.org/new-breakthrough-brings-matrix-multiplication-closer-to-ideal-20240307/
- Rinku, D. & Rani, M. (2017). Analysis of multi-threading time metric on single and multi-core CPUs with Matrix multiplication. 3rd. International Conference on Advances in Electrical, Electronics, Information, Communication, and Bio-Informatics (AEEICB17). https://ieeexplore.ieee.org/abstract/document/7972402