CPUs vs. GPUs

We are going to compare programs that are executed on GPUs and CPUs and finally specify what type of program suits to run on GPUs.

CPU is optimized for latency. CPUs try to complete an instruction fast as much as possible. On the other hand, GPUs are optimized for throughput. Try to complete more instructions at a constant interval. GPUs have thousands of simple cores and are well suited for massively parallel applications, applications that could be divided into many simple independent problems, like matrix multiplication.

https://upload.wikimedia.org/wikipedia/commons/c/c6/Cpu-gpu.svg

GPU programming languages

The two most commonly used programming languages for GPU programming are CUDA and OpenCL. We give a brief comparison between the two below.

CUDA

just for Nvidia GPUs

has extensive libraries like cuBLAS, cuDNN, etc.

OpenCL

runs on CPU, GPU, and even FPGAs

has good libraries but not as CUDA

CPU / GPU communication

The roofline model states that the performance of the processing units can be bound by memory or computational power. To reach maximum performance, we shouldn't allow programs to become memory-bound. This problem is more serious for programs run on GPUs. In deep learning applications, loading data to GPU could become the bottleneck of performance. There is some advice to mitigate this kind of problem.

https://upload.wikimedia.org/wikipedia/commons/5/5a/Example_of_a_naive_Roofline_model.svg

Deep learning frameworks

PyTorch and Tensorflow are widely used frameworks for deep learning.

deep learning framework advantages

PyTorch