CPUs vs. GPUs

We are going to compare programs that are executed on GPUs and CPUs and finally specify what type of program suits to run on GPUs.

CPU is optimized for latency. CPUs try to complete an instruction fast as much as possible. On the other hand, GPUs are optimized for throughput. Try to complete more instructions at a constant interval. GPUs have thousands of simple cores and are well suited for massively parallel applications, applications that could be divided into many simple independent problems, like matrix multiplication.

GPU programming languages

The two most commonly used programming languages for GPU programming are CUDA and OpenCL. We give a brief comparison between the two below.

CUDA

just for Nvidia GPUs

has extensive libraries like cuBLAS, cuDNN, etc.

OpenCL

runs on CPU, GPU, and even FPGAs

has good libraries but not as CUDA

CPU / GPU communication

The roofline model states that the performance of the processing units can be bound by memory or computational power. To reach maximum performance, we shouldn't allow programs to become memory-bound. This problem is more serious for programs run on GPUs. In deep learning applications, loading data to GPU could become the bottleneck of performance. There is some advice to mitigate this kind of problem.

Use SSD
Prefetch data with multiple threads on the CPU
Read all data to RAM if possible

Deep learning frameworks

PyTorch and Tensorflow are widely used frameworks for deep learning.

deep learning framework advantages

easily make computational graphs
easily get gradient from computational graphs
implemented on top of GPUs