We are going to compare programs that are executed on GPUs and CPUs and finally specify what type of program suits to run on GPUs.
CPU is optimized for latency. CPUs try to complete an instruction fast as much as possible. On the other hand, GPUs are optimized for throughput. Try to complete more instructions at a constant interval. GPUs have thousands of simple cores and are well suited for massively parallel applications, applications that could be divided into many simple independent problems, like matrix multiplication.
The two most commonly used programming languages for GPU programming are CUDA and OpenCL. We give a brief comparison between the two below.
just for Nvidia GPUs
has extensive libraries like cuBLAS, cuDNN, etc.
runs on CPU, GPU, and even FPGAs
has good libraries but not as CUDA
The roofline model states that the performance of the processing units can be bound by memory or computational power. To reach maximum performance, we shouldn't allow programs to become memory-bound. This problem is more serious for programs run on GPUs. In deep learning applications, loading data to GPU could become the bottleneck of performance. There is some advice to mitigate this kind of problem.
PyTorch and Tensorflow are widely used frameworks for deep learning.