Subtitles section Play video
Hello, my name is Alberto Villarreal.
In this short video, I want to give you
an introduction to a new feature in the Intel Xeon Scalable
Processors that is designed to accelerate the learning use
cases.
Deep learning has gained significant attention
in the industry by achieving state-of-the-art results
in image classification, speech recognition,
language translation, object detection,
and other applications.
Second-generation Intel Xeon Scalable Processors
led to increased performance of deep learning applications,
from cloud to edge devices, while using
the same hardware for many other types of workloads.
This is because of new features in these processors
such as Intel Advanced Vector Extensions 512 or Intel
AVX-512, which is a set of instructions
that can accelerate performance for demanding computation
of tasks.
Intel AVX-512 now includes Intel AVX-512 Deep Learning Boost,
which has new instructions that accelerate deep learning
inference workloads such as image classification, object
detection, and others.
Let's see how this new technology works.
Research has shown that both deep learning
training and inference can be performed
with lower numerical precision using
16-bit multipliers for training and 8-bit multipliers
or fewer for inference with minimal to no loss in accuracy.
The previous generation of Intel Xeon Scalable Processors
enabled lower precision for inference
using the Intel AVX 512 instruction set.
These instructions enable lower-precision multiplies
with higher-precision accumulates.
As shown in this figure, multiplying two 8-bit values
and accumulating the result of 32 bits
requires three instructions with the accumulation
in Int32 format.
The new generation of Intel Xeon Scalable Processors
now include Intel AVX-512 Deep Learning Boost,
which enables 8-bit multiplies with 32-bit accumulates
with one single instruction.
The three instructions used in the previous generation
are now fused into the new instruction.
This allows for significantly more performance
with less memory requirements.
We can use this new functionality in several ways.
First, let me show you how to take advantage of the Intel
AVX-512 Deep Learning Boost via functionality available
in the Intel Math Kernel Library for Deep Neural Networks
or Intel MKL-DNN.
Intel MKL-DNN is an open-source performance library
for deep learning applications intended
for acceleration of deep learning frameworks
on Intel architecture.
It contains vectorized and threaded building blocks
that you can use to implement deep neural networks.
This is a good way to make use of the deep learning
primitives that are already optimized
to run on Intel processors.
You can simply use any of the deep learning frameworks
or libraries.
Many are listed here with more coming soon.
They use Intel MKL-DNN to benefit
from the performance gains offered by Intel Deep Learning
Boost.
You can also link your application to Intel MKL-DNN
via C or C++ APIs.
This way, you can take advantage of deep learning primitives
and performance-critical functions
that are already optimized to use Intel Deep Learning Boost.
This allows you to develop your own optimized software products
or to optimize existing ones.
For example, let us suppose we want to use the C++ API
in Intel MKL-DNN to implement a convolution with a rectified
linear unit from the AlexNet topology using lower-precision
primitives.
This diagram shows the flow of operations and data
for this example.
Notice that we start performing a quantization
step to get low-precision representations
of data, weights, and biases for the convolution layer.
Then we perform the convolution operation using lower-position,
and at the end, the output of the computation
is dequantized from 8-bit integers
into the original floating-point format.
The source code for this example can
be found in the Intel MKL-DNN repository.
You can go to the main page in the repository
and click on the SimpleNet example, where
you can find an introduction to 8-bit integer computations,
including the quantization process, which converts a given
input into a lower-precision format.
On this page, you will find a walkthrough of the source code
that implements the convolution operation in this example,
showing the different steps involved in implementation.
You can use this code sample as a basis
to create your own network and take advantage of the new Intel
AVX-512 Deep Learning Boost functionality.
The complete source code for this example, as well as
other examples, tutorials, and installation directions
for Intel MKL-DNN can be downloaded
from the GitHub repository listed in the links section.
The code samples that I just showed
illustrate how you can use the new Intel AVX-512 Deep Learning
Boost feature to accelerate your applications.
Of course, you can also take advantage of these new features
by using frameworks and libraries that have already
been optimized for Intel AVX-512 Deep Learning Boost.
I hope this information was useful for you.
Remember to check out the links provided for resources
that you can use to make your artificial intelligence
applications run faster.
Thanks for watching.