Placeholder Image

Subtitles section Play video

  • Hello, my name is Alberto Villarreal.

  • In this short video, I want to give you

  • an introduction to a new feature in the Intel Xeon Scalable

  • Processors that is designed to accelerate the learning use

  • cases.

  • Deep learning has gained significant attention

  • in the industry by achieving state-of-the-art results

  • in image classification, speech recognition,

  • language translation, object detection,

  • and other applications.

  • Second-generation Intel Xeon Scalable Processors

  • led to increased performance of deep learning applications,

  • from cloud to edge devices, while using

  • the same hardware for many other types of workloads.

  • This is because of new features in these processors

  • such as Intel Advanced Vector Extensions 512 or Intel

  • AVX-512, which is a set of instructions

  • that can accelerate performance for demanding computation

  • of tasks.

  • Intel AVX-512 now includes Intel AVX-512 Deep Learning Boost,

  • which has new instructions that accelerate deep learning

  • inference workloads such as image classification, object

  • detection, and others.

  • Let's see how this new technology works.

  • Research has shown that both deep learning

  • training and inference can be performed

  • with lower numerical precision using

  • 16-bit multipliers for training and 8-bit multipliers

  • or fewer for inference with minimal to no loss in accuracy.

  • The previous generation of Intel Xeon Scalable Processors

  • enabled lower precision for inference

  • using the Intel AVX 512 instruction set.

  • These instructions enable lower-precision multiplies

  • with higher-precision accumulates.

  • As shown in this figure, multiplying two 8-bit values

  • and accumulating the result of 32 bits

  • requires three instructions with the accumulation

  • in Int32 format.

  • The new generation of Intel Xeon Scalable Processors

  • now include Intel AVX-512 Deep Learning Boost,

  • which enables 8-bit multiplies with 32-bit accumulates

  • with one single instruction.

  • The three instructions used in the previous generation

  • are now fused into the new instruction.

  • This allows for significantly more performance

  • with less memory requirements.

  • We can use this new functionality in several ways.

  • First, let me show you how to take advantage of the Intel

  • AVX-512 Deep Learning Boost via functionality available

  • in the Intel Math Kernel Library for Deep Neural Networks

  • or Intel MKL-DNN.

  • Intel MKL-DNN is an open-source performance library

  • for deep learning applications intended

  • for acceleration of deep learning frameworks

  • on Intel architecture.

  • It contains vectorized and threaded building blocks

  • that you can use to implement deep neural networks.

  • This is a good way to make use of the deep learning

  • primitives that are already optimized

  • to run on Intel processors.

  • You can simply use any of the deep learning frameworks

  • or libraries.

  • Many are listed here with more coming soon.

  • They use Intel MKL-DNN to benefit

  • from the performance gains offered by Intel Deep Learning

  • Boost.

  • You can also link your application to Intel MKL-DNN

  • via C or C++ APIs.

  • This way, you can take advantage of deep learning primitives

  • and performance-critical functions

  • that are already optimized to use Intel Deep Learning Boost.

  • This allows you to develop your own optimized software products

  • or to optimize existing ones.

  • For example, let us suppose we want to use the C++ API

  • in Intel MKL-DNN to implement a convolution with a rectified

  • linear unit from the AlexNet topology using lower-precision

  • primitives.

  • This diagram shows the flow of operations and data

  • for this example.

  • Notice that we start performing a quantization

  • step to get low-precision representations

  • of data, weights, and biases for the convolution layer.

  • Then we perform the convolution operation using lower-position,

  • and at the end, the output of the computation

  • is dequantized from 8-bit integers

  • into the original floating-point format.

  • The source code for this example can

  • be found in the Intel MKL-DNN repository.

  • You can go to the main page in the repository

  • and click on the SimpleNet example, where

  • you can find an introduction to 8-bit integer computations,

  • including the quantization process, which converts a given

  • input into a lower-precision format.

  • On this page, you will find a walkthrough of the source code

  • that implements the convolution operation in this example,

  • showing the different steps involved in implementation.

  • You can use this code sample as a basis

  • to create your own network and take advantage of the new Intel

  • AVX-512 Deep Learning Boost functionality.

  • The complete source code for this example, as well as

  • other examples, tutorials, and installation directions

  • for Intel MKL-DNN can be downloaded

  • from the GitHub repository listed in the links section.

  • The code samples that I just showed

  • illustrate how you can use the new Intel AVX-512 Deep Learning

  • Boost feature to accelerate your applications.

  • Of course, you can also take advantage of these new features

  • by using frameworks and libraries that have already

  • been optimized for Intel AVX-512 Deep Learning Boost.

  • I hope this information was useful for you.

  • Remember to check out the links provided for resources

  • that you can use to make your artificial intelligence

  • applications run faster.

  • Thanks for watching.

Hello, my name is Alberto Villarreal.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it