Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • WEI LIN: I'm a senior director of PAI,

  • the platform of artificial intelligence in Alibaba.

  • Today I will give you a brief introduction about our work.

  • This is an overview of our computation platform.

  • We have a global storage system with

  • the heterogeneous resource.

  • On top that we have uniform resource management

  • to support all different types of computation framework,

  • including PAI.

  • Here's a snapshot of the PAI user interface.

  • People can pull and try components, and build

  • a workflow very easily.

  • The system runs on top of millions of CPU cores,

  • and thousands of GPU cores.

  • Our single training job can scale up to a thousand workers,

  • with billions of features and parameters.

  • And we also have a public server in our cloud

  • for the external user as well.

  • Here is our key in our system design.

  • We want to have some easy-going building blocks

  • for AI application creators.

  • We also provide-- cover the full web cycle for those developers,

  • to give them a one-stop programming experience.

  • Our core is our engine which can provide

  • high performance, low cost, good flexibility, and extensibility.

  • Since this is a TensorFlow dev summit,

  • I will talk more about our work in the deep learning engine.

  • Our ultimate goal is to let the developer focus on modelling

  • their neural network.

  • Net assistant, PAI, helps them run their model easily,

  • efficiently, and to scale.

  • How do we achieve that?

  • We deeply leverage TensorFlow, because TensorFlow

  • has a very good flexible and extensible system design.

  • And we did a lot of in-depth optimization,

  • which is listed on the right.

  • Inside Alibaba, the recommendation system

  • has billions of features that requires

  • a thousand workers in training.

  • We have to enhance TensorFlow, especially on the runtime

  • and distributed training mechanism,

  • to leverage the sparsity of the data better.

  • We also improved the communication protocol

  • to introduce this multi-layer, or reduced ring,

  • to build on top of the network hierarchy topology.

  • We also support different communication protocols,

  • like RDMA and NCCL.

  • In order to fully utilize new GPU architecture

  • like [INAUDIBLE].

  • We enhanced a TensorFlow that can automatically run

  • the model with mixed precision.

  • From the initial results, we actually

  • improved this three times in our real scenarios.

  • We asked for easy programming.

  • We worked with the community together,

  • to improve the auto-parallelism.

  • We introduced TAO, which is based on the XLA framework,

  • and that did a lot of the optimization.

  • Including the cost-based graph split, graph optimization,

  • kernel fusion, and full-stage code-gen.

  • On the inference side, we introduced the PAI-Blade tools,

  • which has three levels of optimization.

  • Here are some results.

  • We can see that on those public models,

  • we are on par with TensorRT.

  • Slightly better.

  • Recently, graph neural networks gained a lot of attention

  • inside Alibaba.

  • In our real scenario, we faced a more challenging [INAUDIBLE] ,

  • group, having four properties.

  • They are large-scale heterogeneous, attributed,

  • and dynamic.

  • We had to enhance TensorFlow to solve those challenges.

  • We also developed a general CN inference engine,

  • with-- on FPGA, and integrated this engine with TensorFlow.

  • We have deployed this solution to our CityBrain project

  • in China.

  • PAI also provide a lot of the SDK

  • for the developer to accelerate their research,

  • including Reinforcement Learning, Transfer Learning

  • package, and the Computation Vision natural language

  • processing.

  • They are all built up on top of TensorFlow.

  • Currently we already contributed a lot--

  • 50-- probably more than 50 PRs back.

  • I came here to want to have more connection,

  • and to try to share more of our work with the community.

  • Thank you.

  • [MUSIC PLAYING]

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it