Subtitles section Play video Print subtitles MATT FROST: Welcome to the first session on WebM and the New VP9 Open Video Codec. We figured that there's no way to really add a little excitement to a presentation than to change it at the last minute, and so what we've spent this morning doing is encoding some VP9 video and H.264 video and putting together a side by side demonstration just to give you a taste of what we're working on. So what you're going to see is a video. The video is going to be the same on either side. It's going to be VP9, the new codec on the left, H.264 on the right. And H.264, we used the X264 open video encoder, which is commonly regarded as the best encoder out there. We used the highest possible settings. So we've done everything we can to favor H.264 here. All of this is at the same data rate, so both of the videos are going to be at the same data rate. And the bit rate varies. In some cases, we're using 500K. In other cases, we've dropped the bit rate down to bit rates that are actually banned by certain UN conventions for the compression of HD video. And so with that, I think that's everything, Ronald? RONALD BULTJE: Yes. So like Matt said, what you're looking at here is shots that we just took this morning. We've encoded those in just a couple of hours and basically, what you're looking at here, on the left, VP9 and on the right, H.264, is what an amazing job we can actually do at video compression if we're using the very latest technologies. MATT FROST: So you can see the blockiness on the right. On some of this, it's a lot more evident than others, and especially evident, if you want afterwards to come up and take a look at this running on the screen, we can freeze frames. But you see there on the right especially, all this blockiness and how much it clears up as it moves into VP9 territory. RONALD BULTJE: And a point here really is that for high definition video, H.264 can do a reasonable job, but we can do a lot better than that. And so having said that, let's actually get started on the presentation. MATT FROST: So the way that we're going to handle this presentation is I'm going to do a quick introduction on why we care about open video, both why does Google-- which has historically been involved with developing applications around video-- has gotten down deeply into actually helping work on these next generation compression technologies. After we talk about that and why, in general, improving video compression is good for everybody, I'm going to turn it over to Ronald for really the meat of this presentation, which will be to show you some more demonstrations, to talk a little bit about how we measure video quality, talk about some of the techniques that we're exploiting to really make this dramatic improvement in compression. And then finally, after you've seen this, and I hope that you've started to get a little excited about what this technology can do for you, we'll go and talk about the last stages, how we're going to wrap up this project and how we're going to get these tools into your hands as quickly as possible. So to start off with, just taking a quick look at how Google got into video. Video at Google started in the same way that so many big projects at Google start, as an experiment. And we launched these efforts with just a single full time engineer and a number of engineers working 20% of their time on video, really focusing on video-related data. And then over the last 10 years, obviously, video at Google has exploded, not only with YouTube but with Google Talk, Hangouts, lots of applications where you wouldn't necessarily think of video as playing a core role, like Chromoting, which is Chrome Remote Desktopping. But if you look at the really motivating factors for getting into video compression, there are a couple that are really of note. One, of course, is the acquisition of YouTube. And with the acquisition of YouTube, we all of a sudden started to focus very heavily on both improving the experience for users, improving video quality, but also about the costs associated with all aspects of running a service like YouTube. There are costs associated with ingest, transcode of video formats, storage of multiple different formats, and then distribution of the video, both to caches and to the edge, and ultimately to users. The second was the move from HTML4 to HTML5, which came at the same time, pretty much, as our launch of Chrome. And of course, in HTML4, although to the user, it appeared that video could be supported in a browser, in fact, video was supported through runtimes and plug-ins. With HTML5, video becomes a native part of the browser. And so with the move towards HTML5, we see it filtering through the addition of the video tag in Chrome and the launch of HTML5 video for YouTube. So these are the two factors-- the focus on quality and reducing cost with YouTube, the need to build a high quality codec into Chrome and other browsers for the video tag-- that sparked the acquisition in 2010 of On2 Technologies, the company that I came from and many members of the WebM team came from, and the launch of the WebM project. The WebM project is an effort to develop a high quality, open alternative for web video. We're very focused on web video, not on video for Blu-ray discs, not on video for cable television, but about solving the problems that we find in web video. In addition, we're very focused on having an open standard because we believe that the web has evolved as quickly is it has because it is based on open technologies. And clearly, multimedia communication has become such a core part of how we communicate on the web that we need open technologies that are rapidly evolving to allow us to keep pace and to make sure that we can develop the next generation of killer video applications. We wanted something simple as well. So we used the VP8 Open Codec, the Vorbis Open Audio Codec, which was a long existing open audio codec, and then them the Matroska File Wrapper. With the launch of VP9 in a matter of months, we're going to be adding the VP9 Video Codec as well as the brand new Opus Audio Codec, which is another open audio codec, very performant and high quality. So since our launch, obviously, web video has continued to grow. And if we just look at what we know very well, which is YouTube, YouTube has grown to be a global scale video platform capable of serving video across the globe to these myriad connected video enabled devices that we're all using. It supports a billion monthly users, and those users are looking at video four billion times a day for a total of six billion plus hours of video viewed monthly. Just to think about that number, that is an hour of video for every person on the planet consumed on YouTube. And on the creation side, we're seeing exactly the same trends. 72 hours of video is uploaded per minute, and that video is increasingly becoming HD video. So if you look at the graph on the right, blue is 360p standard definition video, which is slowly declining, but quickly being matched by uploads of HD video. And the key here of great importance is that HD video is obviously more complex. There's more data for a given HD video than there is for-- unless, of course, you're encoding it in VP9-- than there is for a standard resolution video. In addition, I think we can all agree that the better the video is, the higher the resolution, the more watchable it is. And then finally, the other trend that's driving both creation and consumption is the increase in mobile devices and the move towards 4G networks. So even this morning, there was an article when I woke up and was checking my email saying that YouTube video accounts for 25% of all downstream web traffic in Europe. And I think BitTorrent accounted for 13%. So there alone, between just two web video services, we're looking at close to 40% of all web data in Europe being video related data. And that accords with what we see from the latest Cisco forecasts, for instance, which is that consumer web video is going to be close to 90% of all consumer data on the web within the next three years. So it's remarkably encouraging to see the growth in video, but it also represents a real challenge. Of course, the good news is that we have a technology that is up to this challenge, and that is VP9. With next generation video codecs, with the codecs as good as VP9, we can effectively significantly increase the size of the internet and we can significantly increase the speed of the internet. So obviously, if you're taking VP9-- which, as Ronald will say, halves the bit rate you need for the very best H.264 to deliver a given quality video-- you're going to be able to speed the downloaded of a download and play a video, you're going to be able to speed, obviously, the buffering of these videos. So we have the tools to effectively dramatically increase the size of the internet. But of course in doing that, in improving the video experience, in improving the ability to upload video quickly, we're going to just create the conditions for even more consumption of video. And so it's not going to be enough for us to rest on our laurels with VP9. We're going to have to turn to VP9 and keep on doing it, keep on pushing the boundaries of what we're capable of with video compression. So with that, I'm going to turn it over to Ronald to show you some really remarkable demonstrations of this new technology. RONALD BULTJE: Thank you. So to get started, I just briefly want to say some words about video quality. So how do we measure quality? Well, the most typical way to measure quality is to just look at it, because at the end of the day, the only thing that we care about is that the video that you're looking at looks great to your eyes. But that's, of course, not all there is to it because as we're developing a new video codec, we cannot spend our whole day just watching YouTube videos over and over and over again. That would be fun, though. So in addition to visually analyzing and inspecting video, we're also using metrics. The most popular metric in the field for measuring video quality is called PSNR. It stands for Peak Square Noise Ratio. And the graph that you're looking at here on the left is a typical representation of PSNR on the vertical axis and video bit rate on the horizontal axis to give you some sort of a feeling of how those two relate. So the obvious thing to note here is that as you increase the bit rate, the video quality, as measured by this metric, increases. So at the end of the day what that means is that it doesn't really matter what code you use, as long as you've infinite bandwidth, you can accomplish any quality. However, our goal is to make it easier and faster and simpler to stream video. So how does PSNR actually compare to visual quality? So for that, there's a sample clip. So what you're looking here is a very high penalty shot of the New York skyline. I believe that this is the Empire State Building. And this clip has a lot of detailed textures all across. So what we've done here is that we've encoded it at various bit rates, and then every couple of seconds, we're dropping the bit rate and the metric quality of the video will slowly decrease. So this is 45 dB, and what you're seeing slowly at 30 dB is that some of the detail, or actually a lot of the detail, in the backgrounds of the buildings just completely disappears. And that was the case at 35 dB already also. As you go to 25 dB, you can see-- we can go really low in quality, but you do not want to watch this. Here's a different scene. Same thing, we start with the original 45 dB. 40 dB looks pretty good. 35 dB starts having a lot of artifacts, and then 30 and 25 are essentially unwatchable. So what does that mean for video quality? Well, the typical target quality for high definition video on the internet lies rounds 40 dB. You were just looking at the video, and a 40 dB looked really quite good. So if you go to YouTube and you try to stream a 720p video, that's actually about the quality that you will get. In terms of bit rate, what you should expect to get is a couple of megabits a second. For this particular clip, that's one to two megabits a second, but that's very source material dependent. So what we've done, then, is we have taken, I think, about 1,000 YouTube CCL licensed uploads, just randomly selected from whatever users give us, and we've then taken out particular material that we're not really interested in, such as stills or video clips that contain garbage video content. And then we were left with, I think, about 700 CCL licensed YouTube uploads, and we've encoded those at various bit rates-- so at various quality settings-- with our VP9 Video Codec or with H.264 using the X264 encoder at the very best settings that we are aware of. Then for each of these clips, we've taken the left half of the resulting compressed file and the right half of the 264 one and we've stitched those back together, and then you essentially get what you're looking at here. So left here is VP9, right is 264, and those are at about the same bit rate. You will see graphs here on the left and on the right, and those are actually the effective bit rate for this particular video clip. And as you can see, it starts being about equal. Now, you saw it just jumping up, and that's because we're gradually increasing the bit rate to allow the 264 encoder to catch up in quality. And as you can see, it slowly, slowly starts looking a little bit better. And at this point, I would say that it looks about equal on the left and on the right. But if you look at the bit rate graphs, you can basically see that we're spending about two and a half times the bit rate on a 264 file versus the VP9 file. So those are the compression savings that you can get if you do same quality encodings but you use VP9 instead of 264. So what you're looking at here is a comparative graph for the clip that you were just looking at. The blue line is the 264 encoded version and the red line is the VP9 encoded version. And as I said in the beginning, vertical axis is PSNR as a metric of quality, and the horizontal axis is bit rate. So the way that you compare these is that you can pick any point from the red line-- or from the blue line, for that matter-- and then you can do two things. Either you can draw a vertical line and find the matching point on a blue line that matches the points on the red line that you're looking for and look at what the difference in quality is. But what we usually do is we do it the other way around. So we're drawing a horizontal line for the point on the red graph, and we're finding the point that matches the horizontal line on the blue. And what you're looking at here is that for the point that we were just looking at, that is, a quality metric point of about 37.1 dB, the VP9 version takes an average of 328 kilobits a second to reach that quality, and for H.264, you need to go up to essentially 800 kilobits a second to get exactly the same quality. So what that means is, again, the metrics tell us you can get a two and a half times lower bit rate and effectively get the same quality by using VP9 instead of 264. If you look to the higher end of the graph, you will see that the differences in quality for the same bit rates might go slightly down, but that's basically just because at the higher end, there's a diminishing returns for bit rate. So if you look at the high ends of both of those graphs and you do the horizontal line comparison, so what is the different bit rate that accomplishes the same quality? You will see that it about comes down to 2x over the whole graph. So let's look at the difference video because I could just be cheating you with this one video and we could have optimized our codec for this one video. So what you're looking at here is, again, the same thing, VP9 on the left, 264 on the right, live bit rate graphs and we start at the same bit rate. Then as we do that, we're slowly increasing the bit rate for the 264 portion video so that it can actually catch up in quality. And what you're looking at is that on the right, the floor is pulsing a lot. You can actually see, if you focus on the pants of little boy here or on the plastic box, that it's very noisy. But eventually, it catches up in quality. Guess what happened to the bit rate? It's almost 3x for this particular video. So here is the [INAUDIBLE] graph for the material that we were just looking at. The red line is VP9, the blue line is H.264. And if we do the same quality different bit rate comparison at the point that we were just looking at, which is about 38.6 dB, for VP9, you arrive at about 200 kilobits a second, and for H.264, you need to interpolate between two points because we don't have an exact match, and it ends up being around 550 kilobits a second. So almost 3x more bit rates to accomplish the same quality, just because you can use VP9 to save this. So we've done this over many, many clips. I told you we had about 700 clips that we tested this on at various bit rates and various quality settings, and overall, you can save 50% bandwidth by encoding your videos in VP9 instead of H.264 at the very best settings that we are aware of. So how did we do this? So let's look a little bit at the techniques that we're using to actually get to this kind of compression efficiency. So a typical video sequence consists of a series of video frames, and then each of these video frames consist of square blocks. So for current generation video codecs, like H.264, these blocks have a size of a maximum 16 by 16 pixels. We've blown this up a lot. We have currently gone up to 64 by 64 pixels for each block, and then at that point, we introduce a partitioning step. And in this partitioning step, we allow you to do a vertical or horizontal partitioning, a four-way split, or no partitioning at all, resulting in different size sub-blocks. If you do a four-way split and you have four 32 by 32 blocks, then for each of these blocks, you go through the same process again of horizontal, vertical split, four-way split, or no split at all. If you do the four-way split, you get down to 16 by 16 pixels, do the same thing again to get to eight by eight, and eventually four by four pixels. So what this partitioning step allows you to do is to break up the video in such a way that it's optimized for your particular content. Stuff that has a very stable motion field can use very large blocks, whereas video content where things are moving all across all the time, you can go to very small video blocks. So what do we you do after that? So after this partitioning step, we're usually doing motion vector coding, and basically what that does is that you pick a reference frame, and you pick a motion vector, and then the block of that particular size that you selected in your partitioning step will be coded using a motion vector pointing in one of the previously coded reference frames. These reference frames in VP8 were usually frames that had previously been encoded, and were therefore temporarily before the current frame. What we've added in VP9 is that we have multi-level alt reference frames, and what that allows you to do is encode the video sequence in any frame order, and then you can use any future frame as a reference frame for a frame that you encode in order, decide to encode after that. So for this series of frames in the left, this is six frames. I could, for example, choose the first thing encode frame one, then frame six, and then frame three using both a future as well as a past reference. And then, now that I have encoded three, I can encode one and two really efficiently because they have a very proximate future and past reference. After I've encoded two and three, I go to five, which has four and six as close neighbors. And so that allows for very temporally close reference frames to be used as a predictor of contents in the current block. So once you have a motion vector, you can use subpixel filtering, and subpixel filtering allows you to basically pick a point in between two full pixels and this point in between is then interpolated using a subpixel interpolation filter. In VP8, we had only a single subpixel interpolation filter. Most codecs use just a single subpixel interpolation filter. We've actually added three in VP9, and those are optimized for different types of material. We have a sharp subpixel interpolation filter, which is really great for material where there's a very sharp edge somewhere in the middle. For example, that city clip that we were looking at in the beginning, if you're thinking of a block that happens to be somewhere on the border between the sky and a building, we consider that a sharp edge, and so using an optimized filter for sharp edges actually maintains a lot of that detail. On the other hand, sometimes there's very sharp edges but those are not consistent across video frames across different temporal points in the sequence that you're looking at. And that point, this will cause a very high frequency residual artifact, and so for those, we've added a low pass filter. And what the low pass filter does is that it basically removes sharp edges, and it does exactly the opposite as a sharp filter. Lastly, we have a regular filter, which is similar to the one that VP8 had. After this prediction step, you have predicted block contents and you have the actual block that you're trying to get as close as possible to, and then the difference between these two is the residual signal that you're going to encode. So in current generation video codecs, we usually use four by four or eight by eight cosine based transforms called DCTs to encode this residual signal. What we've added in VP9 is much higher resolution DCT transforms all the way up to 32 by 32 pixels, and in addition to using the DCT, we've also added an asymmetric sine based transform called ADST. And the sine based transform is optimized for a signal that has a near zero value at the edge of the predicted region, whereas the cosine is optimized for a residual signal that has a zero signal in the middle of the predicted signal. So those are optimized for different conditions, and together, they give good gains when used properly. Basically, the take home message from all of this is that we've added big resolution increments to our video codecs, and what that leads to is a codec that is highly, highly optimized for high definition video coding. But at the same time, because it is very configurable, it still performs really well at low resolution content, for example, SIF-based 320 by 240 video as well. So I'll hand it back to Matt now, who will take over. MATT FROST: Thanks, Ronald. So I just want to give you a quick recap of what we've discussed and sort of the highlights of this technology, and then to tell you about the last steps that we're going through to get VP9 in your hands. As Ronald said, we're talking about technology here that is 50% better than literally everything that everybody else out there is using. And actually, we made a point to say we were using the very best encoder out there at the very best settings, settings which I really think you're not seeing very often in the real world because they're very difficult to use in a real world encoding environment. So I hope that there are a number of people in this audience now who are out there, either with existing products with video or products to which you're looking to add video, or just you're thinking about how you can use these tools to launch a new product and to come out with a start-up. This technology has not been used by anyone right now. YouTube is testing it and we'll talk about that in a little bit, but if you adopt VP9, as you can very quickly, you will have a tremendous advantage over anybody out there with their current offering based on 264 or even VP8. It's currently available in Chrome, and the libvpx library on the WebM project is out there for you to download, compile, and test. It's open source. You will have access to source code. The terms of the open source license are incredibly liberal so that you can take the code, improve it, optimize it, modify it, integrate it with your proprietary technology, and you're not going to have to give back a line of code to the project. You're not going to have to be concerned that you will inadvertently open source your own proprietary code. And then finally, it's royalty free. And obviously, this is something that was of great importance to us as we sought to open source a video technology for use in HTML5 and the video tag. We believe that the best is still to come in terms of video products on the web, and that in order to make sure that people are free to innovate and that start-ups are free to launch great new video products, we have to make sure that they're not writing $5 or $6 million checks a year to standards bodies. We're working very hard on putting this technology into your hands as soon as possible. We did a semi freeze of the bit stream just a couple of weeks ago, and at that time, we said that we were taking comments on the bit stream for 45 more days. Specifically, we're looking for comments from a lot of our hardware partners to some of the software techniques that we're using just to make sure that we're not doing anything that's incredibly difficult to implement in hardware. At the end of the 45 day period on June 17, we're going to be bit stream frozen, which means that after June 17, any VP9 encoder that you use is going to be compliant with any VP9 decoder, and that if you're encoding content with an encoder that's out after June 17, it's going to be able to play back in a decoder after the bit stream freeze. Obviously, getting VP9 in Chrome is very important to us. The beta VP9 which you've been seeing today is already in Chrome. If you download the latest development version of Chrome and enable the VP9 experiment, you'll be able to play back VP9 content immediately. As soon as we've frozen the bit stream as of June 17, we're going to roll it into the Dev Channel of Chrome as well with this final version of VP9, and then that's going to work through the beta channel and through the stable channel. And by the end of the summer, we are going to have VP9 in stable version of Chrome rolling out to the hundreds of millions of users. I think [INAUDIBLE] today said that there are 750 million users of Chrome right now. VP9 is going to be deployed on a massive scale by the end of summer. In terms of final development activities that we're going to be working on, after the bit stream is finalized in the middle of June, we're going to be focusing on optimizations both for performance and for platform. So what that means is we'll be working on making sure that they encoder is optimized for a production environment. Obviously, something that's very important to YouTube as YouTube moves to supporting VP9, that the decoder is sufficiently fast to play back on many of the PCs that are out there. We're also going to be working on platform optimizations that will be important to Android developers, for instance, and to people who want to support VP9 on embedded devices. These are ARM optimizations and optimizations for other DSPs. We have hardware designs coming out. For those of you who may work with semiconductor companies or are thinking about a technology like this for use in something like an action camera, these are hardware designs that get integrated into a larger design for a semiconductor and allow for a fully accelerated VP9 experience. Real time optimizations are obviously incredibly important for video conferencing, Skype style applications, and also for new applications that are coming out like screencasting and screen sharing. By the end of Q3, we should have real time optimizations which allow for a very good real time performance. Those optimizations should then allow VP9 to be integrated into the WebRTC project, which is a sister project to the WebM project and basically takes the entire real time communication stack and builds it into Chrome, and more broadly into HTML5 capable browsers. And so what this means is that when VP9 is integrated into WebRTC, you will have tools that are open source, free for implementation that used to, even four years ago, require license fees of hundreds of thousands of dollars. And you, with a few hundred lines of JavaScript, should be able to build the same sort of rich video conferencing style applications and screencasting applications that you're seeing with products like Hangouts. And finally, in the end of this year moving into Q1 2014, we're going to see, again, hardware designs for the encoder. So just to give you an idea of how usable these technologies are, we have a VP9 demonstration in YouTube. If you download the Development Version of Chrome and flip the VP9 tag, you can play back YouTube VP9 videos. And one thing this should drive home is this was a project that was done over the course of two weeks, that VP9 was built into YouTube. Obviously, we have very capable teams. Obviously we have people on the WebM team and people on the YouTube team who know a lot about these tools, but this demonstration is VP9 in the YouTube operating environment. There's nothing canned here. This is VP9 being encoded and transmitted in the same way that any other video is. So this, I hope, again, will give you guys pause to say, god, we could do this as well. We could come out very quickly with a VP9 based service that will be remarkably better than anything that's out there right now. So I just want to leave you with some thoughts about what I hope that you're thinking about coming away from this presentation. The WebM project is a true community-based open source project, and obviously, these sorts of projects thrive on contributions from the community. We are coming out of a period where we've been very intensively focused on algorithm development. Some of this work is certainly very complicated stuff that not every-- even incredibly seasoned-- software engineer can work on. But we're moving into a point where we're focusing on application development, we're focusing on optimization, we're focusing on bug fixes and patches, and that's the sort of thing that people in this room certainly can do. So we encourage you to contribute and we encourage you to advocate for use of these technologies. We build open source technologies, and yet simply because we build them, that doesn't mean that people adopt them. It takes work to get communities to focus on adopting these sorts of open technologies. So advocate within your project in your company, advocate within your company for use of open technologies, and advocate within the web community as a whole. We think that with VP9, we've shown the power of a rapidly developing, open technology, and we hope that people are as excited about this as we are and that you go out and help spread the word about this technology. But most important, we'd like you to use the technology. We're building this with a purpose, and that is for people to go out, take advantage of these dramatic steps forward that we've made with VP9. And so we hope you will go out, that you'll be charged up from this presentation, and that you'll immediately download the Development Version of Chrome and start playing around with this and start seeing what you can do with this tool that we've been building for you. So there are just a couple of other things I'd like to say. There are a couple of other presentations related to this project. There's a presentation on Demystifying Video Encoding, Encoding for WebM VP8-- and this is certainly relevant to VP9-- and then another on the WebRTC project. And again, if you're considering a video conferencing style application, screensharing, remote desktopping, this is something that you should be very interested in. Sorry. I shouldn't be using PowerPoint. So with that, we can open it up to questions. Can we switch to just the Developers Screen, guys? Do I do that? AUDIENCE: Hey there. VP8, VP9 on mobile, do you have any plans releasing for iOS and integrating with my iOS applications-- Native, Objective C, and stuff? Do you have any plans for that? MATT FROST: He's asking if VP8 is in iOS? AUDIENCE: VP9 on iOS running on top of Objective C. RONALD BULTJE: So I think as for Android, it's obvious Android supports VP8 and Android will eventually support VP9 as well. For iOS-- MATT FROST: When I was talking about optimizations, platform optimizations, talking about VP9, that's the sort of work we're focusing on, ARM optimizations that should apply across all of these ARM SOCs that are prevalent in Android devices and iOS devices. There aren't hardware accelerators and iOS platforms right now. Obviously, that's something we'd like to change, but presently, if you're going to try to support VP8 in iOS, you're going to have to do it through software. AUDIENCE: Thank you. RONALD BULTJE: Yep? AUDIENCE: Bruce Lawson from Opera. I've been advocating WebM for a couple of years. One question. I expect your answer is yes. Is it your assumption that the agreement that you came to with MPEG LA about VP8 equally applies to VP9? MATT FROST: It does apply to VP9 in a slightly different way than it does with VP8. The agreement with MPEG LA and the 11 licensors with respect to VP9 covers techniques that are common with VP8. So obviously, we've added back some techniques we were using in earlier versions, we've added in some new techniques, so there are some techniques that aren't subject to the license in VP9. But yes, the core techniques which are used in VP8 are covered by the MPEG LA license, and there will be a VP9 license that will be available for developers and manufacturers to take advantage of. AUDIENCE: Super. Follow up question. About 18 months ago, the Chrome team announced they were going to drop H.264 being bundled in the browser, and that subsequently didn't happen. Can you comment further on whether Chrome will drop H.264 and concentrate only on VP9? MATT FROST: I can't really comment on plans going forward. What I can say is that having built H.264 in, it's very difficult to remove a technology. I think when you look at the difference between VP9 and H.264, there's not going to be any competition between the two. So I think with respect to VP9, H.264 is slightly less relevant because there was nothing-- we didn't have our finger on the scale for this presentation. And especially, we were hoping to drive home with that initial demonstration which we put together over the last few hours that we're not looking for the best videos. We're just out there recording stuff. So even if 264 remains in Chrome-- which I think is probably likely-- I don't think it's going to be relevant for a next gen codec because there's just such a difference in quality. AUDIENCE: Thanks for your answers. AUDIENCE: Hi there. I have a question about performance. Besides the obvious difference in royalty and licensing and all that, can you comment on VP9 versus HEVC, and do you hope to achieve the same performance or proof of [INAUDIBLE]? RONALD BULTJE: So the question is in terms of quality, how do VP9 and HEVC compare? AUDIENCE: Yeah, and bit rate performance, yeah. RONALD BULTJE: Right. So testing HEVC is difficult. I'll answer your question in a second. Testing HEVC is difficult because there's currently no either open source software or commercial software available that can actually encode HEVC unless it's highly developmental in nature or it is the development model. The problem with the alpha and beta versions that are currently on the market for commercial products is that we're not allowed to use them in comparative settings like we're doing. Their license doesn't allow us to do that. Then the problem with the reference model is it is a really good encoder, it gives good quality, but it is so enormously slow. It can do about 10 frames an hour for a high definition video. That's just not something that we can really use in YouTube. But yes, we've done those tests. In terms of quality, they're currently about equal. There's some videos where HEVC, the reference model, is actually about 10%, 20% better. There's also a couple of videos where VP9 is about 10%, 20% better. If you take the average over, for example, all of those CCL licensed YouTube clips that we looked at, it's about a 1% difference. I think that 1% is in favor of HEVC if you so wish, but 1% is so small that really, we don't think that plays a role. What does that mean going forward? Well, we're really more interested in commercial software that will be out there that actually encodes HEVC at reasonable speed settings. And like I said, there's currently nothing on the market but we're really interested in such products, so once they are on the market and we can use them, we certainly will. AUDIENCE: Follow-up question about the performance. Is this any reason to not expect this to scale up to 4K video or [INAUDIBLE]? RONALD BULTJE: We think that the current high definition trend is mostly going towards 720p and 1080p. So if you look at YouTube uploads, there is basically no 4K material there, so it's just really hard to find testing materials, and that's why we mostly use 720p and 1080p material. MATT FROST: But certainly when we designed the codec, we designed it with 4K in mind. There aren't any limitations which are going to prevent it from doing 4K. RONALD BULTJE: Right. You can use this all the way up to 16K video if that's what you were asking. MATT FROST: Sir? AUDIENCE: Yeah. Have you been talking to the WebRTC team, and do you know when they're going to integrate VP9 into their current products? MATT FROST: We talk with the WebRTC team regularly. As I said, we've got to finish our real time enhancements in order to actually have a codec that works well in a real time environment before we can expect it to be integrated into WebRTC. But I think we're looking at Q4 2013. AUDIENCE: Great, thanks. MATT FROST: We're in 2013, right? RONALD BULTJE: Yeah. AUDIENCE: Hi. I just wanted to talk about the rate of change in video codecs. I think maybe we can see like VP8, VP9, we're talking about an accelerating rate of change. And that's great, and I really wanted to applaud the efforts to getting this out in Chrome Dev quickly, or Chrome Stable quickly. I just wanted to ask about maybe some of your relationships with other software vendors that are going to be relevant, like we're talking Mozilla, IE, iOS was, I think, previously mentioned. As this kind of rate of innovation in codecs increases, how are we going to make sure that we can have as few transcode targets as possible? My company is working on a video product. We don't want to have eight different codecs. And if we can imagine, let's say, that Version 10 comes out relatively soon, sometime down the road. How can we make sure that devices stick with a relatively small subset of compatible decodings? MATT FROST: I guess I'm a little unsure of what you're asking. In terms of how we get support on devices as quickly as possible, or how we solve the transcoding problem? AUDIENCE: And just keeping the number of transcoded formats as small as possible. If IE only supports H.264, I have to have an H.264 encoding. So I was just wondering what kind of relationships you guys are working on to make sure that as many devices and platforms as possible can support something like VP9. MATT FROST: We're certainly working very hard on that, and as I said in the slide on next steps showing the timeline, our focus on having hardware designs out there as quickly as possible is an effort to try to make sure that there's hardware that supports VP9 more rapidly than hardware has ever been out to support a new format. We had a VP9 summit two weeks ago, which was largely attended by semiconductor companies. Actually, some other very encouraging companies were there with great interest in these new technologies. But we're working very hard with our hardware partners and with OEMs to make sure that this is supported as quickly as possible. I think internally, what we're looking at is probably relying on VP8 to the extent that we need hardware now and we don't have it in VP9. So I think what we've talked about is always falling back to an earlier version of an open technology that has very broad hardware support. But we're trying to think very creatively about things like transcoding and things that we can do to ensure backwards compatibility or enhancement layers. So part of the focus of this open development cycle and process that we have is to really try to think in very new ways about how we support new technologies while maintaining the benefits of hardware support or device support for older technologies. AUDIENCE: Excellent. Thank you. AUDIENCE: So a key point in any solution is going to be performance. Hardware acceleration really solves that, and that was one of the challenges with the adoption of VP8 in timing versus H.264, which has broad spectrum hardware acceleration. I understand the timing, the delays, and the efforts you guys are doing to really achieve that hardware accelerated support for VP9. But until then, what's the software performance in comparison to H.264, for either both software, software, or software, hardware? RONALD BULTJE: So we've only done software, software comparisons for that. Let me start VP8 264. Currently, VP8 decoding is about twice as fast as 264 decoding using fully optimized decoders. VP9 is about twice as slow currently as VP8, decoding, and that basically means that it's exactly at the same speed as H.264 decoding. That's not what we're targeting as a final product. We haven't finished fully optimizing the decoder. Eventually, what we hope to get is about a 40% slowdown from VP8 decoding, and that will put it well ahead of the fastest 264 decoders that are out there in software. AUDIENCE: Great. Thank you. AUDIENCE: Hello. I was just wanting to get some background on the comparison between H.264 and VP9. For H.264, what were you using-- CVR, BVR, and what QP values? RONALD BULTJE: This is two path encoding at the target bit rate. So it's preset very slow. Since we're doing visual comparison, there is no tune set. It's paths one or two, and then just a target bit rate. We tend to choose target bit rates that are somewhere between 100 and 1,000 kilobits a second, and then we just pick the same point for the VP9 one as well to start with. AUDIENCE: So in both of the comparisons, you were trying to be very generic so you weren't tuning the encoder in any way to make it a better quality at that bit rate. You were just giving it two paths to try to figure it out. RONALD BULTJE: So you mean visual quality, or-- AUDIENCE: Yes. RONALD BULTJE: So we haven't tuned either one of them for any specific setting. For 264, the default is that it optimizes for visual experience, and so that's why we optimized it to 6414. So it's not optimized for SSIM or PSNR in the visual displace that we did here. VP9 encoding does not have any such tunes, so we're not setting any item, of course. AUDIENCE: So you just used the default settings of [INAUDIBLE]? RONALD BULTJE: We're using the default settings, and we've actually discussed this extensively with the 264 developers. They agree. They support this kind of testing methodology, and as far as I'm aware, they agree with it. They fully expect the kind of results that we're getting here. AUDIENCE: Right. OK, thanks. AUDIENCE: Hi. One more question about performance. I think you mentioned a little bit about the real time. So do you think in the future, you can manage to bring an application like application desktop into the web? I mean like putting three, four windows in the same browser, high definition, things like that? RONALD BULTJE: In terms of decoding or encoding? AUDIENCE: Both. RONALD BULTJE: So for encoding, yes. So there will be real time settings for this codec eventually. For no codec will that get you exactly the types of bit rate quality ratios that you're seeing here. These are really using very slow settings, and that is by far not real time. But if you set the VP9 codec to real time settings, then yes, eventually it will encode in real time. It will be able to do four full desktops all at once, and it will be able to decode all of those also. You'll probably need a multicore machine for this, obviously, but it will be able to do it, yes. AUDIENCE: And you're using the graphics card and other things like that. You didn't mention about the hardware, OpenGL or-- RONALD BULTJE: It's future software. There's no hardware involved. AUDIENCE: No using the hardware, the card hardware. RONALD BULTJE: We're not using GPU or anything like that at this point. AUDIENCE: Thank you. AUDIENCE: Hi. I just want to know, how does a VP9, now or later, compare to VP8 and H.264 when we're talking about single task CBR, low bitrate, real time encoding? Little background is we are part of the screen sharing utility that currently uses VP8, and we've been successfully using it for a year, but the biggest gripe with VP8 is that it doesn't respect bit rate, especially on low bit rates, unless you enable frame dropping, which is unacceptable. So we have to do a bunch of hacks to actually produce quality and it doesn't behave like H.264 would in that situation. So how will VP9 address that problem, or is that even on the roadmap? RONALD BULTJE: So in general, desktop sharing and applications like this, also real time communications, yes, they're on the roadmap, and yes, they will all be supported. In terms of your specific problem, I guess the best thing to do is why don't you come and see us afterwards in the Chrome [INAUDIBLE], and we can actually look at that. AUDIENCE: OK, awesome. RONALD BULTJE: As for VP9, VP9 currently does not have a one pass mode. We've removed that to just speed up development, but it will eventually be re-added, and it will be as fast as the VP8 one but with a 50% reduction in bit rate. AUDIENCE: Do you have a timeline for that? Is it going to this year, or next year? RONALD BULTJE: Like Matt said, that will happen-- MATT FROST: Late Q3. RONALD BULTJE: Q3 2013, around then. We're currently focusing on YouTube, and those kind of things will come after that. AUDIENCE: Awesome. Thank you. AUDIENCE: I have two questions, unrelated questions to that. What is the latency performance of VP8 compared to VP9 in terms of decoding and encoding? And the second question is, how does VP9 compare to H.265? RONALD BULTJE: So I think H.265, I addressed earlier. So do you want me to go into that further, or was that OK? AUDIENCE: More in terms of the real time performance. RONALD BULTJE: So in terms of real time performance, I think for both, that's really, really hard to say because there is no real time HEVC encoder and there is no real time VP9 encoder. So I can sort of guess, but this is something that the future will have to tell us. We will put a lot of effort into writing real time encoders or adapting our encoder to be real time capable because that is very important for us. MATT FROST: But in terms of raw latency, it should be faster than VP8. You can decode the first frame, right? RONALD BULTJE: I think it will be the same as VP8. So VP8 allows one frame in, one frame out, and VP9 will allow exactly that same frame control model. AUDIENCE: So you mentioned that you've asked hardware manufacturers for any concerns or comments. Have you gotten any yet? MATT FROST: Sorry. Are considering supporting it? AUDIENCE: Well, in terms of the algorithms and how you would actually-- MATT FROST: They're working on it quickly. AUDIENCE: But there's no concerns or comments or anything yet? MATT FROST: No concerns. AUDIENCE: You said you opened up for comments. MATT FROST: No. We have received comments. We have a hardware team internally that took a first pass at comments. We've received a couple of comments additionally just saying, here's some stuff you're doing in software that doesn't implement well, and hardware. I don't foresee a lot of additional comments from the hardware manufacturers. The other work that we're doing over the next 45 days is we had a bunch of experiments that we had to close out, and so we're doing some closing out as well and just finishing the code. Absent act of God, this is bit stream final on June 17. RONALD BULTJE: So we have actually received comments from some hardware manufacturers, and we are actively addressing the ones that we're getting. AUDIENCE: OK, thanks. AUDIENCE: Hi. I might have missed this, but when did you say the ARM optimizations for VP9 are going to come out? MATT FROST: Actually starting now really, we're focusing on doing some optimizations by ourselves and with partners. So I would say that's going to be coming out second half of the year, and it'll probably be sort of incremental where you may get an initial pass of ARM optimizations and then some final optimization. It's obviously very important for us for Android to be able to get VP9 working as well as possible, and obviously, ARM is incredibly important for the Android ecosystem, so that's an area of significant focus. AUDIENCE: And in terms of real time encoding, so in order to blend into WebRTC, you're going to have to get that working. So is this going to coincide with the assimilation of VP9 into WebRTC? MATT FROST: It'll be real time optimizations, which I think we were sort of thinking about end of Q3, beginning of Q4, and then integration into WebRTC will follow on that. Obviously, the one thing I'd say, it's an open source project. If you guys think that you see an opportunity, you can go out and do the optimizations yourselves. There are contractors who can do it. So I encourage you guys to think about that, that you can take the code and you can start working on some of this stuff yourselves. Obviously, we'd love it if you'd contribute it back but we're not going to force you to. Yeah, I guess last question. AUDIENCE: This is a question about how VP9 relates to what the Android team talked about with Google proxy and the speedy proxy. You alluded to transcoding real time for backwards compatible device support. Do you see Google doing the same thing they're going to do with images in this proxy and doing video transcoding to adapt this and use this for compression mode in the Google proxy? RONALD BULTJE: That's a really interesting application, and that's something that we'll have to look into the future. It's not as easy as it sounds because video transcoding actually takes some time. So that would mean that you would actually have to wait a minute while the video is transcoding until you can visit that website, and that might not be quite what you're looking for. But it's an interesting application and we might look into that in the future. MATT FROST: I think that's it. I think we're out of time. Sorry, but we're happy to talk to you afterwards. [APPLAUSE]
B1 ronald quality frost hardware real time encoder Google I/O 2013 - WebM and the New VP9 Open Video Codec 65 9 田立瑋 posted on 2013/09/19 More Share Save Report Video vocabulary