Subtitles section Play video Print subtitles Meet the Packets: How audio travels into your browser Sara Fecadu KATIE: Hello. Welcome back. So, I keep forgetting to do this and I apologize. But the big announcement right now is that the swag is ready. But do not go get swag now because we're about to have a really awesome talk by Sara Fecadu. I asked Sara for a fun fact and her fun fact was that she makes bakes a mean cookie which unfortunately we can't all indulge in. So, as a follow up question, I said what prompted you write this talk about an audio API. And she said, well, I had spent a year building a checkout form and I just couldn't stand to look at it or think about it anymore and I had to do something different. Which I think is something that literally all you have us can probably identify really strongly with. So, anyways, Sara is gonna come up and talk to us about the audio API. So, give it up for Sara. [ Applause ] SARA: Hello. See if I can get my computer started here. Okay. Welcome to my talk. Meet the packets. If not everyone has realized, it's a play off meet the parents. I spent a lot of time working on that. [ Laughter ] Let's see here. One second. Gonna progress? No. Okay. We're gonna do it without the clicker. So, this will be interesting. As Katie said, my name oh. My whole slide deck isn't progressing. Okay. One second. There we go. Okay. Thank you for coming to talk. As Katie said, my name is Sara Fecadu. I am from Seattle, Washington. And I don't have a ton of hobbies besides making cookies and listening to a lot of podcasts. And by day I'm a software developer at Nordstrom. And Nordstrom is a clothing retailer founded in 1901. While people don't usually associate 100 year old companies with tech, we have a thriving tech org working on innovative ways to get you what you need and feel your best. And a year ago I was hired on to do a rewrite of Nordstrom.com's redux. And as of last May, we have been taking 100% of customer orders. Now, why am I talking about audio streaming? Katie may have taken my joke here, but the answer is: Form fields. Our checkout UI has 22 form fields. And they come in different groupings for different reasons. But many of my waking moments over the past year have been spent thinking about these form fields. And I just wanted to do anything else. So, I was sitting on my couch one night reading a book on packet analysis, like one does, and watching a YouTube video. And I thought to myself, how does that work? Like, on the packet level, how does audio video streaming work? So, to answer the larger question, I started small with: What is audio streaming? And audio streaming is the act of sending audio files over the network. And this talk will be about on demand audio streaming. Now, the major difference between on demand streaming and live streaming, is with on demand streaming we need all of the packets to get across the wire. Whereas with live streaming, you may be more interested in keeping them up with the event and a certain amount of packet loss is acceptable. Over the past few months, I learned that audio streaming, even when limited to on demand, is as wide a subject as it is deep. I have picked three topics that exemplify what audio streaming is. Why it's hard and how to get started yourself. And we will talk about audio streaming protocols, TCP congestion control and client players. Audio streaming protocols give us a stand how to encode, segment and ship your code to the client. TCP congestion control handles congestion on the TCP layer of the stack. And it is relevant with on demand audio streaming because we're shipping larger audio files and we need every single packet to make its way to the client to play audio. A client player is any network connected device with a play and pause button. So, this could be your phone, your TV, your laptop, et cetera. And client players not only allow us to play our audio, but when paired with modern audio streaming protocols, they hold a lot of decision making power. Well, audio streaming protocols are the heart of audio streaming. And today we'll talk about adaptive bitrate streaming it &s it benefits and how to convert your own audio files to work with two popular audio streaming protocols. Before we get started, I wanted to go over some terms that will come up. A codec encodes data and uses compression techniques to get the highest quality for the smallest footprint. Encoding and trans coding is converting it from one type to another. Trans coding can convert from digital to digital. And then move from analog to other digital files. Bitrate is how many bits it takes to encode a second of audio. And this number usually refers to the quality of the audio file. When I think of playing music on the Internet, I think of an HTML5 audio tag with a source attribute set to the path of my audio file. And this is a perfectly reasonable way to do it. You can request and receive a single file containing an entire song. And it would be referred to as progressive streaming and the major benefit here is you only have one file to deal with. But let's say, for instance, you have a user and they have a slow network connection and they can't download your one file. They're stuck. So, adaptive bitrate streaming aims to solve this problem by encoding your audio in multiple bitrates and allowing the client player to decide which quality is best for the user to listen to your audio uninterrupted. This allows more users to access your audio. But it does add a layer of operational complexity because now you've got a lot more work on moving parts. The audio streaming protocols we'll talk about not only average adaptive bitrate streaming, but also use HTTP web servers. They do this by encoding the file, segmenting they will, placing them on a web server and then once requested, partial audio files are sent to the client one at a time. Here is the secret to our modern audio streaming protocols is it's more of a series of downloads than it really is a stream. But we'll refer to it as streaming anyway. The two most popular audio streaming protocols today are HTTP lye streaming, or HLS, and dynamic adaptive streaming over HTTP, MPEG DASH. It was created by Apple to support streaming to mobile devices and it is default on all Mac OS and Apple devices. And MPEG DASH was a direct alternative to HLS. It was created by the forum who want to make MPEG DASH the international streaming. Let's look at them side by side. HLS takes the MPC, AAC, AC 3, or EC 3, encodes them into fragmented MP4 files. Those segmented files are in a play list. If you have multiple bitrate streams, each stream will be in a media play list and all of your media play lists will be in a master play list. With MPEG DASH, it is agnostic, in theory you can convert any into MPEG DASH. It will be fragmented into a fragmented MP4 file. That will be displayed in an XML manifest file called a media presentation description. Okay. We've talked about what files will be used and what they'll be segmented into, but how do you get it there? You've got this audio file. What tools allow you to convert the audio file? Well, you've got options. But most of these options are paid options. Except for FFmpeg. Which is an open source demand line tool that among other things allows you to convert audio files to be HLS or MPEG DASH. However, I founded learning curve for FFmpeg to be pretty steep. And a lot of the documentation for HLS and MPEG DASH were for video streams. Instead I used Amazon elastic trans coder. It's an AWS offering that converts files of one type to another. In our case, we're taking an audio file and converting it to be used with HLS and MPEG DASH. It's pretty much plug and play. You tell Amazon elastic trans coder what type of files you have and what type of files you want and it outputs the stream for you. And even though it's easy to use, it's not a free service. So, if you were going to be converting a lot of files, it may be worth your time to learn more about an open source alternative like MPEG DASH. My workflow when working with Amazon Elastic Transcoder was to upload to an AWS object store. I told Amazon Elastic Transcoder where my audio file was and what settings I needed it to convert my audio files to. And Amazon Elastic Transcoder output my streams into that same S3 bucket. And I downloaded them for us to explore. This is the basic set of files you would get with an HLS stream. And it kind of looks like a lot. But we're going to break it down into four groups. In the top left, the master play list. In our case, we have two bitrate streams represented and they will be linked out from the master play list. And then in the top right you'll see those media play lists which have each bitrate stream. And those will contain all of our links to our transport stream files which are the fragmented audio files represented in both the bottom left and the bottom right. On the bottom right we have our 64K bitrate stream segmented audio files. And in the bottom, oh. Did I get that backwards? I'm not really good at right and left. But in the bottom section you'll have your fragmented audio files. We'll take a closer look at those so you can see really what's in it. This is the entirety of the HLS master play list. It contains information about the specific bitrate streams and links out to those media play lists that represent the streams themselves. Let's look at the 64K bitrate stream media playlist. It has even more information about the stream including caching information, the target duration of each segmented audio file, and most importantly, links out to our transport streams. This is what one of those fragmented audio times looks like. And there's something a little interesting going on here. If you'll notice, it's color coded and I kept trying to figure out why. But then I realized a transport stream has the file extension .ts. And something else has the file extension .ts, TypeScript. Ignore the colors. It's just a binary coded file. Now our MPEG DASH audio stream has fewer files and looks more manageable. But it's similar. We have our media presentation description, which is an XML manifest file which contains all of our information about the stream. Then below we have our two segmented audio files. All of the segments are encapsulated in a single file, but within them there are segments. That's why there are fewer files in the MPEG DASH audio stream than in the other audio stream. Look at the description. See a lot of stuff here. But there are three important elements. All bitrate streams are represented in a representation tag. And then all bitrate streams are enclosed in an adaptation set. Within the representation tag, we do have our URL to our audio files. And taking a look at one of those audio files we'll see if looks fairly similar to the segmented audio file we saw with HLS. Minus the color coding because it's a .MP4 versus .TS. visual studio is not confused in this case. Earlier we talked about progressive streaming which is streaming an entire audio file in one two. We used an audio element and a source attribute with the path of our audio file. With MPEG DASH and HLS, it's very similar. But instead of having the path to our audio file, we have the path to the master play list for HLS or media presentation description for MPEG DASH. We're going to take a hard left here and we're gonna talk about the second topic in my talk. Which is TCP congestion control. And TCP is a transport layer protocol and it has mechanisms in both its sender and receiver which are defined by the operating systems of each to react to and hopefully avoid congestion when sending packets over the wire. And they are called TCP congestion control. And today we talk about packet loss congestion control and why it isn't so great. And more specific, the congestion window and duplicate acknowledgment in packet loss based congestion control. Before we get started, somewhere terms, bandwidth is the rate at which data can be sent. And throughput is the rate at which data can be received. The congestion window is a TCP variable that defines the amount of data that can be sent before the acknowledgment is received by the sender. Let's say you have a user who has requested your audio file from the server. Your audio packets travel down the network stack, across the physical layer, up the data link layer in the network layer and arrives at the transport layer and unfortunately there's congestion right before we reached our destination. Now, traffic congestion and network congestion have very similar beginnings. Either too many cars or too many packets have entered the roadway and there's nowhere for them to go. With traffic, you have to wait it out. Luckily for us, TCP congestion control allows them to flow over the wire, even during congestion. And before we get to the specifics of these TCP congestion control algorithms, let's talk about the TCP happy path. We're going to start with a single packet sent from the sender to the receiver flowing through the receiver's buffer. And being acknowledged by the receiver and having an acknowledgment packet sent back to the requester. We talked about the congestion window, the amount of data before a sender receives an acknowledgment. Another way of thinking about the congestion window is as a sending rate. As the sender receives acknowledgments, the congestion window grows. And as the receiver's buffers fill and they drop all excess packets, the sender responds by shrinking the congestion window. A second way of thinking about the congestion window is as a bucket. And as packet loss occurs, the bucket shrinks. And as acknowledgments are received by the sender, the bucket gross. There's a slight oversight in the bucket explanation in that the receiver has no way of telling the sender that it is dropping packets due to congestion. But one option the sender does have is to send a duplicate acknowledgment. And a duplicate is if they're trying to send out of order packets. They send one, two and three. For the purposes of our example, the receiver's not going to process them right away. So, that when we send packet four, it's full and it has nowhere to go. So, packet four dropped due to congestion. And they move on to process packet one, send an acknowledgment, send for packet two and for three. However, when it looks at packet five, it says I can't process you because this would be an out of order packet. drops packet five and sends back for three. The sender is tipped off that it needs to sends packets four and five again. So, a more truthful version of the bucket metaphor would be that the congestion window shrinks as old acknowledgments are received by the sender. And the bucket window grows as new acknowledgments are sent by the sender. The first b congestion control algorithms were written in the 1980s and the most recent were a couple years ago. We will talk about TCP Reno and BBR. TCP Reno is the classic. And BBR was created by Google engineers a few years ago to address issues that they saw when using packet based algorithms. TCP Reno starts with a congestion period where it's set at some rate increasing by. It's set at some value, excuse me, increasing by some rate. And as the sender receives acknowledgments, the congestion window grows by one. And as the sender adds packets, it is divided by some rate. I have chosen path. So, it's divided by two. And the main issue with TCP Reno is that it assumes that small amounts of packet loss are congestion. And in a world where the sender doesn't know the state of the receiver's buffer and the receiver is unable to tell the sender that it has room left to process packets, you have an Internet moving at a fraction of the capacity. In 2016, BBR was created to help you get the most out of your Internet connection. It looks for the place where sending rate is equal to bandwidth. In theory, you should be able to send to the receiver and move on to the application without any queuing. Some companies have reported positive outcomes when using BBR in their production systems. Firstly, it only has to be implemented in the senders side and is in Linux operating systems with kernel 4.9 or higher. And they found BBR increased bandwidth for the low bandwidth users for 10 15%, and the bandwidth for their median group 5 7. Additionally, users in Latin America and Asia saw additional increases. But is it a fair algorithm? Fairness, or using your fair share of bandwidth is the goal of every TCP control algorithm. And in experiments in Google and Spotify, they found that BBR was able to co exist with congestion control algorithms like TCP Reno or QBIC. However, some researchers found that BBR's initial start algorithm pushed QBIC spenders back to where they couldn't reestablish their fair share of bandwidth. And this is an issue currently being look the at both in and outside of Google. We've reached the final section in this talk. And so far we've talked about how audio files are processed to be streamed and issues that may occur as they travel to devices. We'll wrap up by talking about the role of the client player and how to create your own audio strings. Now, I'm a pretty big fan of Spotify and I use it regularly. But have you ever looked at what's being sent back from the web server to create those audio streams? This should look pretty familiar to what we were looking at with our segmented audio files with HLS and MPEG DASH. But when I first saw these, I did not have this context. And I kept thinking, do I need to write some client side JavaScript to get this to play on the Internet? Is there an NPM package I can use? Or is there something simple and obvious going on here that's going right over my head. And luckily for me and hopefully everyone who writes JavaScript for the web, there is. Because HLS and MPEG DASH handed over a lot of responsibility to the clients that process their streams. And this not only includes picking the correct quality of audio to play, but it also includes allowing elements like the audio element to process segmented audio files without any modification. And most browsers do this by leveraging the media sources extension API and the encrypted media extensions API. Additionally, libraries like HLS.JS and Dash.JS are available while cross browser support is low. As a side note, if you need to support iOS Safari, you need HLS. But with most other browsers, you have options. So, it would have been really fun to reverse engineer Spotify's audio player. But I got tired of reading their minified code. So, I decided to make my own audio player. And I started with a cassette that I found from a box of cassettes. And I chose it because it has the words "Map squad" written on it. And I used my iPhone's voice memo application to record the audio so the quality is so so at best. But it works. And you can try it right now. But maybe wait until the end of the talk because I want to show you how it's made. The entire application is a single in docs.HTML file with an audio element in the body. When loaded into the browser, the immediately invoked function runs, the init function. And at the top, we define the audio that's equal to our audio element. Next web see if the media sources extension API is supported in our browser. If it is, we will assume we can use dash.JS to enable MPEG DASH in most browsers. Pass it to the dash.JS media player. And when the player is initialized, our audio will be loaded with it. If the media sources extension API is not available, we're going to assume we're using iOS Safari and we need to have an HLS stream. We will do this by setting the source attribute of our audio element to the master playlist or the past to our master playlist. And that this file is all you need to stream audio to most browsers in 2019. If you want to try it out in the browser for yourself, or you want to create your own audio streams, please feel free to fork 24 repo. Thank you. [ Applause ] KATIE: I'm sorry. I think that scared me more than it scared you. Thank you so much, Sara. Can you believe that is the first talk she has ever given at a conference? Yes. Amazing. All right. So, we have about a a 15 minute break right now. So, go out and pick up your swag bags. And we'll see you back here at 3:00. Patricia Ruiz Realini is talking about the importance of your local library. Which is pretty cool because I hang out at the library. We'll see you back here at 3:00. No, wait. 3:00, yeah.
B1 audio congestion streaming sender tcp packet Meet the Packets: How audio travels into your browser - Sara Fecadu - JSConf US 2019 2 0 林宜悉 posted on 2020/03/28 More Share Save Report Video vocabulary