Placeholder Image

Subtitles section Play video

  • Meet the Packets: How audio travels into your browser

  • Sara Fecadu KATIE: Hello. Welcome back. So, I keep forgetting

  • to do this and I apologize. But the big announcement right now is that the swag is ready. But do

  • not go get swag now because we're about to have a really awesome talk by Sara Fecadu.

  • I asked Sara for a fun fact and her fun fact was that she makes  bakes a mean cookie which

  • unfortunately we can't all indulge in. So, as a follow up question, I said what prompted

  • you write this talk about an audio API. And she said, well, I had spent a year building

  • a checkout form and I just couldn't stand to look at it or think about it anymore and

  • I had to do something different. Which I think is something that literally all you have us

  • can probably identify really strongly with. So, anyways, Sara is gonna come up and talk

  • to us about the audio API. So, give it up for Sara.

  • [ Applause ] SARA: Hello. See if I can get my computer

  • started here. Okay. Welcome to my talk. Meet the packets. If not everyone has realized,

  • it's a play off meet the parents. I spent a lot of time working on that.

  • [ Laughter ] Let's see here. One second. Gonna progress?

  • No. Okay. We're gonna do it without the clicker. So, this will be interesting. As Katie said,

  • my name  oh. My whole slide deck isn't progressing. Okay. One second. There we go. Okay. Thank

  • you for coming to talk. As Katie said, my name is Sara Fecadu. I am from Seattle, Washington.

  • And I don't have a ton of hobbies besides making cookies and listening to a lot of podcasts.

  • And by day I'm a software developer at Nordstrom. And Nordstrom is a clothing retailer founded

  • in 1901. While people don't usually associate 100 year old companies with tech, we have

  • a thriving tech org working on innovative ways to get you what you need and feel your

  • best. And a year ago I was hired on to do a rewrite of Nordstrom.com's redux. And as

  • of last May, we have been taking 100% of customer orders. Now, why am I talking about audio

  • streaming? Katie may have taken my joke here, but the answer is: Form fields. Our checkout

  • UI has 22 form fields. And they come in different groupings for different reasons. But many

  • of my waking moments over the past year have been spent thinking about these form fields.

  • And I just wanted to do anything else. So, I was sitting on my couch one night reading

  • a book on packet analysis, like one does, and watching a YouTube video. And I thought

  • to myself, how does that work? Like, on the packet level, how does audio video streaming

  • work? So, to answer the larger question, I started small with: What is audio streaming?

  • And audio streaming is the act of sending audio files over the network. And this talk

  • will be about on demand audio streaming. Now, the major difference between on demand streaming

  • and live streaming, is with on demand streaming we need all of the packets to get across the

  • wire. Whereas with live streaming, you may be more interested in keeping them up with

  • the event and a certain amount of packet loss is acceptable. Over the past few months, I

  • learned that audio streaming, even when limited to on demand, is as wide a subject as it is

  • deep. I have picked three topics that exemplify what audio streaming is. Why it's hard and

  • how to get started yourself. And we will talk about audio streaming protocols, TCP congestion

  • control and client players. Audio streaming protocols give us a stand how to encode, segment

  • and ship your code to the client. TCP congestion control handles congestion on the TCP layer

  • of the stack. And it is relevant with on demand audio streaming because we're shipping larger

  • audio files and we need every single packet to make its way to the client to play audio.

  • A client player is any network connected device with a play and pause button. So, this could

  • be your phone, your TV, your laptop, et cetera. And client players not only allow us to play

  • our audio, but when paired with modern audio streaming protocols, they hold a lot of decision

  • making power. Well, audio streaming protocols are the heart of audio streaming. And today

  • we'll talk about adaptive bitrate streaming it &s it benefits and how to convert your

  • own audio files to work with two popular audio streaming protocols. Before we get started,

  • I wanted to go over some terms that will come up. A codec encodes data and uses compression

  • techniques to get the highest quality for the smallest footprint. Encoding and trans

  • coding is converting it from one type to another. Trans coding can convert from digital to digital.

  • And then move from analog to other digital files. Bitrate is how many bits it takes to

  • encode a second of audio. And this number usually refers to the quality of the audio

  • file. When I think of playing music on the Internet, I think of an HTML5 audio tag with

  • a source attribute set to the path of my audio file. And this is a perfectly reasonable way

  • to do it. You can request and receive a single file containing an entire song. And it would

  • be referred to as progressive streaming and the major benefit here is you only have one

  • file to deal with. But let's say, for instance, you have a user and they have a slow network

  • connection and they can't download your one file. They're stuck. So, adaptive bitrate

  • streaming aims to solve this problem by encoding your audio in multiple bitrates and allowing

  • the client player to decide which quality is best for the user to listen to your audio

  • uninterrupted. This allows more users to access your audio. But it does add a layer of operational

  • complexity because now you've got a lot more work on moving parts. The audio streaming

  • protocols we'll talk about not only average adaptive bitrate streaming, but also use HTTP

  • web servers. They do this by encoding the file, segmenting they will, placing them on

  • a web server and then once requested, partial audio files are sent to the client one at

  • a time. Here is the secret to our modern audio streaming protocols is it's more of a series

  • of downloads than it really is a stream. But we'll refer to it as streaming anyway. The

  • two most popular audio streaming protocols today are HTTP lye streaming, or HLS, and

  • dynamic adaptive streaming over HTTP, MPEG DASH. It was created by Apple to support streaming

  • to mobile devices and it is default on all Mac OS and Apple devices. And MPEG DASH was

  • a direct alternative to HLS. It was created by the forum who want to make MPEG DASH the

  • international streaming. Let's look at them side by side. HLS takes the MPC, AAC, AC 3,

  • or EC 3, encodes them into fragmented MP4 files. Those segmented files are in a play

  • list. If you have multiple bitrate streams, each stream will be in a media play list and

  • all of your media play lists will be in a master play list. With MPEG DASH, it is agnostic,

  • in theory you can convert any into MPEG DASH. It will be fragmented into a fragmented MP4

  • file. That will be displayed in an XML manifest file called a media presentation description.

  • Okay. We've talked about what files will be used and what they'll be segmented into, but

  • how do you get it there? You've got this audio file. What tools allow you to convert the

  • audio file? Well, you've got options. But most of these options are paid options. Except

  • for FFmpeg. Which is an open source demand line tool that among other things allows you

  • to convert audio files to be HLS or MPEG DASH. However, I founded learning curve for FFmpeg

  • to be pretty steep. And a lot of the documentation for HLS and MPEG DASH were for video streams.

  • Instead I used Amazon elastic trans coder. It's an AWS offering that converts files of

  • one type to another. In our case, we're taking an audio file and converting it to be used

  • with HLS and MPEG DASH. It's pretty much plug and play. You tell Amazon elastic trans coder

  • what type of files you have and what type of files you want and it outputs the stream

  • for you. And even though it's easy to use, it's not a free service. So, if you were going

  • to be converting a lot of files, it may be worth your time to learn more about an open

  • source alternative like MPEG DASH. My workflow when working with Amazon Elastic Transcoder

  • was to upload to an AWS object store. I told Amazon Elastic Transcoder where my audio file

  • was and what settings I needed it to convert my audio files to. And Amazon Elastic Transcoder

  • output my streams into that same S3 bucket. And I downloaded them for us to explore. This

  • is the basic set of files you would get with an HLS stream. And it kind of looks like a

  • lot. But we're going to break it down into four groups. In the top left, the master play

  • list. In our case, we have two bitrate streams represented and they will be linked out from

  • the master play list. And then in the top right you'll see those media play lists which

  • have each bitrate stream. And those will contain all of our links to our transport stream files

  • which are the fragmented audio files represented in both the bottom left and the bottom right.

  • On the bottom right we have our 64K bitrate stream segmented audio files. And in the bottom,

  • oh. Did I get that backwards? I'm not really good at right and left. But in the bottom

  • section you'll have your fragmented audio files. We'll take a closer look at those so

  • you can see really what's in it. This is the entirety of the HLS master play list. It contains

  • information about the specific bitrate streams and links out to those media play lists that

  • represent the streams themselves. Let's look at the 64K bitrate stream media playlist.

  • It has even more information about the stream including caching information, the target

  • duration of each segmented audio file, and most importantly, links out to our transport

  • streams. This is what one of those fragmented audio times looks like. And there's something

  • a little interesting going on here. If you'll notice, it's color coded and I kept trying

  • to figure out why. But then I realized a transport stream has the file extension .ts. And something

  • else has the file extension .ts, TypeScript. Ignore the colors. It's just a binary coded

  • file. Now our MPEG DASH audio stream has fewer files and looks more manageable. But it's

  • similar. We have our media presentation description, which is an XML manifest file which contains

  • all of our information about the stream. Then below we have our two segmented audio files.

  • All of the segments are encapsulated in a single file, but within them there are segments.

  • That's why there are fewer files in the MPEG DASH audio stream than in the other audio

  • stream. Look at the description. See a lot of stuff here. But there are three important

  • elements. All bitrate streams are represented in a representation tag. And then all bitrate

  • streams are enclosed in an adaptation set. Within the representation tag, we do have

  • our URL to our audio files. And taking a look at one of those audio files we'll see if looks

  • fairly similar to the segmented audio file we saw with HLS. Minus the color coding because

  • it's a .MP4 versus .TS. visual studio is not confused in this case.

  • Earlier we talked about progressive streaming which is streaming an entire audio file in

  • one two. We used an audio element and a source attribute with the path of our audio file.

  • With MPEG DASH and HLS, it's very similar. But instead of having the path to our audio

  • file, we have the path to the master play list for HLS or media presentation description

  • for MPEG DASH. We're going to take a hard left here and we're gonna talk about the second

  • topic in my talk. Which is TCP congestion control. And TCP is a transport layer protocol

  • and it has mechanisms in both its sender and receiver which are defined by the operating

  • systems of each to react to and hopefully avoid congestion when sending packets over

  • the wire. And they are called TCP congestion control. And today we talk about packet loss

  • congestion control and why it isn't so great. And more specific, the congestion window and

  • duplicate acknowledgment in packet loss based congestion control. Before we get started,

  • somewhere terms, bandwidth is the rate at which data can be sent. And throughput is

  • the rate at which data can be received. The congestion window is a TCP variable that defines

  • the amount of data that can be sent before the acknowledgment is received by the sender.

  • Let's say you have a user who has requested your audio file from the server. Your audio

  • packets travel down the network stack, across the physical layer, up the data link layer

  • in the network layer and arrives at the transport layer and unfortunately there's congestion

  • right before we reached our destination. Now, traffic congestion and network congestion

  • have very similar beginnings. Either too many cars or too many packets have entered the

  • roadway and there's nowhere for them to go. With traffic, you have to wait it out. Luckily

  • for us, TCP congestion control allows them to flow over the wire, even during congestion.

  • And before we get to the specifics of these TCP congestion control algorithms, let's talk

  • about the TCP happy path. We're going to start with a single packet sent from the sender

  • to the receiver flowing through the receiver's buffer. And being acknowledged by the receiver

  • and having an acknowledgment packet sent back to the requester. We talked about the congestion

  • window, the amount of data before a sender receives an acknowledgment. Another way of

  • thinking about the congestion window is as a sending rate. As the sender receives acknowledgments,

  • the congestion window grows. And as the receiver's buffers fill and they drop all excess packets,

  • the sender responds by shrinking the congestion window. A second way of thinking about the

  • congestion window is as a bucket. And as packet loss occurs, the bucket shrinks. And as acknowledgments

  • are received by the sender, the bucket gross. There's a slight oversight in the bucket explanation

  • in that the receiver has no way of telling the sender that it is dropping packets due

  • to congestion. But one option the sender does have is to send a duplicate acknowledgment.

  • And a duplicate is if they're trying to send out of order packets. They send one, two and

  • three. For the purposes of our example, the receiver's not going to process them right

  • away. So, that when we send packet four, it's full and it has nowhere to go. So, packet

  • four dropped due to congestion. And they move on to process packet one, send an acknowledgment,

  • send for packet two and for three. However, when it looks at packet five, it says I can't

  • process you because this would be an out of order packet. drops packet five and sends

  • back for three. The sender is tipped off that it needs to sends packets four and five again.

  • So, a more truthful version of the bucket metaphor would be that the congestion window

  • shrinks as old acknowledgments are received by the sender. And the bucket window grows

  • as new acknowledgments are sent by the sender. The first b congestion control algorithms

  • were written in the 1980s and the most recent were a couple years ago. We will talk about

  • TCP Reno and BBR. TCP Reno is the classic. And BBR was created by Google engineers a

  • few years ago to address issues that they saw when using packet based algorithms. TCP

  • Reno starts with a congestion period where it's set at some rate increasing by. It's

  • set at some value, excuse me, increasing by some rate. And as the sender receives acknowledgments,

  • the congestion window grows by one. And as the sender adds packets, it is divided by

  • some rate. I have chosen path. So, it's divided by two. And the main issue with TCP Reno is

  • that it assumes that small amounts of packet loss are congestion. And in a world where

  • the sender doesn't know the state of the receiver's buffer and the receiver is unable to tell

  • the sender that it has room left to process packets, you have an Internet moving at a

  • fraction of the capacity. In 2016, BBR was created to help you get the most out of your

  • Internet connection. It looks for the place where sending rate is equal to bandwidth.

  • In theory, you should be able to send to the receiver and move on to the application without

  • any queuing. Some companies have reported positive outcomes when using BBR in their

  • production systems. Firstly, it only has to be implemented in the senders side and is

  • in Linux operating systems with kernel 4.9 or higher. And they found BBR increased bandwidth

  • for the low bandwidth users for 10 15%, and the bandwidth for their median group 5 7.

  • Additionally, users in Latin America and Asia saw additional increases. But is it a fair

  • algorithm? Fairness, or using your fair share of bandwidth is the goal of every TCP control

  • algorithm. And in experiments in Google and Spotify, they found that BBR was able to co

  • exist with congestion control algorithms like TCP Reno or QBIC. However, some researchers

  • found that BBR's initial start algorithm pushed QBIC spenders back to where they couldn't

  • reestablish their fair share of bandwidth. And this is an issue currently being look

  • the at both in and outside of Google. We've reached the final section in this talk. And

  • so far we've talked about how audio files are processed to be streamed and issues that

  • may occur as they travel to devices. We'll wrap up by talking about the role of the client

  • player and how to create your own audio strings. Now, I'm a pretty big fan of Spotify and I

  • use it regularly. But have you ever looked at what's being sent back from the web server

  • to create those audio streams? This should look pretty familiar to what we were looking

  • at with our segmented audio files with HLS and MPEG DASH. But when I first saw these,

  • I did not have this context. And I kept thinking, do I need to write some client side JavaScript

  • to get this to play on the Internet? Is there an NPM package I can use? Or is there something

  • simple and obvious going on here that's going right over my head. And luckily for me and

  • hopefully everyone who writes JavaScript for the web, there is. Because HLS and MPEG DASH

  • handed over a lot of responsibility to the clients that process their streams. And this

  • not only includes picking the correct quality of audio to play, but it also includes allowing

  • elements like the audio element to process segmented audio files without any modification.

  • And most browsers do this by leveraging the media sources extension API and the encrypted

  • media extensions API. Additionally, libraries like HLS.JS and Dash.JS are available while

  • cross browser support is low. As a side note, if you need to support iOS Safari, you need

  • HLS. But with most other browsers, you have options. So, it would have been really fun

  • to reverse engineer Spotify's audio player. But I got tired of reading their minified

  • code. So, I decided to make my own audio player. And I started with a cassette that I found

  • from a box of cassettes. And I chose it because it has the words "Map squad" written on it.

  • And I used my iPhone's voice memo application to record the audio so the quality is so so

  • at best. But it works. And you can try it right now. But maybe wait until the end of

  • the talk because I want to show you how it's made. The entire application is a single in

  • docs.HTML file with an audio element in the body. When loaded into the browser, the immediately

  • invoked function runs, the init function. And at the top, we define the audio that's

  • equal to our audio element. Next web see if the media sources extension API is supported

  • in our browser. If it is, we will assume we can use dash.JS to enable MPEG DASH in most

  • browsers. Pass it to the dash.JS media player. And when the player is initialized, our audio

  • will be loaded with it. If the media sources extension API is not available, we're going

  • to assume we're using iOS Safari and we need to have an HLS stream. We will do this by

  • setting the source attribute of our audio element to the master playlist or the past

  • to our master playlist. And that  this file is all you need to stream audio to most browsers

  • in 2019. If you want to try it out in the browser for yourself, or you want to create

  • your own audio streams, please feel free to fork 24 repo. Thank you.

  • [ Applause ] KATIE: I'm sorry. I think that scared me more

  • than it scared you. Thank you so much, Sara. Can you believe that is the first talk she

  • has ever given at a conference? Yes. Amazing. All right. So, we have about a  a 15 minute

  • break right now. So, go out and pick up your swag bags. And we'll see you back here at

  • 3:00. Patricia Ruiz Realini is talking about the importance of your local library. Which

  • is pretty cool because I hang out at the library. We'll see you back here at 3:00. No, wait.

  • 3:00, yeah.

Meet the Packets: How audio travels into your browser

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it