Subtitles section Play video
Sparks: Good afternoon. My name's Dave Sparks.
I'm on the Android team.
And I'm the technical lead for the multimedia framework.
I've been working on Android since October of 2007.
But actually, technically, I started before that,
because I worked on the MIDI engine that we're using.
So I kind of have a long, vested interest
in the project.
So today, we have kind of an ambitious title,
called "Mastering the Media Framework."
I think the reality is that if you believe that--
that we're going to do that in an hour,
it's probably pretty ambitious.
And if you do believe that,
I have a bridge just north of here
that you might be interested in.
But I think we actually will be able to cover
a few kind of interesting things.
In thinking about this topic,
I wanted to cover stuff that wasn't really available
in the SDK, so we're really going to del--
delve into the lower parts of the framework,
the infrastructure that basically
everything's built on.
Kind of explain some of the design philosophy.
So...
Oh, I guess I should have put that up first.
Here we go.
So on the agenda,
in the cutesy fashion of the thing,
we're talking about the architecture--
Frank Lloyd Android.
What's new in our Cupcake release,
which just came out recently.
And those of you who have the phone,
you're running that on your device today.
And then a few common problems that people run into
when they're writing applications for the framework.
And then there probably will be
a little bit of time left over at the end
for anybody who has questions.
So moving along,
we'll start with the architecture.
So when we first started designing the architecture,
we had some goals in mind.
One of the things was to make development
of applications that use media, rich media applications,
very easy to develop.
And so that was one of the key goals
that we wanted to accomplish in this.
And I think you'll see it as we look at the framework.
It's really simple to play audio,
to display a video, and things like that.
One of the key things, because this is
a multi-tasking operating system,
is we have-- you could potentially have
things happening in the background.
For example, you could have a music player
playing in the background.
We need the ability to share resources
among all these applications,
and so that's one of the key things,
was to design an architecture
that could easily share resources.
And the other thing is, you know,
paramount in Android is the security model.
And if you've looked over the security stuff--
I'm not sure we had a talk today on security.
But security is really important to us.
And so we needed a way to be able to sandbox
parts of the application that are--
that are particularly vulnerable,
and I think you'll see as we look at the--
the framework, that it's designed
to isolate parts of the system
that are particularly vulnerable to hacking.
And then, you know, providing a way
to add features in the future
that are backwards compatible.
So that's the-- the room for future growth.
So here's kind of a 30,000-foot view
of the way the media framework works.
So on the left side, you'll notice
that there is the application.
And the red line-- red dashed line there--
is denoting the process boundary.
So applications run in one process.
And the media server actually runs
in its own process that's actually booted up--
brought up during boot time.
And so the codecs
and the file parsers and the network stack
and everything that has to do with playing media
is actually sitting in a separate process.
And then underneath that are the hardware abstractions
for the audio and video pass.
So Surface Flingers are an abstraction for video
and graphics.
And Audio Flinger's the abstraction for audio.
So looking at a typical media function,
there's a lot of stuff--
because of this inner process communication that's going on,
there's a lot of things that are involved
in moving a call down the stack.
So I wanted to give you an idea--
for those of you who've looked at the source code,
it's sometimes hard to follow, you know, how is a call--
A question that comes up quite frequently
is how does a function call, like, you know,
prepare or make its way
all the way down to the framework
and into the-- the media engine?
So this is kind of a top-level view
of what a stack might look like.
At the very top is the Dalvik VM proxy.
So that's the Java object that you're actually talking to.
So, for example, for a media player,
there's a media player object.
If you look at the media player definition,
it's a pretty-- I mean,
there's not a lot of code in Java.
It's pretty simple.
And basically, it's a proxy for--
in this case, actually, the native proxy,
which it's underneath, and then eventually,
the actual implementation.
So from that, we go through JNI,
which is the Java Native Interface.
And that is just a little shim layer
that's static bindings
to an actual MediaPlayer object.
So when you create a MediaPlayer in Java,
what you're actually doing is making a call
through this JNI layer
to instantiate a C++ object.
That's actually the MediaPlayer.
And there's a reference to that that's held in the Java object.
And then some tricky stuff--
weak references to garbage collection
and stuff like that, which is a little bit too deep
for the talk today.
Like I said, you're not going to master the framework today,
but at least get an idea of what's there.
So in the native proxy,
this is actually a proxy object for the service.
So there is a little bit of code in the native code.
You know, a little bit of logic in the native code.
But primarily, most of the implementation
is actually sitting down in this media server process.
So the native proxy is actually the C++ object
that talks through this binder interface.
The reason we have a native proxy
instead of going directly through JNI
is a lot of the other pieces of the framework does.
So we wanted to be able to provide
access to native applications in the future
to use MediaPlayer objects.
So it makes it relatively easy,
because that's something you'd probably want to do
with games and things like that
that are kind of more natural to write in native code.
We wanted to provide the ability to do that.
So that's why the native proxy sits there
and then the Java layer just sits on top of that.
So the binder proxy and the binder native piece--
Binder is our abstraction for inter-process communication.
Binder, basically, what it does,
is it marshals objects across this process boundary
through a special kernel driver.
And through that, we can do things like move data,
move file descriptors that are duped across processes
so that they can be accessed by different processes.
And we can also do something which--we can share memory
between processes.
And this is a really efficient way of moving data
back and forth between the application
and the media server.
And this is used extensively
in Audio Flinger and Surface Flinger.
So the binder proxy is basically the marshalling code
on the applications side.
And the binder native code is the marshalling code
for the server side of the process.
And if you're looking at all the pieces
of the framework-- they start with
mediaplayer.java, for example--
there's an android_media...
_mediaplayer.cpp,
which is the JNI piece.
There's a mediaplayer.cpp,
which is the native proxy object.
Then there's an imediaplayer.cpp,
which is actually a-- a binder proxy
and the binder native code in one chunk.
So you actually see the marshalling code
for both pieces in that one file.
And one is called bpmediaplayer.cpp--
or, sorry, BP MediaPlayer object.
And a BN MediaPlayer object.
So when you're looking at that code,
you can see the piece that's on the native side--
the server side and the proxy.
And then the final piece of the puzzle
is the actual implementation itself.
So in the case of the media server--
sorry, the MediaPlayer-- there's a MediaPlayer service
which instantiates a MediaPlayer object
in the service that's, you know, proxied
in the application by this other MediaPlayer object.
That's basically-- each one of the calls
goes through this stack.
Now, because the stack is, you know, fairly lightweight
in terms of we don't make a lot of calls through it,
we can afford a little bit of overhead here.
So there's a bit of code that you go through
to get to this place, but once you've started playing,
and you'll see this later in the slides,
you don't have to do a lot of calls
to maintain the application playing.
So this is actually kind of a top-level diagram
of what the media server process looks like.
So I've got this media player service.
And it can instantiate a number of different players.
So on the left-hand side, you'll see, bottom,
we have OpenCORE, Vorbis, and MIDI.
And these are three different media player types.
So going from the simplest one, which is the Vorbis player--
Vorbis basically just plays Ogg Vorbis files,
which is a-- we'll get into the specifics
of the codec, but it's a psycho-acoustic codec
that's open sourced.
We use this for a lot of our internal sounds,
because it's very lightweight.
It's pretty efficient.
And so we use that for our ringtones
and for our application sounds.
The MIDI player, a little more complex.
But basically, it's just another instantiation
of a media player.
These all share a common interface,
so if you look at the MediaPlayer.java interface,
there's almost, you know, one-for-one correspondence
between what you see there and what's actually happening
in the players themselves.
And then the final one is OpenCORE.
So anything that isn't an Ogg file or a MIDI file
is routed over to OpenCORE.
And OpenCORE is basically the-- the bulk of the framework.
It consists of all of the major codecs,
like, you know, MP3 and AAC and AMR
and the video codecs, H.263 and H.264 and AVC.
So any file that's not specifically one of those two
ends up going to OpenCORE to be played.
Now, this provides some extensibility.
The media player service is smart enough
to sort of recognize these file types.
And we have a media scanner that runs at boot time--
that goes out, looks at the files,
figures out what they are.
And so we can actually, you know, replace or add
new player types by just instantiating
a new type of player.
In fact, there are some projects out there
where they've replaced OpenCORE with GStreamer
or other media frameworks.
And we're talking to some other--
some different types of player applications
that might have new codecs and new file types,
and that's one way of doing it.
The other way of doing it is you--
if you wanted to add a new file type,
you could actually implement it inside of OpenCORE.
And then on the right-hand side,
we have the media recorder service.
Prior to-- in the 1.0, 1.1 releases,
that was basically just an audio record path.
In Cupcake, we've added video recording.
So this is now integrated with a camera service.
And so the media recorder-- again, it's sort of a proxy.
There's a proxy, um--
it uses the same sort of type of thing,
where there's a media recorder-- media recorder object
in the Java layer.
And there's a media recorder service
that actually does the recording.
And for the actual authoring engine,
we're using OpenCORE.
And it has the-- the encoder side.
So we've talked about the decoders,
and the encoders would be H.263, H.264, and also AVC.
Sorry, and MPEG-4 SP.
And then, the audio codecs.
So all those sit inside of OpenCORE.
And then the camera service both operates
in conjunction with the media recorder
and also independently for still images.
So if your application wants to take a still image,
you instantiate a camera object,
which again is just a proxy for this camera service.
The camera surface takes care of handling preview for you,
so again, we wanted to limit the amount of traffic
between the application and the hardware.
So this actually provides a way for the preview frames
to go directly out to the display.
Your application doesn't have to worry about it,
it just happens.
And then in the case where the media recorder
is actually doing video record,
we take those frames into the OpenCORE
and it does the encoding there.
So kind of looking at what a media playback session
would look like.
The application provides three main pieces of data.
It's going to provide the source URI.
The "where is this file coming from."
It'll either come from a local file that's on the--
you know, on the SD card.
It could come from a resource
that's in the application, the .apk,
or it could come from a network stream.
And so the application provides that information.
It provides a surface that basically,
at the application level, called a surface view.
This, at the binder level, is an ISurface interface,
which is an abstraction for the--the view that you see.
And then it also provides the audio types,
so that the hardware knows where to route the audio.
So once those have been established,
the media server basically takes care of everything
from that point on.
So you--once you have called the prepare function
and the start function,
the frames--video frames, audio frames, whatever, are--
they're going to be decoded inside the media server process.
And they get output directly to either Audio Flinger
or Surface Flinger, depending on whether
it's an audio stream or a video stream.
And all the synchronization is handled for you automatically.
Again, it's a very low overhead.
There's no data that's flowing back up to the application
at this point--it's all happening inside the hardware.
One other reason for doing that
we mentioned earlier is that in the case--
in many cases, for example the G1 and the Sapphire,
the device that you guys got today--
those devices actually have hardware codecs.
And so we're able to take advantage
of a DSP that's in the device to accelerate.
In the case of, for example, H.264,
we can accelerate the decoded video in there
and offload some of that from the main processor.
And that frees the processor up to do other things,
either, you know, doing sync in the background,
or just all sorts of things that it might need--
you might need those cycles for.
So again, that's-- all that is happening
inside the media server process.
We don't want to give applications direct access
to the hardware, so it's another good reason
for putting this inside the media server process.
So in the media recorder side,
we have a similar sort of thing.
It's a little more complex.
The application can either,
in the case of--
it can actually create its own camera
and then pass that to the media server
or it can let the media server create a camera for it.
And then the frames from the camera go directly
into the encoders.
It again is going to provide a surface for the preview,
so as you're taking your video, the preview frames are going
directly to the-- to the display surface
so you can see what you're recording.
And then you can select an audio source.
Right now that's just the microphone input,
but in the future, it could be other sources.
You know, potentially you could be recording
from, you know, TV or some-- some other hardware device
that's on the device.
And then--so once you've established that,
the camera service will then start feeding frames
through the camera service up to the media server
and then they're pushed out to the Surface Flinger
and they're also pushed out into OpenCORE for encoding.
And then there's a file authoring piece
that actually takes the frames from audio and video,
boxes them together, and writes them out to a file.
So, get into a little more detail about the codecs.
We have a number of different--
we have three different video codecs.
So one of the questions that comes a lot--
comes up a lot from the forums
is what kind of codecs are available,
what should they be used for, and things like that.
So just kind of a little bit of history
about the different codecs.
So H.263 is a codec from-- I think it was--
came out about 1996, was when it was standardized.
It was originally intended for video conferencing,
so it's really low bit-rate stuff.
You know, designed to go over an ISDN line
or something like that.
So it's actually worked out pretty well for mobile devices,
and a lot of mobile devices support H.263.
The encoder is pretty simple.
The decoder is pretty simple.
So it's a lightweight kind of codec for an embedded device.
It's part of the 3GPP standard.
So it's adopted by a number of different manufacturers.
And it's actually used by a number of existing
video sites-- of websites--
for their encode.
For example, YouTube-- if you go to, like,
the m.youtube.com,
typically you'll end up at an H.263 stream.
Because it's supported on most mobile devices.
So MPEG-4 SP was originally designed
as a replacement for MPEG-1 and MPEG-2.
MPEG-1, MPEG-2--fairly early standardized codecs.
They wanted to do something better.
Again, it has a very simple encoder model, similar to H.263.
There's just single frame references.
And there's some question about whether
it's actually a better codec or not than H.263,
even though they're--
they came out very close together.
It's missing the deblocking filter, so--
I didn't mention that before.
H.263 has a deblocking filter.
If you've ever looked at video,
it typically comes out in, like, 8x8 pixel blocks.
And you get kind of a blockiness.
So there's an in-loop deblocking filter in H.263,
which basically smooths some of those edges out.
The MPEG-4 SP, in its basic profile,
is missing that.
So it--the quality of MPEG-4,
some people don't think it's quite as good,
even though it came out at roughly the same time.
Then the final codec we support
is a fairly recent development.
I think it's a 2003, or something like that.
The H.264 AVC codec came out.
Compression's much better.
It includes the ability
to have multiple reference frames,
although on our current platforms,
we don't actually support that.
But theoretically, you could get better compression
in the main-- what's called the main profile.
We support base profile.
It has this mandatory in-loop deblocking filter
that I mentioned before,
which gets rid of the blockiness in the frames.
One of the really nice things
is it has a number of different profiles.
And so different devices support different levels
of--of profiles.
It specifies things like frame sizes, bit rates,
the--the types of advanced features
that it has to support.
And there's a number of optional features in there.
And basically, each of those levels
and profiles defines what's in those codecs.
It's actually used in a pretty wide range of things.
Everything from digital cinema, now, HDTV broadcasts,
and we're starting to see it on mobile devices like the G1.
When you do a--if you're using the device itself today,
and you do a YouTube playback,
you're actually-- on Wi-Fi,
you're actually getting a H.264 stream,
which is why it's so much better quality.
On the downside, it's a lot more complex than H.263
because it has these advanced features in it.
So it takes a lot more CPU.
And in the case of the G1, for example,
that particular hardware,
some of the acceleration happens in the DSP,
but there's still some stuff that has to go
on the application processor.
On the audio side, MP3 is pretty--
everybody's pretty familiar with.
It uses what's called a psycho-acoustic model,
which is why we get better compression than a typical,
you know, straight compression algorithm.
So psycho-acoustic means you look for things in the--
that are hidden within the audio.
There are certain sounds
that are going to be masked by other sounds.
And so the psycho-acoustic model
will try to pick out those things,
get rid of them, and you get better--
much better compression there.
You get approximately 10:1 compression
over a straight linear PCM at 128kbits per second,
which is pretty reasonable, especially for a mobile device.
And then if you want to, you know, be a purist,
most people figure you get full sonic transparency
at about 192kbits per second.
So that's where most people won't be able to hear
the difference between the original
and the compressed version.
For a more advanced codec,
AAC came out sometime after MP3.
It's built on the same basic principles,
but it has much better compression ratios.
You get sonic transparency at roughly 128kbits persecond.
So, you know, much, much better compression.
And another mark that people use
is 128kbits per second--
MP3 is roughly equivalent to 96kbits per second AAC.
We also find it's-- it's used, commonly used,
in MPEG-4 streams.
So if you have an MPEG-4 audio--video stream,
you're likely to find an AAC codec with it.
In the case of our high-quality YouTube streams,
they're typically a 96 kilohertz AAC format.
And then finally, Ogg Vorbis, which I'd mentioned earlier,
we're using for a lot of our sounds.
Again, it's another psycho-acoustic model.
It's an open source codec,
so it doesn't have any patent,
you know, issues in terms of licensing--
whereas any of the other codecs, if you're selling a device,
you need to go, you know,
get the appropriate patent licenses.
Or I probably shouldn't say that,
because I'm not a lawyer,
but you should probably see your lawyer.
From our perspective, it's very low overhead.
It doesn't bring in all of the OpenCORE framework,
'cause it's just an audio codec.
So it uses-- it's very lightweight
in terms of the amount of memory usage it uses
and also the amount of code space
that it has to load in in order to play a file.
So that's why we use it for things like ringtones
and other things that need fairly low latency
and we know we're gonna use it a lot.
The other thing is that, unlike MP3--
MP3 doesn't have a native way of specifying a seamless loop.
For those of you who aren't audio guy--
audio experts, "seamless loop" basically means
you can play the whole thing as one seamless,
no clips, no pops loop to play over and over again.
A typical application for that would be a ringtone,
where you want it to continue playing
the same sound over and over again
without--without the pops and clicks.
MP3 doesn't have a way to specify that accurately enough
that you can actually do that without having some sort of gap.
There are people that have added things in the ID3 tags
to get around that, but there isn't
any standardized way to do it.
Ogg does it-- actually, both Ogg and AAC
have conventions for specifying a seamless loop.
So that's another reason why we use Ogg
is that we can get that nice seamless loop.
So if you're doing anything in a game application
where you want to get, you know, some sort of--
a typical thing would be like an ambient sound
that's playing over and over in the background.
You know, the factory sound or, you know,
some eerie swamp noises or whatever.
That's the way to do it is to use the Ogg file.
You'll get pretty good compression.
It's pretty low overhead for decoding it.
And you can get those loops that won't click.
And then finally, the last codecs
we're going to talk about in terms of audio
are the AMR codecs.
AMR is a speech codec,
so it doesn't get the full bandwidth.
If you ever try to encode one with music on it,
it will sound pretty crappy.
That's because it-- it wants to kind of focus in
on one central tone.
That's how it gets its high compression rate.
But at the same time, it throws away a lot of audio.
So it's typically used for video codecs.
And in fact, GSM basically is based
on AMR-type codecs.
It's--the input is,
for the AMR narrow band, is 8 kilohertz.
So going back to Nyquist, that basically means
your highest frequency you can represent
is just shy of 4 kilohertz.
And the output bit-rates are, you know,
anywhere from just under 5kbits per second up to 12.2.
AMR wide band is a little bit better quality.
It's got a 16 kilohertz input, and slightly higher bandwidths.
But again, it's a speech codec primarily,
and so you're not going to get great audio out of it.
We do use these, because in the package,
the OpenCORE package, the AMR narrow band codec
is the only audio encoder--
native audio encoder we have in software.
So if your hardware platform doesn't have an encoder,
that's kind of the fallback codec.
And in fact, if you use the audio recorder application
like MMS, and attach an audio,
this is the codec you're going to get.
If you do a video record today,
that's the codec you're going to get.
We're expecting that future hardware platforms
will provide, you know, native encoders for AAC.
It's a little too heavy to do AAC
on the application processor
while you're doing video record and everything else.
So we really need the acceleration
in order to do it.
AMR is specified in 3GPP streams.
So most phones that will decode an H.263
will also decode the AMR.
So it's a fairly compatible format.
If you look at the--the other phones that are out there
that support, you know, video playback,
they typically will support AMR as well.
So we've talked about codecs.
Both audio and video codecs.
The other piece of it, when you're doing a stream,
is what's the container format?
And so I'm going to talk a little bit about that.
So 3GPP is the stream that's defined
by the 3GPP organization.
These are phones that support that standard
and are going to support these types of files.
3GPP is actually an MPEG-4 file format.
But it's--very, very restricted set of--
of things that you can put into that file,
designed for compatibility with these embedded devices.
So you really want to use a H.263 video codec
for--for broad compatibility across a number of phones.
You probably want to use a low bit rate for the video,
typically like 192kbits per second.
And you also want to use the AMR narrow band codec.
For MPEG-4 streams, which we also support,
they're typically higher quality.
They typically are going to use
either an H.264 or a higher-- bigger size H.263 format.
Usually they use an AAC codec.
And then on our particular devices,
the G1 and the device that you just received today--
I'm not even sure what we're calling it--
I--
is capable of up to 500kbits per second
on the video side
and 96kbits per second.
So a total of about 600kbits per second,
sustained.
If you do your encoding well,
you're going to actually get more than that out of it.
We've actually been able to do better
than 1 megabit per second, but you have to be--
have a really good encoder.
If it gets "burst-y," it will interfere
with the performance of the codec.
So one question that comes up a lot on the forums
is what container should I use
if I'm either authoring or if I'm doing video recording?
So for authoring for our Android device,
if you want the best quality--
the most bang for your bits, so to speak--
you want to use an MPEG-4 codec--
er, container file with an H.264 encoded stream.
It needs to be, for these devices today,
a baseline profile roughly, as I was saying before,
at 500kbits per second HVGA or smaller,
and AAC codec up to 96kbits per second.
That will get you a pretty high quality--
that's basically the screen resolution.
So it looks really good on-- on the display.
For other--
you're going to create content on an Android device,
so you have a video record application, for example.
And you want to be able to send that via MMS
or some other email or whatever to another phone,
you probably want to stick to a 3GPP format,
because not all phones will support an MPEG-4 stream,
particularly the advanced codecs.
So in that case we recommend...
I'm getting ahead of myself here.
So in that case we recommend using the QCIF format.
That's 192kbits per second.
Now, if you're creating content
on the Android device itself,
intended for another Android device,
we have an H.263 encoder.
We don't have an H.264 encoder,
so you're restricted to H.263.
And for the same reason I've discussed before,
we won't have an AAC encoder,
so you're going to use an AMR narrow band encoder,
at least on the current range of devices.
So those are kind of the critical things
in terms of inter-operability with other devices.
And then the other thing is-- a question that comes up a lot
is if I want to stream to an Android device,
what do I need to do to make that work?
The thing where most people fail on that
is the "moov" atom, which is the index of frames
that tells--basically tells the organization of the file,
needs to precede the data-- the movie data atom.
And...the...
Most applications will not do that naturally.
I mean, it's more-- it's easier for a programmer
to write something that builds that index afterwards.
So you have-- you typically have
to give it a specific-- you know,
turn something on,
depending on what the application is,
or if you're using FFmpeg,
you have to give it a command line option
that tell it to-- to put that atom
at the beginning instead of the end.
So...
For--we just recently came out with what we've been calling
the Cupcake release, or the 1.5 release.
That's the release that's on the phones
you just received today.
Some of the new features we added in the media framework.
We talked about video recording before.
We added an AudioTrack interface
and an AudioRecord interface in Java,
which allows direct access to raw audio.
And we added the JET interactive MIDI engine.
These are kind of the-- the highlights
in the media framework area.
So kind of digging into the specifics here...
AudioTrack-- we've had a lot of requests
for getting direct access to audio.
And...so what AudioTrack does is allow you
to write a raw stream from Java
directly to the Audio Flinger mixer engine.
Audio Flinger is a software mixer engine
that abstracts the hardware interface for you.
So it could actually-- it could mix multiple streams
from different applications.
To give you an example,
you could be listening to an MP3 file
while the phone rings.
And the ringtone will play
while the MP3 file is still playing.
Or a game could have multiple sound effects
that are all playing at the same time.
And the mixer engine takes care of that automatically for you.
You don't have to write a special mixer engine.
It's in-- built into the device.
Potentially could be hardware accelerated in the future.
And it also allows you to...
It does sample rate conversion for you.
So you can mix multiple streams at different sample rates.
You can modify the pitch and so on and so forth.
So what AudioTrack does, it gives you direct access
to that mixer engine.
So you can take a raw Java stream,
you know, 16-bit PCM samples, for example,
and you can-- you can send that out
to the mixer engine.
Have it do the sample rate conversion for you.
Do volume control for you.
It does-- has anti-zipper volume filters
so--if anybody's ever played with audio before,
if you change the volume,
it changes the volume in discrete steps
so you don't get the pops or clicks
or what we typically refer to as zipper noise.
And that's all done with...
Either you can do writes on a thread in Java,
or you can use the callback engine to fill the buffer.
Similarly, AudioRecord gives you direct access to the microphone.
So in the same sort of way,
you could pull up a stream from the microphone.
You specify the sample rate you want it in.
And, you know, with the combination
of the two of those,
you can now take a stream from the microphone,
do some processing on it, and now put it back out
via the...
the AudioTrack interface too, that mixer engine.
And that mixer engine will go wherever audio is routed.
So, for example, a question that comes up
a lot is, well, what if they have a Bluetooth device?
Well, that's actually handled for you automatically.
There's nothing you have to do as an application programmer.
If there's a Bluetooth device paired that supports A2DP,
then that audio is going to go directly
to the...to the A2DP headset.
Your...whether it's a headset or even your car or whatever.
And then we've got this call mack--
callback mechanism so you can actually
just set up a buffer and just keep--
when you get a callback, you fill it.
You know, if you're doing a ping-pong buffer,
where you have half of it being filled
and the other half is actually being output to the device.
And there's also a static buffer mode
where you give it a-- for example,
a sound effect that you want to play
and it only does a single copy.
And then it just automatically mixes it,
so each time you trigger the sound,
it will mix it for you,
and you don't have to do additional memory copies.
So those are kind of the big highlights
in terms of the-- the audio pieces of it.
Then another new piece that's actually been in there
for a while, but we've finally implemented the Java support,
is the JET Interactive MIDI Engine.
So JET is--
it's based upon the EAS MIDI engine.
And what it does is allow you to pre-author some content
that is very interactive.
So what you do is you,
if you're an author, you're going to create content
in a-- your favorite authoring tool.
Digital authoring workstation tool.
It has a VST plugin, so that you can, you know,
basically write your-- your game code--
your--your audio in the tool
and hear it back played as it would be played on the device.
You can take and have multiple tracks
that are synchronized and mute them and unmute them
synchronous with the segment.
So basically, your piece is going to be divided up into
a bunch of little segments.
And just as an example,
I might have an A section, like the intro,
and maybe I have a verse and I have a chorus.
And I can interactively get those to place
one after another.
So, for example, if I have a game that, um--
it has kind of levels, I might start with
a certain background noise, and perhaps, you know,
my character's taking damage.
So I bring in some little element
that heightens the tension in the game
and this is all done seamlessly.
And it's very small content, because it's MIDI.
And then you can actually have little flourishes
that play in synchronization with it--
with the music that's going on.
So some--for example, let's say you, you know,
you take out an enemy.
There's a little trumpet sound or whatever.
A sound effect that's synchronized
with the rest of the-- the audio that's playing.
Now all this is done under-- under program control.
In addition to that, you also have the ability
to have callbacks that are synchronized.
So a good example would be a Guitar Hero type game
where you have music playing in the background.
What you really want to do is have the player
do something in synchronization with the rhythm of the sound.
So you can get a callback in your Java application
that tells you when a particular event occurred.
So you could create these tracks of--of events
that you've been-- you know, measured--
did they hit before or after?
And we actually have a sample application
in the SDK that shows you how to do this.
It's a--I think a, like, two- or three-level game
that with-- complete with graphics
and sound and everything to show you how to do it.
The code--the code itself is written in native code
that's sitting on top of the EAS engine,
so again, in keeping with our philosophy
of trying to minimize the--
the overhead from the application,
this is all happening in background.
You don't have to do anything to keep it going
other than keep feeding it segments.
So periodically, you're going to wake up and say,
"Oh, well, here's the next segment of audio to play,"
and then it will play automatically
for whatever the length of that segment is.
It's all open source.
Not only is the-- the code itself open source,
but the tools are open sourced,
including the VST plugin.
So if you are ambitious
and you want to do something interesting with it,
it's all sitting out there for you to play with.
I think it's out there now.
If not, it will be shortly.
And so those are the big highlights of the--
the MIDI-- the MIDI engine.
Oh, I forgot. One more thing.
The DLS support-- so one of the critiques
of general MIDI, or MIDI in general,
is the quality of the instruments.
And admittedly, what we ship with the device is pretty small.
We try to keep the code size down.
But what the DLS support does with JET
is allow you to load your own samples.
So you can either author them yourself
or you can go to a content provider
and author these things.
So if you want a high-quality piano
or you want, you know, a particular drum set,
you're going for a techno sound or whatever,
you can actually, you know,
put these things inside the game,
use them as a resource,
load them in and-- and your game will have
a unique flavor that you don't get
from the general MIDI set.
So...
I wanted to talk about a few common problems
that people run into.
Start with the first one here.
This one I see a lot.
And that is the behavior of the application
for the volume control is-- is inconsistent.
So, volume control on Android devices
is an overloaded function.
And as you can see from here,
if you're in a call, what the volume control does
is adjust the volume that you're hearing
from the other end of the phone.
If you're not in a call, if it's ringing,
pressing the volume button mutes the--the ringer.
Oh, panic.
I'm in a, you know, middle of a presentation
and my phone goes off.
So that's how you mute it.
If we can detect that a media track is active,
then we'll adjust the volume of whatever is playing.
But otherwise, it adjusts the ringtone volume.
The issue here is that if your-- if your game is--
or your application is just sporadically making sounds,
like, you know, you just have little UI elements
or you play a sound effect periodically,
you can only adjust the volume of the application
during that short period that the sound is playing.
It's because we don't actually know
that you're going to make sound until that particular instant.
So if you want to make it work correctly,
there's an-- there's an API you need to call.
It's in--it's part of the activity package.
It's called setVolumeControlStream.
So you can see a little chunk of code here.
In your onCreate,
you're going to call this setVolumeControlStream
and tell it what kind of stream you're going to play.
In the case of most applications that are in the foreground,
that are playing audio,
you probably want streamed music,
which is kind of our generic placeholder
for, you know, audio that's in the foreground.
If your ringtone application, for some--
you know, you're playing ringtones,
and you would select a different type.
But this basically tells the activity manager,
when you press the audio button,
if none of those...
previous things are-- in other words,
if we're not in call, if it's not ringing,
and if there's-- if--
if none of these other things are happening,
then that's the default behavior of the volume control.
Without that, you're probably going to get
pretty inconsistent behavior and frustrated users.
That's probably the number one problem
I see with applications in the marketplace today
is they're not using that.
Another common one I see on the--in a--
on the forums is people saying,
"How do I--how do I play a file from my APK?
"I just want to have an audio file
that I ship with the-- with the package,"
and they get this wrong for whatever reason.
I think we have some code out there
from a long time ago that looks like this.
And so this doesn't work.
This is the correct way to do it.
So there's this AssetFileDescriptor.
I talked a little bit earlier about the binder object
and how we pass things through,
so we're going to pass the file descriptor,
which is a pointer to your resource,
through the binder to the...
I don't know how that period got in there.
It should be setDataSource.
So it's setDataSource, takes a FileDescriptor,
StartOffset, and a Length,
and so what this will do is, using a resource ID,
it will find, you know, open it,
find the offset where that raw--
that resource starts.
And it will, you know, pass--
set those values so that we can tell
the media player where to find it,
and the media player will then play that
from that offset in the FileDescriptor.
I had another thought there.
Oh, yeah. So--yeah.
Raw resources, make sure that when you put your file in,
you're putting it in as a raw resource,
so it doesn't get compressed.
We don't compress things like MP3 files and so on.
They have to be in the raw directory.
Another common one I see on the forums
is people running out of MediaPlayers.
And this is kind of an absurd example,
but, you know, just to give you a point.
There is a limited amount of resources.
This is an embedded device.
A lot of people who are moving over from the desktop
don't realize that they're working with something
that's, you know, equivalent to a desktop system
from maybe ten years ago.
So don't do this.
If you're going to use MediaPlayers,
try to recycle them.
So our solution is, you know,
there are resources that are actually allocated
when you create a MediaPlayer.
It's allocating memory, it may be loading codecs.
It may--there may actually be a hardware codec
that's been instantiated that you're preventing
the rest of the system from using.
So whenever you're done with them,
make sure you release them.
So you're going to call release,
you set null on the MediaPlayer object.
Or you can call reset and set-- do a new setDataSource,
which, you know, is basically just recycling your MediaPlayer.
And try to keep it to, you know, two or three maximum.
'Cause you are sharing with other applications, hopefully.
And so if you get a little piggy with your MediaPlayer resources,
somebody else can't get them.
And also, if you go into the background--
so, and you're in-- on pause,
you definitely want to release all of your MediaPlayers
so that other applications can get access to them.
Another big one that happens a lot
is the CPU... "My CPU is saturated."
And you look at the logs and you see this.
You know, CPU is-- is--
can't remember what the message is now.
But it's pretty clear that the CPU is unhappy.
And this is kind of the typical thing,
is that you're trying to play too many
different compressed streams at a time.
Codecs take a lot of CPU resources,
especially ones that are running on software.
So, you know, a typical, say, MP3 decode
of a high-quality MP3 might take 20% of the CPU.
You add up two or three of those things,
and you're talking about some serious CPU resources.
And then you wonder why your, you know, frame rate
on your game is pretty bad.
Well, that's why.
So we actually have a solution for this problem.
It's called SoundPool.
Now, SoundPool had some problems in the 1.0, 1.1 release.
We fixed those problems in Cupcake.
It's actually pretty useful.
So what it allows you to do is take resources
that are encoded in MP3 or AAC or Ogg Vorbis,
whatever your preferred audio format is.
It decodes them and loads them into memory
so they're ready to play,
and then uses the AudioTrack interface
to play them out through the mixer engine
just like we were talking about before.
And so you can get much lower overhead.
You know, some are in the order of about 5% per stream
as compared to these, you know, 20% or 30%.
Depending on what the audio codec is.
So it gives you the same sort of flexibility.
You can modify--in fact, it actually gives you
a little more flexibility, because you can set the rates.
It can-- will manage streams for you.
So if you want to limit the number of streams
that are playing, you tell it upfront,
"I want," let's say, "eight streams maximum."
If you exceed that, it will automatically,
based on the priority, you know, select the least priority,
get rid of that one, and start the new sound.
So it's kind of managing resources for you.
And then you can do things like pan in real time.
You can change the pitch.
So if you want to get a Doppler effect
or something like that, this is the way to do it.
So that's pretty much it.
We have about ten minutes left for questions,
if anybody wants to go up to a microphone.
[applause]
Thank you.
man: Hi, thank you. That was a great talk.
Is setting the streamed music,
so you can respond to the volume control--
do you have to do that every time you create a new activity,
or is it sticky for the life of the app?
Sparks: It's sticky--
you're going to call it in your onCreate function.
man: But in every single activity?
Sparks: Yeah, yeah. man: Okay.
man: Hi, my first question is that currently,
Android using the OpenCORE
for the multimedia framework.
And my question is that does Google has any plan
to support any other middleware,
such as GStreamer or anything else?
Sparks: Not at this time.
We don't have any plans to support anything else.
man: Okay.
What's the strategy of Google
for supporting other pioneers
providing this multimedia middleware?
Sparks: Well, so, because of the flexibility
of the MediaPlayer service, you could easily add
another code--another media framework engine in there
and replace OpenCORE.
man: Okay.
So my second question is that, um--
[coughs]
that currently--
Google, you mentioned implementing the MediaPlayer
and the recording service.
Is there any plan to support the mobile TV and other,
such as video conference, in frameworks?
Sparks: We're--we're looking at video conferencing.
Digital TV is probably a little bit farther out.
We kind of need a platform to do the development on.
So we'll be working with partners.
Basically, if there's a partner that's interested
in something that isn't there,
we will--we can work with you on it.
man: Okay, thank you.
man: Does the media framework support RTSP control?
Sparks: Yes.
So RTSP support is not as good as we'd like it to be.
It's getting better with every release.
And we're expecting to make some more strides
in the next release after this.
But Cupcake is slightly better.
man: And that's specified by...
in the URL, by specifying the RTSP?
Sparks: Yeah. Right. man: Okay.
And you mentioned, like, 500 kilobits per second
being the maximum, or--
What if you tried to play something
that is larger than that?
Sparks: Well, the codec may fall behind.
What will typically happen is that you'll get a--
if you're using our MovieView, you'll get an error message
that says that it can't keep up.
man: Mm-hmm. So it will try, but it will--
It might fall behind. Sparks: Yeah.
man: Thank you.
man: My question is ask--
how about-- how much flexibility we have
to control the camera services?
For example, can I control the frame rate,
and the color tunings, and et cetera?
Sparks: Yeah, some of that's going to depend on the--
on the device.
We're still kind of struggling
with some of the device-specific things,
but in the case of the camera,
there's a setParameters interface.
And there's access, depending on the device,
to some of those parameters.
The way you know that is, you do a setParameter.
Let's say you ask for a certain frame rate.
You--you do a getParameter.
You find out if it accepted your frame rate or not.
Because there's a number of parameters.
man: Yeah, but also, in the-- for example, the low light.
So you want--not only you want to slow the frame rate,
but also you want to increase the integration time.
Sparks: Right.
man: So in the-- sometimes you want,
even in the low light,
but you want to slow the frame rate.
But you still want to keep the normal integration time.
So how you--do you have those kind of flexibility to control?
Sparks: Well, so that's going to depend
on whether the hardware supports it or not.
If the hardware supports it, then there should be
a parameter for that.
One of the things we've done is--
for hardware dev-- manufacturers
that have specific things that they want to support,
that aren't like, standard--
they can add a prefix to their parameter key value pairs.
So that will, you know-- it's unique to that device.
And we're certainly open to manufacturers suggesting,
you know, new-- new standard parameters.
And we're starting to adopt more of those.
So, for example, like, white balance is in there.
Scene modes, things like that are all part of it.
man: Okay. Sparks: Yeah.
man: I was wondering what kind of native code hooks
the audio framework has?
I'm working on an app that basically would involve,
like, actively doing a fast Fourier transform,
you know, on however many samples you can get at a time.
And so, it seems like for now--
or in the Java, for example,
it's mostly built toward recording audio and--
and doing things with that.
What sort of active control do you have over the device?
Sparks: So officially, we don't support
native API access to audio yet.
The reason for that is,
we, you know-- any API we publish,
we're going to have to live with for a long whi--
a long time.
We're still playing with APIs,
trying to, you know, get-- make them better.
And so the audio APIs
have changed a little bit in Cupcake.
They're going to change again in the next two releases.
At that point, we'll probably be ready
to start providing native access.
What you can do,
very shortly we'll have a native SDK,
which will give you access to libc and libm.
You can get access to the audio
from the Java-- official Java APIs,
do your processing in native code,
and then feed it back, and you'll be able to do that
without having to do MEMcopies.
man: And so basically, that would just be
accessing the buffer that the audio writes to.
And also, just a very tiny question about the buffer.
Does it--
does it loop back when you record the audio?
Or is it--does it record in, essentially, like, blocks?
Do you record an entire buffer once in a row,
or does it sort of go back to the start and then keep going?
Sparks: You can either have it cycle through a static buffer,
or you can just pass in new buffers each time,
depending on how you want to use it.
man: Okay. Thanks.
man: Let's say you have a game
where you want to generate a sound instantly
on a button press or a touch.
Sparks: "Instantly" is a relative term.
man: As instantly as you can get.
Would you recommend, then, the JET MIDI stuff,
or an Ogg, or what?
Sparks: You--you're probably going to get best results
with SoundPool,
because SoundPool's really aimed at that.
What SoundPool doesn't give you--
and we don't have an API for it,
we get a lot of requests for it,
so, you know, it's on my list of things to do--
is synchronization.
So if you're trying to do a rhythm game
where you--you want to be able to have very precise control
of--of, say, a drum track--
you--there isn't a way to do that today.
But if you're just trying to do--
man: Like gunfire kind of thing.
Sparks: Gunfire? SoundPool is perfect for that.
That's--that's what it was intended for.
man: Yeah, if I use the audio mixer,
can I control the volume
of the different sources differently?
Sparks: Yes. man: Okay.
Sparks: So, SoundPool has a volume control
for each of its channels that you--
basically, when you trigger a SoundPool sound,
you get an ID back.
And you can use that to control that sound.
If you're using the AudioTrack interface,
there's a volume control interface on it.
man: My question is,
for the testing sites, how--
does Google have a plan to release a certain application
or testing program to verify MediaPlayer
and other media middleware like this?
Sparks: Right.
man: 3D and everything else?
Sparks: So we haven't announced
what we're doing there yet.
I can't talk about it.
But it's definitely something we're thinking about.
man: Okay.
Another question is about the concurrency
there for the mobile devices.
The resource is very limited.
So for example, the service you mentioned.
The memory is very limited.
So how do we handle any--
or maybe you have any experience--
handle the 3D surface
and also the multimedia surface
and put together a raw atom surface
or something like that?
Sparks: So when you say "3D," you're talking about--
man: Like OpenGL, because you do the overlay
and you use the overlay and you--
Sparks: Yeah, I'm-- I'm not that up on it.
I'm not a graphics guy.
I'm really an audio guy.
But I actually manage the team that does the 3D stuff.
So I'm kind of familiar with it.
There's definitely limited texture memory
that's available--that's probably the most critical thing
that we're running into-- but obviously,
you know, that--
we're going to figure out how to share that.
And so--
I don't have a good answer for you,
but we're aware of the problem.
man: Okay. Yeah.
Just one more question is do you have any plan
to move OpenGL 2.0 for the Android?
Sparks: Yes. If you--
man: Do you have a time frame?
Sparks: Yeah, if you're following
the master source tree right now,
you'll start to see changes come out for--
we're--we're marrying 2D and 3D space.
So the 2D framework will be running as an OpenGL context,
which will allow you, then, to, you know--
ES 2.0 context.
So you'll be able to share between the 3D app
and the 2D app.
Currently, if you have a 3D app,
it takes over the frame buffer
and nothing else can run.
You'll actually be able to run 3D
inside the 2D framework.
man: Okay, thank you.
man: I think this question is sort of related.
I was wondering how would you take, like, the--
the surface that you use to play back video
and use it as a texture, like in OpenGL?
Sparks: That's coming, yeah.
Yeah, that--so you actually would be able to map
that texture onto a 3D--
man: Is there any way you can do that today
with the current APIs?
Sparks: Nope.
Yeah, there's no access to the--
to the video after it leaves the media server.
man: And no time frame
as far as when there'll be
some type of communication as far as
how to about doing that in your applications?
Sparks: Well, it's-- so it's in our--
what we call our Eclair release.
So that's master today.
man: Okay. Okay, thank you.
Sparks: I think-- are we out of time?
woman: [indistinct]
Sparks: Okay.
woman: Hi, do you have any performance metrics
as to what are the performance numbers
with the certain playback of audio and video to share,
or any memory footprints available
that we can look up, maybe?
Sparks: Not today.
It's actually part of some of the work we're doing
that somebody was asking about earlier.
That I can't talk about yet. But yeah.
There's definitely some-- some plans to do metrics
and to have baselines that you can depend on.
woman: And then the second question that I have
is that do you have any additional formats
that are lined up or are in the roadmap?
Like VC-1 and additional audio formats?
Sparks: No, not-- not officially, no.
woman: Okay.
woman: Hi, this is back to the SoundPool question.
Is it possible to calculate latency
or at least know, like,
when the song actually went to the sound card
so I could at least know when it actually did play--
if there's any sort of callback or anything?
Sparks: So you can get a playback complete callback
that tells you when it left the player engine.
There's some additional latency in the hardware
that we...we don't have complete visibility into,
but it's reported back
through the audio track interface,
theoretically, if it's done correctly.
So at the MediaPlayer level, no.
At the AudioTrack level, yes.
If that's...makes any sense.
woman: Okay, so I can at least get that,
even if I can't actually calculate latency
for every single call?
Sparks: Right, right.
woman: Okay. Thank you.
Sparks: Uh-huh.
man: Yeah, this is a question
about the samples processing.
You partially touched upon that.
But in your architecture diagram,
where do you think the sound processing effect
really has to be placed?
For example, it could be an equalizer
or different kind of audio post processing
that needs to be done.
Because in the current Cupcake version, 1.5,
I do not see a placeholder
or any implementation of that sort.
Sparks: So one of the things we're in the process of doing
is we're-- we're looking at OpenAL--
Have I got that right? OpenAL ES?
As the, um--possibly the-- an abstraction for that.
But it definitely is something you want to do
on an application-by-application basis.
For example, you don't want to have
effects running on, you know, a notification if...
The--you--you wouldn't want the application
in the foreground and forcing something
on some other application that's running in background.
So that's kind of the direction we're headed with that.
man: What's the current recommendation?
How do you want the developers to address?
Sparks: Well, the-- since there isn't any way,
there's no recommendation.
I mean, if you were doing native code,
it's kind of up to you.
But our recommendation would be if you're, you know,
doing some special version of the code,
you would probably want to insert it
at the application level and not sitting
at the bottom of the Audio Flinger stack.
man: Okay, thanks.
woman: Is it better to get the system service once
and share it across activities in an application,
or let each activity fetch the service?
Sparks: I mean, there's a certain amount of overhead,
'cause it's a binder call to do it.
So if you know you're going to use it,
I would just keep it around.
I mean, it's just a-- a Java object reference.
So it's pretty cheap to hold around.
man: Is there any way to listen to music
on a mono Bluetooth?
Sparks: Ah, on a SCO?
Yeah, no. [chuckles]
The reason we haven't done that
is the audio quality is really pretty poor.
I mean, it's designed for-- for call audio.
So the experience isn't going to be very good.
Theoretically, you know, it's possible.
We just don't think it's a good idea.
[chuckling]
man: If you want to record for a long period of time,
you know, like a half-hour,
can you frequency scale the processor
or put it to sleep, or...
Sparks: It--well, that happens automatically.
I mean, it's-- it's actually going to sleep
and waking up all the time.
So it's just depending on what's--
man: But if you're doing, like, a raw 8k sample rate,
how big a buffer can you have, and then will it sleep in--
while that buffer's filling?
Sparks: So the--the size of those buffers
is defined in the media recorder service.
And I think they're...
I want to say they're like 2-- 2k at...
whatever the output rate is.
So they're pretty good size.
I mean, it's like a half a second of audio.
So the processor, theoretically,
would be asleep for quite some time.
man: So is that handled by the codec,
or is it handled by-- I mean, the DSP on a codec?
Or is it handled by--
Sparks: So the... the process
is going to wake up when there's audio available.
It's going to...
you know, route it over to the AMR encoder.
It's going to do its thing.
Spit out a bunch of bits that'll go to the file composer
to be written out.
And then theoretically,
it's gonna go back to sleep again.
man: No, I mean on the recorder.
If you're recording the audio.
If you're off the microphone.
Sparks: I'm sorry?
man: If you're recording raw audio off the microphone.
Sparks: Yeah.
Oh, oh, are you talking about using the AudioTrack
or AudioRecord interface?
man: The AudioRecord interface. ADPCM.
Sparks: Yeah, that's...
So it's pretty much the same thing.
I mean, if you define your buffer size large enough,
whatever that buffer size is, that's the buffer size
it's going to use at the lower level.
So it'll be asleep for that amount of time.
man: And the DSP will be the one filling the buffer?
Sparks: Yeah, yeah. The DSP fills the buffer.
man: All right, thanks.
man: One last question.
From a platform perspective,
would you be able to state a minimum requirement
on OpenGL performance?
Sparks: I'm not ready to say that today.
But...
at some point we'll--
we'll be able to tell you about that.
man: Okay, thanks. Sparks: Uh-huh.
Guess that's my time. Thanks, everyone.
[applause]