Subtitles section Play video Print subtitles [MUSIC PLAYING] CHRIS KELLEY: Thank you so much for joining us. My name is Chris. I'm a designer and prototyper working on immersive prototyping at Google, and I'm joined by Ellie and Luca. And today, we're going to talk about exploring AR interaction. It's really awesome to be here. We explore immersive computing through rapid prototyping of AR and VR experiments. Often, that's focused on use case exploration or app ideas. We work fast, which means we fail fast, but that means that we learn fast. We spend a week or two on each prototyping sprint, and at the end of the sprint, we end with a functional prototype starting from a tightly scoped question. And then we put that prototype in people's hands and we see what we can learn. So this talk is going to be about takeaways we have from those AR explorations. But first, I want to set the table a little bit and talk about what we mean when we say augmented reality. When a lot of people think about AR, the first thing they think about is bringing virtual objects to users in the world. And it is that. That's part of it. We call this the out of AR. But AR also means more than that. It means being able to understand the world visually to bring information to users, and we call this understanding the in of AR. Many of the tools and techniques that were created for computer vision and machine learning perfectly complement tools like ARCore, which is Google's AR development platform. So when we explore AR, we build experiences that include one of these approaches or both. So this talk is going to be about three magic powers that we've found for AR. We think that these magic powers can help you build better AR experiences for your users. So we're going to talk about some prototypes that we've built and share our learnings with you during each of these three magic power areas during the talk. First, I'll talk to you about context-driven superpowers. That's about how we can combine visual and physical understanding of the world to make magical AR experiences. Then Ellie will talk to you about shared augmentations. And this is really all about the different ways that we can connect people together in AR, and how we can empower them just by putting them together. And then Luca will cover expressive inputs. This is about how AR can help unlock authentic and natural understanding for our users. So let's start about context-driven superpowers. What this really means is using AR technologies to deeply understand the context of a device, and then build experiences that directly leverage that context. And there's two parts to an AR context. One is visual understanding, and the other is physical understanding. With ARCore, this gives your phone the ability to understand and sense its environment physically. But through computer vision and machine learning, we can make sense of the world visually. And by combining these results, we get an authentic understanding of the scene, which is a natural building block of magical AR. So let's start with visual understanding. The prototyping community has done some awesome explorations here, and we've done a few of our own that we're excited to share. To start, we wondered if we could trigger custom experiences from visual signals in the world. Traditional apps today leverage all kinds of device signals to trigger experiences. GPS, the IMU, et cetera. So could we use visual input as a signal as well? We built a really basic implementation of this concept. This uses ARCore and the Google Cloud Vision API that detects any kind of snowman in the scene, which triggers a particle system that starts to snow. So through visual understanding, we were able to tailor an experience to specific cues in the environment for users. This enables adaptable and context aware applications. So even though this example is a simple one, the concept can be extended so much further. For example, yesterday we announced the augmented images API for ARCore. So if you use this, you can make something like an experience that reacts relative to device movement around an image in the scene, or even from a known distance to an object in the world. If you think this concept is interesting, I highly recommend checking out the AR VR demo tent. They have some amazing augmented images demos there. The next thing we wanted to know is if we could bridge the gap between digital and physical, and, for example, bring some of the most delightful features of e-readers to physical books. The digital age has brought all kinds of improvements to some traditional human behaviors, and e-readers have brought lots of cool new things to reading. But if you're like me, sometimes you just missed the tactility in holding a great book in your hands. So we wanted to know if we could bridge that gap. In this prototype, users highlight a passage or word with their finger and they instantly get back a definition. This is a great example of a short-form-focused interaction that required no setup for users. It was an easy win only made possible by visual understanding. But as soon as we tried this prototype, there were two downfalls that we noticed, and they became immediately apparent when we used it. The first is that it was really difficult to aim your finger at a small moving target on a phone, and maybe the page is moving as well, and you're trying to target this little word. That was really hard. And the second was that when you're highlighting a word, your finger is blocking the exact thing that you're trying to see. Now, these are easily solvable with a follow-up UX iteration, but they illustrate a larger lesson. And that's that with any kind of immersive computing, you really have to try it before you can judge it. An interaction might sound great when you talk about it and it might even look good in a visual mock, but until you have it in your hand and you can feel it and try it, you're not going to know if it works or not. You really have to put it in a prototype so you can create your own facts. Another thing we think about a lot is, can we help people learn more effectively? Could we use AR to make learning better? There's many styles of learning, and if you combine these styles of learning, it often results in faster and higher-quality learning. In this prototype, we combined visual, oral, verbal, and kinesthetic learning to teach people how to make the perfect espresso. The videos explain-- I'm sorry. We placed videos around the espresso machine in the physical locations where that step occurs. So if you were learning how to use the grinder, the video for the grinder is right next to it. Now, for users to trigger that video, they move their phone to the area and then they can watch the lesson. That added physical component of the physical proximity of the video and the actual device made a huge difference in general understanding. In our studies, users who had never used an espresso machine before easily made an espresso after using this prototype. So for some kinds of learning, this can be really beneficial for users. Now, unfortunately for our prototype, one thing that we learned here was that it's actually really hard to hold your phone and make an espresso at the same time. So you need to be really mindful of the fact that your users might be splitting their physical resources between the phone and the world. And so as it applies to your use case, try building experiences that are really snackable and hands-free. Speaking of combining learning and superpowers together, we wondered if AR could help us learn from hidden information that's layered in the world all around us. This is a prototype that we built that's an immersive language learning app. We showed translations roughly next to objects of interest and positioned these labels by taking a point cloud sample from around the object and putting the label sort of in the middle of the points. Users found this kind of immersive learning really fun, and we saw users freely exploring the world looking for other things to learn about. So we found that if you give people the freedom to roam and tools that are simple and flexible, the experiences that you build for them can create immense value. We now have physical understanding. This is AR's ability to extract and infer information and meaning from the world around you. When a device knows exactly where it is, not only in space, but also relative to other devices, we can start to do things that really feel like you have superpowers. For example, we can start to make interactions that are extremely physical, natural, and delightful. Humans have been physically interacting with each other for a really long time, but digital life has abstracted some of those interactions. We wondered if we could swing the pendulum back the other direction a little bit using AR. So in this prototype, much like a carnival milk bottle game, you fling a baseball out of the top of your phone and it hits milk bottles that are shown on other devices. You just point the ball where you want to go, and it goes. We did this by putting multiple devices in a shared coordinate system, which you could do using the new Google Cloud Anchors API that we announced for ARCore yesterday. And one thing you'll notice here is that we aren't even showing users past their camera. Now, we did that deliberately because we really wanted to stretch and see how far we could take this concept of physical interaction. And one thing we learned was that once people learned to do it, they found it really natural and actually had a lot of fun with it. But almost every user that tried it had to be not only told how to do it, but shown how to do it. People actually had to flip this mental switch of the expectations they have for how a 2D smartphone interaction works. So you really need to be mindful of the context that people are bringing in and the mental models they have for 2D smartphone interactions. We also wanted to know if we could help someone visualize the future in a way that would let them make better decisions. Humans pay attention to the things that matter to us. And in a literal sense, the imagery that appears in our peripheral vision takes a lower cognitive priority than the things we're focused on. Would smartphone AR be any different? In this experiment, we overlaid the architectural mesh of the homeowner's remodel on top of the active construction project. The homeowner could visualize in context what the changes to their home was going to look like. Now, at the time that this prototype was created, we had to do actual manual alignment of this model on top of the house. You could do it today. If I rebuilt it, I would use the augmented images API that we announced yesterday. It would be much easier to put a fixed image in a location, the house, and sync them together. But even with that initial friction for the UX, the homeowner got tremendous value out of this. In fact, they went back to their architect after seeing this and changed the design of their new home because they found out that they weren't going to have enough space in the upstairs bathroom-- something they hadn't noticed in the plans before. So the lesson is that if you provide people high-quality, personally relevant content, you can create ways that people will find really valuable and attention grabbing experiences. But when does modifying the real environment start to break down? You may be familiar with the uncanny valley. It's a concept that suggests when things that are really familiar to humans are almost right but just a little bit off, it makes us feel uneasy. Subtle manipulations of the real environment in AR can sometimes feel similar. It can be difficult to get right. In this specific example, we tried removing things from the world. We created this AR invisibility cloak for the plant. What we did was we created a point cloud around the object, attached little cubes to the point cloud, applied a material to those points, and extracted the texture from the surrounding environment. That worked pretty well in uniform environments, but unfortunately, the world doesn't have too many of those. It's made up of dynamic lighting and subtle patterns, so this always ended up looking a little bit weird. Remember to be thoughtful about the way that you add or remove things from the environment. People are really perceptive, and so you need to strive to build experiences that align with their expectations, or at the very least, don't defy them. But is physical understanding always critical? All points in the section have their place, but, ultimately, you have to be guided by your critical user journeys. In this example, we wanted to build a viewer for this amazing 3D model by Damon [INAUDIBLE].. It was important that people could see the model in 3D and move around to discover the object. A challenge, though, was that the camera feed was creating a lot of visual noise and distraction. People were having a hard time appreciating the nuances of the model. We adopted concepts from filmmaking and guided users by using focus and depth of field, all which were controlled by the user's motion. This resulted in people feeling encouraged to explore, and they really stopped getting distracted by the physical environment. So humans are already great at so many things. AR really allows us to leverage those existing capabilities to make interactions feel invisible. If we leverage visual and physical understanding together, we can build experiences that really give people superpowers. With that, Ellie is going to talk to you about special opportunities we have in shared augmentations. ELLIE NATTINGER: Thanks, Chris. So I'm Ellie Nattinger. I'm a software engineer and prototyper on Google's VR and AR team. Chris has talked about the kinds of experiences you start to have when your devices can understand the world around you, and I'm going to talk about what happens when you can share those experiences with the people around you. We're interested not only in adding AR augmentations to your own reality, but also in sharing those augmentations. If you listened to the developer keynote yesterday, you know that shared AR experiences is a really big topic for us these days. For one thing, a shared reality lets people be immersed in the same experience. Think about a movie theater. Why do movie theaters exist? Everybody's watching a movie that they could probably watch at home on their television or their computer by themselves much more comfortably not having to go anywhere, but it feels qualitatively different to be in a space with other people sharing that experience. And beyond those kinds of shared passive experiences, having a shared reality lets you collaborate, lets you learn, lets you build and play together. We think you should be able to share your augmented realities with your friends, and your families, and your colleagues, so we've done a variety of explorations about how do you build those kinds of shared realities in AR. First, there's kind of a technical question. How do you get people aligned in a shared AR space? There's a number of ways we've tried. If you don't need a lot of accuracy, you could just start your apps with all the devices in approximately the same location. You could use markers or augmented images so multiple users can all point their devices at one picture and get a common point of reference-- cures the zero, zero, zero of my virtual world. And you can even use the new ARCore Cloud Anchors API that we just announced yesterday to localize multiple devices against the visual features of a particular space. In addition to the technical considerations, we've found three axes of experience that we think are really useful to consider when you're designing these kinds of shared augmented experiences. First of those is co-located versus remote. Are your users in the same physical space or different physical spaces? Second is, how much precision is required, or is it optional? Do you have to have everybody see the virtual bunny at exactly the same point in the world, or do you have a little bit of flexibility about that? And the third is whether your experience is synchronous or asynchronous. Is everybody participating in this augmented experience at exactly the same time, or at slightly different times? And we see these not as necessarily binary axes, but more of a continuum that you can consider when you're designing these multi-person AR experiences. So let's talk about some prototypes and apps that fall on different points of the spectrum and the lessons we've learned from them. To start with, we've found that when you've got a group that's interacting with the same content in the same space, you really need shared, precise, spatial registration. For example, let's say you're in a classroom. Imagine if a group of students who are doing a unit on the solar system could all look at and walk around the globe, or an asteroid field, or look at the sun. In Expeditions AR, one of Google's initial AR experiences, all the students can point their devices to a marker, they calibrate themselves against a shared location, they see the object in the same place, and then what this allows is for a teacher to be able to point out particular parts of the object. Oh, if you all come over and look at this side of the sun, you see a cut-out into its core. Over here on the Earth, you can see a hurricane. Everybody starts get a spatial understanding of the parts of the object and where they are in the world. So when does it matter that your shared space has a lot of precision? When you have multiple people who are all in the same physical space interacting with or looking at the exact same augmented objects at the same time. We were also curious-- how much can we take advantage of people's existing spatial awareness when you're working in high-precision shared spaces? We experimented with this in this multi-person construction application, where you've got multiple people who are all building onto a shared AR object in the same space. Adding blocks to each other, everybody's being able to coordinate. And you want to be able to tell what part of the object someone is working on. Have your physical movement support that collaboration. Like, if Chris is over here and he's placing some green blocks in the real world, I'm not going to step in front of him and start putting yellow blocks there instead. We've got a natural sense of how to collaborate, how to arrange, how to coordinate ourselves in space. People already have that sense. So we can keep that in a shared AR if we've got our virtual objects precisely lined up enough. We also found it helpful to notice that because you can see both the digital object but also the other people through the pass-through camera, you are able to get a pretty good sense of what people were looking at as well as what they were interacting with. We've also wondered what would it feel like to have a shared AR experience for multiple people in the same space, but who aren't necessarily interacting with the same things? So think of this more like an AR LAN party. Where we're all in the same space, or maybe could be different spaces, we're seeing connected things, and we're having a shared experience. So this prototype's a competitive quiz guessing game where you look at the map and you have to figure out where on the globe you think is represented and stick your pushpin in, get points depending on how close you are. We've got the state synced, so we know who's winning. But the location of where that globe is doesn't actually need to be synchronized. And maybe you don't want it to be synchronized because I don't want anybody to get a clue based on where I'm sticking my pushpin into the globe. It's fun to be together, even when we're not looking at exactly the same AR things. And do we always need our spaces to align exactly? Sometimes it's enough just to be in the same room. This prototype example's of an AR boat race. You blow on the microphone of your phone, and it creates the wind that propels your boat down the little AR track. By us being next to each other when we start the app and spawn the track, we get a shared physical experience even though our AR worlds might not perfectly align. We get to keep all the elements of the social game play-- talking to each other, our physical presence-- but we're not necessarily touching the same objects. Another super interesting area we've been playing with is how audio can be a way to include multiple people in a single device AR experience. If you think of the standard Magic Window device AR, it's a pretty personal experience. I'm looking at this thing through my phone. But now, imagine you can leave a sound in AR that has a 3D position like any other virtual thing, and now you start to be able to hear it, even if you're not necessarily looking at it. And other people can hear the sound from your device at the same time. So for an example, let's say you could leave little notes all over your space. Might look something like this. I'm a plant. I'm a plant. I'm a plant. I'm elephant. I'm elephant. I'm elephant. This is a chair. This is a chair. This is a chair. I'm a plant. I'm a plant. I'm elephant. I'm elephant. This is a chair. This is a chair. So notice, you don't have to be the one with a phone to get a sense of where these audio annotations start to live in physical space. Another question we've asked-- if you have a synchronous AR experience with multiple people who are in different places, what kind of representation do you need of the other person? So let's imagine you have maybe a shared AR photos app where multiple people can look at photos that are arranged in space. So I'm taking pictures in one location, I'm viewing them arranged around me in AR, and then I want to share my AR experience with Luca, who comes in and joins me from a remote location. What we found-- we needed a couple of things to make us feel like we were connected and sharing the same AR experience, even though we were in different places. We needed to have a voice connection so we could actually talk about the pictures, and we needed to know where the other person was looking. See which picture you're paying attention to when you're talking about it. But what was interesting is we didn't actually need to know where the other person was, as long as we had that shared frame of reference. We're all here, here's what I'm looking at, here's what Luca's looking at. We've also been curious about asymmetric experiences. What happens when users share the same space and the same augmentations, but they've got different roles in the experience? So for instance, in this prototype, Chris is using his phone as a controller to draw in space, but he's not actually seeing the AR annotations he's drawing. The other person sees the same AR content and uses their phone to take a video. They're playing different roles in the same experience. Kind of artist versus cinematographer. And we found there could be some challenges to asymmetric experiences if there's a lack of information about what the other person is experiencing. For instance, Chris can't tell what Luca's filming or see how his drawing looks from far away. So as we mentioned previously, these kinds of different combinations of space, and time, and precision are relevant for multi-person AR experiences, and they have different technical and experiential needs. If you have multiple people in the same space with the same augmentations at the same time, then you need a way of sharing. You need a way of common localization. That's why we created the new Cloud Anchors API. If you've got multiple people in the same space with different augmentations at the same time, the kind of AR LAN party model, you need some way to share data. And if you've got multiple people in different spaces interacting with the same augmentations at the same time, you need sharing in some kind of representation of that interaction. So shared AR experiences is a big area. We've explored some parts of the space. We'd love to see what you all come up with. So Chris has talked about examples where your device understands your surroundings and gives you special powers, I talked about examples where you've got multiple people who can collaborate and interact. Now Luca will talk about what happens when your devices have a better understanding of you and allow for more expressive inputs. Luca? LUCA PRASSO: Thank you, Ellie. My name is Luca Prasso, and I'm a prototyper and a technical artist working in the Google AR and VR team. So let's talk about the device that you carry with you every day and the ones that are all around you, and how they can provide the meaningful and authentic signals that we can use in our augmented experiences. So ARCore tracks the device motion as we move to the real world and provides some understanding of the environment. And these signals can be used to create powerful, and creative, and expressive tools, and offer new ways for us to interact with digital content. So the data represents who we are, what we know, and what we have. And we were interested in understanding if the user can connect more deeply if the data is displayed around them in 3D, and through AR and physical aspirations, they can look at this data. So we took a database of several thousand world cities, and we mapped it in an area that's wide as a football field. We assign a dot to every city and we scale the dot based on the population of the city. And each country has a different color. So now you can walk to this data field. And as ARCore tracks the motion of the user, we play footsteps in sync. You take a step and you hear a step. And [INAUDIBLE] sound fields surrounds the user and enhances the experience and the sense of exploration of this data forest. And flight paths are displayed up in the sky. And the pass-through camera is heavily tinted so that we can allow the user to focus on the data and then still give a sense of presence. And what happens is the user, as he walks to the physical space, he starts mapping, and pairing, and creating this mental map between the data and the physical location. And starts understanding better, in this particular case, the relative distance between the places. And what we discover is also that the gestures that are a part of our digital life every day, a pinch to zoom, it's now in AR something more traditional. It's actually moving closer to the digital object and inspecting it like we do with a real object. And pan and drag means taking a couple of steps to the right to look at the information. So physical exploration like this is very fascinating, but we need to take into account all the different users and provide the alternative move and affordances. So in AR, a user can move everywhere, but what if he cannot or he doesn't want to move? What if he's sitting? So in this particular case, we allow the user to simply point the phone everywhere they want to go, tap on the screen anywhere, and the application will move the point of view in that direction. At the same time, we still have to provide audio, haptics, and color effects to enhance the sense of physical space the user has to have while traveling. And so we found that this is a powerful mechanism to explore a certain type of data that makes sense in the 3D space and to allow the user to discover hidden patterns. But can we go beyond the pixels that you can find on your screen? We're fascinated by the spatial audio and a way to incorporate audio into an AR experience. So we combine ARCore and the Google Resonance SDK. And Resonance is this very powerful spatial audio engine that recently Google open-sourced. And you should check it out because it's great. And so now I can take audio sources and place them into the 3D locations, and animate them, and describe the properties of the walls, and the ceilings, and the floor, and all the obstacles. And now as the ARCore moves the point of view, it carries with it the digital ears, the Resonance used to render accurately the sounds in the scene. So what can we do with this? So we imagine, what if I can sit next to a performer during an acoustic concert, or a classical concert, or a jazz performance? What if I can be onstage with actors, and listen to their play, and be there? So we took two amazing actors, Chris and Ellie, and we asked them to record separately lines from Shakespeare. And we placed these audio sources a few feet apart and we surrounded the environment with an ambisonic sound field of a rain forest, of the raining. And then later on, we switched to a room with a lot of reverb into the walls. CHRIS KELLEY: Thou told'st me they were stolen unto this wood, and here I am, and wode within this wood, because I cannot meet my Hermia. Hence, get thee gone, and follow me no more. ELLIE NATTINGER: You draw me, you hard-hearted adamant, but yet you draw not iron, for my heart is true as steel. Leave you your power to draw, and I shall have no power to follow you. CHRIS KELLEY: Do I entice you? Do I speak to you fair? Or rather, do I not in plainest truth tell you, I do not, nor I cannot love you? LUCA PRASSO: So now the user can walk around, maybe with his eyes closed, a nice pair of headphones, and it's like being on stage with these actors. So we took this example and we extended it. We observed that we can build in real-time a 2D map of where the user has been so far with his phone as he's walking around. And so at any given time when the user hits a button, we can programmatically place audio recording in space where we know that the user can reach with the phone and with their ears. [MUSIC PLAYING] And suddenly, the user becomes the human mixer of this experience. And different instruments can populate your squares, and your rooms, and your schools. And this opens the door to an amazing amount of opportunities with AR audio-first experiments. So let's go back to visual understanding. Chris mentioned that the computer vision and machine learning can interpret the things that are around us, and this is also important to understand the body in turning into an expressive controller. So in real life, we are surrounded by a lot of sound sources for all of the places. And naturally, our body and our head moves to mix and focus on what we like and what we want to listen to. So can we take this intuition into the way we watch movies or play video games on a mobile device? So what we did, we took the phone camera signal, fed it to Google Mobile Vision. That gave us a head position and head orientation. And we fed it to Google Resonance SDK. And we said, OK, you're watching a scene in which actors are in a forest, and they're all around you, and it's raining. So now as I leave my phone far away from my head, I hear the forest. As I'm taking the phone closer to my face, I start hearing the actors playing. I warn you, this is an Oscar performance. [THUNDER RUMBLES] ELLIE NATTINGER: Our company here. CHRIS KELLEY: My man, according to the script. ELLIE NATTINGER: Here is the scroll of every man's name which is thought fit through all Athens to play in our interlude before the duke and the duchess on his [INAUDIBLE] LUCA PRASSO: So now what is interesting is that the tiny little motions that we can do when we're watching and we're playing this experience, it can be turned into subtle changes in the user experience that we can control. So we talk about how the changes in poses can become a trigger to drive interaction. In this Google Research app called [INAUDIBLE],, we actually exploit the opposite-- the absence of motion. And when the user-- in this case, my kids-- stop posing, the app takes a picture. And so the simple mechanism that is triggered by computer vision creates the incredible, delightful opportunities that, apparently, my kids love. And Research is doing incredible progress in looking at an RGB image and understanding where the body pose and skeleton is. And you should check out the Google Research blog post because their post estimation research is amazing. So we took Ellie's video and we fed it to the machine computer algorithm. And we got back, a bunch of 3D poses and segmentation masks of Ellie. And this opens the door to a lot of variety of experiments with creative filters that we can apply to this. But what's more interesting for us is that it also allows us to understand better the intent and the context of the user. So we took this pose estimation technology and we added a digital character. Now it tries to mimic what the human character is doing. And this allows [INAUDIBLE] now to bring your family and friends-- in this case, my son, Noah-- into the scene so that he can act and create a nice video. But this also, like Ellie mentioned before, we should consider the situation, because this is an asymmetric experience. What you don't see here is how frustrated my son was after a few minutes because he couldn't see what was going on. I was the one having fun taking picture and video him, and he didn't see much. He could only hear the lion roaring. So we need to be extremely mindful as the developer about this unbalance of delight. And so maybe I should have passed the image of the phone to a nearby TV so I can make my son first-class citizen in this experience. So all this AR technology and the physical and the visual understanding are ingredients that allow us to unlock all kinds of new expressive input mechanisms. And we are still exploring. We're just at the beginning of this journey. But we are excited to hear what you think and what you want to come up with. So to summarize, we shared a bunch of ways in which we think about AR and various aspirations that we have done. We talked about expanding our definition of AR. Putting content into the world, but also pulling information from the world. And these are all ingredients that we use to create these magical AR superpowers to enhance the social interactions and to express yourself in this new digital medium. So we combined ARCore capabilities with different Google technologies, and this gives us the opportunity to explore all these new interaction models. And we encourage you, developers, to stretch your definition of AR. But we want to do this together. We're going to keep exploring, but what we want to hear what tickled you, what tickled your curiosity. So we can wait to see what you build next. Thank you very much for coming. [MUSIC PLAYING]
B1 US ar physical shared user ellie space Exploring AR interaction (Google I/O '18) 37 4 Tony Yu posted on 2019/01/02 More Share Save Report Video vocabulary