Subtitles section Play video Print subtitles [? RAIF LEVINE: All ?] right. Thanks everybody for coming today. I'm [? Raif ?] Levine of the Android UI Toolkit Team. And it is my pleasure to introduce today Alex Crichton of the Mozilla Research. And he is a member of the Rust core team. And he is here to tell us about Rust, one of the more exciting and interesting languages, I think, in the past few years. ALEX CRICHTON: Thank you, [? Raif ?]. So I'm going to give just a whirlwind tour of what Rust is, why you might feel like using it, and what are the unique aspects of it. So the first thing that I normally get if I ever talk about a programming language is why do we have yet another one? So if you take a look around, we have this whole landscape of programming languages today, they all fill various niches. They solve lots of problems. But it turns out that you can organize these languages along a spectrum. And the spectrum is this trade-off between control and safety. So on one end of the spectrum, we have C and C++, which give us lots of control. We know exactly what's going to run our machine. We have lots of control over memory layout. We don't have a lot of safety. We have segfaults, buffer overruns, very common bugs like that. Whereas on the other side, we have JavaScript. We have Ruby. We have Python. They're very safe languages, but you don't quite know what's going to happen at runtime. So in JavaScript, we have JITs behind the scenes, or you really don't know what's going to happen. Because it'll change as the program is running. So what Rust is doing is completely stepping off this line. Rust is saying, we are not going to give you a trade-off between control and safety. But rather, we're going to give you both. So Rust is a systems programming language which is kind of filling this niche that hasn't been filled by many of the languages today, where you get both the very low-level control of C and C++, along with the high-level safety and constructs that you would expect from Ruby, and JavaScript, and Python. So that might raise a question. We have all these languages. What kind of niches will benefit from this safety and this control? So, suppose you're building a web browser, for example, Servo. Servo is a project in Mozilla Research to write a parallel layout engine in Rust. So it's entirely written in Rust today. And it benefits from this control, this very high level of control. Because browsers are very competitive in performance, as we all very much well know. But at the same time, all major browsers today are written in C++, so they're not getting this great level of safety. They have a lot of buffer overruns. They have a lot of segfaults, memory vulnerabilities. But by writing in Rust, we're able to totally eliminate all these at compile time. And the other great thing about Servo is this parallelism aspect. if you try and retrofit parallelism onto, for example, Gecko, which is millions of lines of C++, it's just not going to end well. So by using a language which from the ground up will not allow this memory unsafety, we're able to do very ambitious things like paralyzing layout. And on the other spectrum of things, let's say you're not a C++ hacker. You're not a browser hacker. You're writing a Ruby gem. Skylight is a great example of this. It's a product of Tilde, where what they did is they have a component that runs inside of the customer's Rails apps will just kind if monitor how long it takes to talk to the database, how long it takes for the HTTP request, general analytics and monitoring about that. But the key aspect here is that they have very tight resource constraints. They're a component running in their client's application. So they can't use too much memory, or they can't take too long to run. So they were running into problems, and they decided to rewrite their gem in Rust. And Rust is great for this use case because with the low level of control that you get in C+, C, and C++, they were able to satisfy these very tight memory constraints, the very tight runtime constraints. But also, they were able to not compromise the safety that they get from Ruby. So this is an example where they would have written their gem in C and C++. But they were very hesitant to do so. They're a Ruby shop. They haven't done a lot of systems programming before. So it's kind of tough, kind of that first breach into systems programming. And this is where Rust really helps out. So I want to talk a little bit about what I mean by control and what I mean by safety. So this is a small example in C++. Well, the first thing we're going to do is make a vector of strings on the stack. And then we're going to walk through and take a pointer into that. So the first thing that we'll realize is all of this is laid out inline, on the stack, and on the heap. So, for example, this vector is comprised of three separate fields, the data, length, and capacity, which are stored directly inline on the stack. There's no extra indirection here. And then on the heap itself, we have some strings which are themselves just wrapping a vector. So if we take a look at that, we'll see that it itself also has inline data. So this first element in the array on the heap is just another data, length, and capacity, which is pointing to more data for the string. So the key here is that there's not these extra layers of indirection. It's only explicitly when we go onto the heap, we're actually buying into this. And then the second part about control in C++ is you have these very lightweight references. So this reference into the vector, the first element of the vector is just this raw pointer straight into memory. There's no extra metadata tracking it. There's no extra fanciness going on here. It's just a value pointing straight into memory. A little dangerous, as we'll see in a second, but it's this high level of control. We know exactly what's going on. And then the final aspect of this is we have deterministic destruction. Or what this means is that this vector of strings, we know precisely when it's going to be deallocated. When this function returns is the exact moment at which this destructor will run. And it will destroy all the components of the vector itself. So this is where we have very fine-grained control over the lifetime of the resources that we have control of, either on the stack or within all the containers themselves. And what this mostly boils down to is something that we call zero-cost abstractions, where this basically means that it's something that at compile time, you can have this very nice interface, very easy to use. It's very fluent to use. But it all optimizes away to nothing. So once you push it through the compiler, it's basically a shim. And it'll go to exactly what you would have written if you did the very low-level operations yourself. And on the other this side of this, let's take a look at Java. So if we take our previous example of a vector of strings, then what's actually happening here is the vector on the stack is a pointer to some data, the length, and some capacity, which itself is a pointer to some more data. But in there, we have yet another pointer to the actual string value which has data, length, and capacity. We keep going with these extra layers of indirection. And this is something that's imposed on us by the Java language itself. There's no way that we can get around this, these extra layers of indirection. It's something that we just don't have control over, where you have to buy into right up front. Unlike in C++, where we can eliminate these extra layers and flatten it all down. And when I'm talking about zero-cost abstractions, it's not just memory layout. It's also static dispatch. It's the ability to know that a function call at runtime is either going to be statically resolved or dynamically resolved at runtime itself. This is a very powerful trade-off where you want to make sure you know what's going on. And the same idea happens with template expansion, which is generics in Java and C++, where what it boils down to is that if I have a vector of integers and a vector of strings, those should probably be optimized very, very differently. And it means that every time you instantiate those type parameters, you get very specialized copies of code. So it's as if you wrote the most specialized vector of integers for the vector of integers itself. And so, let's take a look at the safety aspect. That's an example of what I mean by control. But the safety comes into play especially in C++. So this is a classical example of where something is going to go wrong. So the first thing that we do is we have our previous example with the vector strings. And we take a pointer into the first element. But then we come along, and we try and mutate the vector. And some of you familiar with vectors in C++, you'll know that when you've exceeded the capacitive of vector, you probably have to reallocate it, copy some data, and then push the big data onto it. So let's say, in this case, we have to do that. We copy our data elsewhere, copy our first element that's in our vector, push on some new data. And then the key aspect is we deallocate the contents of the previous data pointer. And what this means is that this pointer, our element pointer is now a dangling pointer in the freed memory. And that basically implies that when we come around here to print it out onto the standard output, we're going to either get garbage, a segfault, or this is what C++ calls undefined behavior. This is the crux of what we're trying to avoid. This is where the unsafety stems from. So it's good to examine examples like this. But what we're really interested in is these fundamental ingredients for what's going wrong here. Because this kind of example, you can probably pattern match and figure it out. It's very difficult to find this in a much larger program among many, many function calls deep. So the first thing we'll notice is there's some aliasing going on here. This data pointer of the vector and the element pointer on the stack are pointing to the same memory location. Aliasing is basically where you have two pointers pointing at a very similar location, but they don't know anything about one another. So then we come in and mix in mutation. What we're doing is we're mutating the data pointer of the vector. But we're not mutating the alias reference of the element itself. So it's these two fundamental ingredients in combination-- aliasing and mutation. It's OK to have either of them independently. But when we have them simultaneously is when these memory and safety bugs start to come up. So I'll harp on this a lot more in the rest of this talk. And we'll see a little bit more in detail. So you might be asking, what about garbage collection? This sounds like a problem that garbage collectors are tasked to solve, forbidding dangling pointers, forbidding these various references, rewriting things at runtime. But it turns out that garbage collectors don't come without a downside. Garbage collectors don't have a lot of control. They don't have this low-level of C and C++. You have these GC pauses. You have these variable runtimes actually allocating data. It's very tough to reason about what's going to happen. And then an aspect of garbage collectors which is not really cited that often is they require runtime. And this is a very strong aspect of C and C++ where, for example, if I write a library in C, I can run it basically anywhere. Because there's no assumption about the host runtime that I'm running inside of. So with a garbage collector, it's very difficult to embed what you're writing into other languages. And we'll see how it's really beneficial to not have this runtime behind the scenes. And then the worst part about this is that it's insufficient to solve all the problems that we want to solve. Garbage collectors don't automatically get you away from iterator invalidation, from data races. These are these concurrency bugs that are very difficult to weed out. So it turns out that all of these problems are very closely interrelated. And we would love to solve them all at once. And garbage collection doesn't quite fit the bill. So what Rust has is a system called ownership/borrowing. This is going to be the main focus for the rest of the talk. I'm going to talk a lot about ownership/borrowing and some of the great things we'll get out of it. But it suffices to say for now that ownership/borrowing doesn't need a runtime. It's totally aesthetic analysis [INAUDIBLE]. It's zero-cost at runtime. It's all static analysis. There's nothing going on under the hood when you're actually running code. It's totally memory safe. This is the crux on which all of the memory safety of Rust is built is these concepts of ownership/borrowing. And finally, using these, we're able to prevent all data races. Which means that all your concurrent code you write in Rust, you will not have a data race. Which is an incredibly powerful thing to say if we're coming from, for example, C or C++, where it's very difficult to know if you ever might have a data race or it won't. And we take a look back again at C++. C++ gets us one of these things. We don't need a runtime with C++. Garbage collection also gets us one of these things. It gets us memory safety. But the great thing about ownership/borrowing is this free abstraction at runtime gets us all of these simultaneously. So I want to give some credit to the Rust 1.0 community, in the sense that we would not be here at 1.0 without them. The Rust community is very large, very active, incredibly helpful, incredibly kind, incredibly welcoming. And you can see this across our forums, IRC our Subreddit, on GitHub, on issues and pull requests, basically everyone is proud to be a part of this. And they were absolutely instrumental to coming across the finish line for 1.0, both in terms of giving feedback on our designs, proposing their own designs. I would highly encourage anyone who's interested in Rust to jump in. And don't feel afraid to ask stupid questions or ask any questions anywhere. Because it'll be answered almost instantly by someone who's super helpful. All right, so I want to take the rest of this talk and talk a lot about what I mean by ownership, and what I mean by borrowing, and some of the great things that we're going to get out of this. And the first of this is ownership. So ownership is basically the English term, what it means by it. I own a resource. So in this case, let's say I own a book. I can then decide to send this book to someone else. And I can then decide to go away. And we knew that this book, because it has an owner, we don't deallocate it or anything. It's totally independent of me, the previous owner, at this point. But as the owner of a book, I can then also decide to go away. At which point, we know this book has not moved to anyone else, so we can totally destroy it. We can release all resources associated with this book. So I was talking about these fundamental ingredients of aliasing a mutation. And we'll see that with ownership, we're totally forbidding the aliasing aspect. We're still going to allow free mutation through these. But by forbidding aliasing, we're still going to prevent us from this whole class of memory unsafety. So I'm going to walk you through an example of what's actually happening here at runtime, what's actually going on under the hood. So here's a small example where we're going to create a vector on the stack. And you notice that like C++, there's no extra layers of indirection here. These fields are stored directly on the stack, this data, length, and capacity. We're going to come along and push some data onto it, so push a 1, push a 2. And then we'll get to the point that we're going to call this function take. And the key part about this function is that it says that it's going to take ownership of the specter of integers. That's this Vec. It's taking that bare type. What it means is, it's taking ownership. It's consuming ownership. So on the give side of things, what's actually going to happen is we're going to create a shallow copy at runtime of this vector. Now we'll notice that there's some aliasing going on here, which is what ownership is preventing. So the key aspect is that we're going to forget this previous copy. This copy that give has-- the shallow copy-- is totally forgotten. And we no longer have access to it. So then as the function take runs, it now has ownership of the vector. It has its own shallow copy. And it can do whatever it wants. It can read from it. It can write from it. It can do basically [INAUDIBLE] however it likes. And then once we get to the end of the function take, we know that we have not passed ownership of the specter anywhere else. So this is the precise time at which we could deallocate it. We know that there's no one else who can have access to it. We're the sole owners of it. So as a result of it falling out of our scope, we can free all the resources associated with the vector. And then once we come back to the function give, the vector has been forgotten, so we don't have to worry about using a dangling reference. And so the compiler is the one that's enforcing these moves. When we call the function take, we move the vector Vec into the function take. And if, for example, we tried to actually use the vector after we had moved it, say, for example, we try to push some [INAUDIBLE] onto it, the compiler rejects this, saying the vector has been moved. And so we are not allowed to have access to it. And this primarily prevents use after free. So this is this entire class of bugs. Because we're tracking the owner of a value, who can access it at any one point in time. Then we know that when the owner has gone out of scope, that the resource has now been freed. And there are no variable that could possibly be referencing it. Because the sole owner has gone out of scope. So this all sounds great. But it's kind of clunky if we only had ownership. So we need the system which we call borrowing, which kind of helps us lend out a value for a small period of time. So I have ownership. I can pass it along. Someone can pass me back ownership, but it's not very ergonomic to do that. So borrowing is what comes into play where I, as the owner of a value, can then decide to lend it out for a period of time. But I'm still in control of the lifetime of the resource. You don't get to control it. You can access it, but I'm the one that's going to decide where to free it and when to free it. So we have two primary kinds of borrows in Rust, the first of which is a shared borrow. And a shared borrow means what it says, where I, as the owner of a resource, can send out multiple copies of my resource, just references to it, into [INAUDIBLE] functions or threads, and yes. And then what's actually happening here is the aliasing and mutation that we're trying to prevent simultaneously, we're totally forbidding the mutation aspect of this. Because there's aliasing happening via shared borrows, we're not going to allow mutation through the shared borrows themselves. And as you might guess, on the flip side, we have mutable borrows, where mutable borrows, I, as the owner, can lend out a mutable borrow to someone else. They can then pass it along to someone else that they like. And then it will implicitly return back on up the stack once everyone starts returning. So what's going on here is these two ingredients of aliasing and mutation, the mutable borrows, unlike the shared borrows, are preventing aliasing but allowing mutation. So these are two different kinds of references to get these two aspects of either aliasing or mutation. But they're in totally separate classes. Because if they were to overlap, then we have them simultaneously. And that's where memory's unsafety bugs come out. So let's take a look back at our previous example of a shared reference going on here. So we have the same idea. We have a vector pushing the data onto it. But then instead of passing ownership to this function use, we're going to pass a borrow for it. So that's what this ampersand sigil out in front means. This ampersand means that I'm taking a borrow of this vector. I'm not taking ownership. And that's the same idea on the caller side. I have this ampersand out in front to say, I'm loaning out ownership of the resource that I currently own. And then you're going to be able to access it for a small period of time. So at runtime, what's actually happening is we're just creating a raw pointer. That's all references are. And this vec pointer is just pointing directly onto the stack itself. And then the function use is going to use this raw pointer, read whatever it likes from it. And then once it's returned, this reference has gone out of scope. So we totally forget about the reference. So another thing to emphasize is this. If this function use tried to, for example, mutate the vector by pushing some more data onto it or for modifying some various elements of the array, those are all completely disallowed. Because mutation through a shared reference is not allowed in this case for vectors of integers. So these two are forbidden by the compiler saying that you cannot mutate through a shared reference. And this isn't 100% true. There are some controlled circumstances in which we do allow mutation through shared references. I'll talk about a few more of them later on in the talk. But it suffices to say that if you see this &T. what it means is that you cannot mutate it. It's essentially immutable. So let's take a look at some mutable references now. So the mutable references are denoted by this &mut tag. And because we have a mutable reference, the compiler is going to allow us to do things like push. So in this example, we're just going to iterate over one vector, push some data onto a new vector, and just create a separate copy of that. So I want to walk you through what's happening here at runtime or how this iteration actually happens. And the first thing that we're going to do is we're going to create this element point. Our iterator is going to give us a pointer into the vector. And those pointers are pointing directly into the first slot. This is kind of like the element pointer we saw earlier in C++. And then when we come around to actually push some data onto the new vector, we're just going to read that pointer, push some data, and then run to the next leaf of the iteration. But the key idea here is that iteration is the zero-cost abstraction, as fast as you would write it in C and C++. This increment stuff, all it's doing is swinging this pointer down to the second slot and then just kind of forgetting about the first pointer. So some of you might have seen an example like this before. Now, you might be wondering what if this from vector and this to vector are equal. What if we're trying to push onto what we're reading from at the same time. So if we walk through and see what's going on, let's say we have that reallocation happen, like we saw at the beginning with C++. So we have our vector of 123. We push the first element, reallocate. We now have 1231. But the key thing that's going to go wrong here is, as we go to the next loop of the iteration, we're going to update this element pointer. But now it's pointing into freed memory. This dangling pointer just was hanging out here. And this would be a problem if Rust were going to allow this. But if we actually try to write this down in Rust, we can actually see what's going on here. The first thing we'll do is we'll take out a shared reference. And the next thing that we'll do is try to take out this mutable reference. But what's actually going to happen is the compiler is forbidding both shared references and mutable references from happening at the same time. So the compiler has this notion of when a shared reference is active and when a mutable reference is active. And these can never overlap, because that would be allowing simultaneous mutation and aliasing. And this ends up giving us a very nice property of mutable references in that a mutable reference is the only way to access the data that it contains, which is a very strong guarantee to provide that if I have a mutable reference, I am the only one that can either read or write to this data at this point in time. And this also applies across many threads as well. So we'll see later how we can leverage these kinds of guarantees and leverage these kinds of static guarantees that Rust gives us in various fashions. So this might seem a little unwieldy if you're looking at, OK, I can either have a shared reference or a mutable reference. But I've got to make sure they somehow match up. But I don't want to have to contort code to make sure that it actually works out. So in this case, we're going to take a pointer into the vector, just &vec[i]. But the compiler, when we come down to this vec.push, this needs to be disallowed. Because this pointer could become invalid if we allow the push. So the compiler has this notion of the lifetime of references. So know that the lifetime of this elem reference is the scope of this for loop. And we know that because the mutation happens in the lifetime of the shared reference, it's completely disallowed by the compiler. And it prevents these two from happening at the same time. But once we've gone outside the loop, the compiler knows that the shared reference has fallen out of scope. There's no active shared references, so we can allow mutation. And what this basically boils down to is that in code, basically, you can chunk up the time which a vector is alive for or any resource is alive for. And it can either be borrowed in a shared fashion or borrowed in a mutable fashion. And it can happen a bunch of times. They just can never overlap. So those are the fundamental concepts of ownership/borrowing in Rust, this whole idea of I own a resource. I can pass it around to other threads. But I can pass around ownership so that you get to control the lifetime. But I can also lend on a borrow, where you can either read it, or you can mutate it. But I'm still in control of the lifetime of the resource itself. So I want to give you a taste of how using these two concepts baked into the language, which are somewhat simple, we can build up concurrency abstractions. We can build up these great concurrency libraries. And one of the great things about Rust is that there's all these ways to tackle concurrency. There's message passing. There's shared memory. There's mutexes. There's all these different paradigms. But in Rust, these are all 100% built into libraries. None of these are actually found in the language itself. Because they're all leveraging ownership/borrowing to give you a safe interface at compile time. And one of the other cool things we'll see about this is that you typically have these standard best practices whenever you're using these paradigms. And Rust is going to statically enforce that you must follow these best practices. You cannot break them. And I'll show you some examples as we go through. So the fundamental thing that we're trying to prevent is something called a data race. A data race is what happens when two threads, in an unsynchronized fashion, access the same memory so where at least one is a write. And in terms of C++, this is called undefined behavior. A data race will lead to undefined behavior. And basically, what's happening here is because we use LLVM as a back end, LLVM assumes that a data race is undefined behavior. So the optimizer can do whatever it wants if it detects the code could have a data race. So we need to prevent this to prevent our optimization passes from going awry and doing all sorts of crazy stuff. If you take a look at the ingredients for a data race, like if we were looking at some ingredients for the memory unsafety we saw earlier, these three ingredients of aliasing with more than one thread, with mutation, where at least one's a write, and then unsynchronized. It turns out two of these sound pretty familiar. Two of these sound like our previous system of ownership/borrowing are going to help us also forbid data races at the same time. So I want to talk first about messaging, message passing, where this is going to leverage ownership in Rust, where I have ownership of a message. Another thread comes along. I can then send them a message. And then they can also decide to send me a message. And the key idea here is that we're passing ownership of messages between threads. So typically, whenever you're using message passing, you have this best practice that once you've sent a message, you no longer access what you just sent. But you don't really have a static guarantee that you're not accessing what you just sent across the other thread. So what Rust is going to do is, because of ownership, we can ensure that once we've put a message into a channel or across the threads, I no longer have access to it. So we're enforcing this isolation of threads at compile time all with no overhead and zero-cost. So to run you through an example, we have two threads here. The parent thread takes the receiver end of a channel. And it's going to spawn a child thread, which is going to take the transmission onto this channel. And what we're going to end up with is these two pointers pointing at shared memories. The shared state of the channel itself is the queue of messages on the channel. And then as the child starts running, it's going to create a vector on its stack. It's going to push some data onto it, add some new data onto the vector. And then it's going to decide that it wants to send it along this channel. So like the moves we saw earlier, what's going to happen is this will create a shallow copy at runtime. And it'll transfer ownership of the value from the child onto the channel itself. Then it's key that this ownership is not transferring to another thread, but rather, to a channel itself. So the owner of this data is now the channel and not the threads themselves. So we come back to the parent, which decides they're the ones to receive a message, where we're going to take this shallow copy of the data from the channel, transfer ownership over to the parent. And now the parent can have access to it and do whatever it wants to it. Now, some key things to get away from this are the child no longer has access to the vector. Because we have moved the data onto the channel, there's no way for the child to continue to modify it or to mutate it. It's passed ownership. It's relinquished ownership. Whereas on the parent side, we can be guaranteed that once we've received a message, that we contain ownership. And there are no other outstanding aliases or references. We know that we are the only way people who can access this data at this time. And then the other thing is that I want to emphasize this is all a shallow copy happening here at runtime. So the data inside this vector itself was never moved around. They never copied it to other locations. All we had to do was move this point around into the same data. So the next paradigm that I want to talk about is shared read-only access. Typically, message passing is great for a lot of various algorithms and various ways to set up a system. But a lot of times, you don't want to have to send data around. You want to send it once, and then everyone has access to this larger array, or large image, or whatever you want. And what Rust has for this is something that we call Arc. And Arc stands for Atomically Reference Counted. And what this means is that it's a reference count on top. And then we will modify the reference count atomically. And the actual memory representation for this pointer is we store the reference count inline with the rest of the data in the Arc itself. So the data fields of this vector found at the end of the Arc pointer. But all we have to do is tack on this reference count. So this is the whole zero-cost aspect of-- the Arc isn't giving you any extra layers of indirection than you're already asking for. But one of the key aspects about Arc is when you create it, it took ownership of the value. So this Arc consumed ownership of the vector when it was created, which means that it knows it has a static guarantee that there are no aliases. There's no mutable references. There's no one else that could possibly access this data when it was created. So the Arc has complete control over how it gives you access to the internal data. So being a shared read-only state primitive, we're only going to allow shared references from this. So from an Arc, we can get a reference to a vector of integers. And this, in a concurrent context, is immutable. So we're totally safe from data races in this case. Because there's no mutation going on here. We're only allowing aliasing, but we're forbidding the mutation aspect. And so the key idea here is that we can use ownerships to understand that we control access. And then using borrowing, we can control what kind of access we give you in the Arc. So we're not giving immutable access. You can't mutate, which is kind of the best practice in shared state concurrency. But it's not necessarily enforced at compile time. And you might accidentally have some mutation occasionally. But in Rust, it's totally forbidden at compile time. So the next thing that I want to talk about is locked mutable access. This is when you have some mutexes use some extra synchronization to prevent threads from running in simultaneously. So we'll take a look at a small example here where all we do is we take a mutex, we lock it, push some more data onto it. And the first thing you'll notice is this type parameter. We have a mutex of a vector of integers. You don't see this a lot everywhere else. This is kind of following the paradigm of you locked in a not code. So typically, you'll see mutexes around a certain block of code. And then you'll access some data inside of it. And it's this implicit assumption that you'd better access that data only inside those mutexes. Because if you access it outside, you might be causing some data races. So in Rust, this is a static guarantee that you're given. You are given a mutex which is explicitly protecting the data that it is containing. And then like Arc, we have complete control over the ownership of the data when it was created. So the mutex is only allowing you access once you've actually locked the mutex. So in this case, when we lock the mutex itself, we're getting a sentinel. We're getting kind of a placeholder value which serves as a smart pointer, in a sense, to the data that's inside the mutex. So through this, we can mutate it. We can read it. So in this case, we're going to push some data onto it. But the key idea here is that we can only access the data through this guard. And we can only get this guard if we've locked the mutex. So this is where we are protecting this data. And you can only ever access it if you acquire the mutex, preventing the data races that would happen if you access it outside. And the other great thing about these mutexes in Rust is that we know exactly when this lock is going to be released. So, because we have deterministic destruction like in C++, when this guard goes out of scope, it's going to automatically unlock the mutex that it was associated with. So another key aspect here is that if I try and borrow the data from this guard, if I try and take a borrow out of the Rust lifetime system, like I was talking about earlier with these scopes, can prevent us from having that reference outliving the guard itself. So we can make sure that all references, even shared references which you'll pull out from this guard, are only ever accessible while the mutex itself is locked. So we can use all these ownership and borrowing guarantees that Rust gives us to make sure that because the datas only have access in lock, we totally prevented data races here. And the last thing that Rust will give us is these extra tools to check whether types are sendable across threads, for example. This is one example of a trait that Rust gives. So what this function is saying is that I'm a function which can transfer any type which is sendable to other friends. And then an example of this is Arc, like we saw earlier, is indeed sendable to another thread. But its sibling, Rc, which stands for reference counted is not sendable to other threads. This is a key difference where Rc, the modifications to the reference count, are not done atomically. They're much, much cheaper. So it's much faster for all these reference counts of Arcs. But if an Rc were sent to another thread, because it was a non-atomic mutation, you could have a data race. So at compile time, we're able to say in Rust that Arcs are sendable, but these Rcs are not. And this is in contrast to C++'s std::shared_ptr, if you're familiar with that. They don't actually know whether this class is going to be sent across threads. So you have to pessimistically assume that it will be. So even if you know for a fact that your references are not actually escaping the thread you're using them in, you still have to pay for this atomic reference count overhead. So what this means is that in Rust, you can pick and choose whether your primitives are going to be thread-safe or not thread-safe. And if they're not thread-safe, you can be guaranteed at compile time that you're still not going to have any data races. These Rc pointers will never escape the thread that you're using them in. And if you do actually need that, then you can yourself opt-in to the atomic reference counting overhead, which will be necessary. So that's an example of how using ownership and using borrowing, we can use these tools to give us all these great concurrency primitives, all these primitives like message passing, shared-state concurrency, mutex. They're all built into the standard library of Rust. They're not actually in the language itself, which allows us to iterate on them, to make tweaks to them, and basically extend the language and then grow it a little bit to beyond just the concepts of ownership and borrowing. And I want to go into now how these are implemented, what's actually going on under the hood. Because Arc is giving you this facade of shared ownership, actually. There are multiple references that can access this data. So this strict concept of ownership or this strict concept of borrowing doesn't always apply. And this is where unsafe Rust comes into play. So Rust has this notion of unsafe code. It's a block of code delineated saying that unsafe operations can happen within this block. And this is useful for doing things like talking to other languages, for like CFFI bindings. Because the compiler has no idea when you call a function, what it's actually going to do on the other side. So it has to assume pessimistically that something memory unsafe is going to happen. You have to opt-in to saying, no, it actually won't. And it's also great for what I was saying earlier about building Arc or building Vec. We can use unsafe code to build up new abstractions in the language. And the key idea here is that because we have told the compiler, trust me. I know what I'm doing within this unsafe block, I will maintain memory safety myself. We can make this safe abstraction around it. This is the crux of Rust, is building these safe abstractions, which might have small, unsafe internals. But the safe abstraction is what everyone relies on as part of the public interface. So the way this typically works is the safe abstraction, the safe layer, will do some dynamic runtime checks, like making sure that indexes are inbound. Or making sure some other dynamic invariant that's very tough to reason about at compile time. And then it's going to add on top of that the guarantees it gets from ownership and borrowing, these static guarantees of, if I owned it, I'm the only one, or share references I can only read from, and mutable references I can either read or mutate. But it's totally unique. And using all these static guarantees, you can bend Rust a little bit in the ways that you'd like with unsafe code. And so you might be thinking, well, if we have unsafe code, haven't we just made Rust's memory unsafe? Haven't we just broken down this idea on saying Rust is a safe language? And it turns out the way this works in practice is that ownership/borrowing cover almost all use cases I've ever seen in terms of how you would architect a system or how you would design it. This concept of, I can either borrow it in a shared fashion or borrow it in a mutable fashion. It encompasses, well, maybe with the little tweaks of what you're already doing today. This is already the best practice of how you're accessing data. And this is just codifying it at compile time. And as a testament to this, Cargo, which is Rust's package manager-- I'll talk about that in a second-- and Skylight, like I said earlier, have zero unsafe blocks. There's no unsafe code in these projects, except for [INAUDIBLE], which is kind of assumed. And these are fairly substantial projects that are doing some pretty crazy internal things, things with ownership, things with borrowing. And they haven't felt the need to fall down to this unsafe code to break any invariants. So they've been able to fit entirely within this programming model. But the great thing about unsafe, like I was saying with Arc, is we can extend the programming model. If you do find yourself needing to drop down to unsafe or to have a little bit of unsafe code, what it means is you're probably writing a very small primitive, a very small thing that you can very easily reason about and say, yeah, this is safe. The compiler can't totally reason about this. But is a safe extraction. And using that, you can add new primitives. Like we have these things called Cells, or RefCells, x or Mutexes, or Arcs, all this fun stuff in the standard library. And you could build on top of what we're given by ownership/borrowing by bending ownership/borrowing just a little bit to give us these new set of primitives that we can leverage and take advantage of. So that's ownership/borrowing in a nutshell, how we can use it to build concurrency. And then how we can use all of these guarantees to use unsafe code to build new primitives. And I'm going to talk a little bit now about using Rust today, just a quick tour of some of the stuff that we have and the tools that Rust provides. One of the primary ones is Cargo. So I said this earlier, but Cargo is Rust's package manager, where what it does is it manages all of your dependencies. It's very similar to Ruby's Bundler. And it'll take care of building our dependencies, building transit dependencies making sure that all your system libraries are in place. It'll take care of all this logic, allowing you to get to writing your application. So one of the great things about this is we can guarantee reproducible builds. So this is a very important part of Bundler, if you're familiar with that, with locked files. And we took the same concept to Cargo, where if I ever build a project I can then be guaranteed, if it's successfully built, that later in the future, I can build that project again. I don't have to worry about getting all the same versions of all the dependencies and making sure all the transitive dependencies work out. I can be very confident that it was the same set of code will build again. And then all the libraries today for Cargo are crates, and they're all hosted on crates.io. We have a very booming and active community today. And this has been absolutely critical to Rust's initial surge into the market. A lot of Rust's initial success has been-- it's very, very easy to put your code up on crates.io and to also use everyone else's. So even now it's quickly becoming the case that if you have some pretty common use case, like gzip or bzip or various compression, or XML, or parsing JSON, it's already there on crates.io. And it's very easy to pull those crates down, use them yourself, browse documentation, and go through the whole using everyone else's Rust code yourself. And a little bit on Rust itself. Rust itself recently reached the 1.0 stable status this past May. So this is a very large milestone for Rust, reaching the sustainable aspect. We're no longer breaking the language. We're no longer breaking the standard libraries. You can guarantee that Rust is not changing in the future. And coming along with this is this release training idea where we have this pipeline of channels where we have the stable channel, the beta channel, and the nightly channel where we promote these at a regular cycle, kind of like web browsers. And we'll see that we have a whole stabilization pipeline for features where the new features, we can iterate on very quickly in nightly. And you can have access to them basically immediately, as soon as they're implemented. You can give us feedback. We can fix bugs. We can iterate on them. Or we prevent them from leaking into the beta and the stable channels. But then once a feature is actually stabilized in the nightly channel, it's a very short amount of time before it actually gets into beta, gets into stable, and everyone can start using it. So the key aspect here is "stability without stagnation." When you're using Rust, you can-- we're not done with Rust. We want to keep adding on new features. We want to keep adding new standard library APIs or new libraries themselves. So over time, we're going to be adding these things to Rust itself, but while at the same time, giving you a very strong stability guarantee. We're not going to be breaking code willy-nilly. We're not going to break into the language itself. All right, so some of the high-level conclusions of this talk that I want to make sure you walk away with is that Rust is combining these high-level features of JavaScript and Ruby with the low-level control of C and C++. So it's kind of unioning these two together with both safety and control. And Rust can give you very strong safety guarantees beyond what a garbage collector can give. So, for example, we have deterministic destruction. We forbid all data races. We forbid iterator invalidation. All of these happen for free at compile time. And you can basically program very confidently. You can have fields concurrency. You don't have to worry about these kinds of bugs happening in your code because Rust is providing them all at compile time. All right. And that's what I have for today. So thank you for coming, and are there any questions? Yes. [APPLAUSE] AUDIENCE: I [INAUDIBLE] for two weeks. And you advertise it as a systems programming language. And one problem I've had is it must be that the standard library assumes allocation of the parents, basically. So are you at all interested in making a [INAUDIBLE] standardizing library of [INAUDIBLE] and other facilities which return allocation values? Or you're simply [INAUDIBLE] how to use this [INAUDIBLE]? ALEX CRICHTON: I am very much interested in this, actually. [? RAIF LEVINE: So ?] could you repeat the question, please? ALEX CRICHTON: Yes. So the question was, we have a standard library today. We advertise ourselves as a systems programming language, but the standard library assumes that allocation succeeds. It assumes it does not fail, so it'll abort the process if it does. So he's wondering whether we have any plans to kind of expand this where we can have optional-- like, we can say whether allocation is failed or not by having the actual return value. And I can decide what to do with that. And the answer is yes. I very much want to be able to do this. So right now, we have this distinction where we have the standard library. And underneath it, we have a whole facade of libraries at the core, which is actually libcore. And libcore is this library which doesn't actually assume allocation at all. It can't even allocate. And then once we eventually hit a point where we do assume allocation succeeds, we can build more things on top of that. So this will all start by libcore itself will be stabilized. We want to export that as part of the stable channel of Rust itself. And that will give you access to most of the language, not the collections, not the pointers. And then the story there is a little bit more murky in the future. I very much do want an Arc where the only difference is the new method that says whether it failed to allocate or not. We don't have concrete plans and designs, just how to go in that direction. Because the way to start off is to get the core aspect worked out. But it's definitely coming, and it's definitely use case that Rust very much wants to fill into. Yes. AUDIENCE: So there are a number of other tech systems that deal with these type of things, like linear and [INAUDIBLE]. And I was wondering if you could tell us about what full programming languages or [INAUDIBLE] systems you looked at when shooting this design. And what you liked, and what you disliked. ALEX CRICHTON: Yeah. So the question is, there's a lot of type systems like Rust that's linear affine type systems and what languages and type systems we've looked at to influence Rust design, how it's come over the years. And so, like I say, I personally have not been that influential in the design of the core type system. But from what I've heard, it's definitely linear affine types. There's lots of papers on that which are very influential on Rust itself, especially when determining the linearity references and things like that. But I think Cyclone might have been one of the larger languages that we've drawn from in terms of drawing the experience from and seeing what's happened there. But overall, it's actually drawn a lot from actual languages like C and C++, like going back to our fundamentals of seeing what it actually means to have no runtime, kind of in that aspect. So we've drawn things from other places. But those are the big ones that I know of. And I'd have to refer you to others to go into more detail about some of the academic research going on there. Yes. AUDIENCE: Would you a [INAUDIBLE] what was to get about the same as us? ALEX CRICHTON: I'm sorry? AUDIENCE: [INAUDIBLE] in C++, can you at least take C++ by compile and changing [INAUDIBLE] to get about the same as us when we're equal in everything else? ALEX CRICHTON: So the question is, can we bolt safety onto C++? Can we tweak the standard library, tweak the compiler? Can I get these safety guarantees? Maybe check out something on safe primitives so we get a safe language. And the key, I think here is, we've seen with C11 and C++ 14, these new standards. They have things like unique pointer. They have shared pointer. They have lock guards. They have all this great new stuff. But the problem is you can still misuse it. It's still very easy to use these standard primitives, which are pretty safe. But you can use them in unsafe ways. AUDIENCE: Plus they can't [INAUDIBLE] on the [INAUDIBLE]. It can break it, and it's a lot of problems. ALEX CRICHTON: Yes. Those things, they have limits of backwards compatibility. And if we could break it, we could actually fix a lot of things. And it's kind of an expense at some point. So if you're moving C++ in a direction where you're stripping away all the unsafe features, stripping away all the stuff. Than at what point do you actually create a new dialect of language? And from what I've seen in C++ and Rust, is the move semantics are very radically different in Rust as they are in C++. There's no move constructors. There's no copy constructors. There's no extra stuff happening there. And the other major thing is references. Dealing with lifetimes, I think, is just something you fundamentally cannot bolt onto C++ and still have the same language. And those are very unique to Rust. And there's the underpinnings of the safety itself. So those two aspects tied together, I feel, we could add them to C++. We can kind of tweak C++ and break it in various ways. But at some point, you really have just created a new language at that point. But definitely today, in terms of backwards compatibility, there's just no way that we could actually bolt on safety to C++. You can have all the static analysis you want. We've actually talked to tons and tons of people in various aspects of industry. They all have huge amounts of static analysis. But everything falls down at some point. It catches 90%, 95% of bugs. But that 5% leaking through is still going to have memory unsafety, security vulnerabilities, and all that good stuff. Yes. AUDIENCE: [INAUDIBLE]? ALEX CRICHTON: Yeah. So the question is, one of the great things about an existing language is we already have tons of libraries. We have decades of code, decades of experience writing all this. So how's Rust going to do all this? How's Rust going to get these libraries' bootstraps, get this up and running, how's the community running? So what I would say to that is this is where crates.io has been absolutely critical to having it there for Rust 1.0. With crates.io, it's incredibly easy to share libraries, to share our code. We reduced the barrier entry to writing this code, and publishing it, and getting everyone's access to it, so a very, very small amount. And this is coupled with there are not many other systems languages-- I mean, you would never find a package manager for C and C++, like to actually build the C and C++ itself. It was just uniform across all projects. So Cargo has been also absolutely instrumental here in terms of making it very easy to depend on everyone else's code. So it's kind of reducing friction as much as possible and making it easy to jump in there. So we ourselves are trying to lead the-- we've led the way in terms of stabilization of the standard library. We have set some common idioms for the language, some common interfaces. But the libraries themselves, we unfortunately don't have the manpower to invest in these very, very high quality, very large libraries that one might expect. So we have implementations in existence today, but they're kind of in progress. They're works in progress. They're all kind of an open source. So it's definitely a downside. This is something that will take time to flush out these common libraries you would expect in C and C++ in Rust itself. But a lot of large ones are either very much underway, feature complete, or they're on their way coming. That answers your question? Yes. AUDIENCE: So I'll keep continuing on that line of thought. I know Servo has been felt throughout the closest thing to Rust. And so, my question is really about how big you intend to [INAUDIBLE]. So in a project like Chromium, or I'm sure Mozilla lets you have-- there's like base, which has a lot of these Chromium abstractions, really standard library type stuff that a really big part of my web browser needs to use. So does Servo define the sorts of abstractions for itself. Or do you intend the standard library for Rust to be the standard library for Servo? Or what's the story there? ALEX CRICHTON: Yeah. So the question is, where do we see the standard library going? Is it going to be a very large standard library with tons and tons of things? For example, it's very heavily influenced by Servo in terms of their needs, and how those two are going to play out, and whether Servo has its own standard library. So the approach we have taken with the standard library today is fairly conservative. So it's not a very large standard library. I would not use the adjective "batteries included" to describe it. And the reason for this is, having gone through the whole stabilization process, there just wasn't enough that we felt comfortable stabilizing. Did I do something wrong? [? RAIF LEVINE: Yeah, ?] the slides aren't projecting, but I think that's OK. ALEX CRICHTON: OK. So Servo has, at this point, like 150 crates that it's depending on, 150 various libraries that are being built through Cargo. So they're taking the same aspect of they're probably not going to create one monolithic standard library, but rather have lots of little components here and there. So the role of the Rust standard library is going to be to define the common interface for the entire language, so iterators, vectors, hash maps, the option type, the result type. These common abstractions you see in basically 99% of Rust code all over the place, that belongs in the standard library. Once you start getting above that, like some various futures or various weird concurrency primitives, those might be outside the standard library, but in their standalone creates. But we still officially support them. So I suspect the trajectory over time is to stay with a relatively conservative standard library, but a very diverse set of crates that are very small that just build on top of the standard library and kind of work like that. Yes. AUDIENCE: So my question basically is, all the codes I work on are kind of unique in a way that they're not many that are interested in, right? I mean, but there's this core library in C++ usually that everybody's using. And my question is then how are you doing things around each-- basically, our language is wrapping stuff. And how will you handle basic safety in such a department? ALEX CRICHTON: So the question is about how Rust interacts with other languages and how we deal with the safety around that in terms of talking to big projects elsewhere. So this is a very important aspect of Rust. We know that the world is not going to be rewritten in Rust overnight. We've got to deal with existing libraries. So Rust has the FFI boundary of Rust, allows you to just kind of call on external languages. It's just a call instruction. There's no overhead from this. It's exactly what you'd expect from C. So an example, this is Cargo, the package manager, uses libgit2, which is a library written in C for managing Git repositories. And we just call straight out to that. And then to maintain the safety of Rust itself is where ownership, and borrowing, and lifetimes come into play. So that's where you can create a safe abstraction around calling these unsafe libraries. So the C APIs typically have a method of saying, here's when you create the resource. And then you deallocate it when it goes out of scope. So for example, the destructor for that type, we call the free function. And then when you access an internal value, you know you have the ability to tie the lifetimes together. So in return to them, it says, this is only valid while the struct is valid. So you can construct-- you can add, basically, the Rust type system on top of an existing C API. So you can kind of use these static guarantees of Rust, this ownership/borrowing, mutable references being unique, all that good stuff to maintain the safety. You have to be the one that actually creates that API and creates the safe interface to it. Does that answer your question? Yes. AUDIENCE: How do you manage quality and safety of the crates that are posted on [INAUDIBLE]? ALEX CRICHTON: So the question is, how do I manage the quality and the safety of the creates in the crates.io? And this is actually a very interesting question. So the quality itself, we have a small set of crates that we curate. They're hosted in the Rust lang organization itself. So I would not describe them all as incredibly high quality. But we maintain them. We fix bugs. We push updates. We have a lot of continuous integration for them. But in general, you don't actually have this kind of guarantee of quality. You don't know the tests are running. You don't know that it works on your platforms. And this is stuff that we would like to expand to in the future. But it's not something that we're actively pursuing [INAUDIBLE]. We're not going to give C [INAUDIBLE] the whole Rust community. We'll very, very strongly encourage it, use Travis, or use AppVeyor, or whatever open source solution you want. But you don't have the stronger NT coming in. You have to be able to Google around the libraries ahead of time. And it's the same thing with safety. There's not exactly a badge on crates.io that says, this crate has unsafe. You can't use a [INAUDIBLE]. If you use it, you might have unsafe code. So the community is very active in terms of these kind of high profile libraries. If there's unsafe code inside, and it actually is legitimately unsafe, then bug reports will be opened. And it'll be fixed very quickly. But overall, we don't actively curate crates in terms of auditing them for security, or auditing them for quality, or anything like that. Yeah. AUDIENCE: What about other [INAUDIBLE]? What if the person who was maintaining some crate that everybody needs gets hit by a bus? ALEX CRICHTON: Oh, so the question is, what do we do about ownership of crates in terms of what if the original owner just disappears? And we have systems for-- we can transfer ownership of crates here. We can transfer ownership between owners. Or, I mean, if someone's random crate ends up disappearing, then we probably don't take ownership of it. We're not holding ourselves responsible for all libraries and crates on crates.io. In the back. AUDIENCE: [INAUDIBLE]? So what is the current state of [INAUDIBLE]? ALEX CRICHTON: The question is, what does the performance of Rust look like? Because we're using LLVM as a back end, kind of relative decline? And I can say, we're basically on par with C++. We don't like disabling the optimizations, or we don't have any other fancy trickery going on there. But if you use Rust, you're going to get basically what you would get in C++. I don't know, is there a remote question? I don't know if it's switch stream. AUDIENCE: Yeah. You were doing a lot of type checking to get these guarantees. Is that drastically increasing your compile time? And is that scaling significantly to project size? And is there a point where project scale gets unwieldy, especially for iterative development? ALEX CRICHTON: I'm not going to repeat this because it was online. But anyway, yes, so this is an interesting question. And the compilation model for Rust is fairly different than C++. So in C++ compile one object at a time. And you include all your parse headers. But Rust, it's actually essentially a library at a time. It's a crate at a time. So when you compile a Rust crate, you're compiling essentially all the C++ files at the same time, kind of in similar models. So in that sense, the compile time for-- like the incremental compile time for one Rust library is going to be higher than a C++ library, because you have to recompile the entire library. But in terms of type checking, it's fundamentally way faster than C++, because we use traits in our generics. And we do not need to type check them after we've instantiated them. So we can type check everything once. And then we don't have to worry about it ever again. And when you depend on an upstream create-- so, for example, if I include some header file from a C++ library, I don't have to re-type check that every single time you run the compiler. So compile times are a little bit of a problem today in terms of because we're compiling all these big crates. So there's a lot of efforts underway. Like even today, by the end of the year, we're probably going to have incremental compilation, pipeline compilation, parallel compilation, all these great aspects. But in terms of just the raw compile time which you'd expect today, if you start from zero, Rust code will compile a little bit faster than C++. If you go in an incremental fashion, C++ might be a little faster because we don't have the incremental aspects. And I think there's one other aspect to your question. Or did I hit all your points? Oh, scalability, yes. So scalability, I think it definitely scales quite well today. We haven't run into lots of problems. Because the compile time for one crate can be a little high we might have to break that library into two separate crates. But that's generally a good exercise to do anyway. But Servo has not had problems with compile times that would not be solved with incremental compilation, for example. AUDIENCE: Just for reference-- hi, I'm Chandler. I work on [INAUDIBLE] in LDM. But I wanted to let you know that the [INAUDIBLE] compiler spends almost no time type checking C++. It's not really measurable as part of C++ compile time for us. So type checking is never a compile time problem that we've run into. ALEX CRICHTON: Interesting. AUDIENCE: Alex? ALEX CRICHTON: Sure. AUDIENCE: What's the largest Rust code base you're aware of on the client and also on the server, if any? ALEX CRICHTON: So the two largest Rust code bases I know of are Servo and Rust. I don't know if you would-- in terms of a server client, like an HTTP server or an HTTP client, I don't think we have that large code base. It's not quite the niche that Rust is filling right now. So those would be the two. I guess you could qualify Servo as a client, as opposed to a server. But in terms of the largest server, I know crates.io was entirely written in Rust. But that would be the largest one that I personally know of. You had a question? AUDIENCE: How good is your debugging support? ALEX CRICHTON: The question is, how good is the debugging support? AUDIENCE: Yes. That's right. [INAUDIBLE] debugging, and debugging of the [INAUDIBLE], and in cases where it sometimes has a few bugs. ALEX CRICHTON: Sure. So the story here is, if you can debug C++, you can debug Rust. So we use LLVM as a back end. And we have debugging go through that. We have DWARF debug info. We have GDB integration in terms of pretty-printers. So if you can debug C++, you can debug Rust. So whatever you would expect to do there, you can just do the same thing. Because as we're using LLVM as a back end, it's all just native object files. We're using the standard system linker, and all that good stuff. So the standard C++ tools-- and this applies to profiling, testing. All these kinds of various analyses will apply to Rust as well. AUDIENCE: One more question ALEX CRICHTON: One more question? All right. Thank you so much. [APPLAUSE]
B1 rust ownership vector data pointer library The Rust Programming Language 137 6 Amy.Lin posted on 2017/02/16 More Share Save Report Video vocabulary