Subtitles section Play video Print subtitles Gregorio: Hi. I'm Joe Gregorio, and I work in Developer Relations at Google. This talk is on REST and, in the talk, I presume you're familiar with the Atom Publishing Protocol. If you're not, you can watch my other video "An Introduction to the Atom Publishing Protocol," and then come back and watch this one. So let's begin. You may have heard the term REST, and a lot of protocols these days are advertising themselves as REST. REST comes from Roy Fielding's thesis and stands for Representational State Transfer. It's an architectural style. Now, an architectural style is an abstraction as opposed to a concrete thing. For example, this Shaker house is different from the Shaker architectural style. The architectural style of Shaker defines the attributes or characteristics you would see in a house built in that style. In the same way, the REST architectural style is a set of architectural constraints you would see in a protocol built in that style. HTTP is one such protocol. And, for the remainder of this talk, we're just going to talk about HTTP. And I'll refer back to the architectural constraints of REST as we work through that example. Now, it's simply not possible to cover every aspect HTTP, so at the end of this presentation there will be a further reading list, if you'd like to learn more. So why should you care about REST? Well, it's the architecture of the Web as it works today. And if you're going to be building applications on the Web, shouldn't you be working with the architecture instead of against it? And, hopefully, as you see us go through this video, there will be many opportunities for increasing the performance and scalability of your application, and solve some traditionally tricky problems by working with HTTP and taking full advantage of its capabilities. Let's get some of the basics down, some nomenclature in the operation of HTTP. At its simplest, HTTP is a request response protocol. You browser makes a request to the server, the Web server gives you a response. The beauty of the Web is that it appears very simple, as if your browser is talking directly to the server. So, let's look in detail at a specific request and response. Here is a GET request to the URL http://example.org/news and here's what the response looks like. It's a 200 response and what you're seeing here are the headers and a little bit of the response body. The request is to a resource identified by a URI, in this case like I said, http://example.org/news. Resources or addressability is very important. The request is to a resource identified by a URI. In this case, http://example.org/news. The URI is broken down into two pieces. The path goes into the request line, and you can see the host shows up in the host header. There is a method and that's the action to perform on the resource. There are actually several different methods that can be used, GET, PUT, DELETE, HEAD, and POST among others, and each of those methods has particular characteristics about them. For example, GET is safe, idempotent, and cacheable. Cacheable means the response can be cached by an intermediary along the way, idempotent means the request can be done multiple times, and safe means there are no side effects from performing that action. So PUT is also idempotent, but not safe, and not cacheable. Same with DELETE, it is idempotent. HEAD is safe and idempotent. POST has none of those characteristics. Also returned in that response was the representation of that resource, what lives at that URI. The representation is the body and, in this case, it was an HTML document. HTML is a form of hypertext, which means it has links to other resources. Here is a traditional link that you would click on to go to another page, but there's more than one kind of link. Here is a link to a CSS document that the browser will call and include to style the page. There's also other kinds of links. Here's one to a JavaScript document that will get pulled in. This is a particularly important kind of hypertext or document that's pulled in. This is called Code on Demand, the ability to load code into the browser and execute it on the client. The response headers show control data, such as this header which controls how long the response can be cached. So now that we've looked at simple HTTP request and response, let's go back and look at some of the characteristics that a RESTful protocol is supposed to have. Application state and functionality are directed into resources. Those resources are uniquely addressable using a universal syntax for use in hypermedia links. All resources share a uniform interface for transferring the state between the client and the server consisting of a constraint set of well-defined operations, a constraint set of content types optionally supporting Code on Demand, and a protocol which is client-server, stateless, layered, and cacheable. Now that we've already talked about many of these aspects with HTTP, we can see that we already have resources that are identified by URIs, and those resources have a uniform interface understanding a limited set of methods such as GET, PUT, POST, HEAD, and DELETE, and that the representations are self-identified, a constraint set of content types that might not only be hypertext, but could also include Code on Demand such as the example we saw with JavaScript. And we've even seen that HTTP is a client-server protocol. To discuss the remainder of the characteristics of the protocol, we need to look at the underlying structure of the Web. We originally started out with a simplified example of how the Web appears to a client. Let's switch to using the right names for each of those pieces. They're the user agent and the origin server. The reality is that the connections between these pieces could be a lot more complicated. There can be many intermediaries between you and the server you're connecting to. By intermediaries, we mean HTTP intermediaries, which doesn't include devices at lower levels such as routers, modems, and access points. Those intermediaries are the layered part of the protocol, and that layering allows intermediaries to be added at various points in the request response path without changing the interfaces between components where they can do things to passing messages, such as translation or improving performance with caching. Intermediaries include proxies and gateways. Proxies are chosen by the client, while gateways are chosen by the origin server. Despite the slide showing only one proxy and one gateway, realize there may be several proxies and gateways between your user agent and origin server, or there may actually be none. Finally, every actor in the chain, from the user agent through the proxies and the gateways to the origin server, may have a cache associated with them. If an intermediary does caching and a response indicates that the response can be cached, in this case for an hour, then if a new request for that resource comes within an hour, then the cached response will be returned. These caches finish out the major characteristics of our REST protocol. Now, we said this architecture had benefits. What are some of those? Let's first look at some of the performance benefits, which include efficiency, scalability, and user perceived performance. For efficiency, all of those caches help along the way. Your request may not have to reach all the way back to the origin server or, in the case of a local user agent cache, you may never even hit the network at all. Control data allows the signaling of compression, so a response can be GZIPPED before being sent to the user agents that can handle them. Scalability comes from many areas. The use of gateways allows you to distribute traffic among a large set of origin servers based on method, URI, content type, or any of the other headers coming in from the request. Caching helps scalability also as it reduces the actual number of requests that make it all the way back to the origin server. And statelessness allows a request to be routed through different gateways and proxies, thus avoiding introducing bottlenecks and allowing more intermediaries to be added as needed. Finally, User Perceived Performance is increased by having a reduced set of known media types that allows browsers to handle known types much faster. For example, partial rendering of HTML documents as they download. Also, Code on Demand allows computations to be moved closer to the client or closer to the server, depending on where the work can be done fastest. For example, having JavaScript to do form validation before a request is even made to the origin server is obviously faster than round-tripping the form values to the server and having the server return any validation errors. Similarly, caching helps here as it requests may not need to go completely back to the origin server. Also, since GET is idempotent and safe, a user agent could pre-fetch results before they're needed, thus increasing user perceived performance. Lots of other benefits we won't cover, but these are outlined in Roy's thesis. But all these benefits aren't free. You actually have to structure your application or service to take advantage of them. If you do, then you will get the benefits. And if you don't, you won't get them. To see how structuring helps, let's look at two protocols: XML-RPC and the Atom Publishing Protocol. So this is what an XML-RPC request looks like, and here's an example response. All of the requests in XML-RPC are posts. So what do the intermediaries see of this request response? Is it safe? No. Is it idempotent? No. Or is it cacheable? No. If they are, the intermediaries would never know that. All the requests go to the same URI, which means that if you're going to distribute many such calls among a group of origin servers, you would have to look inside the body for the method name. This gives the least amount of information to the Web, and thus it doesn't get any help from intermediaries and doesn't scale with off the shelf parts. So let's take a look at the Atom Publishing Protocol. So for authoring to begin in the Atom Publishing Protocol, a client needs to discover the capabilities and locations of the available collections. Service documents are designed to support this discovery service. To retrieve a service document, we send a GET to its URI. GET is safe, idempotent, cacheable, and zipable. The response type is self-identifying. As you can see, there's a content type header of application Atom Service plus XML that self-identifies what the content is specifically, and the response itself is hypertexted. It contains URIs for each of the collections. That's what's highlighted, in this slide here, is the relative URI for the collection. Once we have a collection URI, we can post an entry to create a new member, and then GET, PUT, or DELETE the members at their own URIs. So here's an example of a GET to a collection document. Again, this is safe, idempotent, cacheable, and zipable. The response is also self-identifying here as you have another content type, application/atom+xml. And again, the response is hypertext. Lastly, the edit URI identifies where the entry can actually be modified. That URI, you can do a GET to, to retrieve it, you can send a PUT to update the resource, or you can send a DELETE to remove it from the collection. So as you can see, the Atom Publishing Protocol is designed with RESTful characteristics in mind and gets many advantages from intermediaries and the network itself as those messages transfer back and forth. So, let's look at some of the other idioms that you can use in building your RESTful protocol to get some of the advantages. For example, long-lived images. If you have large images that need to be transferred back and forth as part of your Web page, what you should do is set the cache for those images to be very long. If you need to update those images, upload a new image to a new URI and change the HTML to point to that new URI. Here's an example where I have big-image.png. And, if we retrieve that image, you'll see that the cache control header has been set to a very long time. In this case, 30 days. If we made a mistake, or we'd like to update that image, what I need to do is upload a new image, big-image-2, set the cache control for that to be very long, and then update the HTML. The idea here is that you keep the HTML with the short cache lifetime, and thus you can update that easily. So there you go, a high level view of REST and how it relates to HTTP. Here's the list of further reading that I had promised you. "RFC 2616" actually outlines what HTTP is. "RFC 3986" outlines the URI standard. You can read Roy Fielding's thesis, "Architectural Styles and the Design of Network-based Software Architectures." And there's also this "Caching Tutorial" by Mark Nottingham which covers in detail many of the things we just talked about. Thanks and have fun.
B1 US server protocol response request origin architectural Intro to REST 74 6 Victor Huang posted on 2015/12/26 More Share Save Report Video vocabulary