Subtitles section Play video Print subtitles (smooth electronic music) - Hello, everybody, thank you very much for coming. Good morning. Hope you are doing well. I'd like to talk today about conflict resolution in distributed systems, that is if several people change some data independently of each other, what happens, how do we resolve those conflicts that'll occur. My background is I'm a researcher at the University of Cambridge. I was previously in industry and a bunch of internet start ups. So I worked at Linkedin for a couple of years, for example. At the moment, I'm working on this research product called Trve Data. Spelled T-R-V-E and what we're trying to do here is to bring end to end encryption to a larger range of applications. So think something like Google Docs where several people can edit a document at the same time online but without having to trust Google servers because what we want to do is to be able to put data on various servers in the cloud but not have to worry about what happens if they get compromised or so on. So that's kind of the background of all of this. I'm not talking about the encryption and the security protocols today, I'm only focusing on one little piece of that whole project which is what happens if several people edit data at the same time and how do we resolve that. So I'd like to start with a scenario that will probably be familiar with you, which is you, a little blue stick figure here, are hacking on some code on your computer and at some point you decide that this code is done and you commit it using your favorite version of control system. I'll just use git as an example here and so at this point, you'd put the code in the repository and then maybe you push it somewhere so that other people can see that `code as well. So in the case of Git, you maybe you'll push it to a repository on Github and this is now the communication mechanism for people with your team. So if there's somebody else, say, this little red stick figure who is also hacking on code, then well you can synchronize up through the central repository. This is all very familiar, this is what we do everyday and so the little red person here might independently at the same time also be working on the same code base and also do a commit and I'll, what happens if you this person now fetches from Github, well they'll have to either do a merge or rebase if that's how your work flow goes or something along those lines. So somehow these changes are going to have to be combined together and as you've probably experienced, if people change different files in the same repository, that's no problem, they will just get merged cleanly. If one person changes the beginning of a file and another person changes the end of the file, that's probably okay because the version control system will merge them automatically. If people change the same part of the same file, then you're going to have to resolve the merge conflict yourself and then we have these tools for doing three way mergers for copying patches from one side to another and figuring out what the results should be. So you've probably have to find with things like this. This is exactly the kind of problem I'm talking about. But this problem happens not only in software development. It's a very general purpose problem. So imagine you're a lawyer working in a law firm and maybe there's a contract being negotiated with, you've got a client on one side and the other company's law firm on the other side and everybody is sending these versions of contracts back and forth and the contracts are probably Microsoft Word documents because that's how lawyers work and they send these things by email and so you've got one person making changes to these Word documents and then hit save and then at the same time, maybe somebody else, maybe at the different company is also updating the same document. Changes it and now you email these changes to each other and so this is actually very much the same data flow and I actually just reused the same diagram and changed the labels. You've got the email as the communication path and at some point these changes are going to have to be merged together and now before Microsoft Word, I'm not sure there is even this kind of nice user interface for three way mergers. I know you can compare two documents but I think what people end up doing is manually copying the changes from one version of the document to another and so performing this merge really manually. So that's kind of this oh crap situation here. So in this case, it's kind of best to have like an informal lock where one person says, okay I'm going to be editing the document now. Please don't change it for the next day. I'll send it to you and then you can edit it. So people try to sequence their updates like this crude manual communication. What happens in another? Let's look at the third example. Let's look at a to do list. So this is maybe a shared to do list where me and my wife together have this shopping list where I can add stuff and she can add stuff and then whoever next goes to the shop can buy those things. So here buying milk is added to the to do list and let's say this to do list is stored on a central server. Now so thats what allows us to communicate and so I add buy milk to the to do list and press okay button and so it does a request, like maybe an http post request over the network, stores it there, it comes back and says okay that was added to the to do list and at the same time, maybe my wife goes, oh you need to water the plants, so I just remembered. So add stuff to the to do list also does this post to the server and comes back okay and so in this case, actually what has happened is if this central server stores its data in a data base and it uses something like transactions. If this is a relational database say, then we actually have serialization going on and that is these updates are actually applied in a sequential order, in a serial order. That's where serializable comes from and database transactions. Which means they're applied one at a time. So you don't actually have the same concurrency problem as we had with the code editing and with the Word documents being sent back and forth. Cause actually, there's one primary copy of the data and that lives on the server and that's being updated one transaction at a time in sequence. So in this case, you don't get this conflict resolution problem. So this seems nice but on the other hand you have a problem which is well, what if I don't have signal on my mobile phone right now or if the network is interuppted for some other reason or I can press the button to save and I'll just a spinning, spinning wait indicator and nothing will happen. So here we have this, it's kind of obvious, this problem if you don't have internet connection then you can't reach the central server so you can't store any data. You can't edit the data in anyway. So it doesn't work offline. So these are the problems here. We get the advantage from a central server that we don't have to worry about this concurrency and these different edits happening at the same time and having to merge those together but at the cost of it only works online. So we now we require constant connectivity to the server. Which is not so great. Especially if you're doing things on mobile devices. Another problem is that these requests are synchornous, so when I press the save button, I have to wait until I get the okay back from the server and only then I notice it is actually being saved. So until I get the okay back, I don't know whether the network request actually made it through or not. Maybe if the network is unrealiable. So that also can be a timing problem. Let's see in the document editing case, say in the case of Google Docs, you know you can press one letter and then a second later that letter will appear on the screen for another person who's editing the same document. So there the edits of, the units of editing is a single key stroke. Its not even like a commit or something like that. It's just a single letter and so in that case, if you wanted to send that through a central server than every time you press a letter, you'd have to first send that to the server, wait for it to be saved there, wait for the server to come back to you, say okay, and then you could display the letter on the screen and so you would have to wait that network round trip for every single change to the document. Which is actually what happens exactly if you're editing a file over SSH. So if you're logged into a Linux server and you're using Vim or Emax or something on the server and you type a letter then that edit actually lives on the server. So you've actually got that synchronus round trip. So if you've got a slow one, reliable network, and using SSH is pretty painful, as you've probably experienced. Our final problem with putting everything on a central server is that now there's this single point of failure and people make fun of Github for example, every time Github goes down, people say, oh we've got this nice decentrilized version control system and what do we do? We put everything back in a centralized service. Isn't that great? So the sync can be disrupted. If you worry about denial of surface attacks, for example, or if you worry about blocking, maybe in some countries that don't have such free internet access, if you're critical of the local regime or something like that, actually this kind of blockability is quite a problem. So what we'd like to do is to figure out how to not have these problems of the single central server but at the same time, not have all of the problems of having to do the merges by hand. So coming back to the to do list example, if we think about this having to synchronously having to communicate with a server, that's okay if the server is in the same town but if the server is on the other side of the planet, then this is quite slow because it's simply speed of light takes a while to go all the way round to the other side of the world and back again. So what if we just put data centers in several different places? So let's say when the blue person adds buy milk to the to do list, that request just goes to the blue person's local data center, call it data center one and get saved there, and when the red person adds water plants to the to do list, then that goes to the local data center, too. Those are two different local data centers. If the people are in different locations, and so now that's okay. So each person can get the response from their local data center and now these changes will be propagated asynchronously and so the speed of light going all the way around the planet is still as slow as it was before. So it could happen that two people make these changes without knowing about each other and then you simply don't know which one of these actually came first. Because you know it's not really defined. Did the blue come first or the red one. From the point of view of data center one, first the blue one came and then the red one came in but from the point of view of data center two it was first the red one then the blue one. So if this adding an item to the to do list means just appending it to the end of the list, then which order should these items appear? Now we certainly have to worry about the fact that the data center one might have buying milk first and watering the plants second and the other data center might have them flipped in the other order. Oh dear. Let's think this further. Even without two data centers, we can just make it really extreme and say okay we're not even going to talk to the data center, we're just going to talk to our local storage on the device and I'm going to treat the storage on my mobile phone as a data center though it's the best data center ever because I can't have a network interupption between me and the storage on my phone. So this communication is always going to work. So I can just store something locally and that will be nice and fast and the red user can similarly store something locally and then we can just kind of have some kind of blah blah network stuff to synchronize the changes around and if you think about it, this is exactly the same as the synchronization process that has to happen with the two data centers, exchanging data and this is exactly the same synchronization process that happened with the Git commits being sent via Github or with the Word documents being sent by email. It's all exactly the same, which is you've got some changes which are happening concurrently, independently from each other and those changes are being propagated asynchronously and here, of course, now we're back to this problem of conflict. So take the example of adding two items to the to do list. You can kind of imagine that quite easy to resolve. You just make sure that you somehow decide on a consistent order. Maybe using some Ids or something or some user Ids. You can order them arbtrially but what if one item, one change is deleted to do item and the other actually edits that same to do item. So the red one changes buying milk to buying soy milk and the blue one just deletes the buying milk item and these two changes happen without knowing about each other, so how do we resolve this? Because if you say that the deletion wins. So, okay this item was deleted, the to do list item was deleted so that edit to change buy milk to buy soy milk is just gone because the item that it refers to is no longer there but that means that we've forgotten about the fact that soy milk was added in there and so maybe that's an important fact that we've now lost. On the other hand, we can do it the way other way around. We could say that this buying soy milk wins but when the deletion has been lost and so maybe the deletion is meaningful. So how do we resolve these kind of conflicts? In general, we have concurrent operations that happen, concurrent doesn't mean necessarily at the same moment in time. It just means they happen without knowing about each other and somehow we need to achieve convergence where everybody agrees on the same data at the end, and so people talk about eventual consistency which is kind of the data base term that's often used for describing these types of system where you simply, you can allow different things to happen concurrently and you just want to end up in the same state at the end. The problem with the term eventual consistency is that it's actually very vaguely defined and so people are very imprecise when talking about it and I'd like to break it down into three points, three properties that are a bit more precise. So I'm going to firstly assume eventual delivery and that is we sent messages over the network and we're going to assume that the network is not interuppted forever. So we're going to assume that after some finite amount of time, you get a network connection again and so if you keep retrying then you can get the message through eventually. You kind of have to make this assumption because if you assume that a device can be offline forever, then it can never come in synch with the other devices again. Just by definition of offline. So we're just going to assume that eventually some messages go through but we're not going to make any assumptions about how quick that is. So if I go to Iceland and there's no mobile signal and I'm hiking somewhere up on the top of a volcano for two weeks, then I won't get any messages for two weeks and when I get back to Reykjavik, then I will get some messages when I reach an internet connection again. So that means my message delay is two weeks in that case. That's okay, we're just going to assume that's alright. Secondly, the second part of eventual consistency is making sure that everybody ends up in the same state. That is if we assume that eventually everybody gets all the messages, then if two people have received the same messages, then they should be in the same state. So that means even if they received the messages in a different order, they should end up in the same state and finally what we want is to not lose data. Now this, it seems like a kind of ridiculous point because of course nobody wants to lose data but it is actually quite important to make this explicit. So quite a few data base users use this mode called last writer wins where you say if two people change the data at the same time, we're just going to pick one of them arbitrally based on the time stamp as being the winner and the other ones were just going to throw away and so this is like equivalent if you think about the word document example of, well we've got two people editing the Word document and one person emailed their change to another and the other person is just going to say, well I made changes myself as well and I'm going to declare that my changes are newer than your changes and so I'm just going to ignore your changes, sorry, and so this is not very friendly to people because changes get lost. So let's require not losing data as well. So this is what I mean with eventual consistency and this is the kind of data structure to which we might want to apply it. So I'm going to model this to do list now as a JSON document and you can just imagine it's like a list of to do items which is like each to do item has a title and the flag done, whether it's done or not, true or false and then maybe there's some settings and stuff as well on the side, and so the main data structures that we have here are an ordered list. That is you've got a list of to do items and they have to appear in a certain order and the user specifies this order and also you have maps. So you've got, maybe one adjacent object inside another adjacent object or something like that. Now once we've got these data structures we can then make various changes to them. So as the user interact with the application, they will make changes. For example, they might set watering plants to true. So they say, okay this is now done, I'm going to check the box on my phone and what that does is set this done flag from false to true. So the operation here, the change operation, the edit operation, is assigning a value to a particular field in this JSON document. Another thing that might happen is somebody might edit a string. So change milk, change buy milk to buy soy milk. So by inserting a few letters into that string, another thing that might, people might do is insert a whole new list item of phone mum between buy milk and before watering water and plants. So this is editing the ordered list object to insert a new item. Another thing that people might do is add another key to this map here or they might even delete an entire entry. So the top level to do item, just delete the entire list. Why not? Maybe you have several lists or something like that. So these are the kinds of changes that people can make to these documents and what we want to do is have some way of resolving those changes. So that even if people make these changes concurrently to each other, we end up with eventual consistency. So we can model this document as a tree and we can have some data type annotations on it. So say the top level document is a map and inside it we have a list under the key to do and so on. Won't go into too much detail. So this is actually an algorithm that I developed with a colleague and we wrote this paper about it a few months ago. So if you're interested in that, you can find it online. It's called a Conflict-Free Replicated JSON Datatype. Has anyone here heard of CRDTS before? Okay a few people, cool. So this is an example of a CRDT. If you don't know what a CRDT is, don't worry. This paper is very theoretical . It looks inside, it looks kind of like this. So I'm not actually going to run through all of the operational semantics today, so don't worry about that. I'm just going to kind give the intuition about how the algorithm works and just show kind of some of the curious edge cases that occur there. Some of the stuff that we have to think about when trying to do this. So the hope is that this algorithm would allow people to concurrently edit JSON documents and merge those edits and end up with a sensible result at the end. So one example of a document, we saw the to do list example, another one might be simply a text document. So. So a text document consists of the file name, which is just a string. Consists of characters or of the body, of the actual text. So each individual characters. Like the smallest unit you can edit and like maybe there might be additional stuff for formatting, setting fonts and so on, I'm just going to leave all of that out. Another thing that's quite useful is cursor positions. So if you use Google Docs, you can see where in the document another user is editing right now. That's quite handy to see so that you don't change the same place at the same time while you're online. So you could imagine implementing that as a map from each client has a position. A position is like a location in the document and so that would allow you to keep track of other people's cursors. So I'll just focus on the body characters for now. So the ordered list of characters is what constitutes the content of the document. You might wonder, why this is, why this is a list of strings and not a list of characters. So begin footnotes. It's a list of strings because in Unicode, if you actually represent like one character, the smallest editable unit of the document is not necessarily a single Unicode code point. Because you have things like combining, combining marks, I think they're called for accents, and diacritics and various things or for emoji, I think the skin color annotation is a combining mark and so you end up with a sequence of several Unicode code points that constitutes an character from user points of view. If you're interested in this, look at the Unicode annex number 29. Anyway, end of footnote. This is completely irrelevant. I just wanted to mention that as well. I'll give a little demo of the text editor we implemented because its just kind of fun to make this a bit more interactive. So what I have here is something that doesn't work. Sorry. This is always the thing with demos. Maybe it has to be on the wifi. Okay, let's try again. Okay, good. So. So this is a very basic text editor that we implemented using this data structure, this algorithm that we developed and I've got actually two windows here, left and right, which are both instances of this text editor running and I can say hello to Berlin and it works. So you can see I can type something on one side and it appears on the other side and these two editors are actually communicating via a network connection. So they are two separate processes that are otherwise don't share anything. So they could easily be on two separate computers on other sides of the world, no problem and what I can now do is can I kill this server. So they use a web socket server here to communicate and when I kill the server, what I'm simulating is a network interupption between the two. So now both editors are offline. So this could be because actually it's a server outage or just because the clients have lost their internet connection, it doesn't really matter and so I can keep editing offline. So lets say here. So I'll say, Hello everyone at GOTO Berlin and you see it doesn't appear on the right hand side, because they're offline. So we have these two now and so I'm going to restart the server now and what we want is for all of those changes to be preserved. So make sure, look at that everyone at is still there and hope you're all doing well is still there. So when I restart the server, the editors in the background keep trying to reconnect to the server automatically and keep retrying in the background and eventually they will mange to connect and resychronize and now you see everyone at has been copied from the left to the right and the hope you're all doing well has been copied from the right to the left and we didn't have any like three way merge user interface for this. Like it just did this automatically. Now, let's have a look at how this algorithm actually works because that's kind of interesting. This is basically the same as what Google Docs does, right? So, Google Docs is a bit fussy with its offline support but if you force it into actually doing it, it essentially does the same kind of merge and I can run a bit of how the algorithm in Google Docs work. So imagine you have a document consisting of the letters H E L and O and you label each letter with the index of what position it is. So 0, 1, 2, 3 and on the left hand side we've got the green editor and on the right hand side, we've got the purple editor and each editor now makes an edit to this. So the left hand side inserts a second L character to change it to Hello and the right hand side inserts and exclamation mark at the end, making it H-E-L-O-exclamation mark and so now we want to merge these two changes that have happened concurrently and so in order to do that, we have the server to, this is, in this case, run by Google and these clients send essentially a dif or like an operation recording what the change, what change has been made by the user and sent that operation over to the server and so the left hand side says insert L at position three. So you see 0,1,2,3,4 so we inserted the L so the O has moved from position three to position four and so we inserted that L there and so this side, we have the inserting exclamation mark at the position four and so that describes the change that was made and now server forwards those changes over to the other client and so the green change, inserting the L gets forward by the server over here. Insert L at position three, position three is the right position here. So you end up with Hello exclamation mark. Which is what we wanted. However, if you think about what happens in the other direction. So this insert exclamation mark at position four, if we simply send that through unchanged, what we will get is Hell exclamation mark O because over here, we inserted the letter L at position three so the O moved along from three to four. So really we would need to change that to insert exclamation mark at position five and only then we would get the right outcome of having Hello exclamation mark and so what has to happen here is that this position four needs to be changed to position five because there was concurrently an insert at position three. So the server has to keep track of all of these things going on simultaneously and it has to transform the messages, rewriting this three to four. Some of the transformation happens on the client as well but this algorithm does actually depend on the server to do some work as well and that works. So this algorithm is called operational transformation and it's been around for quite awhile. So it was first discussed in the academic literature back in the 1980s. Although the first paper that presented this algorithm was actually incorrect and they said so in the paper. They said, heres a case in which our algorithm fails. Can someone help us fix it please? And then several people, researchers came along and proposed fixes to it, which worked, and there were several different competing algorithms there. The one that most of the, well done operational transformation based systems are kind of inherit from is called Jupiter from 1995 and that's what like Google Docs and Ethopad is based off and Google Wave which is now Apache Wave. They all use this Jupiter design which is to use a central server that does some of the transformations. Some of the others don't use a central server but instead keep track of some kind of multidimensonal hypercubes of all of the edits happening simultaneously. They start requiring quite a lot of memory, some of those algorithms. Which is why they're not used as often in practice and so this kind of works fine for Google's purposes but remember what we wanted is to be able to work within enter and encryption and so in that case we can't have a server transforming our messages because the server would then have to see the content of the messages and we want what we want is, well we want to avoid one central server because that's a single point of failure and we want to avoid that server seeing the content of our messages. So we want to be able to just forward the messages and this is where CRDT's come in. So CRDT stands for communative, no sorry, conflict free replicated data types which is a bit of a mouthful which is people just say CRDTS and essentially this is a family of data structures where several nodes can concurrently change the data and they could automatically merge and so they've got, as part of the definition of the data structure, they've got merge functions or functions which allow you to apply operations in a different order and still get the same outcome at the end and if we want to model something like a text document like what I had in our editor example, in that case we have an ordered list of characters and so the data type we want here is an ordered list and several different algorithms were, have been proposed to that (mumbles), more recent about in the last 10 years these things have come up and the one that I'll describe now and that our text editor has based off is called RGA, replicated growable array. Which came out of a Korean research group in 2011. So let me show you how this one works, in a nutshell. We start off with the same example document which is H-E-L-O but now instead of giving each letter just an index, 0,1,2,3 I'm going to give each a unique identifier. Each letter has a unique identifier which might be just 0a, 1a, 2a, 3a and we now have the same edits happen on the left side, we insert the letter L and on the right hand side, we insert the exclamation mark and every time we insert a new letter, we have to make up a new identifier for that letter and we're going to have a little rule for how we create new identifiers. So we have, they have to be globally unique and they also have the certain ordering property and so we're going to construct them as follows. Each identifier is a number and a letter and for the number, we chose one greater than the highest number we have so far in the document and so the document so far contains 0 to three as numbers so the next number we're going to pick is four and then we call the left hand one is node A, the right hand one is Node B and so that will be the letter that we use for the identifier and so if we assume that the names of the nodes are unique. So there's no other node called A, there's only one node called A and the, furthermore we assume that these numbers are generated by taking one plus the highest we have. So the numbers per node will always be unique. So that means we generate 4a on the left hand side and 4b on the right hand side. So it's the same four because they're both in the same starting position but they have two different node Ids so we get two different identifiers and so that works and now we can send these things via server or it doesn't even have to be a server that goes via P2P network or anything else you like as well. So I'll just use a server here for simplicity and instead of inserting at the particular position, we're now going to say insert L and we're going to say the idea of the new letter, which is 4a and we're going to say where to insert it by saying after position 2a and so 2a is the first L and so the new L with 4a get's inserted after 2a and on the other side, we insert the exclamation mark after 4, after 3a. 3a is the letter O and so inserting the exclamation mark with ID for b after 3a means put the exclamation mark after the O and so, we can now forward these messages. So insert L with 4a after 2a. 2a here still means the first L, so it's going to put the second L in the right place and give it ID 4a and on the other side, we can take this message and apply that. Insert exclamation mark after 3a. 3a is still the letter O even though we inserted the letter L before it. It hasn't changed the fact that the ID of the letter O is still 3a and so we can put the exclamation mark with 4b after the letter O, after 3a and so we end up with the same document in both places which is very nice. We can now just have some kind of network between these two. It doesn't have to transform the messages in anyway, it just needs to make sure it keeps retrying and eventually the messages get through. That's all we need here. We can encrypt them and it's all very nice. Now there's still a problem with this algorithm, which I wonder if anyone can spot? Yes? Deletion is one thing. Yes I, deletion is actually quite easy. So what we do for deletion is just to set a flag for every ID and say that if it's deleted then the ID actually remains there, secretly hidden, but we're just going to say okay, this O is gone now, so don't show it in the user interface. That's called a tombstone. There's not. Same position, exactly. So what if two people insert at the same position? So let's have a simpler example document here. A,b,c, with Ids 1a, 2a, 3a, and now let's say X and Y get inserted here between A and B. So the left hand inserts X and Y, gives them two new Ids according to the rules that I specified earlier. So 4a and 5a and and so what that means here is insert X after 1a, insert Y after 4a and on the other side, on the right hand side, I'm going to insert P and Q and we'll have the same rule again for assigning Ids so they'll get 4b and 5b as their Ids and they also appear between A and B and now we need to make sure that all of these letters end up in some kind of consistent order on both nodes afterwards and so lets take those operations and apply them to the other side. So insert P with the new Id 4b after 1a. 1a is the first letter A, so we insert P there and then insert Q with letter, with id 5b after 4b. 4b was the letter Q, which we've already inserted, so we can put the, with the letter P. 4b with the P. So the Q with 5b goes after the 4b so we put A, P, Q, X, Y, B, C. So now, okay that's fine. So far now, we need to make sure that on the other side, we're actually going to end up with the same thing. So we need to make sure the X ends up here after the P Q but before the B because we put the X here, then the two would be inconsistent. So, well how do we do this now? Because the message is going to be the same. The message is just going to be, insert X with ID 4a after 1a but if we put it straight after 1a, then the X will go here and we have an inconsistency. So I'm gong to introduce a small algorithm which uses these Ids and that is, look at the Id of the incoming operation here. When you're applying an operation that came in from a different node, in this case that's 4a. So 4a is the incoming operation and we first find the position after which we want to insert. So 1a is here. So we start here, after the A and we look at the next, the ID of the next element in that list. Which is 4b and if that ID here, 4b, is greater than the incoming element, ID 4a, then we're going to skip over it and if it's greater, the next one is greater again, we'll again skip over that and we keep skipping until we find an element that is less than the ID of the incoming operation. So 2a is less than 4a. So in this ordering here, we look at first the number than the letter. So here 4b is greater than 4a because B is greater than A. 5b is definitely greater than 4a because five is greater than four but two is less than four so we know now we have to insert between the 5b and the 2a. Whew and then for the final operation is actually easy. So this insert Y with Id 5a after 4a while 4a is here. So 5a just goes after it, that works, and if you're wondering the skipping here doesn't apply on this side because here the 4b is greater than 4a so we inserted this 4b here, we didn't skip over 4a because 4b is greater than 4a and so this actually works and you kind of have to actually prove it mathematically to convince yourself this really works in all cases but we have proved it and so we actually believed it and so hoorah, we now have an algorithm which will allow us to make these changes concurrently without any transformation, without any coordination and just allow us to end up in the same place. So let's say our document is Hey guys and we decided that is not gender neutral enough, so one person is going to change it to hey everyone and the other person is going to change it to hey folks and what does our editor do in this case now? So I described just now how deletion works and so deletion just means setting a flag on these so if two people are concurrently delete the same letter, they both delete the G of guys, well that deleting it once is the same as deleting twice, it doesn't get anymore deleted through being deleted several times. So those letters just get deleted and the insertion stay because insertion doesn't replace another insertion. So as these two merge, what you're going to end up with is the two insertion concatinated, so it will be hey everyone folks or maybe it will be hey folks everyone. So the order of those two is arbitrary, that just depends on coincidence of how the networking happens to work out and you know there's no order that is any, one order isn't any better than the other. The problem is just that what we have here, everyone folks is not an English word and so you kind of end up with junk in the document. Though this is actually what Google Docs does as well. So at this point I'm willing to just say, okay we'll tell the users that look here, this doesn't pass the spell checker and it maybe can pop up a warning saying hey several people edited the document at the same place. Actually Google Docs doesn't even do this warning. It just leaves humans to spot it by themselves and it seems to work in practice so I think at this point we just say okay, it's good enough. We're not going to try to do a grammatical analyze of the sentences and do the merges automatically so that they obey English grammar because that's just not going to work and people have tried that and it simply doesn't work. So we've got this CRDT, this data structure for all the lists which several people can change at the same time. Described deletion and how that works. There's still a lot of open questions there. So the problem with deletion here is we need to actually remember for a long time afterwards where a particular ID is because we use these Ids as a position in the list like address like a pointer and so we can't just delete, if something gets deleted from the document, we can't then just remove it and forget about it entirely because then some operation would come in and insert after 500 and 3a and we go 500 and 3a, sorry that was deleted. I don't know where we put that insertion. So we have to keep those things which are called tombstones. Also haven't talked about how to do undo, like you know, control zed type of undo or how to reorder the elements in the list. That's kind of interesting or how to make all of these Ids efficient so it doesn't take too much storage. So still a lot of open questions there. I think the basic algorithm is kind of fine but actually putting it into practice is still going to need a bit of work but I want to talk about some other stuff as well, actually so we were talking about these JSON documents earlier and so interesting things happened when several people concurrently change JSON documents that don't happen just in the case of a text document. So let's say we've got here, it's a bit of a contrived example I'm afraid, a map of colors and so this is, imagine this is an actual structure. So it's not like curly brace qoutation mark CO etc as a text document but it's actually the JSON syntax tree. Described here and so one person inserts a new color, the color red, into this map and so you would then end up with a map containing blue and red and another person concurrently decides actually they want to wipe out the entire contents of the map so it sets it to the empty map and then adds green into the map and so how do we now merge these concurrent changes that have happened here? What happens, what's the outcome, what is even the right behavior here? What do we expect? So we can think about it systematically and say, well okay blue, does it contain blue? Well blue was deleted on the right hand side by setting the whole map to empty so I guess blue should not be in the final result cause it was deleted and the left hand side didn't touch it. What about red? Red intially didn't exist but it was inserted on the right, on the left hand side and the right hand side didn't touch red so I guess we need red to be in the final result. What about green? Green was inserted on the right hand side and it was not touched on the left hand side, so yes we want green to be in the final results. So our final expected outcome is to contain red and green but not blue. That seem logical? I guess so. So we can do that and our algorithm does this. What this means here is that we have to keep track of what this setting the map to empty means because if you just take this and say, on the left hand client, you first add the red in and then you take that change that came from the right hand side which is setting the map to empty and if you just apply that naively and say you've got an operation that has sent the map to empty, well then you're going to wipe out the red because the red was added concurrently. So we have to somehow remember what the state of the map was at the time when you set it to empty. So we have here, the thing is known as a causal context or something like that. In databases like react or example, has anyone used react? They have this data types feature, okay. It works very similair to this that you have this little extra bit of information that you pass around in a HTP header and that, the purpose of that is to keep track of what is the state, what was the state of the document at this time when you saw it so that then later you know what the changes need to apply to. So let's look at a different example. Here we've got again an to do list example and we've got buying milk and it is not yet done and so on the left hand side an edit happens and it just deletes buy milk from the list because maybe it's been done or maybe we no longer need milk or whatever, it's just removed from the list. On the right hand side, somebody sets done to true. So taps the button to set buying milk done to true. What happens if we try to merge these two different updates? We can apply the same logic as we have just now with the colors with the nested maps. So with the colors what we said is okay, does it contain each of the different things depending on who edited what? Okay so the title of buying milk, on the left hand side that was deleted because we deleted the entire to do item. So I guess the title should not be in the result. On the right hand side, we didn't touch the title. So that's got to be deleted. What about done false? Well done false was deleted on the left hand side and over written on the right hand side with done true. So I guess done false is out. What about done true? Done true did not appear on the left hand side. On the right hand side, done true was added so we want done true to be in the end result. So if we apply exactly the same reasoning logic what we end up with is a to do item that contains the flag done true but no title and this is kind of not what you expect because like we've applied exactly the same logic but what we've ended up with something is that looks weird and it looks weird because we kind of have this implicit scheme idea we expected to do item to always have a title and the done field but this merge here has essentially tampered with our schema. So what do we do here? Maybe we need a schema to explicitly say exactly what fields something must have but in that case what do we do? So if somebody changes, if one person changes a field within an object that has a schema and another person deletes that entire object, well do we say the deletion wins? So in that case, any changes to stuff within the deleted item are just going to be lost but we said earlier we don't want to lose data so how do we resolve this? Well maybe we say, actually the, if somebody concurrently changes this and sets done to true then we actually going to bring the title back as well. So even though the title was actually deleted on the left hand side, we kind of ressurect this whole deleted item and say, okay it's gonna both the title and one true but then well we forgot about the fact that somebody deleted this item. So it seems like something has got to give here and we don't really know what it is. We could say that maybe overwriting something with an empty map should not have the same semantics as deleting it and re adding it again or maybe it should have the same semantics. So I'm afraid this is going to end on a slightly depressing note which is, like we simply don't know how to expose these kind of concurrently editable data structures to application in a way that is not horrendously computing and so I think that, I think there's a lot of value in having these kind of data structures that you can just merge automatically and not have to worry about writing manual conflict resolution code but at the same time, conccurenacy still is hard, even if you abstract away the concurrent communication and everything, you've still got the problem that somehow you can sometimes end up in these merge situations where's there no one right way of doing it and so if any of you have ideas, I'd love to hear them. Otherwise, we're continuing to work on this and just try different approaches, see what kind of APIs make sense to developers. We could go back to the old bad days and say okay, we're just going to have this, this oh crap scenario, we're just going to let users resolve all of the edits manually. I don't think that's really friendly to users and so I think that like the popularity of things like Google Docs and indeed like, Meld or things like that for conflict resolution get, shows that I actually, people do need a bit of tooling and help in order to, in order to resolve conflict. We could say, okay well just put everything on a central server and serialize everything. That's an option too but again, we've got this problem that you require a network communication all the time and if stuff doesn't work offline, which is also a shame. So I think it is worth working on this problem but it is an open problem. If you're interested in more details on this, I've put, tweeted the slides already so you can find it there and there are links to all of the papers. Heres the second page and here is a third page of references. So our paper is one of those and there are also several others about different CRTDs and different operational transformation functions. So that goes into vast amounts of detail. Here is a book that I've been writing that I've just sent off to the publishers just two days ago. So you an get an early release of this online already. This is, it's not specifically about merging, it's very broad, kind of introduction to the architecture of databases and what do databases do under the hood, also. So if you're interested in this sort of thing, I'd love if you could check it out. Maybe give me any feedback if you have any thoughts about it. That would be wonderful. Thank you very much all for coming and I hope you have a great rest of the conference. (applause) Do we have a minute or two for questions? I'm a bit short of time. We do? Okay, does anyone have any questions or comments or anything? We have some on the app. That's alright. So the question is, "Apart from the operational transformation algorithm, is there any algorithm which uses weighted operators to decide the last operation? For instance, the delete operation has the lowest impact for that loses. In terms of weight maybe, neural network probabilities making decision faster?" So you can used weighting except that doesn't fundamentally resolve the problem. Which is that you can end up then in some kind of scenarios where you've just got an impasse and you've got two things with the same weighting and one of them has got to win and so you can then arbitraily decide whether one thing should win over another and so that's what I was getting at with kind of these trade offs here of like, you could specify maybe in a schema maybe some kind of semantic annotation saying we want delete to win or we want update to win. The problem there is that it's really hard, I think, to communicate that to developers of like what does that actually mean? If you don't have PHD in distributed systems, will you still be able to understand what this flag in your schema actually means? So I think just making stuff comphrensiable in a way that like, somebody can just go and build an app and they don't have to know about all of the internal details of how conflict resolved internally. I think would be very good and so maybe priorities is one way of doing that but I'm not sure. The other question just there was, "How did I make my slides?" Which is using an iPad, I draw them by hand using a iPad app called Paper by a company called 53. - [Man] I think you should wrap up. - Okay, thank you very much for coming. (applause)
B1 server document data insert hand side list GOTO 2016 • Conflict Resolution for Eventual Consistency • Martin Kleppmann 41 12 colin posted on 2017/04/26 More Share Save Report Video vocabulary