Placeholder Image

Subtitles section Play video

  • Okay.

  • Aloha, Hawaii.

  • I'm Dave Aronson, the T rex of Code O Saurus LLC, And I flew over here on my pterodactyl that teach you howto kill mutants.

  • So what are those in our universe?

  • That of software development, not comic books.

  • There's something out of mutation testing.

  • So what on infinite Earths is mutation testing?

  • You might look at the name and think it's a way to test the mutations used in genetic algorithms or something.

  • But no, it's actually way to test both our code and our unit test suite by using mutations.

  • It's most unusual benefit is to help ensure that our tests are strict by finding the gaps that let our code get away with unintended behavior.

  • Lack of this strictness usually comes from lack of tests, of course, or poorly written or tests that were poorly maintained and just didn't keep pace with changes in the code.

  • Speaking of the code, mutation testing can also help insure that our code is meaningful, by which I mean that any tiny little change to it have a noticeable effect on its behavior.

  • Lack of meaning usually comes from code being redundant or unreachable or otherwise not.

  • Having any real effect.

  • Mutation testing puts these two together by checking first, that some tiny little change did indeed have a noticeable effect.

  • And then, at our unit test sweet spots that change and fails.

  • Not all the tests have to fail, but each change should make at least one unit test fail.

  • Now that's the plus side.

  • But as Fred Brookes told us back in 86 multi reminded us the other day, there's no silver bullet besides which there for killing where wolves, not mutants.

  • The first drawback is that it's rather hard labor on the CPU and therefore usually rather slow.

  • We don't want a mutation test our entire code based on every save, maybe over a lunch break for a small system or overnight or over a weekend for a larger one.

  • Fortunately, most of the tools do include an incremental mode, so we can test just what we changed since last time that maybe we can do on every safe if we save early save often like we know we're supposed to.

  • But don't always.

  • Another drawback is that it's not at all the beginner friendly technique.

  • It tells us that some particular change to the code made no difference to our test results.

  • But what does that mean?

  • You'll find that interpreting the results of mutation testing involves a lot of asking.

  • So what does that mean?

  • Recursive lee?

  • It takes a lot of interpretation to figure out what a mutant is trying to tell us.

  • They're almost as incoherent a zombies, but with a much larger vocabulary.

  • They're not always on about broods.

  • They're usually trying to tell us that our code is meaningless or our tests are lax or, of course, both.

  • But it can take a lot of time and effort to figure out exactly how and sometimes it's a false alarm.

  • For instance, the mutant might not have made any test fail, but it might not have made any actual difference in the codes behavior, either, at least not in anything we actually care about.

  • So those are the main pros and cons.

  • But what does mutation testing actually do?

  • It mutates.

  • Hence the name copies of our code to create test failures, otherwise known as faults.

  • So mutation testing is a fault based software testing technique, and this means it's sort of related to something you might already be familiar with chaos.

  • Monkey from Netflix.

  • Just like chaos, Monkey uses faults to help Netflix discover flaws in their ever recovery process.

  • Mutation testing uses false who help us discover flaws in our code and our unit test suite.

  • But the way this works is kind of upside down from what chaos monkey does.

  • Chaos Monkey is best known for injecting faults into Netflix's production network.

  • If the customers don't notice and the metrics still look fine, then Netflix knows that there ever recovery is working great.

  • But mutation testing injects instead not faults but changes.

  • It doesn't know whether these changes will produce.

  • False or not, we hope they all will.

  • But that's actually up to the unit test suite.

  • It injects them not into our actual network but copies of our code.

  • And it does this in our best environment, not production.

  • Oh, and if everything's still passes, that doesn't mean everything is okay.

  • That's when we have a problem.

  • Remember, each change should make a least one unit test fail.

  • So how does it do all of this?

  • Let's peel back one layer of the onion and look at it from a high level view first our chosen tool breaks our code apart into pieces that test.

  • Usually these are going to be our functions for each function.

  • It then tries to find the tests because if there aren't any tests, well would be kind of redundant, too impossible to run its tests.

  • Assuming it finds any, then the tool makes mutants out of that particular function.

  • And to do that, it looks closely at the function to see how it can be changed and for each way that function can be changed.

  • The tool makes one mutant with just that one change in it after the tools done creating all the mutants it can from a given function and it it a rates over the list.

  • And now we get to the heart of the matter for each mutant made from a given function, the tool will run the unit tests from the original function, but using a mutant instead.

  • And if one of those unit tests fails, this is called killing the mutant quick side note.

  • Some people object to the violent nature of this metaphor, especially since in the comic books, mutants are often used as a metaphor for marginalized people.

  • I'm trying to come up with a better term, like maybe cover or rescue or some such.

  • I couldn't come up with something good in time to change this talk well anyway.

  • Killing the mutant means that the tiny little change that the tool made in order to create the mutant it indeed have a noticeable effect on the behavior.

  • And our test suite was strict enough to notice that change and fail.

  • At least one test after one test has failed.

  • The tool will stop running a test against that mutant and move on to the next one.

  • We don't care how many Maur unit tests that mutant might make.

  • Like so much else in computer science, we only care about ones and zeros.

  • But if a mutant let's all the functions unit tests pass, then it has the superpower of mimic re skilled enough to fool our tests.

  • This usually means that our code is meaningless or our tests are lax or both, and now it's up to us to figure out exactly how.

  • Now let's peel back another layer of the onion and look at some technical details of how this works.

  • First, our tool parses our code usually into an abstract syntax tree or a S T.

  • I think those were mentioned on Wednesday.

  • I know these boxes are a little small to read, but we don't need to actually understand this.

  • A s t in detail after our tool makes an a S t out of our code and it traverse.

  • Is that tree looking for sub trees that represent our functions after finding them?

  • It handles them pretty much like I described earlier, starting with looking for the tests.

  • But how does it do that?

  • This usually relies on us developers either annotating our code or following some kind of naming convention.

  • This is sometimes supplemented or even replaced by the tool.

  • Looking at what functions each of the tests call next.

  • The tool makes the mutants and to make them from an A s T sub tree it then traverse is the sub tree, just like it did to the whole thing.

  • But this time it's not looking for even smaller sub trees, twigs or something to extract, but nodes where it can change.

  • Something may be substituted different kind of node or whatever.

  • For instance, suppose our tool was traversing the S T.

  • I showed earlier, and it's gotten down to this not equals comparison, following those red lines each way it can change that.

  • Knowed it will make a fresh copy of that entire sub tree with just that.

  • No changed in that one way after it's done, making as many copies, which are the mutants.

  • Bye mutating that node.

  • It'll continue traversing this sub tree and do likewise to the rest of the nodes.

  • I've been talking about it making changes.

  • So what kind of changes am I talking about?

  • There are quite a lot.

  • It can change a mathematical, logical orbit wise operator from one to another.

  • In cases where you can do one of the totally different category, it can even do that.

  • For instance, in Java script, we can drink anything Asbo Leon's so X times Why could become X and y.

  • You could swap the order of operandi.

  • In cases where that matters, you can change a comparison from one to another.

  • He could insert or remove a logical negation.

  • It can remove a condition or a loop.

  • It could scramble or truncate the argument lists of function calls or function declarations.

  • It can replace the functions entire contents with turning a constant returning either the arguments, deliberately raising and ever all kinds of things, even nothing at all.

  • If the language permits and Java script does, it could change.

  • Eh, literal Laurie variable or an expression or a function call to some other value, even one of a completely different type, like changing a number two.

  • Uh, if I may quote Smeagol String or nothing, there are many, many more.

  • But I trust you get the idea.

  • From here on, there are no low level details I want to add.

  • So let's look at some examples.

  • We'll start with an easy one.

  • Suppose we have a function like this?

  • Think about what a mutant made from this but return.

  • Because that's what our unit tester almost certainly looking at.

  • This doesn't seem to have any side effects.

  • Mainly such a mutant could return results such as any of these and many, many more.

  • But I had to stop somewhere.

  • Now, suppose we have one test like so, yes, this is a rather poor test, But even so, the vast majority of these mutants mutants that would return these results would still would still get killed by this test one's shown here in crossed out green.

  • But addition, multiplication and exponentially ation in the reverse order, all still get us the correct answer and would therefore survived the test.

  • We know this because when we run our tool, that usually gives us a report that will look kind of sort of like this, the exact format and amount of seek, uh, context and so forth.

  • Well, very enormously, depending exactly what tool we're using.

  • But the information should be pretty much the same.

  • And that is that if we change the function called power, which is in file demo dot Js at line 42 in any of four different ways, all its unit tests would still pass.

  • And then that those four ways are to change Line 42 to swap the arguments line for you, Rita changed the exponentially ation into addition or multiplication or line 43 to swap the operations.

  • So what is this set of surviving mutants trying to tell us?

  • Good.

  • Start to figuring that out is to ask ourselves, how are these mutants surviving?

  • And the usual answer is well, they give the same result or have the same side effect as our original code to determine how that happens.

  • A good start is to look at one mutant along with one estimated passes.

  • So let's start with the plus mutant.

  • Looking at this in combination with the test makes it pretty clear that this particular mutant survives because two plus two is the same as two to the second power.

  • So in order to kill this mutant, cover it, rescue it.

  • Whatever.

  • We need to have a least one test that uses inputs such that X Plus Y is not the same as X to the why we can either add a test or tweak our existing test.

  • Two.

  • Maybe something like this two plus four is six, which is not 16.

  • So this will kill the plus mutant.

  • Better yet, two times four is eight, also not 16.

  • So as a side benefit, it kills the Times mutant as well.

  • But the argument swapping mutants still survive.

  • But that's okay.

  • We don't need to be a superhero about it and kill them all at once.

  • We can attack them separately.

  • To do that, we can once again either added test or tweak our existing test, maybe to something like this who three squared is nine, which is not eight.

  • So that will kill the argument swapping mutants, but also two plus three is 52 times three is six.

  • Both of those air, not eight.

  • So the other mutants stay dead so we don't get any zombie mutants with these inputs.

  • The correct operation is the only simple common one that will get us the right answer.

  • This may make it may make mutation testing sound pretty simple, but this was a downright trivial example.

  • So we could easily think up asked inputs to make pretty much any reasonable mutant Give different results from the original.

  • We could use Princeton's three to the fifth or vice versa.

  • Whatever.

  • Lots of ways to skin that Florrick in.

  • So let's look any more complex example.

  • Suppose we have a function like this Send message uses send bites to send a CZ Many bites us and bites can handle over and over picking up where it left off last time until the messages all sent a fairly common pattern.

  • Now, a mutation testing tool could make lots and lots of mutants out of this, but the one I want to show you is this.

  • It's an example of removing a loop control by deleting those two lines with minus signs there.

  • Well, suppose that this mutant survives our test suite, which consists mainly of this.

  • There's a little more than I'm not going to show you quite yet, dealing with setting the size and creating the message.

  • But even just the fact that this mutant survives tells us something.

  • And that is that if a mutant that only goes through this loop body once makes has the same result as the original code does in the tests, and our tests are only making the code go through the loop body once.

  • So what does that mean?

  • It means that we're only testing, sending a message small enough that send bites can handle it in one chunk.

  • The most likely cause of that, in turn, is that we're just not sending a big enough message to make it go through Body 20 or more, such as in this case, we might have a maximum chunk size.

  • What send bites can handle in one shot of 10 fouls invites, yet were only testing with a tiny little three bite message, and they're the fix is pretty clear.

  • We can just instruct a larger message.

  • Take the size, add one.

  • Make that big a message on.

  • There we go.

  • But maybe it's not this particular cause.

  • Maybe we did best with the largest permissible message out of a pre defined set of messages, or at least message sizes.

  • For instance.

  • Here we have small and large sizes, and we did test with a large and yet the mutant survives.

  • In other words, we're still only going through that loop body wants, because send bites is handling the whole message in one chunk.

  • So what's the mutant trying to tell us now?

  • Now it's trying to tell us that a version of send bites with the looping removed will do the job just fine.

  • And if we do that and then remove everything else that that makes redundant, you wind up with this.

  • And now it's pretty clear that the ultimate message is that the whole send bites excuse me, send message function may well be redundant.

  • Now I say May will be rather than is just because in real world code, there may be some logging and ever handling and whatnot.

  • We need to put in send message, but at the very least, looping was redundant.

  • Fortunately, when it's this kind of case, the solution is clear and easy.

  • Just get rid of all the extra stuff the mutant didn't have.

  • That will also make our code more maintainable by getting rid of unnecessary Croft.

  • So to summarize, mutation testing is a powerful technique to ensure that our code is meaningful in our tests are strict, but it's not so easy.

  • One more thing that is easy.

  • It's easy to get started with in terms of setting up the tools and, if need be, annotating our tests, which may be tedious, but at least it should be easy.

  • But it's not so easy to interpret the results.

  • Nor is it easy on the CPU, even if these drawbacks mean that it might not be appropriate for our current projects.

  • Right now, I still think it's just a really cool idea in a geeky kind of way.

  • If you'd like to try a mutation testing for yourself, here's a list of one's for popular languages and platforms and some other ones I doubt many of you are doing for trained 77 these days.

  • As for Java script.

  • The only one I know of is Striker.

  • There used to be one that was a plug in for the grunt task runner, but that project has folded in its code has been migrated into striker, but he still needs some time to take pictures.

  • I saw some cameras up there.

  • Okay, lastly, a couple shout outs.

  • 1st 2 Top Towel.

  • A consulting network I'm in whose speakers network helped me prepare in practice this presentation.

  • These use that referral link.

  • If you want the iris, sir, join us.

  • And secondly, to Marcus Sharp, creator of Mutant, a mutation dusting tool for Ruby Main.

  • When I've actually used, he's been very willing to answer my ignorant questions there being no such thing as stupid.

  • Questions on DDE critique this presentation, at least the longer forms is the shortest one I've done so far.

  • And if you have any questions, well, we're not supposed to do Q and A.

Okay.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it