Placeholder Image

Subtitles section Play video

  • (gentle instrumental music)

  • - Hi.

  • I'm Praveena.

  • Just the logistics,

  • make sure you rate the session and please do,

  • if you have any questions,

  • please do put in there as well.

  • I'm really interested to hear what you have to say

  • or what have you observed and things like that.

  • I know that Sam has already given me an introduction,

  • but I thought I'll just do it again.

  • I'm a software engineer at Neo Technology

  • or like how my Swedish colleagues would say, Neo4j.

  • I want to talk about microservices.

  • But before I proceed to present my case study,

  • I just wanted to understand,

  • to get a bearing on the audience

  • so that I don't use a ton of jargons

  • and I don't explain things

  • that this audience already knows

  • so it'll be easier to skip through the parts,

  • the obvious ones at least.

  • How many of you have worked with microservices?

  • Okay.

  • That's good.

  • How many of you are developers?

  • Okay.

  • How many of you work like in agile

  • methodologies?

  • How many of you have use like ...

  • Good.

  • That makes things really simple.

  • Microservices is a software architecture style

  • where you compose complex applications

  • through small independent processes.

  • When you look at microservices,

  • almost all major conferences around the world

  • have a separate track dedicated for microservices.

  • It's not just in terms of software architecture style,

  • even like specific

  • language conferences have specific tracks on microservices.

  • So it's been gaining a ton of momentum in the recent years.

  • The places which

  • use microservices are

  • companies like Pinterest, Twitter, Halo, Uber,

  • and ton of these things which I already use

  • and I'm sure like many of you are end customers

  • of these services.

  • Netflix at least, come on.

  • Who isn't a customer of Netflix?

  • When you look at these companies,

  • you see that all of them have been there for

  • maybe maximum of decade, you could say.

  • They have lot of young engineers working there.

  • In terms of legacy, they don't have

  • much to carry it around in some sense.

  • The things that microservices brings in are like ...

  • The kind of small gifts that comes along

  • with using microservices are

  • the ability to do quick deployments,

  • you can release something to production

  • and then look at the feedback

  • and try and adapt to the feedback

  • as quickly as possible,

  • and you can scale up and you can scale down services

  • however you like based on how your user consumption is.

  • It's very, very easy to enhance features

  • or to add features.

  • The part from when you start development

  • to when it is deployed is already used so rapidly.

  • These are the kind of small gifts

  • that comes along with microservices.

  • But what about age old systems?

  • I worked in a place as a consultant where

  • we were in an environment which had

  • Microsoft servers behind

  • this huge like, what do you call it, architecture,

  • and we don't know what was happening there.

  • But what we knew was that

  • the customers were kind of seeing patterns where they saw

  • new startups which came in to their own space,

  • their own publication domain

  • where they were able to quickly add features,

  • the ones that they have been trying to add for years,

  • and then eating into their user market.

  • They were in a position where they were still

  • like market pioneers

  • but they slowly saw their user base

  • being corroded away by tons of new startups in their space.

  • They really wanted to change how they work.

  • They wanted to change how they deliver software

  • because ultimately that's what like,

  • that's the interface to their end users.

  • This web page that they were like ...

  • That is what they were serving to the end users.

  • I was,

  • I mean, not just me,

  • my team were dealing with a 20-year old system

  • which was written in C

  • and the team did not have any business analyst

  • at the beginning.

  • We were basically given a part of the codebase,

  • not the entire one, a part of the codebase.

  • We were told like, "This is the thing.

  • "You don't have any business

  • "unless you have to go ahead and re-engineer code."

  • We don't know how things work or why things work

  • but we want it to be exactly the same

  • because we don't want to lose any user base

  • because some features are not there

  • which they were used to before.

  • The ultimate goal that they wanted to have

  • was also they were moving a development,

  • their development center from across continents.

  • They wanted to say, "We've got a new development center

  • "and we are going to transform how this thing works,

  • "and we want to prove the success."

  • Remember the good old days when you were taking three months

  • to just release, not develop, just release.

  • We are going to like, we're done with that.

  • Now, what we want you to have is basically write code

  • and then release it maximum in a week.

  • That's the target that they were looking at.

  • With any business critical application,

  • you always have tight deadlines which was the case

  • in our system as well.

  • When we looked at,

  • when we saw how people are tackling problems like these,

  • we saw microservices as like the future,

  • this

  • great world

  • where it answered every single problem that we were having.

  • We wanted to release quickly.

  • We wanted to ...

  • Gauge? Gouge?

  • - Gauge. - Gauge, yeah.

  • Gauge customer feedback.

  • Yeah, basically.

  • When we looked at microservices,

  • we felt that it's the future.

  • Like, yes, that's the answer.

  • That's where we want to go to.

  • But it also seemed like an Utopian mystery to me

  • because when you look at ...

  • Just to get some basic definitions.

  • Utopia is like an imagined place where

  • everyone exactly knows what they're supposed to do

  • and everyone have a purpose in that world.

  • A mystery is something that

  • I mean it just explains what it is.

  • I mean, it's impossible to understand or explain it.

  • When I was given this project

  • and I was looking at microservices,

  • that's exactly how I felt.

  • I felt like

  • all of this is great,

  • but it's used by companies which are there for 10 years

  • and they basically know what they are doing

  • and they know their software well.

  • Whereas we were in that place.

  • It looked a lot like Utopian mystery to us

  • but we still decided to steer on

  • because we were like, "We need to start delivering

  • "and let's just do something and see how it works out."

  • We decided to use microservices for our application.

  • We initially thought that

  • our monolithic application is going to be like that,

  • this

  • beautiful

  • multi-tier, tire?

  • Tier.

  • This is great audience.

  • Thank you.

  • Anyways, multi-tier cake.

  • We were all balancing it greatly.

  • It's going to be like when we change it into microservices,

  • it's going to look like that where you have like

  • the same tier of architecture

  • but you have smaller cupcakes

  • where it's divided in really small tiny nuggets

  • and you can just go ahead and bite on to it.

  • Reality was (chuckles) much like this,

  • like literally.

  • I mean, not literally, figuratively it is like this.

  • I don't want to be ...

  • It was like this.

  • It isn't an easy thing

  • when you're dealing with a big pile of shit

  • and now you have to deal with 10 piles of little poo.

  • It's not like ...

  • Oh, my earring.

  • Sorry.

  • Anyways, it's not an easy,

  • it's not an easy problem to solve.

  • These are basically like ...

  • Oh, I ran a bit there.

  • These are are basically 10 lessons

  • and if I can, I can talk in length

  • about each one of these and how important this is,

  • but I'm just giving you

  • a big gloss over

  • what I learned by using microservices.

  • The first lesson that we learned from doing things wrong was

  • we have to keep it small.

  • What do you mean by small?

  • What does micro in microservices mean?

  • Is it the number of services that you have in production?

  • Which

  • I'm gonna say the answer is no, it's not that.

  • Why is it important to have small services?

  • It is important to have small services

  • so that you can basically rewrite the entire service

  • if you want.

  • You can measure

  • the size of your service by answering this question

  • which is, how long does it take to rewrite

  • your entire service?

  • The ideal answer,

  • ideal

  • answer for that should be in the order of two weeks.

  • You can say, "Hang on a minute Praveena,

  • "I thought you were saying that you're talking about

  • "20-year old system,

  • "and whatever you're saying right now,

  • "it just looks like hipster talk to me.

  • "Why?

  • "Why would you ever have to rewrite an entire service?

  • "I think you're lying," or something like that.

  • But there is actually a reason

  • why you want the service to be able to like,

  • to have the ability to be

  • rewritten basically.

  • It is because so that you could

  • work on a story and you should be able to

  • deploy it quick to production.

  • So, how long does your story take

  • from development to deployment?

  • Ideally, what you would want

  • is when you put a story in,

  • when you put a story in,

  • you want it to be like a smooth ride across this lane.

  • This was our

  • card wall.

  • When you pull a story from analysis into ready for dev,

  • what you want is

  • you want a smooth ride across all this lane

  • and you have to come to the ready for prod.

  • That's your ideal situation

  • and you should be able to glide on that

  • like you're riding on a cycle

  • which is like on a really smooth plane.

  • But if you're working with a service which is massive,

  • then it's going to be an experience as though

  • you're riding with this

  • bicycle.

  • Because it's not going to be a pleasant experience

  • when you take a story, do the development,

  • and then push it into test,

  • and then wait because some other story is

  • having a dependency on the one that you're working on.

  • You will run into problems where

  • you face questions like,

  • I worked on a story to improve

  • the query performance and there is a UI bug, and why is ...

  • These are completely unrelated things,

  • why is my story blocked because of a UI bug?

  • I don't see any point in blocking my story.

  • If you have a service which is small enough

  • and it has one responsibility

  • and it knows and it does just one thing,

  • then it's very, very simple for you to be able

  • to make changes to that service

  • and be able to deploy it quickly.

  • Remember, one of the important things that we had

  • in our application was that

  • it took months to see any changes deployed to production.

  • And this is what we started to ...

  • This is what we wanted to avoid

  • is first to have a quick deployment to production.

  • This was very important for us

  • to keep our services small.

  • The next one that I ...

  • I'm sorry.

  • The important thing to realize here is that

  • you have smaller codebases that leads to

  • really small context to change

  • which means that it helps you to do autonomous delivery.

  • Small isn't just beautiful,

  • it's really practical for you to deploy small services

  • in a very big environment.

  • You have to remember why this is very, very important.

  • Lesson two is to focus on autonomy

  • from design to deployment.

  • Autonomy is basically the right of self-governance.

  • How does autonomy look like

  • in a microservices environment?

  • You would want, ideally when you're doing a deployment,

  • you basically say that like,

  • "I have developed

  • "and I was paring or not paring, it's fine.

  • "I'm done with development

  • "and testing is over on that story.

  • "Now I want to go and deploy."

  • What you want to do is just push a button

  • and it's deployed

  • or it gets deployed automatically.

  • You basically have absolutely no choreographed deploy- ...

  • You don't have to do any choreography at all.

  • You don't have to talk to people.

  • You don't have to talk to other services.

  • That's what you want.

  • Here is where something to kick arms and leg.

  • Earlier I was talking about how

  • your service should be small.

  • If your service is small enough,

  • it definitely would have dependencies that

  • it needs to make its work happen.

  • Then in that case, how can you just deploy your service

  • without choreography on its dependent services?

  • Because more often than not

  • when you're working on a solid piece of work,

  • you do have to touch multiple services.

  • You can again say that this is one of those things

  • where people say one thing

  • but the reality is actually something else

  • which was in our case.

  • One thing that we (chuckles)

  • finally succumbed to was basically on saying that,

  • "There's always going to be dependencies,

  • "and we just have to deal with it."

  • That was a important turning point for us

  • because we tried really hard to have,

  • "Oh no, no, no, no,

  • "you're not supposed to do choreography at all.

  • "We shouldn't have to do this.

  • "We are doing microservices wrong."

  • The first step to (laughs) ...

  • The first step to requery is acceptance.

  • We accepted that there's always going to be dependencies.

  • Now let's have a conversation

  • on how to deal with those dependencies.

  • What we wanted was no choreography in our deployments,

  • but what we needed

  • was actually easy choreography in deployments.

  • Which is why

  • this is like a user story card.

  • I don't know whether it's really clear,

  • but what you see in this ...

  • (murmurs)

  • What you see in this is like three different services

  • on a sticky note.

  • They run some numbers next to it.

  • In my project,

  • we

  • used RPMs.

  • All our services gets packaged as RPMs

  • and then they are baked into an AMI and installed.

  • What we knew was that

  • when we developed this story,

  • we knew that this one has this ...

  • To implement the story,

  • we had to touch three different services.

  • Those three different services, once the changes were made,

  • were built into that number RPM.

  • We basically marked those RPM versions on that story

  • and said, "Fine, if you're going to ...

  • "If this story ...

  • "If these RPMs are deployed in any environment,

  • "then it means that the story's feature

  • "is already there in that environment."

  • That was one easy way for us

  • to track how the services are dependent is like

  • when you start working on the story,

  • we decided like,

  • these are the services that it's going to touch

  • and these are ...

  • Once the development is done,

  • we mark the service RPMs that it was using and say like,

  • "Fine, if you deploy these three services,

  • "those service, the story is done.

  • "It's basically there in that environment."

  • So we made a conscious decision to think about

  • what our deployment strategy is going to be

  • at the beginning of the story development.

  • When you do an analysis,

  • the first thing that we did was like,

  • the story is going to touch three services,

  • so let's just identify those services

  • and put it on the card.

  • Then when the development is done,

  • you just add the service RPMs there.

  • When you do

  • have

  • easy choreographed deployments,

  • there's always going to be breaking changes.

  • The important thing is to plan for breaking changes.

  • Sometimes when we were thinking about

  • the deployment strategy,

  • we realized that there are going to be breaking changes.

  • So the next thing to think about is how do we ensure

  • that the end user isn't going to be affected

  • when these breaking changes are going to be deployed?

  • How do you avoid breaking changes?

  • We had a bunch of,

  • a set of tools that we were using

  • sometimes in conjunction with other things

  • and sometimes by itself.

  • We used semantic versioning.

  • We ensured that we had a tolerant reader in our APIs.

  • Sometimes we would do lock-step deployments.

  • The one thing that we used extensively was feature toggle,

  • and feature toggle is based on which environment

  • a certain story is in.

  • While a story is in development,

  • it will be feature toggled on on dev environment,

  • but the QA, the staging, the production environment,

  • it would be toggled off.

  • Even when the RPMs actually reached to those environments,

  • until we are able to test

  • and all those necessary checks are done,

  • they're actually not available to the end user.

  • When you have feature toggles

  • and when you ensure that you have semantic versioning

  • tolerant reader and other things,

  • you can try and avoid

  • breaking changes

  • and you can at least plan for contingency measures.

  • When we did have breaking changes,

  • like if you see, this one says here, blocked on

  • some story here which is in analysis.

  • This is one way where we identified like

  • these stories, although they are independent,

  • if we clamp both of them together while in development,

  • it's going to take weeks for it to complete

  • and weeks for it to finish testing.

  • What we would go ahead and do

  • is actually divide that story into sensible pods

  • and then say it's going to be blocked

  • so do not deploy this until the other story is ready.

  • The developers can continue with their development.

  • The testers can continue with that.

  • When these two things are ready,

  • then you just basically turn the feature on.

  • That's one of the important things.

  • When we were able to plan for it

  • at the start of development, it made a lot of sense to us.

  • Another important thing that

  • microservices brings along is actually

  • having a heterogeneous architecture.

  • This is one of the things that people talk repeatedly

  • that you can explore to your benefit to say that

  • you can choose the tools that works for you.

  • But when you work with old systems,

  • you have to be really careful on what you're doing.

  • As a consultant,

  • I have the responsibility to ensure that

  • I'm not choosing tools

  • that my customers are not able to

  • maintain it at the end of it

  • like when I'm out of the door.

  • There comes a certain responsibility

  • when you're deciding what you have to do.

  • In our case, we chose things that

  • we knew that is going to add long term

  • benefits to customers.

  • They were done in really simple languages rather than like

  • ML or like

  • Clojure, dare I say? (laughs)

  • We used ChatOps.

  • We used things like Qbot

  • to make sure our deployment ...

  • You can basically ask Qbot which is like a JavaScript

  • ChatOps that you can add it to HipChat

  • and Slack as well I suppose.

  • It will tell you what's the status of your deployment

  • and you can tell like

  • notify me when this service is deployed.

  • So you can just go ahead and do your stuff.

  • Go grab a coffee or something like that

  • and it will tell you basically

  • when you're deployment is done.

  • Other things that we used were Kibana dashboards

  • which is again built on

  • ELK Stack.

  • You know it's saved in JSON.

  • People were able to interface with it

  • and it's easier to learn as well.

  • We were able to

  • quickly

  • fix.

  • We had the ability to add really small cards

  • which would go through all the lanes really quickly.

  • If we know that

  • we are getting user feedback on a certain thing.

  • That was like,

  • this is going to add long term value

  • but it's like a really short thing.

  • Can we just quickly do this?

  • This was not just for the end user.

  • This was also for things that

  • AppSec people needed

  • or things that the DevOps people needed.

  • We were able to quickly prioritize stories on those ends.

  • Other things that we did was like

  • automating support scripts.

  • ChatOps was using JavaScript.

  • For support scripts, we realized like,

  • so I was working on a Java application.

  • Now, Java and scripting doesn't really mix

  • so we decided we will do,

  • we will use Python scripts

  • because our infrastructure was an Ansible

  • so we thought Python runs really good with Ansible.

  • We wrote really small scripts

  • whenever systems would go down.

  • If a support engineer were to look at it,

  • how would they do it really quickly?

  • We wrote these really small scripts.

  • The next thing we did was automate those support scripts.

  • Without even the intervention

  • of a support engineer,

  • the minute something goes down,

  • it shows certain status automatically.

  • These are just examples of things that you can use even in

  • age old systems

  • where it's really hard to incorporate things.

  • Lesson four

  • was to pay attention

  • to the bounded context.

  • I think, I really do think,

  • no microservices talk gets complete

  • without mentioning bounded context

  • because it's such an important thing to consider

  • and it's very, very easy to miss as well.

  • Bounded context.

  • This is from Martin Fowler's blog

  • and Eric Evans on his

  • Domain-Driven Design book

  • talks extensively about this.

  • In simple terms, bounded context is like

  • you have in any application,

  • you have multiple context.

  • These multiple context

  • have models which are named exactly the same.

  • For example in a support context, you have customer,

  • and you also have a customer in a sales context,

  • but although these models are named the same

  • and they kind of indicate the same person,

  • the operations that that person has is completely different.

  • This is actually

  • a reflection

  • of humans

  • in reality.

  • I am here as a speaker but I'm also an employee

  • at what do you call it?

  • At Neo Technology.

  • This happens.

  • This happens quite a lot.

  • People sometimes

  • misinterpret

  • this and they try and

  • share operations and models between these two contexts.

  • That's a big no, please don't do that.

  • We were bitten really hard by that.

  • I'll show you how quickly.

  • We had two services, authentication service and the web app.

  • When the web app is trying to authenticate,

  • it uses the authentication service

  • returns back a JSON response

  • with the auth token username and some user details,

  • let's say age for instance.

  • What the web app needed was

  • a response like this that didn't need the auth token.

  • But if I were to

  • impart this inside authentication service,

  • then it would mean that authentication service

  • would have to return this response.

  • Which is what we did, honestly,

  • which is what we did.

  • Which meant like anytime web app had to change,

  • we had to change what authentication service was returning,

  • which was a big mistake on our part.

  • What we should have done instead is

  • made this transformation inside web app

  • and turned off authentication service.

  • So authentication service always gives this one response

  • and its consumers interpret it in different ways

  • based on what they want.

  • Lesson five was choose what works for you

  • and document your reasons.

  • I'm a software developer

  • and the last thing that I want to do

  • is write pages of Confluence document.

  • But it was very important for us to do this

  • one important step.

  • Because developers like to add,

  • always improve the system that they're working on,

  • which meant that we were having

  • the same conversations over and over again about

  • why are we doing this

  • and this shouldn't be this way kind of thing.

  • Which is fair.

  • I mean, I would do that in anything that I am in.

  • But what it meant was that people were getting tired

  • of explaining this again and again

  • and try and see what was happening.

  • One of the things that we did was

  • when we take a decision on something,

  • just understand the context why

  • a certain decision was taken

  • and just put it up somewhere,

  • in the mail or somewhere

  • so that it's communicated to the entire team

  • about what we are doing and why we decided to do this,

  • and re-evaluate it

  • anytime where one of those context, things changes.

  • So you don't end up having the same discussion

  • over and over again,

  • but you do have the discussion where it matters.

  • When your context in which you decided a certain,

  • when you came up at a certain decision, has changed,

  • then your decision no longer applies

  • because your context has changed.

  • We ensure that certain ...

  • We imparted certain things like that.

  • An example of this was

  • understanding where our constraints

  • and where our principles came from.

  • An example of principle was

  • we decided that we will do our validations

  • at every service level

  • which meant the validations were duplicated

  • but we were fine.

  • We had good reasons why the validations

  • should be duplicated,

  • which I'll probably cover in a short while.

  • The important thing was around constraints.

  • There are certain things which you can't change

  • in your project constraint.

  • For instance, we had ...

  • I did tell you that I was working on a Java application

  • and we had Python scripts.

  • When we came to use a journey test,

  • we started with Ruby.

  • At that point, our clients went like, "No.

  • "We can't do one more language into the port.

  • "Can you please just pick up something else?

  • "Either Python or Java to do this?"

  • So we did, like after doing the user journey test in Ruby,

  • we decided to change the user journey test using Java

  • because Cucumber has a Java API as well.

  • Anyways, we decided to do that.

  • It made a lot of sense at that point why we had to do it

  • because it was from a constraint which

  • we had no control over

  • and there's no point in discussing this

  • over and over again whether

  • Ruby is better or Java is better

  • when your clients says to choose Java.

  • About validations.

  • For principles, we decided to ...

  • An example of this is to

  • the acceptance that it is okay to duplicate validations.

  • The important thing

  • when I first became a developer

  • in my training that was taught

  • over and over again was DRY,

  • do not repeat yourselves.

  • Why would I want to duplicate validations?

  • Because that's one of the tenets of software development.

  • You never repeat yourself.

  • But it comes from a greater understanding

  • where you understand that like, okay fine,

  • there are certain duplications which you never want to do.

  • But then there are certain duplications

  • which you should be doing.

  • An example of this one would be say validations

  • where you have your client-side validations in JavaScript

  • when a user is entering something

  • and you say you match an email or something like that

  • and you say like, "Well, this is not a valid email address."

  • That's something that you would do in JavaScript.

  • Whereas when you're trying to save something

  • in your database,

  • that's a completely different kind of validation

  • which you should be doing which is,

  • does it have any SQL injection in it?

  • Does it have Bobby Tables

  • and things like that?

  • These validations serve two different purposes.

  • When it's done in the client side

  • versus when it's done on the server side.

  • It is a good thing

  • to ensure that you do validations whenever it is necessary.

  • In some cases where we had to duplicate validations,

  • we needed to check this over and over again.

  • There was this tension between like,

  • do we make shared libraries or shared clients

  • or do we

  • just

  • accept everything that comes by?

  • In our case,

  • we decided to do both.

  • One of the services which I was taking care of,

  • the service itself published a client

  • for any of its consumers to connect to the service.

  • The client had all the validations that

  • the service would do.

  • If you were to use the client,

  • then you can ensure that all the validations are done.

  • There were times where you don't want to

  • use shared client,

  • but you want to use instead a shared library.

  • An example where we use a shared library

  • which is a very good layer of abstraction was

  • managing negative TTL caching.

  • So we used ELBs.

  • When your ELB switch is being made

  • like your service endpoint switch is being made,

  • at that point, there are times where

  • your negative TTL caching affects

  • the service discovery.

  • When we had to abstract that,

  • so that was a very important thing

  • for all the services

  • to basically have it done the same.

  • It was no reason for it to be done in two different ways.

  • That was a very good example for us to do something

  • in a shared library.

  • An example of abstracting something

  • which we shouldn't have is

  • abstracting models into shared libraries.

  • An example of this is what I spoke about earlier.

  • We saw that the support context and sales context had

  • basically customer and product shared.

  • Someone thought like, I mean,

  • I think in all fairness it could have been me.

  • It was like, "Oh, I know what to do here.

  • "We're doing the validations again and again.

  • "Let's just do the smart thing.

  • "Let's just extract that into a separate library

  • "and then ensure that

  • "those domain models are basically

  • "shared between those things."

  • What it did was basically

  • removed this boundary completely

  • and it made this entire thing into a

  • support and sales context, context.

  • It was a mess.

  • We couldn't add any features to one context

  • without affecting the other.

  • Which is why we wanted the microservices in the first place.

  • Which is why we wanted to keep it small in the first place.

  • But what's the point of keeping it small

  • when you share libraries in between?

  • You might as well have both of them together.

  • Which was a costly lesson for us because

  • by the time we wanted to change the web app

  • to do something else,

  • it was very much tied to authentication service

  • and it meant that we had to ...

  • We spent like a month or so

  • trying to strip away features from web app,

  • and it was still a mess.

  • Which is a shame really.

  • Lesson six is to embrace Conway's law.

  • This is one of those things that gets mentioned

  • along with bounded context time and again

  • in any micro- ...

  • Sorry.

  • In any microservices talk.

  • Which is Conway's law.

  • Conway's law is basically something that has been

  • threw in again and again where

  • when you try and come up with a solution,

  • your solution is going to

  • mirror what your organization structure is.

  • So instead of fighting Conway's law,

  • it's very, very useful for you to embrace it

  • and make it work for you.

  • An example of that is

  • deciding between these two teams.

  • Initially we had a UI team,

  • which is why we have a web app service.

  • Then we had a platform form team

  • which is why we had multiple,

  • what do you call it?

  • We had one AWS deployment architecture.

  • Then we had feature teams

  • which meant we have features in between.

  • But what this (laughs) meant was that

  • any time you had to add

  • a feature,

  • it is going to touch both the UI team and the feature team

  • and the platform team.

  • Which meant you had to somehow coordinate

  • giving or splitting work across these three different teams

  • and deployment between these three different teams.

  • It's a mess.

  • As I say it, I can realize how difficult.

  • It's like a trauma.

  • It's coming back all to me.

  • It's not really a good feeling for you to ...

  • The last thing as a developer you want

  • is sit in meetings about how you should develop code

  • or how you should deploy code.

  • You're like, "This is what I wanted to avoid.

  • "Why am I here again?"

  • An important thing that we decided after trying out

  • the wrong way of doing it in that context

  • is try and do it in a different way.

  • Which is actually make our teams,

  • like our feature teams have UI developers,

  • have a platform person,

  • and importantly, have a product owner within that.

  • When we had a product owner team.

  • I don't know, somehow it was making things really difficult

  • when we had to make decisions.

  • We had to

  • send emails, wait for a long time,

  • and wait for a long time for it to

  • get business owner approval and things like that.

  • The minute we actually had a product owner in our team,

  • when he was attending on stand-ups, everyday stand-ups,

  • it was very easy for us to just go like,

  • "What do you think about this story?

  • "We have tested this.

  • "Should we just deploy this?"

  • He goes like, "Yup, if it's ..

  • "Yeah, sure, deploy it."

  • At the end of the stand-up it was deployed basically.

  • That's how simple things became in terms of deployment.

  • What you want to have in your team

  • is a vertical slicing of teams

  • instead of having horizontal slicing.

  • Other things that we used Conway's law

  • to our benefit was to have self-contained systems

  • and just talking to a product owner about,

  • "Look, you're the product owner,

  • "you give us the requirements.

  • "You're not stopping from changes being happening."

  • Because we are basically in this together.

  • We need to work out how we are going to work

  • but it's not like I'm suggesting something

  • and you are like, "No, that can't go in."

  • It's instead the other way around where

  • you're suggesting us something

  • and we are going to make sure that

  • that change is actually in.

  • That made tons of difference.

  • Lesson seven is to take monitoring seriously.

  • I can't stress how important it is

  • to ensure that you monitor your systems.

  • Because things can go wrong easily.

  • I mean, when you have one thing to take care of,

  • you just have to sit and stare at that one thing.

  • Whereas when you have hundreds of things to take care of,

  • you would go mental about this.

  • It's important to take your monitoring seriously

  • and ensure that you have proper alerts

  • to just alert the important things

  • not all, like some region.

  • A region went down is an important thing,

  • but like an instance went down may be not that important

  • because you do have ELBs and things like that in front

  • which can take care of it.

  • It can be like a post-mortem.

  • You need to understand at which

  • different levels you have alerts.

  • We had our dashboards configured for

  • three important things.

  • The first one was business metrics.

  • The second one was like mangled together

  • which is your application metrics and your system metrics.

  • Business metrics, this was a

  • huge win for us to win our product owner's confidence

  • because he could basically see

  • anything that he's adding,

  • how quick it reaches to production

  • and how does it affect the user

  • interaction with the system.

  • We were showing him

  • after a certain feature was added,

  • how did the downloads increase or decrease

  • or how many users stayed on these pages

  • and things like that.

  • Those are the kind of metrics that they like to see.

  • Those are the kind of metrics which you should track as well

  • as a developer on your team to understand

  • how the software that you put out there

  • is interacting with the users.

  • We use

  • a ton of these tools,

  • whichever one that suited our needs

  • and we worked on our dashboards continuously

  • as well as we went by because we were like,

  • "When we thought that this one thing was really important,

  • "now we actually fixed that problem,

  • "we automated that problem.

  • "It shouldn't happen anymore,

  • "so we can just push it down the stack

  • "and let's look for some other alert

  • "that we should look for."

  • We worked on this continuously.

  • It wasn't just one thing that was just there.

  • An example of this is,

  • how many of you have used Hystrix?

  • Netflix Hystrix.

  • It's a great

  • library.

  • It ensures that ...

  • It has circuit breakers.

  • If any of your dependent services aren't working,

  • we decided that we will show the content to the user anyway

  • because the user shouldn't be penalized for

  • basically our problem of not being able to

  • put software out there properly.

  • That was an example of a very good callback

  • we had in our system.

  • We would see,

  • we were tracking callbacks in our application.

  • Hystrix also gives out metrics for you.

  • We're tracking callbacks in our application

  • and we saw that certain services were like,

  • you see here, there is a huge,

  • what do you call it, like alert

  • which means that something went wrong there at that point

  • like maybe a region went down and something happened.

  • But the user got the content anyway.

  • Other things that we were tracking

  • were things like availability zones.

  • Is there any CloudWatch alarm?

  • Not like specific alarms.

  • It's like, if there's any CloudWatch alarm

  • that's on then it would just ...

  • Where's my thing?

  • That one would go red and

  • a support engineer would go ahead and look at it.

  • That was an example of an application metrics.

  • This is an example of a business metric.

  • We track the user interaction

  • and how much time the user spent

  • in our application.

  • We send an event out when the user enters,

  • enters the system,

  • and we also send an event out

  • when the user exits the system.

  • We were able to see something happening here.

  • That's around the time when the callbacks were failing

  • and were like, "Okay, there's definitely something wrong."

  • Those are the ways that we were able to

  • show to the product owner like,

  • "This application that we've put out there,

  • "this work that we're doing in terms of infrastructure,

  • "it's for you as well.

  • "You can add these dashboards."

  • He was really happy.

  • Going on to the next lesson.

  • An important thing for us was to make sure

  • that we do testing.

  • This shouldn't be a point of contention

  • in this century.

  • I mean, it's important that we have to do testing.

  • It's important that we have to do testing

  • at different levels.

  • In our case,

  • when you talk about microservices, people always say that,

  • "You have done your development properly,

  • "just release it into production

  • "and you can see something goes wrong,

  • "just pull it back or roll back

  • "or release a fix."

  • I was working in a system where our product owners

  • were not used to change at a rapid pace.

  • This was basically scaring them that

  • they could potentially release software and lose customers.

  • They weren't basically ready for it.

  • In our case, we decided,

  • in that case, what we will do is have a QA environment

  • which mirrors exactly what the production does

  • and we will run a soak test so that we have always

  • a user interaction happening with the QA environment.

  • Anytime we were finished with a story,

  • we will check whether the QA environment is free

  • and we would just release it in QA.

  • Because the soak test was running continuously,

  • it was giving us feedback if anything were to go wrong.

  • That covered 80% of our cases when things could go wrong.

  • The product owners seem to come around after it.

  • When he saw that like things went wrong in QA,

  • developers are working to fix it.

  • It was very easy for us to say

  • once something is QA testing done,

  • it's ready for production.

  • We can release it immediately in a matter of minutes.

  • Once QA testing is done,

  • we would talk about it in the stand-up.

  • We'll be like, "We saw this.

  • "We saw these problems, we fixed it.

  • "Can we release in prod?"

  • It went from a point where the

  • product owner was like, "Yeah, sure.

  • "Go ahead, release it, it's fine.

  • "If it's done in QA, if all the test has passed,

  • "just go ahead and release it."

  • It went there in a matter of months.

  • It was a great feeling for us

  • to be able to do that.

  • This was basically how we thought we did our testing.

  • When we released something in production,

  • it's like Schrodinger's cat.

  • They say that you never know whether the cat is

  • dead or alive when it's in a box.

  • The moment you open it is when you see

  • it's dead or alive.

  • But no, you can actually test it.

  • If you shake the box and the cat shouts,

  • then it's probably alive.

  • That's how we tested in our QA environment.

  • Apart from this,

  • to enable that

  • our production is always online,

  • we had ton of other measures that we added

  • apart from just testing.

  • Chaos Monkey is an important thing that

  • I think if you're working in a microservice environment,

  • even if Chaos Monkey isn't automated in your test set,

  • just try doing it as like

  • exercise for the week and see what happens.

  • Chaos Monkey will take care of bringing down systems

  • and you can actually track

  • whether your infrastructure is resilient enough to handle

  • chaos.

  • I think one of the important things that we did

  • which was like a no-brainer

  • and it was really quick fix for all our services

  • was to add health checks.

  • But the health check would not only ...

  • Say for example the web app,

  • the health check would not only check

  • whether the web app is up and running,

  • the web app is able to accept request,

  • it would also check whether it can

  • talk to the authentication service that it was dependent on

  • or other five other services that it was dependent on.

  • We tied the health checks

  • of a service to be also tied to the health checks

  • of its dependent services.

  • That was a very, very easy win for us

  • when we broke any contracts between those two services.

  • We deploy something in production or let someone ...

  • We deploy something in QA

  • or someone deployed a dependent service in QA,

  • then we know that the health checks

  • of the dependent services go down

  • and we're like, "Wait, there's something wrong here.

  • "Maybe this shouldn't go to production.

  • "Maybe we should just stop and see what happens."

  • Just by doing that,

  • it gave us tons of resources

  • on how we are doing things wrong

  • and all we had to do was just quick go fix one by one

  • and it was a very, very simple thing to do.

  • If you can do one thing to your system right now

  • which you haven't already done,

  • please go and add health checks.

  • Also, health checks to make sure that it's dependent on

  • your dependent services.

  • In terms of testing, we had our ...

  • A test pyramid looks like this.

  • You need to have tons of unit,

  • I mean not tons.

  • You need to have a lot of unit test underneath

  • and then on top of it, you have integration test

  • and then functional test.

  • You can't see clearly, I'm really sorry.

  • That's a contract test there.

  • You basically want to have something like that.

  • Sorry, that's a journey test.

  • But in our case, it actually became like this.

  • We had a unit test, integration test, a functional test,

  • and then in between, we have something called

  • as a service test which was

  • checking whether the services on by itself is working

  • by sending

  • dummy requests into it

  • and checking whether we are getting a proper response back.

  • That was a service test which was ran

  • anytime a service was deployed.

  • On top of that was the contract test

  • which is something that our dependent services would give us

  • so that we add it in our pipeline

  • and we go like, "Now this thing has been deployed,

  • "the service is working,

  • "but is it still working as how my

  • "consumers expected it to be working?"

  • That's why we want to test the contract test.

  • On top of that was a journey test which was actually

  • replicating how a user interacts with the system.

  • If we were to do just depend on the journey test

  • and not do any of these things, it would take ...

  • We wouldn't know when something fails,

  • what's failing basically.

  • So it's important for you to realize

  • how do you ensure that your service by itself is working,

  • and then service with its ecosystem is working,

  • and then services for your user is working.

  • That's very important for you to

  • think about in your testing strategy.

  • Note that tests are there to validate constraints.

  • They shouldn't be constraints themselves.

  • If you see yourself whenever you have to add a feature

  • going and editing all the way down like the pyramid,

  • then you're probably not testing it at the right level

  • and that's an indication that you have to have a look at

  • and definitely fix it whenever you can.

  • Lesson number nine is definitely invest in ...

  • That should say 10.

  • I'm sorry, nine.

  • Lesson number nine is to invest in infrastructure.

  • Again, just so you guys know,

  • this is an indicative chart of how my,

  • what do you call it?

  • How the story split was.

  • There was actual data behind it

  • but for reasons, I can't share that with you.

  • This is an indicative chart.

  • When we started our application

  • between infrastructure and feature,

  • at the start we had like hundred percent

  • infrastructure stories

  • and then it started going down.

  • This is for one service.

  • Then we added one more service.

  • This is how it looked like.

  • We started getting feature stories at the beginning

  • because some infrastructure work has already been done.

  • Then for the third service, this is how it looks.

  • As we went by, we saw that the

  • amount of infrastructure code that needs to be done

  • was reducing.

  • It's important that, don't be ...

  • It can be very daunting when you start with infrastructure.

  • It's like, I have to reinvent the wheel again.

  • Why do I have to do it?

  • But perseverance is key.

  • Definitely you need to invest in infrastructure.

  • Why wouldn't you at this age?

  • That's an important thing.

  • You need to embrace new technology.

  • It's an evolving ecosystem out there

  • and you are losing out if you're not actually

  • tapping on to that potential.

  • Digital disruption has already happened.

  • The blockbuster used to be a big thing.

  • Now, I don't own a TV, I don't own a DVD.

  • My MacBook doesn't have a DVD slot.

  • That's where it is at.

  • Now, a lot of things are done towards your middle man.

  • If you're not embracing new technology

  • and if you're not adopting things that's already out there,

  • you're going to be phased out sooner than later.

  • I think it's important for really old companies

  • to understand this and not fight this

  • just because you have the user base

  • because it's not going to work out that way.

  • At least maybe for now it works out,

  • but it won't be like that for too long.

  • Every environment like banking, technology,

  • other services, everyone

  • is working on this space

  • and you need to tap into that potential

  • to make things happen for you.

  • There are new kids on the block.

  • I would definitely have you please

  • go ahead and read about it.

  • If you're high up in the ladder,

  • please ensure that you know

  • what these things are.

  • The three factors that people say that

  • it contributes to your personal growth is

  • autonomy, mastery, and purpose.

  • I think your microservices

  • also need to have these three things in them

  • for them to work properly,

  • which is like ensure that it has proper autonomy

  • and it knows what it is doing.

  • It has the right tools to do

  • and it has a purpose to exist.

  • If it doesn't exist, just don't leave it out there.

  • Microservices helps you to improve on iterations

  • which is very, very important in agile software development

  • is to improve in iterations, adapt to feedback.

  • And having autonomy and taking control of

  • business as well as technology in smaller teams

  • helps you to deliver software quickly.

  • You have to ensure that there is

  • high cohesion between your services

  • but there are coupled loosely enough

  • such that you can alter its state

  • whenever as you need.

  • You need to be able to

  • embrace Conway's law.

  • If your team isn't structured to do that,

  • then that's definitely going to get into the way of

  • you adopting microservices properly.

  • Why do microservices?

  • It's a lot of fun doing it out there.

  • As a developer, it gives me great purpose.

  • when I work on something

  • and I don't have to wait for it like six months

  • to be released to the user.

  • I enjoy it when I see something that I worked on

  • and I can actually show to my mom.

  • Like, "You know that thing?

  • "I worked on it."

  • I think

  • I deserve bragging rights on the things that I worked on.

  • That's my summary.

  • These are resources.

  • I'm just reading my slides at this point.

  • Image credits.

  • Thank you.

  • (audience applauding)

(gentle instrumental music)

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it