Placeholder Image

Subtitles section Play video

  • Data science plays a key role in the selection of influenza vaccines.

  • What may sound like an excerpt from a sci-fi novel is, in fact, a real-life application

  • of modern data science techniques improving lives today.

  • In this video, we'll talk about viruses and vaccines.

  • We'll explore machine learning's role in the preparation of influenza vaccines and

  • the ways to visualize and analyze genome data using data science techniques.

  • (These include ML and different substitution models).

  • We'll also mention platforms, where you can store and analyze gene data or even your

  • own genome if you've got it.

  • But first things firstlet's see what viruses are and how they operate!

  • What are viruses?

  • Viruses are small cells, which can cause illness in different organisms, like birds, mammals,

  • and humans.

  • In the case of Influenza, there are two distinct surface proteins N and H, which it uses to

  • enter a host or host cells (the H protein) and replicate (the N protein).

  • Now, these proteins vary a bit in their structure, so different versions of them are identified

  • by a number.

  • An example of that is the H3N2, which contains the third variant of the H protein and second

  • variant of the N protein.

  • Both H3N2 and H1N1 are called subtypes of Influenza.

  • And they're also the two most common subtypes to infect humans.

  • H3N2 is an important example of the flu virus.

  • Also known as the Hong Kong flu, it caused a pandemic in 1968, resulting in over a million

  • deaths worldwide.

  • The virus was highly contagious and spread quickly through the population, starting from

  • Asia and later reaching America, via returning troops from Vietnam.

  • By the end of 1969, the virus had reached parts of Africa and South America, as well.

  • And if you thought this was bad, hold on to your hats!

  • There's even a more dangerous influenza subtype: the H1N1, also known as the Spanish

  • flu.

  • H1N1 was responsible for the swine flu pandemic of 2009, as well as the devastating Spanish

  • flu of 1918.

  • It was extremely lethal, resulting in over 30 million deaths worldwide.

  • The reasons behind the high mortality of the virus still remain a mystery.

  • While some scientists suggest an unusually aggressive form of the virus was involved,

  • others claim it was the circumstances surrounding the infection (overcrowded and unhygienic

  • camps during the war) that contributed to the high death toll.

  • At this point, you're probably thinking: “If this virus can be so dangerous or potentially

  • lethal, how can we protect ourselves against it?”

  • The short answer: influenza vaccines, commonly known as flu shots.

  • So, what is a vaccine and how does it work?

  • Nowadays, vaccines can include forms of a weakened virus, which our immune system can

  • train to recognize and deactivate.

  • In the case of the influenza vaccine, it includes some forms of H1N1 and H3N2 viruses we talked

  • about earlier.

  • Influenza vaccines are formulated annually.

  • But why do they need to change the vaccine each year?

  • The answer lies behind two phenomena in genetics: antigenic drifts and shifts.

  • Hold on, wait, what are those?

  • Let's start with antigenic drift.

  • Imagine you have a group of people, stranded on a raft in the sea.

  • Over time the people on the raft slowly change appearances, they grow a beard, hair gets

  • longer, they get more tanned.

  • In essence, they remain the same people but slightly changed.

  • This is what antigenic drift means - slow changes over time.

  • And what about an antigenic shift?

  • Now, if two people on the raft mix their genomes (as none of the kids are calling it) and create

  • a progeny, a.k.a. a child, it will contain a mixture of both their traits.

  • So, the antigenic shift is the exchange of genetic material and the creation of a new

  • organism.

  • Because of the antigenic drift Influenza mutates and changes quickly, making it difficult to

  • find a vaccine against all possible mutated viruses.

  • The antigenic shift also causes the emerging of new influenza subtypes, such as the H3N1

  • or H1N1 we talked about earlier.

  • So, when scientists decide which virus types to include in the vaccine, they need to think

  • about how to make it most effective.

  • And that depends on how closely the vaccine resembles the types of influenza viruses which

  • will dominate during the upcoming flu season.

  • This is where data science comes into play.

  • Based on existing data about former and current virus spread and variants, scientists try

  • to model and predict the future behavior of viruses, using machine learning algorithms.

  • To do that, they first need an appropriate way to handle information about viruses, or

  • more precisely their genomes.

  • This is done via analysis of genetic data.

  • But what's genetic data, exactly?

  • Genetic data includes the genome of organisms or some parts of it.

  • It usually consists of DNA, represented in the form of strings.

  • In the case of Influenza, it contains RNA, which some viruses have as their genetic material.

  • Alright!

  • Once we have our genetic data, it's time to decide how to best visualize it.

  • Though there are many options, we'll talk about one in particular.

  • The staple phylogenetic tree.

  • Phylogenetic trees, also known as evolutionary trees, represent the closeness of different

  • species in terms of their genetics.

  • Basically, they are a diagram showing the evolutionary relationships between species.

  • In the case of influenza, such trees can be used to visualize different strains of the

  • virus.

  • Let's put all of this together and get to the final point: prediction using data science.

  • Using information obtained from phylogenetic trees combined with different machine learning

  • techniques, you can model future behavior or spread of the Influenza virus.

  • One of the methods involves nonnegative least-squares optimization, which measures distances between

  • branches of a phylogenetic tree.

  • It uses a bidirectional weighted phylogenetic tree and determines sets of coding changes

  • on the surface of the H protein.

  • The model can then identify the antigenic impact of different influenza strains.

  • Another way to perform phylogenetic analyses is to use the PAML package, which contains

  • programs for phylogenetic analyses of genetic data using maximum likelihood (ML).

  • How it's done?

  • By taking a set of trees and evaluating their log-likelihood values under different models.

  • These models estimate some parameters while allowing for others to vary.

  • This way they can incorporate the variety of gene types in influenza strains and their

  • surface H protein.

  • Of course, there are other methods you can use to make predictions in biology.

  • Our aim is to provide you with an overview of two main ones, and we trust you can delve

  • into and explore other methods on your own if you find this topic interesting.

  • And that pretty much brings it into a close.

  • We went all the way from learning about the flu and how a virus works, through the history

  • of the first vaccine and the biggest flu pandemics, to the antigenic shifts and drifts.

  • That was fun, right?

  • We discussed different types of biological data and their visualization.

  • Finally, we learned how to make predictions using different machine learning techniques.

  • But, before we go, let's round off with something about data science and its diverse

  • applications.

  • Data science is not just a tool used in the IT Domain or by large corporations.

  • It plays an important role in (life) sciences and its medical and biological applications

  • are becoming more and more widespread.

  • In fact, big tech companies like Google and Amazon started their own genome projects recently,

  • allowing users to store and analyze their own genome on their respective cloud platforms.

  • Microsoft entered the field too, with the release of Microsoft Genomics on their Azure

  • cloud.

  • So, if the big players are on it, it's a safe bet to assume that genomes and their

  • analytics using machine learning are definitely worth looking into.

  • Ok, guys and gals.

  • I hope we managed to shed light on influenza vaccines and the data science behind them.

  • If you enjoyed the content of our video, please click the like button and share the story

  • with your friends!

  • And, if you're curious to find out more on the topic, you can follow the link to the

  • article in the description.

  • Thanks for watching!

Data science plays a key role in the selection of influenza vaccines.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it