Subtitles section Play video Print subtitles My name is Sue Wessler. I am a Professor of Genetics at the University of California, Riverside. And my lab studies transposable elements. The title of these two presentations are The Dynamic Genome. In the first talk I introduced transposable elements by describing their discovery by Barbara McClintock, how they move, and how that discovery over the years was recognized as a major revolution in biology as it became appreciated that transposable elements are the major component of most of the genomes of higher eukaryotes. In this talk I am going to go into detail about how my lab studies the evolutionary impact of transposable elements on genomes. And how we develop strategies to identify elements that have an impact on transposable elements. I have divided this talk into three parts. In the first part I talk about the transition from genetic approaches to genomic approaches in order to identify elements that in fact impact genome evolution. The elements that were discovered in my lab are called MITEs, and I will tell you about that discovery in the second part of this talk. And in the final part of the talk I will tell you about how MITEs are able to increase their copy number in the genome without harming the host significantly. So to review the first part, we talked about the genetic analysis of transposable elements, how genetic analysis led to the discovery of transposable elements, and I used this spotted corn kernel as an example to tell you really how powerful the genetic analysis was. So when you see a spotted corn kernel like this, we know, the geneticist knows, that the reason that kernel is spotted is because there are active transposable elements. There are... that is the spots reflect the movement of transposable elements. The other thing the genetics tells us is exactly where in the genome that active transposable element is. So for example, here when we are looking at spotted corn kernels, we know that there is an active transposable element, in other words, one that is capable of moving in a gene responsible for kernel pigmentation. The other thing that the genetics tells us is the type of element that's there. And I described at the beginning of the first talk the difference between autonomous elements, that is, ones that encode transposase, and non-autonomous elements. Those are the elements that don't make transposase but are able to move if there is an autonomous element in the genome. McClintock and others were able to deduce this just by looking at the behavior of transposable elements in crosses. Now that is the good news about the genetic analysis. Unfortunately the genetic analysis is limited in its scope. And that is that by its very nature genetics depends on the analysis of mutant alleles, and so the transposable elements that were being studied were the ones causing mutations. They were mutagenic elements. Now because these elements cause mutations, there aren't many copies of them in the genome. So, I mentioned in the first talk that genomes are up to 50-80% of the genome sequence is derived from transposable elements. However, the elements that cause mutations are not those elements. And you can understand that an element that causes mutation is eventually, if its copy number increases too high, will kill the host. So these are.... these are a special class of transposable elements that cause mutations. And as such these elements really have a minimal impact on genome evolution. They're bad. They are really bad. So McClintock as I also described in the first talk, not only discovered transposable elements, but she hypothesized that they were also tools that diversify organisms. And to review, she hypothesized that transposable elements that are in the genome do not move around frequently, that there are conditions, such as changes in climate for example, that could activate transposable elements. that this activation would generate genetic diversity in the population by increasing the frequency of mutation, and that some of these transposable element mutations may be adaptive. I will come back to this scenario at the end of the talk when I show you how the elements that we have identified in plant genomes fit this scenario very, very nicely. So to review, I described in the last talk the two general classes of transposable elements in the genome. The first class, which is called Class 2, are DNA transposons. These are the elements that were discovered by McClintock genetically. We know these elements have a typical structure of terminal inverted repeats, that they encode a single protein necessary for the movement of the element, and that is transposase. The other class of elements which I am not going to go into extensively in this talk, and nor did I in the last talk, are called retrotransposons. These are elements that encode reverse transcriptase, and they move through an RNA intermediate, again by mechanisms that were described in the last talk. I also want to re-introduce you to transposable elements families because we are going to revisit families in this talk. That transposable element families contain autonomous elements. That is the elements that encode transposase. And non-autonomous elements these are elements that don't encode transposase, but are able to move utilizing the transposase that's encoded by the autonomous element. Something else that I discussed in the first talk is just how prevalent transposable elements are in genomes. So this was a human gene where I showed you the exons that are in the gene, and this is really a pretty typical gene. And what we find is that in the non-coding regions, there are many, many, many transposable elements, that some human genes have over a hundred transposable elements in their introns. Now the element that I am referring to, most of the elements in the human genome are called Alu. They are a class 2 retrotransposon which are present at an astonishing copy number of over a million copies. It's almost ten percent of the human genome. Now one of the things, if we are interested in the evolutionary impact of a particular transposable element, one question we could ask is, so for example, if we picked out one of these elements, we could ask what happened when it inserted? Did it change the expression of the gene? These are the questions, if we are interested in the evolutionary impact, these are the questions that we would like to be able to address. Unfortunately we can't do that with the human elements, most of the humans elements. And that is that the insertions may have changed gene expression, but we have no way to address that now. And the reason is first of all in the human population, virtually all of us, 99.99% of us, have these insertions, have exactly these insertions. That's because these elements moved millions of years ago. So what that means is that if we want to know how did the insertion of a particular element change the expression of a gene, if at all, we are too late. So what we want to do is identify a group of organisms where these high copy number elements are actively transposing. And I am going to talk about that strategy- that's exactly what this talk is about. So here is our strategy for analyzing the impact of transposable elements on genome evolution. And this is a figure from the previous talk which is sort of a typical region of a genome, a grass genome, and this is from the barley genome, and the blue boxes are genes and the triangles are transposable elements. So in barley about 85% of the genome is derived from transposable elements. So the strategy that we would like to do to identify evolutionarily relevant transposons is to find a species that is in the midst of genome expansion. So where these high copy number elements are moving, are increasing their copy number. And we would then go ahead and identify and isolate an active element. So this is not one of the mutagens that was identified by the geneticists, but in fact these are the high copy number elements that are now increasing in copy number. Ok, so we would then ask the question, how is this element able to increase its copy number so extensively without harming the host? What are its strategies for success? And success in this case is defined by being able to increase your copy number without killing or harming the host, and possibly even by benefiting the host in some way. And we are going to address all of those issues in this talk. So in the first talk we talked about the discovery of transposable elements in maize. Well, maize is a member of a larger group of organisms. It's a grass. These are the most important organisms for human health, for the human diet. More calories come from members of the grass clade than any other group of organisms on this planet. We are familiar with maize. The other members of the grass clade is: rice, which actually is the most important source of human calories, sorghum, which is also a very important crop plant especially in Africa, and finally barley. Another member of this family, which I am not showing is wheat. So what you would notice here, those numbers are the size of the genome. The maize genome is about the same size as the human genome, at 2500 megabasepairs. The rice genome is much, much smaller. It is almost ten-fold smaller. And what is remarkable is that here are these plants that are so incredibly similar, yet their genomes size differs dramatically, by more than ten-fold. So these organisms diverged from a common ancestor only about 70 million years ago. And the main reason for this difference in genome size is this dramatic amplification, expansion of transposable elements. And this slide helps explain in part how that can happen. These organisms, the grasses, in fact have about the same gene number. They have about 30,000 genes, give or take a few 1000. And so the genomes of these organisms are largely syntenous. The genes are mostly in the same order. So in rice you could see, with smallest genome, I've shown three genes there in pink, yellow and blue that are pretty close together. In maize those genes are further apart. And in rice those genes, I'm sorry, in barley those genes are even further apart. So what's happening here is transposable elements, which are the squares and circles and ellipses in between, transposable elements are inserting massively between the genes and expanding the genome. So this is largely responsible for the difference in genome size. And it's the safe havens that transposable elements can go without harming the host. So I want to show you a little bit at a higher resolution to show you what elements are involved. So I introduced to you before that there were two types of transposable elements. There were Class 1 elements which are retrotransposons, and then Class 2 elements which are DNA transposons. The retrotransposons which are generally these big elements that make RNA copies. That RNA copy is then made into double stranded DNA. The double stranded DNA can insert back into the genome. It almost make copies like a printing press, like an old fashioned mimeograph machine, which most of you probably never experienced. So what you see here is that the huge blocks could be hundreds of kb that separate some genes in the grass genome are largely retrotransposons that are inserted into each other,` literally driving the genes apart. So as we say it's almost like genes sitting in a sea of transposable elements. Now these are not the elements that we are going to talk about today. Instead we are going to talk about elements that I think are probably more involved in diversifying the genome. And that is, so what I have done here is I've blown up the area of one gene and broken it up into its exons and introns. And I've shown you that sitting in plant genes, much like the Alu elements that are inserted in human genes, are little elements, little transposable elements called MITEs. These are DNA transposons. They are non-autonomous elements and in the next few slides I am going to tell you a little bit more about MITEs because they were discovered in my lab at least 2 decades ago. So the way the first MITE was discovered was it was an insertion sitting in a mutant gene. And this is the work of several people in my lab, especially Tom Bureau and Rita Varagona. What you see here is a maize gene, and sitting in it is a little DNA transposon, disrupting the gene, causing a mutation. When Tom Bureau isolated this transposon, like I said, this was back in the early 90s, when he isolated the transposon, he took the sequence and compared it to several other maize genes or plant genes that had been deposited into databases. This was before you did BLAST searches. This was in the early 90s when there were some wildtype genes that had been sequenced. And what he found was that in fact sequences like this little element were present in many of the other wildtype genes. And I am showing that here. So what you see is here are some wildtype genes that were revealed by this computer search. And so for example, this gene has an insertion in an intron. So these are normal genes. So these insertions are in non-coding regions. They are not effecting the expression of the gene. This one is in the 5' promoter region. And the one at the end here is in the 3' region. So he discovered these elements, they are called... MITEs stand for miniature inverted repeat transposable elements. What is similar about these elements is their structure. Their sequence may not be similar and is not similar when you go from organism to organism. But these are the most predominant transposable element type associated with the genes of plants. So let me tell you where MITEs fit in in a transposable element family. I told you before about autonomous elements. I told you about non-autonomous elements. MITEs are non-autonomous elements, but they have no coding capacity. So in this case I have shown them. They look like they are a deletion derivative of the autonomous element, but have none of the coding sequence of the transposase. That is one possibility. The other thing, first of all, MITEs are very, very small. Very short. And they can attain very high copy numbers. So where as most non-autonomous elements in the genome may be five or ten of them, maybe up to 50, there can be 1000s of MITEs. I am going to talk more about that in a bit. Here's an example of MITEs that look like an autonomous element that's in the genome. Its terminal inverted repeats, its ends, are very, very similar. So we can look at this and say, Ah! This autonomous element must move these MITEs. We've done lots of experiments over the years to validate that. There are other MITEs in the genome that don't look like any other element that's in the genome except for the ends, except for the terminal inverted repeats. And I think you might remember from the first talk that the terminal inverted repeats are critical because that is where the transposase binds and facilitates the transposition of the element. So MITEs are short, miniature inverted repeat transposable elements, I won't say that anymore, I'll just say MITEs from now on. They are short elements. They can attain very, very high copy numbers unlike most other DNA transposons. And that is what is relevant here. That its a high copy number elements that I will argue are the ones that have an impact in diversifying genomes. Not the lower copy number elements that McClintock and others had discovered cause mutations. So fortunately, MITEs are not restricted to plant genomes. And I say fortunately because a lot of my work over the years was funded by the National Institutes of Health, and if you are working on a plant system, it's nice to be able to say that what you find in this plant will be relevant to human health. And we all try to say that. So in this case this is a composition of transposable element composition of a mosquito genome, Aedes aegypti. And it turns out that about 16% of this genome is due to... is derived from MITEs. From transposable elements. So what I would say in my grant application is that by understanding how MITEs move, and how they increase their copy number, in a plant genome we can extrapolate... we can use that information to understand how they expand in animal genomes. And there are MITEs in zebrafish, and in most higher eukaryotes, but none of them to date have been shown to be moving. So the other thing about MITEs that's really relevant is that they are preferentially in genic regions. Now remember again from the first talk we said that very small percentage of the genomes of plants are genic. So it may just be like 10 or 20% are where genes are, but MITEs preferentially go into genic regions. And we will talk a little bit later about that preference. So we have this situation here where we have these very high copy numbers, so we have elements that expand, that increase their copy number, and they go into genes, but they are not killing the host. So how do they do that? So not only do they not kill the host, but they seem to be beneficial. So what I've shown here are two examples of wildtype genes. In the first one we see the red exons, and what I am showing is that transcription, the sequences that initiate transcription, are actually derived from the MITE sequence that is in the promoter region. Similarly, there are other MITE examples of wildtype genes that have MITEs that in fact carry the sequences for transcription termination. So MITEs carry in some cases, regulatory sequences that are used by genes. MITEs also contribute to allelic diversity. So what I have shown here is the same gene or alleles of the same gene, one with a MITE and one without. So here would be a wonderful example to be able to say, okay, I have a gene without a MITE, I have one with a MITE, what is... how do they differ in expression? And that may be able to tell us what is the impact of insertion of that MITE. Unfortunately, when we look at a database, and we harvest all of these related sequences and genes it turns out that these genes don't just differ, the alleles don't just differ by the presence or absence of the MITE sequence. They have many other single nucleotide polymorphisms, indels, that differentiate them. And this indicates that this insertion happened a very long time ago. So these are essentially dead elements, and really we can't tell by comparing the expression of these two genes what the impact of the MITE was because there are lots of other differences between those two genes. So as I said, these are old insertions. Okay, so what we need are active MITEs in order to understand how they... and we need to catch them in the act of increasing their copy number. So what we need is a situation like this, and I am going to come back to this later, and that is two genes that differ only in the presence or absence of the MITE. Then we can compare the two genes and say, okay, if this one for example is expressed in the roots and the leaves, and this one is just expressed in the roots, we can say that the MITE sequences allowed this gene to be expressed in a different tissue. It diversified its expression. Ok. So we want alleles that only differ by the presence or absence of the MITE. So now I am going to.... the next part of the talk is that quest, that search for active MITEs. So what I am showing here is a phylogenetic tree. And it's called a star phylogeny. So the way we interpret these, so what's done is you take all of the MITE sequences that are in a genome, you put it into a computer program, and it generates a tree that tells you how these sequences are related to each other. So this is, what you see, what this tells us, the story that this tree tells us is sometime long, long ago, there was a single element or a couple of related elements. They increased their copy number. They were all identical. They increased their copy number. And then somehow transposition stopped and over the thousands or millions of years these sequences drifted. They accumulated mutations and now you see this star phylogeny. So that's a story that this tree tells us. And that's a typical MITE family tree. Okay, so what it tells us is that MITEs amplify rapidly from one or a few nearly identical copies. Ok. So if we look at a typical, the genome of a higher plant or animal, what we see are lots of bursts. And I have, for convenience, I am showing the same tree that I have cut and pasted but in fact each of these trees should be different since it is a different MITE that started out as one copy and then increased their copy number. So forgive me because these trees are not easy to draw. So you see there are just, genomes are filled with MITEs, tens of thousands of them that all started from a few elements, and burst, but over evolutionary time. So in order to understand how those elements amplified without killing the host we need to, as I said before, catch a MITE in the act of bursting, and that is that central region, this red circle here, where the element is rapidly amplifying, and that is what the rest of this talk is about. So in order to identify active MITEs, my lab had to switch directions. And I think this happens frequently that sometimes the organism you work on isn't ideal for the questions that you want to address. And so this was from the first talk I showed you- this was the first maize genetics group, or the maize genetics group, R. A. Emerson's group at Cornell. And there is a picture of Barbara McClintock at the end. And here's the spotted kernels that she used to discover transposable elements. And I mentioned that this is a shed in the Cornell plantation. And this picture was taken in 1929. Well, what we did in 2002 is we got a new group of researchers together, involving Susan McCouch at Cornell, Sean Eddy, who at that time was at WashU, Zhirong Bao, many other people, and we took a picture in front of the same shed in the Cornell plantation. And so this is our collaborative group that focused on identifying active MITEs, and to do that we had to switch organisms, and we switched to rice. And I will tell you in a second why we did that. I think I mentioned that maize has this really, really large genome of 2500 megabasepairs. That it's about the same size as the human genome, very dynamic, very complex. The rice genome is significantly smaller, almost, about 6 fold smaller. And it is of this group of grasses that I talked about before. It has the smallest genome of the cereal grasses, 350 megabases. And for that reason, and plus because it is so important to human health, it was the first grass genome that was completely sequenced. And for us to do this project we required the complete genomic sequence in order to identify the active MITE. We weren't going to use a genetic approach. We were going to use a computational approach, and that is what I am going to talk about. So here is the strategy. We had the complete genome sequence of rice, and when I say we, the person mainly responsible for this- two people- Zhirong Bao who is a graduate student in the lab of Sean Eddy, who, as I said, was at WashU at that time. And Ning Jiang who was a graduate student in my lab. Zhirong Bao had devised a computer program called RECON. What this program does is it takes.... so what we are looking for... We are trying to find an element, a MITE, in the genome sequence, that has the features of an active transposon. What are those features? First of all, a transposable element will have many copies. So we are looking for something that has multiple copies, but we are looking for something where those copies were generated very recently. So those copies, remember, we are looking at the red region of that phylogenetic tree, those copies should be identical or nearly identical. Because when an element duplicates the two copies are identical, and over time they drift, and that is what the star phylogeny is there. So what was done, we used, we took the rice genome sequence, compared it to itself, and in that way identified high copy number repeats. And in about three thousand repeats were identified. These could be genes. These could be transposable elements. Now then the human being has to come in, or the human being came in devising the RECON protocol, but what you have to do, and this was done by Ning Jiang, she manually searched each of these three thousand families to try to find a sequence that looked like a MITE. And she found one, obviously, or I wouldn't be talking about it now. K, so what she found was a family that had fifty-one nearly identical copies in the sequenced genome, which is called Nipponbare. That is the name of the strain. There were 51 copies in this genome. And it had the structure of a MITE. And here it is. It is called mPing, and it is 430 basepairs in length. So the problem though is that when you do computational analysis, and you identify something that looks like it should be active, you've got to go back to the bench and prove that it is active. This is what we call a candidate. It's a candidate in development. You've got to then do an experiment that validates, that shows it moving around. So what Ning did was she took cells that were in the freezer for 4 years. She popped these cells out, and we had a cell culture. So cell culture is an environment where DNA... where transposable elements have been shown to move around in other situations. So what we had is we had the DNA from the plant, two plants, one before cell culture, and then after cell culture. The question we are asking is can we see the movement of the mPing element. And we use a technique called transposon display. This is what we used years ago. Now all you have to do is sequence the genome. And we'll talk about that later. But this was the technology available to us at the time. And what you do is you make a primer, and here we made a primer that was near the end of the mPing element. And then what we are going to do is we are going to take genomic DNA, we are going to cut it up with a restriction enzyme. And we are going to put adaptors at the end of the genomic DNA fragments. We are then going to do PCR using the primer from the end of the genomic fragment and a primer from the mPing element. And you resolve this on a gel, and that is what you see here. So what you see in lanes 1 and 2 is a situation where it is a rice plant in one, the DNA of the rice plant before cell culture, and 2 is the DNA from the cell culture. And all of the bands, and nothing is happening. You see one and two, they look exactly the same. That's because the mPing element is not moving around in that cell culture. Each of the bands comes from at one end of the band is the mPing primer, which you see over here. At the other end of the band is the region in the genomic DNA where the adaptor sequences is. Okay, this is a modification of a technique called RFLP. I am sorry, AFLP. So anyway, lanes three and four are far more interesting. And this is something that... there are... Science is really slow, but there are these days you have which you always remember. And this was a day. It was a Sunday morning and I turned on my computer and Ning had sent me this picture that showed that the mPing element was moving around in the rice genome, and it was, needless to say, it made my day. So what you see in lane three is the plant before it went into cell culture. And there are only a couple of copies of mPing in that strain. However after cell culture there are hundreds of copies. So what we can actually do is cut out those bands, re-amplify them, and sequence them, and determine the position of insertion of mPing in the cell culture DNA. So what I want to show you here is where mPing fits in. mPing is a MITE. It is a non-autonomous element as are all MITEs. It doesn't code for anything. It is only 430 basepairs in length. Xiaoyu Zhang who was a graduate student in my lab took the mPing sequence and BLASTed it, compared it to the entire rice genome sequence. And he found a single transposon that looked like it was the autonomous element for this family. So that is called Ping. So we have Ping. We have mPing. And there was only a single copy of that element in the entire Nipponbare genome. And we went on over the years to show that the transposase from Ping is able to move mPing, and I am not going to talk about that in this talk. So what I want to tell you about is the copy number of mPing. Because I've told you that MITEs are great because they attain these really high copy numbers of hundreds to thousands. And yet I am bragging about some element that has fifty copies. That's not a whole lot. And in fact when we look at a lot of strains... what I've shown here is the Nipponbare, the sequenced genome, which has 51 copies as Nb. And then when we look at a lot of other Japonica strains, and this was done in collaboration with Susan McCouch. We obtained the strain collection. We see that there are very low numbers of mPing from 25 up to 38. That's not the burst that I've talked about, which is... So fortuitously another... two other groups in Japan had identified the mPing element, but they identified it in plants, in living plants. And I am going to talk about those plants in a second, but when we analyzed those plants, we found that these four related land races, as I've shown here, had over 600 copies of the mPing element. So this really said to us that here's an element that is capable and has increased its copy number very, very quickly. But the next question, remember one of the things we are interested in, is how can the element do this. Here we have 4 strains where the mPing element had increased tremendously to 600 copies. The question, what I want to show you in the next slide, is just how closely related all of these strains are. How the major difference between these strains is the mPing insertion sites. And so what we can do, and again, this is another transposon display where in the first lane, the Nb is Nipponbare, with its sequenced genome, and where there are 50 copies of mPing. Those are fewer than 50 copies, which I am not going to go into. why that is. It's the way the experiment is done. The next three lanes are three of the four land rices where the mPing element... where there are 600 copies of the mPing element. And what you'll notice first of all is that the patterns are very, very different. So the insertion sites for the elements are very, very different for each of those. So from that we concluded that the mPing element had, at least at some stages, amplified independently. These strains at some point, there probably was one strain in which the mPing element had amplified, had activated, had become active. And then land rices are strains of rice that farmers have grown in particular areas. So these strains, they were then grown in particular areas, They were kept separate. And now we are bringing them back together in the lab. So what you see is the different patterns, but you can see that there are many more bands. So we say that they burst independently in these three strains. What I want to show in the next slide, not next slide, but the next experiment is just how similar these strains all are. So what you see here, is using, in the first, in the gel on the left, we've used the mPing element the primer from the mPing element, remember I showed you that before, in PCR. What we are using in this second panel is a primer from a different element. So exactly the same genomic DNA preparations, but the difference is one of the primers in PCR. This is a primer from a retrotransposon called Dasheng, which was also identified in my lab. There are about 1500 copies of this in the genome, but what you see, again using the exactly the same DNA preparations, is that the pattern for all of those strains is exactly the same. So what that, or virtually the same. So what that says is the major difference between these strains is the different insertion sites of mPing. Now what I want to tell you is I want to show you the experiment we devised, and this is Eunyoung Cho in my lab, devised this experiment to see if the mPing element was still transposing in these high copy number strains. And this is a, people in this area of transposon biology, people will look at one organism and see transposons, and look at another and see them in different places. And they'll say, okay, the transposon is moving. They are probably not. What they are seeing is just polymorphism. The only way you can really see transposition to say something is transposing, is if you see it right before your eyes. And that is what I am going to show you right here. So what's done is we have ten. We have one plant that was grown up, one rice plant, and from that plant Eunyoung took ten seeds. Rice is a selfer. It self-pollinates, so all those seeds are virtually identical. She planted the seeds, she grew them up. She isolated genomic DNA. She did transposon display using the mPing primer. And what you see are the ten lanes of the 10 individual plants there. And you'll see differences. You'll see differences, and I'll explain that in a minute. So she took the last plant here, the very last one. She took 10 seeds from that plant and she grew the next generation. And she did the same thing, transposon display. She took seed from the last plant here, I am sorry, ten seeds from that plant. She grew it up. Here is the last generation. So by comparing these three panels and by looking at the white arrows, the white empty arrows, what you'll notice is that there may be... there's a band say in the F1. Umm, that then in the next generation it is segregated. Okay. So you see that band. That is a heritable insertion of mPing. That is now segregating because when it inserts in the F1 generation, it is heterozygous. And in the next generation we can see it segregating. So essentially by looking at this what you can see is what I think is pretty remarkable. That is this rapid increase in the mPing copy number over a very, very short period of time. A couple of generations. And so we can actually, just by simply counting bands, we can determine how many new insertions there are per generation. And that is what I will show you here. So we had basically found that there were approximately forty new insertions per plant, per generation, and that 80 percent of these insertions are heritable. This blew us away. We had no idea that a transposable element could increase its copy number so rapidly. So now we have the material to address the questions that I had... one of the questions that I posed at the very beginning, which is how do transposable element amplify without killing their host? And then the second question I will end the talk with is does amplification actually benefit the host? So the first one is the way were addressing this first question is where are the elements going? Where are they in the genome? So the way we... experimentally the way we address that question was to take genomic DNA from these plants and essentially amplify out the ends of the element, much like in transposon display, but we are not going to run a gel. So we are taking genomic DNA, and you'll see in a second, we are not taking it from one plant. We are taking it from a small population. We are using the primer from the mPing element and a flanking primer in an adapter and amplifying up all of these flanking regions, and then instead of running it on a gel because we don't want to look at a few insertions, we want to look at all the insertions, we use high-throughput sequencing. In this case we used 454 sequencing to sequence tens of thousands of... hundreds of thousands, flanking regions, regions flanking the mPing insertion sites to try to find out where mPing is. But more importantly, and this was an experiment done by Ken Naito when he was a postdoc in my lab. What he did is he figured out that the capacity for sequencing is so great that we don't have to restrict ourselves to one plant. He can actually determine where mPing is in a small population of plants. And he chose the number 24, so we took 24 rice plants, and he was able to barcode the PCR reaction so we could tell which plant the PCR products came from. And so, what he did, so what you see on the left in the gel, is, you'll see for example some of the PCR patterns, products, from each of, from a subset of the 24 plants. And you can see that in blue I am showing you the bands that are shared by all of the plants. In pink I am showing you the bands that are only present in one plant. The ability to look independently at shared and unshared insertions is very powerful. Because what we want to know is not just what are the insertions that are present in all 24 plants, We want to know... those we will call "old" insertions. We want to know what are the new insertions. What are the ones that are just happening. And the reason is, it is possible that the old insertions have been filtered by selection. So that over generations insertions that are in places that are in exons or whatever have been removed because they have been detrimental. So by looking at shared an unshared we are able to get the whole spectrum of new and old insertions. Old is also... old is not really old. These are insertions that happened, this initial burst really happened maybe over the last 50 to 75 years. But these shared insertions, these unshared insertions, happened in our greenhouse. So essentially what he was able to sequence, so we know that if all of the bar-coded plants or some of them show the same insertion, we call that shared. If only one of the plants shows an insertion, we call that unshared. So he was able to determine 928 shared insertion sites, and 736 unshared insertion sites. Ok, so, as I said before, these unshared insertion sites are de novo insertions. They just happened. And they are heterozygous. Heterozygous because when insertion happens it goes into one of the two alleles. And heterozygous is important because if it is a detrimental insertion, if it's a detrimental mutation, it is likely to be a recessive mutation. So. This is a... I don't expect you to see this. It is just really to impress you. What you are seeing is each of these graphs is a different rice chromosome and the blue... I am going to blow it up in a second. The shared insertions are shown in blue and the unshared insertions are shown in red. And it essentially shows that the mPing insertion can insert throughout the 12 chromosomes of rice. It is on every single chromosome. And this shows a single chromosome, chromosome 4, and we are looking at the insertions on chromosome 4. And chromosome 4 has a very large region of heterochromatin. And most of the transposable elements, the DNA transposons, don't insert into heterochromatin. They insert near genes which are euchromatin. But even, but we do have a few insertions that are in or near heterochromatin. So what we know, what we have learned from this analysis, first of all the distribution of shared and unshared insertions is the same. It is exactly the same. There is no difference in the insertion preference of either class. That, remember I said before, that MITEs prefer to insert into single copy regions, intergenic regions, and sure enough, 91% of the insertions that we found were in single copy sequences. The genome average of single copy sequences is about 54%. So this is saying that MITEs do prefer to insert into single copy sequences. And that we found that even when it does insert into heterochromatin, it's actually inserting near genes that are in the heterochromatin. So now we have looked at a gross scale throughout the chromosomes. Let's look more closely and see where is mPing inserting in and around genes. So here is a summary, with a very surprising result. So what we see, we are looking at insertions that are in the 5' untranslated region, in the exon sequences, in intron sequences, and in the 3' UTR. And we are looking at again at a summary of a very large number of genes, and what you are seeing is the percentage of insertions. And so what we find is that the grey here is the expected number, is the expected number of insertions given the composition of the genome. The pink are the unshared insertions and the blue are the shared insertions. And the only thing that really stands out on this histogram is this. We find that there are far fewer insertions into exon sequences than we would expect by chance. Almost ten fold fewer insertions. Both the shared and the unshared. And the unshared is significant again because it tells us that mPing prefers not to insert into exon sequences. How it does this, how it knows exon from intron we can only speculate at this point, and I'll speculate a little later. So mPing has another insertion preference. And so here is genic regions, so what you are seeing at the top is around the gene. And what we are seeing on the X axis is the percentage of insertions. And the blue or purple bar is the actual mPing insertions and the grey dotted line is our control. So if we just sampled genome sequences what we would expect to see in those regions. And what we are seeing where the dotted line is is the transcriptional start site, and we go upstream from there to -1, -3, and that's in kb. And what you see is that there is a spike. There is a preference for mPing inserting within 1 kb of the transcription start site. And when we take this together, and you say how could this possibly happen? We don't know. And that is obviously an area of intensive interest. One of the ideas is that in plant genomes, and in other genomes, it is known that the region just upstream of the transcription start site has fewer nucleosomes, as do exons. Exons have fewer nucleosomes, I'm sorry, exons have more nucleosomes. And so, introns have fewer nucleosomes compared to exons, so it seems that the mPing is avoiding insertion into dense chromatin regions, or relatively dense chromatin regions. But this is pure speculation at this point. It is the only thing that is consistent with this pattern of insertion, that is avoiding exons and preference for insertion near the transcription start site. So let me summarize at this point, and that is we find that 91% or so of the mPing insertions are in single copy regions near genes. That exons insertions are 10 fold under-represented. So the element is avoiding insertion into exons. And that insertions within 1 kb of the transcription start site are also enriched. Ok, and finally that the distribution of shared and unshared insertions are indistinguishable meaning that this is the insertion preference of the mPing element. I said we don't understand it, but this is the preference. So at the beginning of this section I posed two questions that we want to answer with these experiments. The first one is how do transposable elements amplify without killing their host? And the answer to that question for mPing is the following. And that is the rapid amplification of a successful element, and mPing is a successful element, has really a more modest impact on the host than previously thought. So this is actually, when I first saw this data I was kind of disappointed because I was like, "boy, it is just not doing a whole lot." And then it kind of dawned on me that that is what successful elements have to do. I mean in order to be successful, and success again is defined as being able to attain very, very high copy number, it has to do little harm. So the second question that is more difficult to address is does the amplification actually benefit the host? And we have some data that suggests that it does, and it is experiments that we are pursuing now, and I am going to tell you what those results are. And really what we want to do is we want to look at the impact of mPing insertions on host transcription. So I showed you at the beginning of this talk the situation that we wanted in order to... the experimental material we needed in order to address the question of what is the impact of insertion on diversifying gene expression for example. And what we needed were alleles that differ by the presence or absence of a transposable element. Now we have lots of those examples. So as I said, we wanted alleles that only differ by the presence or absence of the MITE. And this summarizes really what we have now. We have 710 genes, and EG4, I didn't show this before, EG4 is one of the land rices. It is a strain that we determine the mPing insertion sites. So by comparing the mPing insertions in EG4, with the same genes in Nipponbare, we now have essentially 710 genes that have alleles that differ largely by the presence of this mPing element. Of those 710 genes, almost 400 have insertions within the promoter region, the 5' untranslated... I'm sorry, the promoter region. 120 or so have insertions within the gene. And 193 have insertions downstream of the gene. So the question we are asking is what is the impact of insertion on transcription. To do that, Ken Naito when he was a postdoc in the lab compared the transcription of the 710 alleles in EG4 versus Nipponbare initially under normal growth conditions in the greenhouse. So to do this he used a microarray of rice genes. He did microarray analysis of 31,000+ rice genes. He isolated RNA from Nipponbare and EG4 seedlings. And essentially he determined that for a significant percentage of these alleles, there was no difference in gene transcription. So for 78% the transcript levels for these genes were the same for Nipponbare as they were largely the same in EG4. So this is a pretty benign effect on host transcription. So what you see in this slide is a comparison of the expression of the remaining alleles, that is those where we did see a difference between the transcription in EG4 and Nipponbare. And for three quarters, approximately three quarters of those alleles, three quarters of those alleles, we saw upregulation in EG4. That is the presence of the mPing element was correlated with increased transcription of the gene. And most of that difference, or what you see here is, most of that difference were insertions that were in the 5' upstream regions. So this is upstream of the transcription start site. We do see also differences in insertion in introns. That many of the intron insertions, many meaning of the remaining 25% that show an effect, most of those in fact were upregulated. There weren't many that were downregulated except the few that were in exons, and that is understandable. So what I want to do know is to show you how we confirm this microarray analysis. So what you'll see... I'll just take one of these over here. To the left. What you see is a particular allele, and this is OS... some long number. The insertion of mPing is at -2497. So it is 2.5 kb upstream from the transcription start site. And yellow here is transcription in EG4. Gray is transcription in Nipponbare. So what we are seeing... what we are doing here is we are isolating RNA and instead of using the microarrays, we are doing PCR, quantitative PCR. And what we find in every case we check, the EG4 allele for the particular alleles we are looking at, the ones that showed upregulation by microarrays, we're able to confirm that result using quantitative PCR. Yes, indeed there is more transcription from the EG4 alleles. Now we have a problem. And this, I'll try to make this as simple as possible. There's another difference between Nipponbare alleles and EG4 alleles. besides the presence or absence of mPing. And that is that the allele in Nipponbare, which doesn't have the transposon in it is in a genome that only has 50 mPings. Whereas the EG4 alleles, all of them, are in genomes that have, I show here a thousand, but 500-1000 mPing elements. So it is possible that the differences that we are seeing in transcription between EG4 and Nipponbare is due to that load of 1000 elements in the background. What we need is a control. We need a control where we can compare the alleles with and without mPing in the same type of genomic background. And again, I wouldn't be telling you that we need this control if we didn't have one. And so I mentioned at the beginning that EG4 is one of a couple of land rices that have... in which the mPing element has burst, where we have many copies of mPing. And recall I showed this transposon display, and so what I am showing here is EG4 is one of these land rices. Another is A123, and another is A157. So, and the other thing you'll notice is that the mPing insertions in those strains, just by looking at the patterns you can see the patterns on the transposon display differ. This means that the insertions are different. They are in different places in the genome. So this allows us to have valid controls. So here we see alleles that differ between Nipponbare and EG4. The control is we can identify in for example, A123, we can identify A123 that has the Nipponbare allele in it. Okay. So in that way we are able to compare A123 and Nipponbare to EG4 and in that way we sort of eliminate the complication of 1000 mPing insertions in the background. And I'll show you that data now. So here we have, what I am showing is a histogram, this is a particular gene Os... so on... It's a rice gene. An annotated rice gene. The -600 means that it has insertion of mPing 600 basepairs upstream of the transcription start site. And that is shown in this schematic below here. So what you see is that in Nipponbare, which is the gray, we see a level of transcription which is set arbitrarily as 1. In EG4 we see about 5-fold more transcription. Transcripts. Now we also have to compare, we have the blue. The blue is the A123, which is another land rice where there is no mPing in that position. So A123 has the Nipponbare allele and despite having that allele and that background. No, not that ... Despite. Even with the 1000 copies of mPing in the background, we still see reduced expression of the gene. So the alleles, the Nipponbare allele, and A123 allele, we are getting about the same level of transcription. A154 has the same insertion as EG4. And as you see, we get increased transcription. So this clearly tells us that the difference in transcription is due to the mPing, somehow. We don't know how. It is due to the insertion of the mPing at -600 from the transcription start site of this gene. Here's another experiment. I am not going to show you all of them. Don't worry. Here's another gene it is Os01g0.... whatever. This insertion is at -2.5 kb from the transcription start site. Again, so what we have here, Nipponbare, the negative next to Nipponbare means that there is no mPing in that gene. The + means that there is. So here EG4 and A123 have that allele with mPing. A157 doesn't. Again, the expression has to do with the presence of mPing, in this case 2.5 kb upstream from the transcription start site. So, just to summarize this part-the impact of mPing insertion on nearby gene transcription. In the vast majority of alleles we see no impact. No effect. This would be a neutral mutation. Of the 710 alleles we are comparing, 111 we see upregulation of the nearby gene. And for 45 we see down regulation of the nearby gene. Now the question that we are going to ask is does the presence of mPing affect transcription in a different way? Does it in this case confer stress inducibility on nearby genes? Now remember from McClintock's scenario she mentioned the possibility that transposable elements are induced by stress. So here we are going to look at something a little different. We are going to ask does the presence of a transposable element cause the nearby gene to be stress inducible? So this experiment... I'll lead you through this here.... is we are looking at three different stresses: cold, high salt and desiccation, dryness. So this is a gene which has an mPing element at -55. So 55 basepairs from the transcription start of this gene. And this is a gene, you'll see the control under normal conditions, it's one of those vast majority, 78%, that show no effect under normal growth conditions. That is what you see in the control there. However, when we subject these plants to cold, and we meaning Ken Naito again. Cold or salt, we see that the strain EG4, which has mPing at -55, we see increased transcription. Not much, but we see reproducible increased transcription, whereas the other high copy strains that don't have this allele with mPing do not respond. So here's another example. This is an mPing element in gene Os02... it has an insertion at -41, 41 basepairs upstream of the transcription start. What we see is that the alleles... here EG4, in blue, and A123, in yellow, have the mPing containing allele. And you see those are the ones that are induced by cold and salt. What's nice here is we are not seeing any effect of desiccation. We see a consistent effect of cold and salt. We don't know the mechanism for this. It is under investigation. So then the question is, how... so one of the things you might wonder is how is the transposon effecting transcription? Is it acting as an enhancer? Or is it acting as a new promoter? A site of transcription initiation? So we have several intron insertions and we can do the same experiment. Here we have Os... another gene... which has an mPing element only in EG4 in an intron. And when we do the same experiment, under room temperature, RT is room temperature, normal conditions, there is no difference in the transcription of the allele with and without mPing. However when we look at the situation in the cold, we see that it is cold-inducible. So here the transposon in a distant intron is effecting the inducibility of this gene. And we see that in the next slide also. This is another gene, a very large gene, with an mPing element in an intron. And these introns are in the EG4 allele. I am sorry, these mPing insertions are in the EG4 allele and in the A123. And again, we see that those two are inducible, suggesting that mPing sequences are in some way acting to enhance transcription under cold conditions. We didn't do this experiment under... we didn't test dry and salt. Okay, so let me give you conclusions from this part of the talk. The first thing is that we found surprisingly, or maybe we were surprised, but that is why you do experiments, to get surprised, and then you see the results and you say, "Oh that makes sense". That massive amplification is largely benign. And when I say up to a point, we've caught this element in the act of amplifying. Obviously at some point if the number of elements transposing gets too great it is going to start causing some damage, and that is one of the things we are really, really interested in. When does this activation stop? What happens? And we don't know yet. That the amplification has a subtle impact on the expression of many genes. It causes stress induction. It induces the expression of some of genes, but it really is tweaking them. Most of the expression we see is maybe a two-fold, three-fold increase. And again, it produces stress inducible networks. And I say cold and salt. Others, I'll give you a few tastes of where this experiment, where this is going. And the other thing that is significant is that it generates dominant alleles. So if you think about a population. Remember I said that when these elements insert they are heterozygous. That... if it caused a phenotypic change, that overexpression will be a phenotypic change that can be seen possibly in a heterozygous organism. So we don't have to wait for this to become homozygous. So I want to go back to McClintock's scenario. Again, and that transposable elements... her scenario for how transposable elements can function as tools to generate diversity. Transposable elements usually don't move around, and we know that now. We know that the vast majority of transposons in the genome are inactive, even though genomes are 50-80%, 20-50-80% derived from transposable elements, that most are inactive, that they are inactive because they have accumulated mutations. Or the few that are active are being epigenetically restrained by the host. That it is possible that somehow stress conditions may activate transposons. Now I haven't shown you that. We started with the strain mPing, the EG4 strain, where... the system was active already. We don't know how it became active. That obviously is something that we are very interested in. And we think that it is possible because EG4 and mPing are present in most rice strains, that in most rice strains these elements are epigenetically silenced. But that somehow in these few strains, these land rices and EG4, that the element became activated. We do not know how. That obviously is an area of future research, and that is a critical area because that's how we think most genomes are sort of poised. Many genomes. They have the ability for active transposable elements to start amplifying. But how that switches.... what is the switch and how is it thrown is the subject of future research. Again the movement of transposable elements generates genetic diversity increases the mutation frequency, McClintock looked at mutagens. She looked at elements that... geneticists look at mutants. These are, as I said at the beginning, these are not insertions that will benefit the organism. However, we have been able to identify an element where most of the insertions are benign. And, as we said, a rare TE induced mutation may be adaptive. So I want to sort of speculate a little bit. And tell you about how we sort of fit mPing into this model that somehow a stress could have induced Ping. Ping is the autonomous element. Again, this is a black box. We do not know. We weren't there when this happened. We came upon the strain, or our Japanese collaborators came upon the strain when it was already active. This leads to the massive and rapid amplification of mPing that we're seeing. It is still in progress. This generates tens of thousands of new alleles. Now we looked at 24 plants, but imagine a field of a thousand plants. mPing accumulating 25-40 new insertions per plant. What is really interesting, a point I haven't made, is that rice is a selfer. So it selfs. There is no new genetic information coming into populations. The same genetic information is being scrambled up by recombination or whatever. mPing is a way, transposons, are a way to dramatically diversify the genetic material without introducing... without having gene flow into the population. So this, what we are hypothesizing, we see it at the transcription level that this amplification creates transcriptional changes and we are hypothesizing that these changes can lead to quantitative variation. So changes in cold tolerance, changes in drought tolerance. Changes in desiccation. But that is the point we are now testing. And I am going to end by telling you about, very briefly, about the experiments that we are currently doing to address the question. Really the smoking gun question. And that is what is the phenotypic consequences of the mPing burst on EG4? Does this... we've talked about transcriptional changes, but are there phenotypic changes that go along with those? And the way we are doing that is a number of ways. The first thing, and again we are taking advantage of the wonderful new high throughput sequencing technologies. So one of the things that has allowed this progress... this project to move forward, and I think for most of us in molecular biology, is the technology that really drives the questions that we ask. And that we can get deeper and deeper into a particular problem as the technology changes. And the availability of high throughput sequencing has allowed us to address questions that we didn't even dream about, you know as recently as 5 or 6 years ago. In this case what we can do is, as shown here, we can... Well, first of all, we know that Nipponbare and EG4 differ phenotypically in several characteristics. They have different flowering time. They have different average height. They differ in some of the stress responses. We want to know for example, if any of those difference are due to one or more mPing insertion. In order to figure this out the first thing we have to do is to figure out the... we have to know is there more going on in EG4 and the land rices than just mPing amplifying. Because remember when we sequenced the insertion sites we did an approach where we used PCR primers and only amplified the element and flanking sequences. Well again, we were limited before by the technology. Now we can actually sequence the entire genome, and in fact we've done that. So EG4 is currently being re-sequenced using next generation technology so that we can see is the mPing amplification the only thing, the only transposon that is amplifying in the genome. And so far the preliminary answer to that is yes. It appears that mPing is the only transposon that is amplifying at this time. The other thing is we are doing transcriptomics. Rather than looking at particular individual genes, we are looking in a strand specific way at the entire genome of mPing, of, I'm sorry, of EG4 and Nipponbare. And this is done in collaboration with Tom Brutnell's group at Cornell where they have developed a really nice protocol to look at single strand, do single strand RNAseq. So now how do you... the way that is traditionally used to find the regions of the genome that are responsible for quantitative traits is mapping population or a recombinant, inbred population. And our collaborators at Kyoto University in Tanisaka Okumoto's lab have over the last decade been developing this incredibly valuable resource. So what they did over ten years ago was to cross EG4 with Nipponbare. Now what I want to point out is these are inbred lines. So all of their... you know... they have two copies of exactly the same gene at every single locus. So we have EG4 crossed with Nipponbare. We have our F1 progeny. Many, many F1's. Those F1's are then selfed. Selfcrossed for ten generations. So we now have growing in this country 275 recombinant, inbred lines. These lines have mosaic chromosomes that are derived from EG4 or Nipponbare. And they are displaying different traits. So we are phenotyping them now, and I'll talk about that in a second. So the question really and I think on the next slide I go into more here. So we are looking at these recombinant, inbred lines. We are assaying... RILs for recombinant, inbred lines. for morphological traits and stress responses. And we are doing something that again I would never have thought would be possible even in the grant application I wrote a couple of years ago. I didn't even write that we world do this because it wasn't affordable, but now again technology, the costs have come down. We are actually re-sequencing all 275 RILs to find out exactly the mosaic structure of their chromosomes. What part came from Nipponbare, with its mPings? What parts came from EG4 with its mPings? So that we can ultimately correlate the mPing insertions with the various phenotypes. Now again, this is just correlative at this point. We then will have to prove that the candidates that we find, the mPing insertions and alleles are the ones responsible for that phenotypic difference. So many of us and many of us in the field think of Barbara McClintock as the first genomicist. The first person who thought of the genome as an entity not just of single genes. And there is a quote from her Nobel lecture which I want to end with. And it is her thinking about the genome, and it really presents the challenge that I have at least felt and have gone with with my lab. And that is, "In the future, attention undoubtedly will be centered on the genome, with greater appreciation of its significance as a highly sensitive organ of the cell that monitors genome activities and corrects common errors, senses unusual and unexpected events, and responds to them, often by restructuring the genome." She ends by saying, "We know about the components of genomes that could be made available for such restructuring." In part the transposable elements that she discovered. "We know nothing, however, about how the cell senses danger and instigates responses to it that are truly remarkable." I'd like to say that we are beginning to understand that black box of the connection between the outside world and the genomic changes. And I think transposable elements are certainly part of that.
B2 US genome element transcription insertion gene copy Susan Wessler (UC Riverside) Part 2: How transposable elements amplify throughout genomes 105 4 Chang Pei Li posted on 2015/11/28 More Share Save Report Video vocabulary