Field of Science

#MicroTwJC: Dicing up Viral Genomes

In this week's Microbiology Twitter Journal Club, we have been challenged to dissect not one, but two papers.
Fortunately, the  University of California San Francisco have made one of the papers widely available, so you can access it here, and the ever important supplemental material can be found here
The title of the paper we'll be going through is "Antiviral RNA Interference in Mammalian Cells".
So let's quickly refresh ourselves on how viruses cause infections. They attach to the surface to a host cell, inject their genetic material into said cell (which can be in the form of either DNA or RNA). The Viruses genetic material does all of the hard work, finding ways to trick the host into replicating it and create new viruses, and possibly killing the host cell in the process.

There are ways in which we can disrupt the replication cycle of viruses, and I've talked about some of the new drugs being developed in previous blog posts
A key part of any viruses replication cycle is the need for it to use RNA RNA is the tool cells use to deliver the genetic code instructions to Ribosomes, which use that code to make proteins. Viruses take advantage of this system by delivering their own RNA to the Ribosomes and tricking into making viral proteins. This paper focuses on one way that cells could potentially resist infection.

Many organisms produce a protein in their cells known as "DICER". This protein can recognise double stranded RNA, and when it does, it cuts it up into much smaller RNA strands, at around 20-25 bases long.

 But the fun simply does not stop here, because those small RNA fragments created by the DICER will play an important role. They get taken up by the RNA-induced Silencing Complex (RISC), a complex set of molecular machinery which uses these short RNA fragments to recognise complementary sequences in complete RNA strands. Once the RISC recognises these RNA's, it breaks them down.
As I explain in the simplified diagram below, which in no way was affected by the Doctor Who season 6 marathon I just finished.

The Silence, I mean the RNA Induced Silencing Complex (RISC) plays a key role in regulating the cellular RNAs within a Cell. Can this system also be used to fight against viruses ?
It's known that if we artificially introduce short synthetic RNA fragments into mammalian cells that match to viral RNA, the cells primed with this RNA can resist infection. In fact, it is know that some invertebrates and plants can use this system to fight viruses without any outside assistance.
Can mammalian cells do it as well ?
How would we even go about investigating this question ?

The Experimental Set Up

Firstly, we need a experimental set up that will allow us to pull apart all of the individual parts that contribute to the creation of siRNA's, and allow those siRNA's to regulate degradation of mRNAs. We can then knock out or alter some of the genes that regulate key steps in the production of siRNAs and their implementation in the RISC.  The researchers decided to use Mouse Embryonic Stem cells because these cells can survive without a fully functioning siRNA system.
The next thing we need to get straight is what kind of virus we want to use for this investigation. They selected the Encephalomyocarditis virus (ECMV). This virus stores its genome in the form of single stranded RNA, and is known to make its host cells reproduce a lot of double stranded RNA during its infection cycle.
The researchers made a recombinant form of the virus genome using PCR, and then amplified it up in a special mammalian cell type called BHK to create a pure form of the virus, which then transferred off to another culture to allow it to go through a full infection cycle to increase the numbers of virus that the researchers could use for an infection.

There are a number of different lineages of embryonic stem cells that the researcher could potentially use, but they wanted to make sure to select one which would produce loads of virus. The more viral RNA going through a system, the more likely that they will detect even a weak host effect acting against them.
They tested 3 lineages of stem cells, named E14, PGK and HM1. 
After they infected the cells, they waited until either 3 or 6 hours post infection (hpi) to take samples from the cells, and measured the amounts of specific proteins within the cells (which are indicated by the black smudges). 
They measured the dose of virus by adding samples to a standard tissue culture at different dilutions, and then determining the amount of virus needed to kill of 50% of the cells in the tissue culture. The less you need to dilute a sample to get to this point, the more virus there actually is in the original sample, so I'm guessing the lower numbers mean there is more virus. 

In this figure, we want to find the cells that produce the highest levels of the viral protein VP1. The researchers also measured the abundance of Actin, a normal component of the cell which doesn't generally vary that much, as a control, to show that they aren't cheating by including more cells in the prep.  The E14 lineage wins this hands down, so this would be used for further studies.

They next took samples of cells 3 hours and 6 hours after being infected, and extracted all of the RNA from these cells. The Y-axis of the graph indicates "read counts" which essentially describe how much of an RNA transcript there is. Of the total RNA within a cell, around 0.7% belonged to the virus after 6 hours. 
Figure 1B shows how the amounts of virus RNA vary over the course of an infection. The RNA's are classed based on the size of the RNA transcripts. At 3 hours post inoculation, there isn't much viral RNA hanging around the cell, but at 6 hours (the grey bars) we can see a sharp spike in the amount of transcripts.

There is an interesting spike for transcripts side between 21, 22 and 23 nucleotides long, which intriguingly is the sie that DICER tends to chop RNA molecules into.
But at this point in the paper, we don't know whether these are all the natural sizes of the virus transcripts.
So lets drill down deeper. Where are these viral transcripts coming from ?
We know that ECMV creates a double stranded RNA molecule, and these RNA's could come from either strand. So the researchers looked at the code for each strand, and checked which part of the HCMV genome these small RNA strands came from.
These are shown in the graph below, with the top graph indicating where the RNA's come from on the positive strand, and the bottom graph shows the numbers of RNA's coming from the bottom strand.
The blue bars indicate strands in the 21-23 range, i.e. the kinds of RNA we'd expect to see accumulate if DICER is chopping these strands up. The 24-44 nucleotide length RNA's are shown up by the grey bars.

You should be able to see that the top graph has most of the bars, and the higher bars. Most of these short RNAs are coming from the positive strand of the ECMV genome. The Positive strand has pretty much all of the grey bars, and the authors suggest that these grey bars represent natural breakdown produces for the RNA. The Blue bars are found on both strands, and are primarily found within the first 200 nucleotides of the genome, which is known as the 5' untranslated region. There is a much smaller region shared by both strands on the opposite end of the genome in the 3' untranslated region. There are also a ton of different regions on the positive strand. But the problem with interpreting most of the data from the positive strand is that breakdown products will be mixed in with those that have been genuinely broken up by DICER.

These two regions of the Virus genome play an important role in allowing the genome to duplicate during an infection. They can lasso in important transcription enzymes onto itself to start the replication process, or start transcribing its genetic code into viral proteins.

In the next figure, the authors zoom into the first 300 base pairs of the genome from the last figure to show what's happening in more detail.  Each of the bars are now numbered as well, You may notice that the bars 1&2 and 3&4 tend to mirror each-other. The sequences for these bars are shown above, and they are complementary. They form duplexes with slight overhangs, such as might be created if they had been chopped up by DICER.

You may be wondering what those circular graphs mean. They describe the the Phase of the register of an RNA transcript. The "Register" of the an RNA transcript describes where the reading frame the sequence operates in, i.e. where it starts. We are assuming that the actual read length here is 22, and that there are 22 registers. These are represented by the 22 arms of these plots. The actual data is represented by the jagged black line, which represents the percentage of reads within a specific register.
The top radar plot shows the registers for the postive strand, and the bottom one shows the registers for the negative strand (although the labelling is really ambiguous, and could suggest that they are taking the registers of RNA strands that are 21 base pairs (top) or 23 base pairs (bottom))
  The top radar plot shows that most of its transcripts fall within the second register. The 23 base pair log strands tend to start in the 22nd register. So the genome at this end is being parcelled up into registers that are off by two base pairs. Those base pairs likely are left dangling off to the side in different directions, leaving overhangs.
To confirm that most of these transcripts were 22 base pairs long, the extracted all of the RNA from an infected cell and ran it out on a northern blot specifically geared to spot low nucleotide products.
They showed that most of the transcripts are about 22 base pairs long. It's shown alongside some Arabidopsis thaliana RNAs for size comparison, a Not infected (NI) control, and with the ubiquitous U6 RNA transcript to show that the cells in both samples were basically alive.
We have so far learned that the 5' edge of the virus genome appears to be heavily attacked by DICER.
But this is all circumstantial evidence. We don't know that DICER is involved in this at all. The only way to show it would be to remove DICER from the equation completely.
In these next set of experiments, the researchers tried infecting Embryonic stem cells that could not produce DICER, to see whether infections with ECMV would produce the same outcome.
They used Embronic stem cells that had been genetically modified with "flox" tags around the DICER genes. When activated, these tags cause the entire DICER gene to be removed.
They then infected these stem cells with ECMV, and checked to see whether the 22bp RNA's were still detectable when DICER wasn't present.
Essentially this figure is the same deal as the last one, with one little extra. We get to look at miR-16 in the control embryonic stem cells and the DICER knockout cells. Cells with DICER present can make miR-16, and those without cannot.
The control cells have detectable 21bp transcripts, and the DICER knockout cells do not. The transcripts probably were genuine viral siRNAs.
To further highlight this, they looked at whether the siRNAs could be incorporated into the RISC. The RISC consists of multiple proteins, and one of the most important ones is the Argonaute protein. It is this protein which directly binds to the siRNA and brings it into the RISC.
They used embryonic stem cells that expressed a specially tagged version of the Argonaute protein (AGO2). During an infection, the siRNAs would be bound by the Argonaute protein. At the end of the infection, the researchers harvested all of the Argonaute protein, and got it to spit out the siRNA sequence so that they could run it on a northern blot.

The Argonaute specifically works by picking up siRNA's, and the researchers use it here to confirm that the EMCV short RNA's are bound by the Argonaute protein  just like real siRNA's.
So what would happen if we were to take these siRNA's gathered by the argonaute protein and find out where they fit on the ECMV genome ?
We can see that viral siRNA's bound by Argonaute come primarily from the positive strand of the ECMV genome, and from a number of hotspots, some of which may be crucial to the function of the viral genomic RNA.
But it should be noted that the total make-up of the RNA's bound to the argonaute protein fundamentally differ from the small transcripts that were freely extracted from the medium in the previous experiments. This suggests something else may be going on, that some unknown process may be causing this difference.

The authors of this paper then decided that it would be interesting to look at what would happen if they infected the embryonic stem cells at different developmental stages. At day 10, embryonic stem cells stop producing a protein called OCT4 as a sign of their new relative maturity.

The point here is that after a certain stage of maturity, we simply don't see the siRNA's anymore, even though infected cells ar both maturities were still producing viral protein. 
When the researchers looked at the RNA transcripts, they found again that the more mature cells had lower levels of short transcripts, especially those in the 21-23 nucleotide range i.e. the siRNAs

If we look at the 5' portion of the genome again and the numbers of transcripts produced from them in the less mature and more mature cells, we get the graph below:

Whilst siRNA transcripts are still detectable at day 10, they are so much lower than the transcripts from the younger cells. Good to know.

But the question remains as to whether this whole siRNA system actually helps protect cells from viruses. The problem here is that many viruses have Viral Suppressors of RNA interference (VSR's) which act to directly confound RNA interference. Whilst these proteins are active, it would be nigh impossible to tell whether siRNA's in mammals were having an effect on them. They have to be removed if we really want to get answers over whether siRNA systems have effects on viral infection.
Unfortunately, whilst ECMV is purported to produce a VSR, it is not yet known. So they switched their work to focus on a different positive strand RNA virus with a known VSR called Nodamura virus. It produces a B2 protein which binds to and protects double stranded viral RNA.
The researchers used a mutant of Nodamura virus which didn't have B2, and used that to infect embryonic stem cells along with its wild type counterpart.
The researchers then extracted the RNA from the infected cells, and pulled out the genomic viral RNA  for each of the virus, and ran them out on a gel. 
 As can be seen the genome, consisting of RNA1 and sub-genomic RNA produce much weaker bands when B2 is not present, suggesting that the B2 mutant may not be present in high numbers within an infected cell. 
The researchers then looked even more directly at the sizes of the RNA transcripts. The Nodamura virus without B2 had a spike in the amount of small RNA transcripts between 21-23 nucleotides long.
This could suggest that the unprotected Nodamura virus's genome is being chopped up into small RNA fragments.

Now let us take a look at where in the Nodamura viruses genome these RNA transcripts come from. We're looking at both the wild type and the mutant versions of the virus.

Again most of the reads come from the positive strand of the viral genome. But the interesting ting here is that without B2, nearly all of the viral short transcripts are now 21-23 base pairs long. Furthermore, we yet again see that they tend to come from the 5' portion of the virus genome. We can zoom in on the 5' untranslated region again, to see how much of a difference the lack of protective B2 has on the amount of short RNAs.

Were you missing the Phase Register radial graphs ? We have more of those for you in the next figure.
These ones are at least more clearly labelled.

We are looking at the wild type Nodamuravirus in the first two plots. There is no distinct phase register for the short RNA's on either of these strands. This makes sense, considering that the virus stops DICER from working. The RNA doesn't get broken down into neat 21-23 base pair chunks, all we are seeing in these graphs are the natural degradation products of the RNA, distributed across random registers.
But when we look at the mutant, which is vulnerable to DICER, we see the that the short RNA's on the positive strand and negative strand all fall into just one of the 22 registers. They are two nucleotides out of phase with eachother, again suggesting that these are siRNA's with nucleotide overhangs.
In the next figure, the researchers show the sequences of the the siRNA's from the first 180 nucleotides of the 5' end of the genome.
The Black Bold numbers indicate the numbers of each transcript, the normal numbers indicate the position of the transcript along the viruses genome. The transcripts marked XXX are ones that were not detected. The blue coloured siRNA sequences are ones that were also shown to come up in a different study which used newborn mice instead of embryonic stem cells, and the overhangs are in red.
This shows that these effects in embryonic stem cells are also mirrored in mouse models, although since I don't have access to the other article, I can't give you the full story on this figure. I don't know how the researchers in the other paper came to their data, I don't know how the transcript numbers could compare. I have to back away from this one.
So for the next part of their work, they decided to knock out the Argonaute protein. They used the Flox system previously used to knockout DICER to knockout the Argonaute gene. This is triggered through the addition of the hormone Tamoxifen. So the next graph shows that the Argonaute protein is knocked out when Tamoxifen is added, even when the cells are infected with Nodamura virus with or without active B2.

In the next figure they try to look at the survival of the embryonic stem cells after they've been infected with Nodamura virus, but they use a really odd method of measuring the effects of removing the effect of Argonaute.
They infected embryonic stem cells that have active Argonaute protein and inactive Argonaute protein with both mutants of the Nodamura virus. What is odd is how they represent the data.
They take the ratio the Nodamura virus genome abundance in cells without Argonaute protein (E7 cells with tamoxifen added)  compared to cells with active Argonaute protein.
So in both cases there appears to be more Nodavirus RNA present when the Argonaute protein is knocked out, which makes sense because the whole RNA induced Silencing Complex pretty much falls apart without it. In the virus that doesn't produce B2, the ratio is a whole lot higher, because in the wild-type E7 the genome is more vulnerable to being chewed up by the RNA Silencing Complex.
But they show the actual data in the Northern blots shown below. The NI and NoV columns are pretty much controls. The important one to pay attention to is the NoV^B2 , which shows what happens when you add the mutant Nodamuravirus to cells with and without active Argonaute protein. When it's there, and the RNA Silencing Complex is active, we can see that there is less of the Nodamuravirus RNA hanging around, and when it isn't there there is more of it.

So this is the evidence they present to show that mammalian cells can use siRNAs to break down viral dsRNA.


The first set of figures suggest that ECMV is broken down into short interfering RNAs. In the next set of Figures, they looked at what happened when they knocked out DICER, although apparently missed the opportunity to catalogue the siRNA present during infection in as much detail as they had in the previous figure. A missed opportunity ?
They then focussed on the argonaute protein, the protein which draws the siRNA's into the RNA silencing complex. When they extracted the Argonaute proteins from their stem cells and got them to spit out whatver siRNA they had bound, they found somehting interesting. They found that the content of these siRNA's differed from the total siRNA content of the cells. They suggest that some other process may be delivering siRNA's to these argonaute proteins, processes that don't require DICER.
Hey, wouldn't it be cool if they had embryonic stem cells without DICER which they could do deep sequencing on and get an idea about the viral siRNA generated by other processes within mammalian cells ?
The researchers then showed that more mature embryonic stem cells don't produce siRNA's even though the virus is still there.
We then go through this whole dance again with Nodamura virus, so that we can see what happens when we remove a viruses defences against siRNA silencing. It looks like the B2 virus suppressor of siRNA works very well, and when it's removed there is less intact viral RNA and more viral derived siRNA.

It's hard to properly critique a paper when you have very little idea about how the experiments were designed. I have no idea about the sizes of the embryonic stem cell populations we are dealing with in each figure. I don't have any idea of how variable the viral siRNA content is from infected cell to infected cell. Most of the results are presented in transcript numbers ( Except for one of the figures where they inexplicably switch to reads per million) , but I don't know what that relates to. Is it transcripts per cell, or for each millilitre of culture ?
They don't use any statistics that I'm really familiar with, and whilst I'm fascinated by the stuff they did do, I don't have enough time to learn it all.
If there is anyone who can give me an explanation of how single spectrum analysis works, and what the researchers use it for in this paper's supplementary section, I would be most indebted.

The problem with this paper is that it makes me paranoid.  I've got a nasty feeling that if I could better understand the data, and was more fully up to date with Virology and deep sequencing techniques that my opinion may be different.
The researchers do show an effect, but they are also quite (rightly) modest about it, suggesting it is only really relevant in the embryonic stages of a mammalian cell.
Unfortunately, I don't have access to the second paper for this journal club, meaning that I'll have to hitch a ride to the British Library over the week-end to get it. I'll have to save my thoughts on the wider applicability of this research for after I've read it, and the associated commentary pieces.

Join in the Microbiology Twitter Journal Club discussion, 8pm GMT next Tuesday, follow the #microtwjc hashtag.
I apologise for yet again producing a review that is almost as long as the paper.

Maillard P.V., Ciaudo C., Marchais A., Li Y., Jay F., Ding S.W. & Voinnet O. (2013). Antiviral RNA Interference in Mammalian Cells, Science, 342 (6155) 235-238. DOI:

No comments:

Post a Comment

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS