Information

Does the mRNA of the covid19 spike protein contain any nuclear localization signals


Does the covid19 spike protein amino acid sequence, as used in the covid19 vaccines, contain a nuclear localization signal. Because if they do, isn't there a chance that the RNA can find its way to the cell's nucleus?

In the case of the AZ and J&J vaccines, which use adenovirus vector DNA to encode for the spike protein. Is there a chance that the DNA of the vaccine could get incorporated into the genome, as described here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2570152/ ?

There is a further discussion of this potential problem here https://cassandravoices.com/science-environment/science/healthy-people-do-not-require-genetic-vaccination/


Does the mRNA of the covid19 spike protein contain any nuclear localization signals?

I take it you are talking about the coronavirus SARS-CoV-2, and asking whether its RNA genome contains any nuclear localization signals that act post-translationally as signals to specifically import the spike proteins into the nucleus. I cannot find any reports where any of the spike proteins are nucleus-imported, i.e. the spike proteins do not have any (working) nuclear localization signal as far as we can tell.

Some of the proteins coded within its RNA genome are however imported into the nucleus. These are not the spike proteins though.

If you are interested in where the proteins localize within the infected cell, I suggest the following article as a very excellent starting place: A systemic and molecular study of subcellular localization of SARS-CoV-2 proteins

As for the mRNA, I cannot find any evidence that it localizes to the nucleus. However, there have been reports that other coronavirus' mRNA can localize to the nucleus, that CoV-1 contains a nuclear export motif, and that other coronavirus RNA can localize at the nuclear pore complex to interfere with trafficking between cytoplasm and nucleus space (i.e. at the gateway into the nucleus, but not inside!). This has been shown at least once more. However, these are not the spike proteins, but rather proteins like Nsp1 (non-structural protein 1) that are not categorized as spike proteins.

There has also been an interesting pre-print floating around since 2020 about the possibility of SARS-CoV-2 integrating into the human genome if we were to ectopically add a reverse transcriptase (RNA -> DNA writing enzyme) to the equation. Please read it with a grain of salt, I don't know that the paper has been peer-reviewed and unfortunately I cannot comment right now on the veracity of the research. It is also an in vitro experiment performed under circumstances that do not exist in humans, so the conclusions do not apply to discussions on the COVID-19 pandemic; it is only a proof of concept of integration-ability of the virus, rather that a demonstration that that occurs in vivo.

Here comes the cognate question:

Will we ever find the spike protein-coding sequences integrated into our genome?

A protein being inside the nucleus cannot integrate into the DNA genome though. With RNA, you would also require a reverse transcriptase to integrate it into DNA, which humans do not have. Therefore the odds of finding a spike protein-coding sequence in the genome, especially in the germline, if you are concerned with cross-generational effects, is virtually and practically zero.

EDIT:

You additionally ask in your edit whether DNA in the AZ and J&J vaccines could get incorporated into the genome (integration events). This is a known concern as it leads to carcinogenesis. You specifically ask about adeno-associated virus vectors. Below I quote an excerpt from Ura et al. (2014) Developments in Viral Vector-Based Vaccines from part 3.3. Adeno-Associated Virus Vectors:

Generally, recombinant AAV vectors are generated by deletion of the Rep and Cap coding regions between the ITRs. These regions are used for endogenous transgene expression. Owing to the deletion of these regions, AAV vectors cannot integrate into the host genome, and their DNA also persists in an episomal form. This preferable feature in the AAV vectors boosts their safety profile, by preventing the onset of tumorigenesis.

In other words, the virus is recombined and its ability to perform genome integration is taken away prior to any considerations of use as vector for the deliver of genes or immunogenic particles, such as the ones you would see in CoV-2 vaccines.

The first clinical trial of a therapeutic retroviral vector took place in 1990. Subsequent clinical studies have raised serious concerns regarding genotoxicity, mainly due to possible viral genome integration. The AAV vector has the ability to express episomal genes without integrating itself into the host genome, and has hence been approved by the EMA for clinical use.

As you can see, this has been in top consideration for a couple decades now, it is of course not an issue likely to be overlooked by regulatory bodies nor researchers or vaccine-producing companies.


Intracellular localization of Crimean-Congo Hemorrhagic Fever (CCHF) virus glycoproteins

Crimean-Congo Hemorrhagic Fever virus (CCHFV), a member of the genus Nairovirus, family Bunyaviridae, is a tick-borne pathogen causing severe disease in humans. To better understand the CCHFV life cycle and explore potential intervention strategies, we studied the biosynthesis and intracellular targeting of the glycoproteins, which are encoded by the M genome segment.

Results

Following determination of the complete genome sequence of the CCHFV reference strain IbAr10200, we generated expression plasmids for the individual expression of the glycoproteins GN and GC, using CMV- and chicken β-actin-driven promoters. The cellular localization of recombinantly expressed CCHFV glycoproteins was compared to authentic glycoproteins expressed during virus infection using indirect immunofluorescence assays, subcellular fractionation/western blot assays and confocal microscopy. To further elucidate potential intracellular targeting/retention signals of the two glycoproteins, GFP-fusion proteins containing different parts of the CCHFV glycoprotein were analyzed for their intracellular targeting. The N-terminal glycoprotein GN localized to the Golgi complex, a process mediated by retention/targeting signal(s) in the cytoplasmic domain and ectodomain of this protein. In contrast, the C-terminal glycoprotein GC remained in the endoplasmic reticulum but could be rescued into the Golgi complex by co-expression of GN.

Conclusion

The data are consistent with the intracellular targeting of most bunyavirus glycoproteins and support the general model for assembly and budding of bunyavirus particles in the Golgi compartment.


Reverse Engineering Moderna’s SARS-CoV-2 Vaccine

In this article, we are going to have a closer look at the mRNA technology used for Moderna’s SARS-CoV-2 vaccine. We are going to decode the sequence disclosed by Moderna’s patent and go through it bit by bit. We are going to learn the basics of our genetic code and why an mRNA vaccine is effective and safe for human use.

Wait — I said we are go i ng to decode a sequence — why so? Overall the vaccine is a liquid that gets injected into your arm. Well, that’s a good question to start with. But to answer it we need to get through a little bit of the RNA basics.

DNA and RNA: The Basics

DNA is pretty much like a digital code. In computers, we store information in 0 and 1 (a bit) — or the presence or absence of a charge. That is the basic flow of how information is transferred in digital systems.

Nature stores its basic information in 4 different molecules: A, C, G and U/T — the ‘nucleotides’. These 4 nucleotides make out the chains of our DNA or RNA:

In computers, we group 8 bits (0 or 1) as one byte. The byte is the common unit in which information is processed. Historically, the byte was the number of bits used to encode a single character of text or a symbol in a computer.

Nature groups 3 nucleotides into a ‘codon’, which is the typical processing unit in nature. The codon is the number of nucleotides used to encode into one of 20 different proteinogenic amino acid. These amino acids make up every single one of our proteins and enzymes.

Nature stores these codes in its DNA in the nucleus of every cell. If the information is needed it gets then ‘transcribed’ into a messenger RNA (mRNA). This is somewhat a short-lived version of our DNA, a blueprint our protein machinery can read to synthesize the things that we are made of: proteins.

How Do Vaccines Work?

To understand Moderna’s mRNA code we must first understand how vaccines work: Vaccines teach our immune system what a pathogen looks like so it can develop antibodies against it.

Pathogens are bacteria, virus, fungi or parasites that can cause disease within our body. These pathogens are made of several subparts, called antigens, that are unique to each specific pathogen. Our immune system can learn to recognize these little subparts of the pathogens and produce antibodies in an immune response. These little antibodies will attach to the pathogen and help our immune cells to fight it.

The idea behind vaccines is to teach our immune system what the antigen looks like, without encountering the pathogen or getting ill. Historically, this has been done using inactivated or weakened pathogens, or parts of the pathogen (antigens) that trigger an immune response. Modern vaccines contain the blueprint (e.g. mRNA) to the antigen, rather than the antigen itself. Our cells are then ‘transfected’ with the blueprint of the antigen using a nanoparticle drug delivery (LNP) system. This way our cells can temporarily produce the antigen themselves and prompt our immune system to respond and produce antibodies against it.

BioNTech’s BNT162b2 and Moderna’s mRNA-1273 vaccine are two of these newer ‘blueprint’ vaccines. Both of the vaccines contain a volatile genetic blueprint — an mRNA — to encode the well-known SARS-CoV-2 ‘spike’ protein. And this SARS-CoV-2 spike protein is the antigen to stimulate the immune response.

However, our cells are very unenthusiastically about foreign genetic blueprints thrown at them, so there was a lot of modification and technology needed to bring the blueprint to work. So let’s have a closer look at Moderna’s mRNA patent.

The Code: Moderna’s mRNA-1273 SARV-CoV2 Vaccine

Unlike BioNTech, Moderna has not published the exact code of its mRNA vaccine. But Moderna holds a patent for its vaccine, which is published and contains the most information about the technology. Hence, it is possible to reverse engineer the structure of the vaccine with good accuracy.

A closer look at the patent US 10,702,600 shows us the following:

They use a highly modified mRNA encoding the full-length SARS-CoV-2 spike protein together with some other functional elements, we will have a closer look at later. If we draw a scheme with all the elements covered by the patent, we get this:

This is our code on a high level. But what does it all mean?

The Cap: Marks the RNA as What it is

We’ll start with the cap which is depicted as a little hat. The cap has the following composition:

In some embodiments, a 5’ terminal cap is 7mG(5’)ppp(5’)N1mpNp.

7mG(5′)ppp(5′)N1mpNp translates into the following nucleotide code: GNN, whereas G is altered (7-methylated), N is altered (1-methylated) and N being just any possible nucleotide.

This standard three-nucleotide sequence is virtually found in all eukaryotes (e.g. animals, plants, fungi) and most viruses and is an evolutionarily conserved modification of eukaryotic mRNA. The mRNA cap has multiple functions: It is essential for the initiation of the translation — so the start of the protein synthesis from the blueprint. Further, it makes the mRNA look legit for our cells and prevents it from degradation. It marks the mRNA as coming from the cell’s nucleus, however, that’s not the case for our vaccine.

The cap can be compared to the flag on a ship: identifying where it comes from and what it is. The same as pirates use other flags, viruses use our natural flag to deceive and disguise.

The “five-prime untranslated region” (5’-UTR)

The 5’-UTR region is just another standard sequence found in every mRNA. The 5’ marks the reading direction of the sequence. Just as we read from left to right, mRNA is read from 5’ to 3’.

In some embodiments, an in vitro transcription template encodes a 5’ untranslated (UTR) region, contains an open reading frame, and encodes a 3’ UTR and a polyA tail.

Untranslated region means, that this part of the code does not translate in any part of the protein, but will be left untranslated. However, its main function is the provision of a sequence that our protein factory (the ribosome) can grab onto and hold firm: the (ribosomal) binding site. Further, it contains the start sequence for the initiation of the protein synthesis: the Kozak sequence.

The Signal Peptide

We have to go through one last layer of informational sequence before we continue with the antigen itself.

Once the ribosome has produced the protein, the protein still needs to go somewhere. And this is the function of the signal peptide. The signal peptide is attached to the protein and contains information about further processing (e.g. cutting, folding, cell exiting) in the cells post-production department — the endoplasmatic reticulum.

In the case of the virus, the ‘S glycoprotein signal peptide’ sends the protein immediately in the endoplasmatic reticulum for further processing and afterwards to the cell membrane for the virus assembly.

BioNTech’s vaccine uses the natural virus’ signal peptide for the protein. On the contrary, Moderna’s vaccine uses a different signal peptide but displaying the same functions. It uses one of the following:

HuIgGk IgE signal peptide, heavy chain epsilon-1 signal peptide, Japanese encephalitis PRM signal sequence, VSVg protein signal sequence or Japanese encephalitis JEV signal sequence.

These signal peptides are either derived from our cells (IgE & IgG) or different viruses. The utilized signal peptides are shorter and display a highly efficient assembly and secretion of the produced viral particles. During the process, the signal peptide is cleaved of the protein by enzymes.

The Actual (Modified) Spike Protein

Up to this point we just encountered instructive sequences for the identification, production, processing and secretion of the protein. Now let’s go to the actual antigen: the SARS-CoV-2 Spike protein. The spike protein sits in the lipid membrane (basically a fat droplet) of the virus and binds to the host cell receptor to induce fusion of the membranes.

So here Moderna did publish the exact sequence of the SARS-CoV2 Spike protein. (It can be verified in the UniProt database using the identifier P59594)

But the illustrated sequence you see is not the mRNA Sequence (remember: A, C, G and U), but the resulting sequence of amino acids.

So, if the mRNA is our blueprint, this sequence is the finished product and not the underlying code any more. The reason why Moderna chose to publish the amino acid sequence instead of the mRNA sequence is, that they don’t want to disclose all the details of their code.

But just using a sequence that is already published, wouldn’t make a great innovation. So Moderna made at least three states of the art changes to their sequence that can be found in the patent:

1. Modification: Disguise the mRNA

The human body established a pretty powerful anti-virus system. Our cells are extremely unhappy about foreign RNA and try their best to get rid of it before it does anything.

This is the main problem with using an mRNA as a vaccine — it needs to sneak past our immune system. Overcome this challenge is one of the selling gimmicks of many RNA technology companies.

So how does Moderna sneak its mRNA past the antivirus system? The answer is found in the patent and is an exceptionally clever bit:

In some embodiments, 100% of the uracil in the open reading frame have a chemical modification. In some embodiments, a chemical modification is in the 5-position of the uracil. In some embodiments, a chemical modification is a N1-methyl pseudouridine.

Every Uracil (‘U’) was replaced by a 1-methyl-3’-pseudouridine. It is slightly chemically altered to decrease its susceptibility to degradation by our enzymes. It cannot be attacked by degrading enzymes (nucleases) as the alteration has made it to slippy for them. However, our protein machinery (ribosome) is not that picky, when it comes to protein synthesis and is still able to use it:

The use of 1-methyl-3’-pseudouridine is currently the most promising nucleotide substitution that substantially outperforms all other modified nucleotides studied.

2. Modification: Maintain the Proteins Natural Structure

The next modification is also a particularly smart one. In the patent, we find claims, that the used amino acid sequence can vary as much as 20% to the natural sequence.

In some embodiments, the amino acid sequence of the SARS-CoV antigenic polypeptide is, or is a fragment, or is a homolog or variant having at least 80% (e.g. 85%, 90%, 95%, 98%, 99%) identity to, the amino acid sequence identified by any one of SEQ ID NO 29, 32 or 34.

Unfortunately, how much it does vary is not disclosed by the patent. For one reason, because they do not want to disclose the details of their technology and further for extending the scope of the patent.

But it’s likely they used state of the art alteration in the protein sequence same or similar to the one used by BioNTech. If we have a look at their sequence we find, that two amino acids — a lysine and a valine — have been replaced by two proline.

It turns out that these two changes are inevitable to maintain vaccine efficiency. Why so? If we look at an electron microscope image we see that the Spike protein is usually incorporated in the shell of the virus together with some other proteins.

In the case of the vaccine, all these crucial parts of the natural virus are missing. This leads to a different conformation or shape of the spike protein in our cells. This altered shape leads to a huge loss of vaccine efficiency. Our body would develop an immune response against a wrong shaped protein, hence not being able to identify the viral protein when encountered.

To prevent this, Moderna did alter two amino acids to maintain the natural structure of the protein.

3. Optimizing the Protein Synthesis

The last change from the original nucleotide sequence Moderna made is called ‘Codon Optimization’. They optimized the sequence to achieve a much higher production rate of the spike protein. Remember the goal of the vaccine is to get the cell to produce high amounts of the SARS-CoV-2 spike protein, to trigger a high immune response with minimal mRNA required.

So let’s get quickly back to the basics: nature groups 3 nucleotides into 1 codon to translate into one of 20 different proteinogenic amino acid. But wait, with 4 possible nucleotides (A, C, G and U/T) and a length of 3, we have 4³=64 possible combination, but only 20 amino acids to encode. That means, that multiple codons can encode for the same amino acid. This is often described as the genetic code to be degenerate, or redundant, because a single amino acid may be coded for by more than one codon.

For this reason, it is possible to change the codon (triplet) but still encode for the same amino acid.

But why would someone change the code, if the result, the translated protein, would still be the same?

It turns out that changes in the RNA characters can make our machinery (enzymes) translate the code faster and more accurately. One of the reasons for this is that our machinery uses a pool of “transfer blocks” (tRNA pool), that transfer the right amino acid based on the matching codon. However, some of these transfer blocks are more abundant and more frequently used, thus accelerating the translation.

But there are many other reasons, e.g. RNA with higher amounts of ‘G’ and ‘C’ is converted more efficiently into proteins.

But why uses nature not always the most efficient and accurate code?

There are many reasons why nature uses inefficient sequence pattern. One way our cells regulate how many proteins are synthesized is by simply altering the speed of the production. Another reason is, that protein synthesis fits in a complex network of other processes, some of them being slower or needing more time (e.g. protein folding). As many of the processes are sequential, it can be beneficial to slow things down to not waste resources.

However, this is not the case for the Moderna vaccine. The vaccine needs to display a high efficiency to produce enough antigen to trigger an immune response. Moderna did the codon optimization by using algorithms and services from GeneArt (Life Technologies) and DNA2.0 (Menlo Park Calif).

The “three-prime untranslated region” (3’-UTR)

Much like our ribosome machinery needed a starting point to lead into the sequence (5’-UTR) — we find a similar sequence at the end of the RNA — the 3’-untranslated region. Just like the 5’-UTR this region is not translated into any part of the protein but does contain many elements that play a crucial role in gene expression by “influencing the localization, stability, export, and translation efficiency of an mRNA.” Many words could be said about the variety of mechanisms of how the 3’-UTR works, but probably the most predominant one being a sequence that can be targeted by a small interfering RNA (siRNA). These little siRNA molecules contain a complementary sequence and can thus interfere with the mRNA and block it — we call this ‘silencing’.

However, certain 3’-UTR sequences are very successful at enhancing protein expression and increasing RNA stability. These sequences are a nice add-on to any RNA therapeutic, and as their use is described in the patent, it is likely Moderna chose some of them.

The end of it: AAAAAAAA

The end of every human mRNA is polyadenylated. That means that the sequence ends on many AAAAAAAs or it has a poly(A) tail. The poly(A) tail is important for nuclear export, translation and stability of the mRNA. It is another tag on the mRNA that marks it as what it is and aids it to get out of the nucleus of the cell. But probably its most important function is to give the RNA an expiry date, whereas the shorter the tail the more likely it is to expiry. The poly(A) tail is shortened over time, and, when it is short enough, the mRNA has expired and is enzymatically degraded.

With this, we know how Moderna mRNA vaccine was engineered to display an efficient production of the SARS-CoV-2 spike protein. And for most parts, we understand why they have been used. If we add the function of each element of the mRNA to our first scheme of the mRNA, it looks like this:

The mRNA vaccine displays the incredible knowledge researchers have gathered over the past years and a product that is considered safe and effective for human use.

But wait, one commonly raised concern hasn’t been addressed so far: Can an mRNA vaccine interfere with or alter our genetic information, our DNA? This is a far-fetched argument and there are many reasons why it cannot:

Even if RNA and DNA are chemically similar, RNA is yet too different to be integrated into our genetic pool: single-stranded vs double-stranded, uracil vs thymine to point out just a few differences. In over 60 years of RNA research, it has not been observed, that mRNA could integrate into our DNA.

Further RNA cannot be easily transferred to DNA and there are just a few exceptions to that central dogma of molecular biology. One being the human immunodeficiency virus (HIV) that can reverse encode its RNA to DNA. But it does not do it randomly to every mRNA. Specific elements and aiding molecules (primers) need to be present for this to work.

But even if RNA could do this, it would not be relevant to us at all. Our genetic information sits well safe in our nucleus. Everything that the virus does and everything the vaccine does, happens outside the nucleus. As we remember the mRNA is tagged multiple times for transport outside of the nucleus and does not contain any elements that could provide its entrance in the closely sealed nucleus.

But even if RNA could do all that and also overcome the last barrier of our cells and get into our precious nucleus, it would still be not relevant to us. In the end, it does only contain natural RNA elements with few viral RNA elements — the same our cells encounter with every common cold or contain naturally. If you fear to get RNA into your cells, well its too late for that: in this very moment hundreds of thousands of common viral RNAs enter and exit our cells — not harming our DNA at all. The mRNA vaccine is no exception to that.

Those who claim that vaccination could change our genes must also claim that infection with the SARS-CoV-2 itself or any other virus could change our genes. We are not turning into genetic mutants if we get just a common cold.

If you want to hear more about this amazing technology, just let me know. I am going to write a second part about Moderna’s mRNA vaccine that covers all the additional adjuvants and excipients to get the RNA into our cells.

Lastly, I want to acknowledge the author Bert Hubert, who recently wrote a similar article about the BioNTech mRNA vaccine, which provided me with the basis for this article. I highly recommend reading it.


Results

LINE-1 constitutes a major category of m 6 A-methylated RNAs in human cells

We developed a new method to examine m 6 A landscape on nascent RNAs, which we refer to as m 6 A inscribed Nascent Transcript Sequencing (MINT-Seq) (Supplementary information, Fig. S1a Materials and Methods). This method was based on 4-thiouridine (4SU) metabolic labeling of nascent RNAs, followed by tandem purification with streptavidin beads and an m 6 A antibody, and we added dual spike-in controls to verify the purification efficiency and sensitivity. A fraction of biotin-purified 4SU-marked nascent RNA was used for Transient Transcriptome sequencing (TT-Seq), 50 which served as the input for MINT-Seq (Supplementary information, Fig. S1a–c). Analysis of paired MINT-Seq and TT-Seq in K562 cells uncovered 59,706 m 6 A peaks on nascent RNAs transcribed in less than 5 min, as compared to 19,306 peaks found on steady-state RNAs (i.e., by conventional MeRIP-Seq with total RNA-seq as inputs) (Supplementary information, Fig. S1d and Table S1). A remarkable number of m 6 A peaks were only found in nascent RNAs (> 40k, Supplementary information, Fig. S1d, e), which are largely uncharacterized m 6 A sites that cannot be robustly detected by regular MeRIP studies (Supplementary information, Fig. S1h). A similar pattern was found in HeLa cells (Supplementary information, Fig. S1f). These nascent RNA m 6 A peaks are discovered in part due to our robust enrichment of nascent RNAs, as revealed by an extremely high intron/exon ratio in the TT-Seq (Supplementary information, Fig. S1g). Strikingly, a very high (

30%) percentage of nascent RNA m 6 A peaks overlap with annotated retrotransposons including non-LTR (e.g., LINEs) and LTR retrotransposons (e.g., ERVs), which is significantly higher than expected (Fig. 1a, left vs right). Among these, LINEs showed the highest numbers of MINT-Seq peaks (22.4% of all peaks), representing strong enrichment of m 6 A peaks (

4-fold higher than expected, Supplementary information, Fig. S2a). Consistently, L1 RNAs contain the highest levels of m 6 A among RTEs by calculating the FPKM ratios between MINT-Seq and TT-Seq (Supplementary information, Fig. S2b). LTR retrotransposons such as ERVs showed moderate levels of m 6 A but SINEs showed no m 6 A peak enrichment and overall low methylation level (Supplementary information, Fig. S2a, b), as exemplified by Alu (a major type of primate-specific SINEs), consistent with its overall low A/T constituents. 13 This strong enrichment of m 6 A on L1s (Fig. 1a Supplementary information, Fig. S2a, b) suggests its yet unappreciated role in L1 expression control or mobilization.

a Pie charts showing the genomic distribution of m 6 A peaks on non-LTR (LINE, SINE) and LTR retrotransposon elements based on K562 MINT-Seq. Left, Genomic distribution of MINT-Seq m 6 A peaks. Right, Expected distribution of MINT-Seq m 6 A peaks. These expected percentages were calculated based on a null hypothesis that any transcribed regions in the genome have equal chances to contain m 6 A peaks. Thus, from the TT-Seq reads mapped to LINE, SINE, and LTR elements in the reference genome (hg19), we can deduce the peaks to be expected from these regions. b A snapshot of genome browser tracks of TT-Seq, MINT-Seq, H3K36me3 ChIP-Seq data in K562 cells, together with the LINE and gene annotations in genome hg19 (below the tracks). RefSeq RNA gene LINC00534 is shown that it contains many strong intronic m 6 A peaks perfectly overlapping L1s (arrows). (+) and (−) in the data tracks indicate Watson and Crick strands. c A bar plot showing numbers of intronic L1s that are sense- (blue) or antisense- (green) oriented to the hosting genes. The “Expected” denote numbers calculated using all intronic L1s, while “Observed” using intronic L1s overlapping m 6 A MINT-Seq peaks. P-value was calculated with Fisher’s exact test. d A density plot showing the percentage of L1 distribution based on the length of all hg19 annotated L1s (gray) or of the m 6 A-marked L1s (red). e A plot showing relative m 6 A levels (MINT-Seq/TT-Seq) across all MILs (intronic L1s that overlap MINT-Seq peaks). A subset of MILs harboring exceptionally high levels of m 6 A was identified as Super-MILs (n = 393), achieved by using the slope of the distribution curve (blue line and green point indicate the boundary between Super-MILs and Typical MILs). fh Boxplots showing features of Super-MILs, Typical MILs and the Control L1s (transcribed intronic L1s without m 6 A peaks), in terms of sequence divergence as compared to L1 consensus (f), length (g) and m 6 A motif (RRACH) density (h). P-values were calculated with Mann-Whitney U tests. i, j Boxplots of the same three groups of L1s as in the previous panels, showing their transcript levels (i), and relative RNA stability (calculated by taking the ratio between RNA-Seq and TT-Seq FPKM, panel j). P-values were calculated with Mann–Whitney U tests.

Signals of m 6 A are particularly strong on L1s located in gene introns (Fig. 1b Supplementary information, Fig. S2c). For example, for the LINC00534 RNA gene, while TT-Seq displays a broad and “flat” pattern across the entire transcription unit, MINT-Seq signals enrich to several “islands”, which perfectly overlap annotated L1s (Fig. 1b). Intronic L1 sequences distribute either in the same or reverse direction as the hosting genes (i.e., sense vs antisense), at a ratio of approximately 1:2 (Fig. 1c). Interestingly, m 6 A peak was strongly biased to mark intronic L1s sense-oriented to host genes, suggesting the deposition of m 6 A is likely guided by L1 RNA sequences rather than L1 DNA sequences or associated chromatin status (Fig. 1c, see below and Discussion). As m 6 A is a mark on RNAs, we used m 6 A-marked L1s to denote the RNA transcripts whenever applicable, we used L1 regions to denote the genomic sequences.

We compared the length of m 6 A-marked L1s to all annotated L1s in the human genome and found that m 6 A-methylated L1s are generally longer and enrich full-length L1s (Fig. 1d). Based on m 6 A peaks in MINT-Seq (FDR < 0.01 by MACS2) and signals of transcription (TT-Seq, FPKM > 0.1), we identified 4315 m 6 A-methylated intronic L1s (MILs) in K562 cells (Materials and Methods Supplementary information, Tables S2, S3). Among these, a subset of MILs harbors exceptionally high levels of m 6 A, reminiscent of the exceptionally high level of histone acetylation H3K27ac at specific enhancer regions that coined the concept of super- or stretch-enhancers 51,52 (e.g., arrows in Fig. 1b). We therefore used an analogous computational strategy to rank MILs based on m 6 A levels, which permitted the identification of a subset of MILs with exceptionally high m 6 A levels that we referred to as Super-MILs (Fig. 1e Materials and methods). Compared to other transcribed intronic L1s without m 6 A mark (i.e., Control L1s), Super-MILs and MILs possess lower sequence divergence as compared to L1 consensus sequence, and they also bear longer length, suggesting that they are evolutionarily younger 53 (Fig. 1f, g). Super-MILs are the least divergent (i.e., youngest), while their length is overall similar to that of MILs. We performed de novo RNA motif analyses of m 6 A peaks on MILs, and found that the top motif was “AAAGAC”, resembling the well-known m 6 A motif “RRACH” (where R = A/G, and H = A/C/U) 54,55 (Supplementary information, Fig. S2d). Indeed, the L1 m 6 A level was positively correlated with RRACH motif density, which was particularly high on Super-MILs, moderately high on typical MILs and low on other transcribed L1s without m 6 A (Fig. 1h). In the human cell types we studied, there are often

200–400 Super-MILs (Fig. 1e Supplementary information, Fig. S2f). Interestingly, the landscapes of both MILs and Super-MILs showed quite strong degrees of cell type specificity (Supplementary information, Fig. S2e, f), which is not just the consequence of cell type-specific transcription considering the fact that the majority of these MILs or Super-MILs are transcribed in other cell types (Supplementary information, Fig. S2g), suggesting that levels of m 6 A deposition are not solely dependent on L1 RNA sequences. Overall, these results suggest that a group of evolutionarily young L1s are deposited with a high level of m 6 A on their transcripts in a very early stage of nascent RNA production.

While many MILs can only be detected at the nascent RNA stage, i.e., solely by MINT-Seq (arrows in Supplementary information, Fig. S2c), Super-MILs are often readily detectable in steady-state RNA methylome by MeRIP-Seq (yellow highlights in Supplementary information, Fig. S2c). By analyzing total RNA-Seq, we found that the RNA abundance of Super-MILs was much higher than that of MILs and other intronic L1s (Fig. 1i). These results suggest that m 6 A levels positively correlate with L1 RNA stability. Indeed, by inferring RNA stability via calculating the signal ratio between steady-stage transcripts (RNA-Seq) and nascent transcripts (TT-Seq), Super-MILs were found to be more stable than typical MILs or other L1s (Fig. 1j). The high detectability of most Super-MILs and some MILs also allowed us to use published MeRIP-Seq data to analyze L1 RNA m 6 A methylome (see below).

RNA m 6 A modification is an evolutionary feature of young L1 transcripts

We examined the evolutionary trajectory of different L1 sub-families in humans and observed a strong correlation between m 6 A levels and L1 evolutionary ages (r = −0.958, P < 1.45e−09, Fig. 2a), with the youngest L1 sub-families 56 such as L1HS (a.k.a., L1PA1), L1PA2 and L1PA3 being the most methylated (Fig. 2a). We reached this conclusion by either using uni-mapped reads for analyses, or by an expectation-maximization (EM) algorithm of the TEtranscript pipeline to include non-uniquely mapped reads 2,57 (Supplementary information, Fig. S3a Materials and Methods). Interestingly, the densities of the “RRACH” motif in the consensus sequences of different L1 sub-families are also correlated with the evolutionary ages of L1s: the younger families have higher densities (Fig. 2a, right side heatmap). Looking into MILs in other species, we analyzed MeRIP-Seq of nuclear RNAs from mouse embryonic stem cells (mESCs) 58 and identified 2033 mouse MILs (Supplementary information, Fig. S3b). Consistent with human MILs, mouse MILs are also longer than average and carry less divergent sequences from consensus (Supplementary information, Fig. S3c, d). The correlation between m 6 A levels and L1 evolutionary ages is overall conserved in mice 59 (Fig. 2b). The youngest and retrotranspositionally active sub-families, L1Md_T, L1Md_A and L1Md_Gf, are highly methylated, and are also of higher RNA abundance (Fig. 2b). Together, these results supported that high RNA m 6 A is a conserved feature of evolutionarily younger L1s observed across species.

a A ranked bar plot shows relative levels of m 6 A (ratios between MINT-Seq FPKM and TT-Seq FPKM) on different human L1 sub-families, and their estimated evolutionary ages (dots connected by the yellow line). The heatmap on the right shows RRACH motif densities on L1 consensus sequences (from Dfam, https://dfam.org/) of each sub-families (numbers indicate motif counts per 100 nucleotides). Myr, millions of years. The r value (correlation coefficient) and P-value indicate the Spearman’s rank correlation between the m 6 A methylation levels and the estimated ages of L1 sub-families. b A ranked bar plot generated in the same way as in panel a but for mouse L1s, using relative m 6 A levels calculated from published data (nuclear RNA MeRIP-Seq FPKM/RNA-Seq FPKM, Wen et al. 58 ). The r value and P-value indicate the Spearman’s rank correlation between the m 6 A methylation levels and the estimated ages of L1 sub-families. c A diagram showing the features of L1s, including retrotransposition-competent L1 (RC-L1, green) and other L1s that are no longer capable to transpose (dead L1s, yellow) characterized by truncations (wedged edges) and mutations (red stars). In the lower part, a diagram of

6 kb full-length L1 sequence with known features is shown. d A density plot showing the “relative” first nucleotide position of the MILs’ 5’ ends aligning them to the consensus sequences of L1s. e A snapshot of genome browser tracks of MCF7 H3K4me3 ChIP-Seq, MINT-Seq (±), MeRIP-Seq (±), and mappability score (from ENCODE) for the Chr22-q12.1 L1HS-Ta (in the intron of TTC28 gene in an antisense direction). Blue highlight indicates the L1HS-Ta region and yellow the TTC28 gene TSS. f A diagram showing the questions raised by our findings, with some of them pursued in the following part of this paper. Red text indicates some important unknowns.

We queried features of the hosting genes of these evolutionarily young L1s. MILs show no obvious preference in terms of locations in the host genes (e.g., towards 5′ or 3′ ends Supplementary information, Fig. S3e). Functional enrichment analysis of the hosting genes in K562 cells identified “regulation of double-strand break repair” (Supplementary information, Fig. S3f) as the most enriched term. Similar functional terms were also identified for MIL-hosting genes in HeLa, MCF7, and mouse ESCs (Supplementary information, Fig. S3f).

M 6 A deposits to both autonomous RC-L1s and co-transcribed dead L1s

In most somatic tissues, live L1s are epigenetically silenced via DNA 5-cytosine methylation (5mC) and/or histone H3 methylation (H3K9me3). 14,15,16,60,61 We asked what are the epigenetic features on the genomic regions of MILs. Using a published whole genome bisulfite sequencing data (WGBS) in K562, 62 we found little enrichment of DNA 5mC on the genomic regions coding for MILs (Supplementary information, Fig. S4a). It was reported that some intronic L1s are repressed by the HUSH complex, which facilitates H3K9me3 deposition and transcriptional suppression. 63,64 Analysis of ChIP-Seq data found mild enrichments of H3K9me3 or HUSH components (i.e., MORC2, MPP8, and TASOR) on the genomic regions of MILs (Supplementary information, Fig. S4a). Only a small fraction of MILs overlapped with H3K9me3 peaks (Supplementary information, Fig. S4b), and an even smaller number overlapped with HUSH complex binding (Supplementary information, Fig. S4c). Furthermore, H3K9me3 was deposited more often to intronic L1s that are antisense to host genes (Supplementary information, Fig. S4d), while MORC2 marked both sense and antisense L1s similarly (Supplementary information, Fig. S4e), indicating a lack of directionality preference. By contrast, m 6 A strongly prefers to be deposited to sense L1s (Fig. 1c). The deposition of m 6 A in gene regions was reported to be mediated by elongation-associated histone modification H3K36me3. 65 By inspecting the browser tracks, we did not find strong overlaps between the m 6 A signals on L1s and the H3K36me3 peaks (Fig. 1b), which is generally applicable to all MILs (Supplementary information, Fig. S4a).

Retrotranspositionally competent L1s (RC-L1s) use their autonomous promoters near 5’end to drive transcription of a

6kb-long intronless RNA. 66,67 Characteristic promoter-associated histone marks H3K4me3 and H3K27ac were deposited to RC-L1 promoters. 67,68 Analysis of published ChIP-Seq identified no enrichment of these marks at the 5’ ends of genomic regions coding for MILs, indicating that most MIL RNAs are not independently transcribed (Supplementary information, Fig. S4a). Indeed, most MILs were truncated and mutated as compared to consensus sequences (Fig. 2c, d), and have lost their promoters or 5’UTR (3390 out of 4315 have lost their 0–1 kb regions, Fig. 2d). Analyses of ATAC-Seq, GRO-CAP, other ChIP-Seq of RNA polymerase II (RNAPII) or transcription factors/coactivators reported to bind L1 promoters (e.g., YY1, MYC and EP300) 67,69,70 showed no enrichment on the 5’ ends of genomic regions coding for MILs, which can be exemplified by two prominent Super-MILs (both are > 6 kb and are located in the introns of PSMA1 and ZRANB3 genes, respectively) whereas the signals are high on annotated human gene promoters/TSSs (Supplementary information, Fig. S5a, b). We experimentally used the CRISPR interference (CRISPRi) system (dCas9-KRAB together with negative control or specific gRNAs) to suppress the transcription of the PSMA1 promoter 71 and found concomitant reduction of both the PSMA1 mRNA and the Super-MIL residing in its intron (Supplementary information, Fig. S5c, d). These results together indicate that the majority of MILs are not transcribed via autonomous promoters, instead they are co-transcribed with hosting genes.

Some intronic L1s were reported to be mis-spliced into hosting mRNAs. 29 To test the commonality of this behavior for MILs, we used a de novo transcript assembly method, Stringtie, to identify transcripts from RNA-Seq data, 72 and examined the frequency of MILs being spliced into mRNA transcripts (Supplementary information, Fig. S5e). As expected, most de novo transcripts overlapping annotated GENCODE genes showed multiple exons (Supplementary information, Fig. S5e, f). By contrast, when de novo called RNA transcripts overlap MILs, they are primarily single exonic, and the majority of MIL-containing de novo transcripts (346 out of 400) do not contain any GENECODE protein-coding exons, indicating that MILs are rarely spliced into host gene mRNAs (Supplementary information, Fig. S5e, g). This result can be exemplified by the raw RNA-Seq data aligned to the ZRANB3 Super-MIL region: while exons flanking the Super-MIL are generally spliced together, Super-MIL reads are not spliced to exons (Supplementary information, Fig. S5h).

We also examined whether high m 6 A methylation applies to RC-L1s, which in humans belong to L1HS, mostly the L1HS-Ta subset (Ta: transcribed subset a). 21,73 While the extremely repetitive nature of L1HS precludes their full alignment by short reads sequencing, there are a few that can be detected based on unique-mappable regions in the L1 body and immediate downstream sequences. 68 Breast cancer cell line MCF7 harbors one such RC-L1 in the first intron of TTC28 gene in the antisense direction (a.k.a., Chr22-q12.1 L1HS-Ta) 68 (Fig. 2e). This is the most active L1 in human cancers responsible for nearly a quarter of all cancer-associated L1 retrotransposition (particularly in breast cancers it drives

70% of retrotransposition events). 22 MINT-Seq in MCF7 cells revealed that this L1HS-Ta RNA is highly m 6 A-methylated (Fig. 2e). In this case, contrasting most other MILs, a strong H3K4me3 peak can be seen on its promoter because RC-L1s are autonomous transcription units (Fig. 2e).

Taken together, these data demonstrated that: 1), the category of MILs is predominantly composed of retrotranspositionally dead L1s, which are not, or are weakly, associated with conventional epigenetic/chromatin states (5mC DNA methylation, histone H3K9me3, H3K36me3, or H3K4me3) 2), Super-MILs and MILs are rarely spliced to adjacent gene mRNAs, which together with the fact that their RNAs are more stable than flanking introns (Fig. 1i Supplementary information, Fig. S5b) suggest that they are processed post-transcriptionally from introns (see discussion) 3) there is high m 6 A methylation of a single active L1HS-Ta in Chr22q12.1 (Fig. 2e), suggesting that RC-L1s share similar RNA m 6 A features as other MILs/Super-MILs. Several important questions are raised by these data (Fig. 2f): what are the potential m 6 A readers of the methylated L1 RNAs? How would m 6 A mark and its readers impact L1s, i.e., for the expression or retrotransposition of RC-L1s, or for the dead MILs to potentially impact hosting genes? What is the implication of these processes to human development or diseases?

MILs are bound by heteromeric RBPs

To identify potential regulatory proteins of MIL RNAs, we analyzed a large collection of enhanced Cross-Linking and Immunoprecipitation (eCLIP-Seq) data generated by the ENCODE consortium in K562 cells 74 (Supplementary information, Table S5). By unbiasedly comparing the eCLIP binding sites of

150 RBPs with the m 6 A MINT-Seq peaks on MILs, we identified more than a dozen RBPs that displayed particularly strong binding with MILs, including scaffold attachment factor B2 (SAFB2), its

70% homologous paralogue SAFB, and RBM15, a nuclear adapter protein that recruits m 6 A methyltransferase to Xist lncRNA 75 (Fig. 3a). Except for RBM15, none of the other MIL-RBPs has been suggested to be m 6 A regulators/readers, and their roles as RBPs are poorly characterized, particularly for SAFB2, SAFB, HLTF, UCHL5, PPIL4, LARP4, BUD13 and ZNF622 (Fig. 3a). The strong enrichment of SAFB2, RBM15 and SAFB on MILs is shown by metagene analyses of eCLIP-Seq signals (Fig. 3b), and is exemplified by the DNAH14 locus (Fig. 3c). As a control, another abundant RBP in the nucleus, hnRNPU (a.k.a., SAF-A), displays no binding with MILs (Fig. 3b). UV cross-linking used by eCLIP-Seq predominantly reveals direct protein–RNA interactions, 74,76 therefore, these results suggested that MILs are bound by a large collection of RBPs (which we will refer to as MIL-RBPs). The strong binding of m 6 A-MILs by these RBPs, but not by any other abundant nuclear RBPs such as hnRNPU, suggest that they may be putative novel m 6 A-RNA binding proteins (i.e., readers). One known nuclear m 6 A reader protein, YTHDC1, 44 was not included in ENCODE eCLIP-Seq datasets. Re-analysis of published iCLIP-Seq data 75 showed that YTHDC1 also binds MILs (Supplementary information, Fig. S6a). Our RIP-qPCR using a native antibody against YTHDC1 confirmed that it bound m 6 A-marked L1s, including L1HS and some Super-MILs (Supplementary information, Fig. S6b). We applied similar eCLIP-Seq analysis to gene 3’ UTRs, the canonical m 6 A sites on mRNAs. 55 This analysis revealed another list of RBPs, including the recently reported m 6 A readers IGF2BP1 and IGF2BP2 (Supplementary information, Fig. S6c). 77 We refer to this group of RBPs as 3UTR-m 6 A-RBPs, which showed a limited overlap with MIL-RBPs. RBM15 is one of the RBPs that exist in both lists (Fig. 3a Supplementary information, Fig. S6c), suggesting that it is a shared adapter for m 6 A deposition at both locations. 75 Some top MIL-RBPs were not found in 3UTR-m 6 A-RBPs, such as SAFB, HLTF, UCHL5, LARP4, and RBFOX2. The lack of binding of MIL-RBPs to 3’UTRs suggests that their interactions with MILs are not solely dependent on m 6 A signals.

a A ranked bar plot showing the numbers of m 6 A peaks on L1s that overlap with RBP eCLIP peaks (ENCODE K562 datasets). The blue bars indicate observed numbers, and the green bars indicate expected numbers calculated using randomly shuffled regions. The statistical significance for each RBP enriching on MILs was calculated by comparing the observed to the expected numbers the P-values (the red dot) are labeled based on the scale shown on top of the panel (Fisher’s exact tests). The first two RBPs (SAFB2 and HLTF) had too significant P-values to be included in the scale (i.e., -Log10 of P-values > 600), therefore, no red dot is shown. b Metagene profile plots of eCLIP-Seq signals showing the binding of SAFB, RBM15, SAFB2 and hnRNPU on MILs, with signals from the same-molecule-weight input controls (gray) plotted as background. Read densities were centered on intronic L1 m 6 A peaks (±3 kb). c A genome browser snapshot of TT-Seq, MINT-Seq, and multiple eCLIP-Seq data at the DNAH14 locus. Yellow highlight indicates a Super-MIL region. d A box plot showing the SAFB binding intensities (eCLIP reads normalized to input) on Super-MILs, typical MILs and Control L1s (the same groups in Fig. 1). P-values: Mann–Whitney U test. e Western blots of SAFB, SAFB2 and YTHDC1 following biotinylated RNA pull-down using in vitro synthesized RNAs (with or without m 6 A) against K562 whole cell lysate. The RNAs were either m 6 A-marked (+) or non-methylated (−). f Distribution of SAFB eCLIP-Seq binding sites on the L1HS consensus sequence. Length and position of six L1HS fragments (F1 to F6) are shown and are used for RNA pull-down in panel g. g Western blots of SAFB following RNA pull-down using in vitro synthesized biotinylated L1 RNA fragments. Lower panel: quantitation of western blots showing the binding affinity between full length L1HS and its fragments with SAFB. h Coomassie blue staining of proteins in the biotinylated RNA pull-down using in vitro synthesized L1HS RNA (with or without m 6 A) against the recombinant full-length SAFB protein expressed in insects. The lower gel picture shows equal amount of L1HS RNAs used for pull down.

Among the MIL-RBPs, we choose to focus on SAFB, which was recently reported to regulate L1 retrotransposition by a CRISPR screening. 63 This is also due to its uniquely strong roles in affecting L1 RNA expression and retrotransposition (see below, Fig.4). SAFB is a protein associated with the nuclear matrix, 78,79 a structure considered important for maintaining high-order chromatin architecture and gene regulation, despite controversy may exist. 80,81 Analysis of eCLIP data showed that SAFB displayed a significantly higher affinity for Super-MILs than typical MILs or non-m 6 A marked L1s (Fig. 3d), suggesting that SAFB is potentially an m 6 A-L1-RNA reader. We used in vitro biotinylated RNA pulldown experiments to study their binding. RNAs labeled with or without m 6 A were in vitro transcribed for pull-down against K562 cell lysates. Western blots following this experiment showed that SAFB specifically binds to L1HS RNA but not to a length-matched control RNA (Fig. 3e). Importantly, the affinity of SAFB to L1HS was significantly enhanced by the presence of m 6 A, while the control RNA showed negligible SAFB binding regardless of the m 6 A status (Fig. 3e). SAFB2, similar to SAFB, exhibited stronger affinity to m 6 A marked L1HS (Fig. 3e). By contrast, a canonical m 6 A reader, YTHDC1, bound both RNAs in their m 6 A-marked forms but showed negligible affinity to non-methylated RNAs (Fig. 3e). We also tested the binding between SAFB and a Super-MIL in the PSMA1 gene, using L1HS and a non-L1 intronic region as controls (Supplementary information, Fig. S6d and Table S4). The results showed that SAFB binds the Super-MIL and L1HS with similar affinity, displaying a stronger binding to the m 6 A-labeled L1s, but it does not bind non-L1 RNA regardless of the m 6 A presence (Supplementary information, Fig. S6d).

a A heatmap showing the RNA expression changes of L1 sub-families after knocking down target proteins (based on re-analysis of ENCODE K562 RNA-Seq data). Fold changes were based on comparison to respective sgRNA or shRNA controls. b, c Heatmaps showing the Log2 fold change of L1 RNA abundances (b, measured by ribo-depleted total RNA-Seq) and m 6 A ratio (c, measured by FPKM of MeRIP-Seq / FPKM of RNA-Seq) of different L1 sub-families after co-depletion of METTL3 and METTL14 by siRNAs (siMETTL3/14). RNA-Seq after transfection by a scramble control siRNA (siCTL) was used as control. d A line plot showing RNA stability of L1HS after flavopiridol treatment for the indicated time. RNA abundance was calculated by RT-qPCR and normalized to 0 h time point in each group. e Normalized RNA expression levels of an active L1HS-Ta (Chr22-q12.1) after depletion of METTL3, YTHDC1 and SAFB by siRNA in MCF7 cells. f Bar plot showing the normalized L1-Neo retrotransposition activity in cells with specific target proteins depleted by siRNAs. Cell colony numbers were counted and compared to the control group. Representative pictures of cell colonies growing in a culture dish are shown on the right. g Sequence comparison between the L1 consensus reporter construct (Con, gray bars) and an L1 RRACH mutant construct (Mut, blue bars). The numbers of RRACH motif in each 500 bp bin throughout the L1HS are shown. h RT-qPCR following m 6 A RIP (meRIP) showing the relative m 6 A levels of reporter L1HS RNAs (the Con was set as 1). m 6 A levels were normalized with synthetic m 6 A-RNA spike-in. i Similar to panel d, RNA stability of Con L1HS and Mut L1HS RNAs was measured. j Similar to panel f, L1-Neo reporter assay showing the retrotransposition activity of Con L1HS and Mut L1HS. k UV-RIP assay using a native antibody against SAFB showing the normalized binding between SAFB and Con / Mut L1HS RNAs. l Expression changes of Con L1HS or Mut L1HS RNAs after knocking down SAFB. All qPCR data show means ± SD. *P < 0.05 **P < 0.01 ***P < 0.001, Student’s t-test.

The binding of SAFB to m 6 A-L1 transcripts may be based on specific RNA regions or motifs. To identify such regions/motifs that may explain the high affinity of SAFB binding, we re-aligned SAFB eCLIP-Seq binding sites to a pseudo genome of L1HS consensus sequence (Fig. 3f Materials and Methods). We found a few regions of L1HS that appear to be the “high affinity” sequences (e.g., the 5’UTR, the end of ORF1 and the middle part of ORF2, Fig. 3f). Based on these, we divided L1HS into 6 fragments to perform biotinylated RNA pulldown assay (fragments 1 to 6, or F1 to F6, Fig. 3f). This experiment revealed that none of the fragments showed strong binding with SAFB (Fig. 3g), despite that the F4 (

3000–4300 nt of the L1HS consensus) possessed a detectable level of binding (Fig. 3g). We sought to identify putative SAFB binding motifs from its eCLIP binding sites on L1s, using a dedicated CLIP-Seq motif discovery tool (i.e., GraphProt). 82,83 This identified a 5-mer putative SAFB motif consisting of A/G enriched pentamers (e.g., GAAAA, Supplementary information, Fig. S6e), consistent with a previous report by iCLIP. 84 However, this putative motif showed a broad occurrence on the L1 sequence (Supplementary information, Fig. S6f). The density of these pentamic motifs appears similar on the full-length L1HS as compared to the L1 fragments (Supplementary information, Fig. S6f). This result suggests that the density of pentamic motifs cannot explain the selective binding of SAFB to full length L1 RNA rather than L1 fragments (Fig. 3g). In addition, we found no correlation between the density of these pentamers on each MIL and the respective SAFB eCLIP-Seq binding affinity (Supplementary information, Fig. S6g). These data indicate that the binding between L1 and SAFB is unlikely mediated by a short RNA motif, but more likely by RNA structures depending on long sequences.

To study the direct binding between SAFB/L1-RNA, we generated recombinant full-length SAFB protein and mixed it with the L1HS RNAs in vitro. This confirmed that they directly bind each other and m 6 A enhances their interaction (Fig. 3h). We further conducted a RNA competition assay to characterize the binding affinity between SAFB and L1 RNAs. In this assay, immobilized SAFB/L1–RNA complex was subjected to competition by various forms of L1HS RNA, antisense L1HS RNA or a control length-matched RNA (Supplementary information, Fig. S6h). Our results verified that SAFB binds L1 RNA in both non-m 6 A- or m 6 A-modified forms, but displaying higher affinity with its m 6 A form (Supplementary information, Fig. S6i, j). SAFB showed no detectable binding with L1 RNA in antisense direction or with a control RNA, no matter if they are m 6 A-marked or not (Supplementary information, Fig. S6i). Collectively, our results indicate that SAFB is a reader of m 6 A-L1 RNAs, but not a reader of the m 6 A mark such “reader” behavior depends on the presence of long L1 RNA sequences and was enhanced by m 6 A. This binding is distinct from canonical m 6 A/reader binding such as that of YTHDC1, which recognizes the m 6 A mark 85 (Fig. 3f Supplementary information, Fig. S6d).

M 6 A modification versus SAFB: opposite roles in controlling L1 expression and retrotransposition

We next examined the roles of m 6 A modification and MIL-RBPs. Analysis of ENCODE RNA-Seq generated in K562 cells 74,86 showed that depletion of the top MIL-RBPs elicited variable changes of L1 RNA expression (Fig. 4a). Depletion of RBM15, a nuclear adapter that recruits m 6 A methyltransferase, 75 markedly reduced the RNA levels of many L1 subfamilies (Fig. 4a). Knockdown of the newly identified m 6 A-L1 reader, SAFB, strongly increased L1 RNA expression (Fig. 4a), indicating that it acts as an L1 suppressor 63 whereas knockdown of SAFB2, another top MIL-RBP and a SAFB homolog, caused negligible effects (Fig. 4a). We confirmed these changes on L1HS and two Super-MILs by RT-qPCRs (Supplementary information, Fig. S7a, b). For other newly identified MIL-RBPs, some were also found to regulate L1 RNA expression, i.e., UCHL5, KHDRBS1 and PPIL4, but the extents of L1 increases upon their knockdown were not as strong as those seen after SAFB depletion (Fig. 4a). We noticed that the impact of SAFB and RBM15 depletion on L1s was more prominent for young L1s (i.e., L1PA1-6, Fig. 4a), which correlates well with their higher m 6 A levels (Fig. 2a). Indeed, the knockdown of these two factors most prominently impacted Super-MILs as they carry highest m 6 A levels (Supplementary information, Fig. S7c). Together with the data on positive correlation between m 6 A and L1 stability (Fig. 1j), these results suggested that m 6 A deposition promotes L1 RNA expression or stability, whereas the novel L1 reader SAFB counteracts such roles.

We examined this hypothesis, first, by co-depleting the m 6 A methyltransferase (i.e., writer) complex METTL3 and METTL14 using siRNAs (Supplementary information, Fig. S7d, e). This resulted in a significant reduction of m 6 A level as well as RNA abundance of L1s (Fig. 4b, c Supplementary information, Fig. S7d, f), indicating a positive role of m 6 A on L1 expression. The changes of L1 abundance shown by RNA-Seq were more prominent for m 6 A-marked L1 RNAs than for L1 RNAs without this mark, suggesting m 6 A dependence (Supplementary information, Fig. S7g). It is notable that the m 6 A reduction on L1s after writer knockdown was significant but incomplete (Fig. 4c Supplementary information, Fig. S7f), consistent with a previous suggestion that even residual amounts of METTL3/14 complex may be sufficient to generate a significant level of m 6 A on many RNAs. 87 Another RNA modification, N 6 , 2-O-dimethyladenosine (m 6 Am), can also be recognized by the anti-m 6 A antibody during MeRIP or MINT-Seq. 88,89 Knockdown of PCIF1, 90,91 the methyltransferase of m 6 Am, did not affect the expression levels of L1 RNAs (Supplementary information, Fig. S7b), suggesting this mark was not directly involved in L1 control.

To corroborate these findings, we reanalyzed a series of recently published RNA-Seq data, 58,92,93,94,95,96,97 and found that depletion of m 6 A writers or reader (METTL3, METTL14, ZC3H13, YTHDC1) generally reduced levels of young L1 RNAs in both mouse and human cells (Supplementary information, Fig. S8a–f), indicating a positive role of m 6 A on L1 expression. In mouse, L1Md_T, L1Md_A and L1Md_Gf are the youngest sub-families and are also the main groups known to be retrotranspositionally active. 59,98 The RNA abundances of these youngest sub-families were reduced significantly in mouse ESCs upon knockout (KO) of m 6 A writers (Supplementary information, Fig. S8a–d), correlating with their higher m 6 A levels shown above (Fig. 2b). By contrast, the abundances of some relatively older L1 RNAs, such as L1Md_F, were moderately increased or unchanged in several datasets (Supplementary information, Fig. S8a–d). Consistent with our hypothesis, L1 RNA stability was globally reduced in the absence of a m 6 A writer, as shown by re-analysis of a published time-course RNA-Seq dataset after METTL14 knockdown 65 (Supplementary information, Fig. S8g). By contrast, L1 stability was increased by SAFB knockdown (Fig. 4d). These results indicate that m 6 A and its reader SAFB oppositely control L1 RNA expression, at least in part, via modulating RNA stability.

L1 RNA is the key intermediate for its retrotransposition. We interrogated an important question mentioned earlier (Fig. 2f): how does m 6 A mark impact L1 retrotransposition activity? By using specific RT-qPCR primers targeting the aforementioned L1HS-Ta at Chr22-q12.1 (Fig. 2e), the most active RC-L1 in human cancers, 22 we examined its expression upon depleting SAFB, METTL3 and YTHDC1 in MCF7 cells where this L1HS-Ta is active. 68 This experiment revealed reduction of this single live L1 (Fig. 4e), in a manner similar to pan-L1HS (the entirety of all L1HS in the genome) or other dead L1s. Importantly, by a well-established L1-neo retrotransposition reporter assay 99 (Supplementary information, Fig. S9a), we found L1 retrotransposition activity was significantly increased after SAFB depletion, but impaired by YTHDC1 and METTL3 knockdown (Fig. 4f Supplementary information, Fig. S9b). These changes of retrotransposition activity were not attributed to different transfection efficiency or proliferation rates because co-transfected puromycin-resistant construct led to similar cell survival rates (Supplementary information, Fig. S9c). RT-qPCR across introns or exons of the neomycin gene in the reporter showed no decrease of its splicing, ruling out that the effects were due to splicing alteration of reporter RNA (Supplementary information, Fig. S9d). Consistent with the increased L1 retrotransposition in reporter assays, after culturing SAFB-depleted cells for > 20 passages, 100 we found a significantly higher number of L1HS in their genomic DNA (Supplementary information, Fig. S9e). There was no alteration of an inactive L1 subfamily, the L1PA2 (Supplementary information, Fig. S9e), supporting that the L1 retrotransposition changes were specific effects. The L1 copy increase was abolished in the presence of an inhibitor of reverse transcriptase, lamivudine (3TC) (Supplementary information, Fig. S9f), suggesting bona fide retrotransposition events. These data together indicated that the known m 6 A writer/reader positively promoted L1 retrotransposition, while SAFB specifically suppressed that.

Direct and positive roles of m 6 A on L1 RNA expression and retrotransposition

Although these data are highly consistent, the knockdown of m 6 A writers can potentially affect other genes and may impact L1s indirectly. We sought to consolidate a direct role of m 6 A on L1 control. m 6 A deposits to the RRACH motif on mRNAs, 54,55,89,101 and we found this to be consistent on L1s (Supplementary information, Fig. S2d). We therefore conducted an “m 6 A mutagenesis” experiment by generating an L1-neo reporter construct with

35% (from 141 to 92) of its RRACH motifs mutated to lose the “RAC” (Fig. 4g Supplementary information, Fig. S9a). We ensure this “m 6 A mutagenesis” introduced minimal L1 RNA sequence change (<1%, i.e., 55 out of

6000 nt) and no amino acid difference. We hereafter investigate the causal function of m 6 A by comparing this m 6 A-mutant L1-neo reporter (i.e., Mut) to the original reporter with consensus L1HS sequence (i.e., Con) (Fig. 4g Supplementary information, Table S4).

Consistent with prediction, the Mut reporter expressed an L1HS RNA of lower m 6 A level as compared to that by Con reporter (Fig. 4h). The Mut L1HS RNA was also less stable (Fig. 4i), consistent with a positive role of m 6 A on L1 RNA stability (Fig. 1i Supplementary information, Fig. S8g). Importantly, the Mut L1HS showed a lower activity of retrotransposition as compared to Con L1HS (Fig. 4j). Northern blot showed a similar RNA profile for the two reporter L1 RNAs (Supplementary information, Fig. S9g), ruling out that RRACH mutations may cause transcriptional pre-termination or aberrant splicing of L1 RNAs. These results demonstrated a direct role of m 6 A in promoting L1 RNA stability and retrotransposition activity. We also assessed whether the binding or roles of SAFB on L1s directly depends on m 6 A. UV crosslinked RNA Immunoprecipitation (UV-RIP) assay showed that as compared to the Con L1HS, the Mut L1HS was less bound by SAFB (Fig. 4k). This is consistent with the quantitative m 6 A reduction on Mut L1HS (Fig. 4h), confirming that SAFB-L1 binding is directly mediated by m 6 A levels (also see Fig. 3). As a consequence of lower SAFB binding, the Mut L1HS RNA was less affected by SAFB depletion as compared to the Con L1HS RNA (Fig. 4l).

Taken together, these data demonstrated that m 6 A is a unique mark that benefits L1 expression (both RC-L1 and dead L1s) and retrotransposition (RC-L1s), which at least in part was mediated by RNA stability control. By contrast, SAFB is a host factor that counteracts such beneficial roles of m 6 A through directly binding m 6 A-L1s to decrease their abundance, consistent with its role as an L1 suppressor identified in a CRISPR screening. 63

MILs are a novel class of regulatory elements that often suppress hosting gene transcription

The large number of MILs we identified in each cell type (

2000 to > 4000 per cell type) are predominantly sense-oriented to hosting genes (Fig. 1c), are co-transcribed without autonomous promoters, and are often quite stable but are not spliced into host mRNAs (Supplementary information, Fig. S5), raising questions as to what are their impacts on host gene expression (Fig. 2f). Importantly, the intronic L1s can be either pre-existing/annotated in the human genome or be created by de novo L1 insertion. 22,30,31,32,33,34 We already observed that MILs tend to exist in host genes associated with DNA damage repair and response (DDR genes, Supplementary information, Fig. S3f). Further examining MIL-hosting human genes, we found that they have a median length of >100 kb, which is significantly longer than that of all RefSeq genes, or of genes hosting non-m 6 A-L1s (Fig. 5a). Interestingly, human DDR genes (Supplementary information, Table S6) are overall longer than average, and the subset of genes that host MILs are particularly long (Fig. 5b). Disrupted transcription of long genes has been hypothesized to underlie disease etiology, 39,41,102 but the underlying mechanisms are elusive. We are therefore curious to test a hypothesis that the MILs harbored in the long genes may regulate their expression, representing an unappreciated L1–host interaction mechanism (Fig. 2f).

a A box plot showing the lengths of human genes that host Super-MILs, Typical MILs and Control L1s. The length of all hg19 RefSeq genes is also plotted as a comparison. b A box plot showing the lengths of human DNA damage repair genes (DDR, n = 448) and of DDR genes that host MILs (n = 37). The length of all RefSeq human genes is also shown. c A diagram showing the strategy to calculate the TBI for intronic L1s based on TT-Seq data. A smaller index indicates a stronger blocking role, while a number close to 1 indicates a lack of blocking. d A box plot showing the TBIs of Super-MILs, Typical MILs and Control L1s that indicates their impact on respective hosting genes (based on K562 TT-Seq using the equation in panel c). e Genome browser snapshots showing the KO design of three different L1s in ZRANB3 introns: an antisense L1(AS-L1, no m 6 A in MINT-Seq), a MIL (low m 6 A) and a Super-MIL (strong m 6 A). Yellow highlights show the locations of these regions in the ZRANB3 gene (bottom of this panel). Targeted mRNA regions by qPCR primers are also shown. f RT-qPCR results showing the expression levels of ZRANB3 mRNA after knocking out three different L1 regions (as in panel g). g Genome browser snapshot of TT-Seq showing the increase of transcription of ZRANB3 gene after Super-MIL KO. MINT-Seq indicates the m 6 A levels of Super-MILs. The dash line indicates the 3’ end of the Super-MIL being deleted. WT, wildtype KO #1/#2, two knockout cell clones. TBI of Super-MIL calculated by each TT-Seq was labeled. h RT-qPCR results showing the expression levels of PSMA1 mRNA after knocking out or inversion of a Super-MIL region (see panel i). i Similar to panel g. Genome browser snapshot of TT-Seq showing the increase of transcription of PSMA1 gene after Super-MIL KO and/or inversion. WT, wildtype KO #1/#2, two knockout cell clones Inversion #1/#2, two Super-MIL inverted cell clones. TBI of Super-MIL calculated by each TT-Seq was labeled. Primers for mRNA RT-qPCR are indicated. For all box plots, P-values were calculated with Mann–Whitney U test and are labeled in each panel. For RT-qPCR results, data show means ± SD. *P < 0.05 **P < 0.01 ***P < 0.001, Student’s t-test.

Taking advantage of TT-Seq, we developed a computational strategy that we referred to as transcription blocking index (TBI) to measure the transcriptional activity of hosting genes after each intronic L1 as compared to that before it (Fig. 5c). TBI is a function for individual L1s that quantitatively reflects how strong they can impact the transcription of their hosting genes, with a number close to 1 indicating a lack of function and the lower the TBI is, the stronger the transcription blocking effect is (Fig. 5c). Remarkably, this analysis revealed that MILs exhibited a significant transcription-blocking effect on hosting genes than that conferred by control intronic L1s without m 6 A marks (Fig. 5d). The blocking effects of MILs are correlated with their m 6 A levels, i.e., Super-MILs showed significantly lower TBIs as compared with typical MILs (Fig. 5d). This observation suggests that MILs may represent a previously unappreciated large category of transcriptional elements for human gene regulation.

To functionally validate this finding, we knocked out several genomic L1 regions coding for MIL RNAs or control L1s by CRISPR/Cas9. We selected the ZRANB3 locus because it not only harbors a Super-MIL, but also two other L1 regions of similar length, which however differ in having lower m 6 A (a MIL) or no m 6 A (an intronic L1 anti-sense to ZRANB3 gene, AS-L1) (Fig. 5e). KO of the Super-MIL resulted in a significantly increased expression of ZRANB3 mRNA (

2.8-fold) in two independent cell clones (Fig. 5e, f). By contrast, the deletion of an AS-L1 region that does not generate m 6 A-RNA caused no consistent/significant change of ZRANB3 expression in three cell clones (Fig. 5f). KO of a low-m 6 A MIL (MIL-KO, Fig. 5e) moderately increased ZRANB3 mRNA expression as compared with WT cells (Fig. 5f). We conducted TT-Seq in the wild-type and Super-MIL KO cells to test the transcriptional basis of ZRANB3 upregulation (Fig. 5g). In support of a role of Super-MIL as “transcriptional roadblock”, a TBI of around 0.543 in WT cells (Fig. 5g, comparing TT-Seq signals to the left vs those to the right of the dashed line) was increased to

1.06 after the Super-MIL deletion notably, the transcription downstream to Super-MIL was restored to a level comparable to the upstream region (Fig. 5g, comparing signals of KO and WT to the left of the dashed line). Such transcriptional “unblocking” is consistent with significant upregulation of ZRANB3 mRNA (Fig. 5f). We observed very consistent results for another Super-MIL located in PSMA1 intron, i.e., genetic deletion of this Super-MIL significantly increased the expression of PSMA1 mRNA (

2-fold) in two cell clones (Fig. 5h). In this case, TT-Seq showed that the TBI was increased from

0.7 (Fig. 5i). The PSMA1 mRNA increase was clearly due to the removal of “transcriptional roadblock” as shown by TT-Seq that the transcription increases were specific to the regions downstream of Super-MIL (left to the dashed line, Fig. 5i), whereas little difference can be found for the regions upstream of it (right side of the KO region in Fig. 5i).

Both our global analysis and locus-specific deletion of multiple L1s (Fig. 5d–i) support that only sense-direction L1s coding for Super-MIL RNAs tend to act as strong transcriptional roadblocks. To further corroborate this conclusion, and to delineate if the effects of Super-MIL deletion should be attributed to the m 6 A-L1 RNA or the DNA region, we conducted an L1 inversion experiment. We identified two cell clones in which the original Super-MIL region was inverted. This can be verified by the presence of direction-flipped TT-Seq signals from the other strand that were not seen in WT cells (Fig. 5i). In cells with L1 inversion, PSMA1 mRNA expression was still increased by

2-fold as compared to WT, which appeared indistinguishable from Super-MIL KO (Fig. 5h) consistently, TT-Seq showed almost identical patterns of transcription and similar TBI levels in cells with L1 KO or inversion (Fig. 5i).

Moreover, we developed an “m 6 A eraser” system by fusing catalytically-dead Cas13d 103 (i.e., dCasRx) with either the m 6 A demethylase FTO (dCasRx-FTO) or an FTO mutant with no enzymatic activity (H231A/D233A, 104 dCasRx-FTO-mut) (Supplementary information, Fig. S10a). Targeted m 6 A editing by wildtype FTO but not the mutant FTO on ZRANB3 Super-MIL reduced its m 6 A level and expression, resulting in increased ZRANB3 mRNA expression (Supplementary information, Fig. S10b–d). Together, this series of data by deleting or inverting L1s and by dCasRx-FTO editing demonstrated that Super-MILs attenuate the host gene expression by blocking its transcription, and the blocking effect is dependent on L1 RNA directionality and its m 6 A level.

SAFB/B2 safeguard long gene transcription by antagonizing MILs as transcriptional roadblocks

Although SAFB2 knockdown alone caused negligible effects on L1 expression (Fig. 4a Supplementary information, Fig. S7b), it interestingly exacerbated L1HS increase in cells depleted of SAFB (Fig. 6a). This finding suggests a partial functional compensation between these two homologous proteins on MIL RNA control, reminiscent of a similar phenomenon of YTHDF readers on mRNA control. 105 In accord, RNA-Seq validated this collaborative suppression of L1 RNAs by SAFB and SAFB2 that siSAFB&B2 caused stronger increase of L1 abundance than siSAFB alone (Fig. 6b). The double knockdown specifically affected the m 6 A-marked L1s (Supplementary information, Fig. S10e), consistent with their preferential binding on m 6 A-L1 RNAs.

a RT-qPCR results showing the expression changes of L1HS after knocking down SAFB or SAFB2 in K562 cells by siRNA. The mRNA levels of SAFB and SAFB2 are also shown. siSAFB or siSAFB2 indicates single knockdown siSAFB&B2 indicates dual depletion. b A heatmap showing the RNA abundances of different L1 sub-families after depleting SAFB (siSAFB) or co-depleting SAFB and SAFB2 (siSAFB&B2) by siRNAs. Data are from RNA-Seq and indicate Log2 fold changes of expression in the knockdown group as compared to siCTL group. c Box plot showing the TBIs (see Fig. 5c) of Control L1, Super-MILs or typical MILs on their hosting genes. Data are based on TT-Seq in K562 cells with the same groups of knockdown as in panel b. P-values were calculated with a paired Student’s t-test. d Genome browser snapshot showing TT-Seq signals over the PSMA1 (upper) or ZRANB3 gene loci (lower) in control or specific knockdown cells as indicated. The MINT-Seq track on top of each plot indicates m 6 A signals for strong Super-MILs. The dash lines denote the 3’ end of the Super-MIL regions. Yellow highlights point to regions with strong transcriptional change. TBI of Super-MIL in each TT-Seq is labeled. RT-qPCR result showing mRNA reduction of ZRANB3 (e) and PSMA1(f) after depletion of SAFB (siSAFB) or co-depletion of SAFB and SAFB2 (siSAFB&B2) in wild type (WT) K562 cells or corresponding Super-MIL knockout cells. g Heatmaps generated by analyzing RNA-seq of siCTL, siSAFB and siSAFB&B2 K562 cells, showing the fold changes of MIL-hosting genes that are DDR genes. Scale bars are shown on the right (i.e., downregulated genes are in blue colors). h Similar to panel g, but these plots show fold changes of MIL-hosting genes that are putative L1 suppressors (by Liu et al., 63 left panel and by Mita et al., 11 right panel). i A regression plot showing the relationship between gene length and their expression changes after SAFB or SAFB&B2 depletion in K562 cells. MIL-hosting genes were divided into five equal-numbered groups by length (x-axis), and the fold changes of each group are shown (the dots indicate mean fold changes for genes of each group, and vertical lines denote 95% confidence interval of mean fold changes). The lines show linear regression and the shaded areas of matched colors denote a 95% confidence interval for that regression. There are 422 long genes (> 200 kb) harboring MILs. Data of siSAFB or siSAFB&B2 in this figure were based on polyA selected RNA-Seq. For all qPCRs, data show means ± SD. *P < 0.05 **P < 0.01 ***P < 0.001, Student’s t-test.

The broad “over-activation” of MILs after SAFB&B2 depletion has provided an opportunity to examine their regulatory roles on host gene transcription in a global manner. We conducted TT-Seq in cells depleted of SAFB alone or of both SAFB&B2, and calculated the TBI for intronic L1s on hosting genes (Fig. 5c). This analysis revealed that SAFB knockdown reduced the overall TBIs of Super-MILs and MILs, indicating “transcriptional blocking” of hosting genes by Super-MILs (Fig. 6c). In accord, siSAFB&B2 double knockdown exacerbated this effect even more (Fig. 6c). We ranked all MILs based on their delta TBI (TBI in siSAFB&B2 – TBI in siCTL), finding a general reduction by siSAFB&B2, with

37% of MILs showing a TBI decrease greater than 0.1 (Supplementary information, Fig. S10f, g). The MILs with decreased TBIs are of higher m 6 A levels than those not showing obvious changes (Supplementary information, Fig. S10h), in support of m 6 A-dependent regulation of TBIs by SAFB&B2. As examples, TT-Seq tracks are shown for two Super-MILs in PSMA1 and ZRANB3 introns, which illustrated that they became strong “transcriptional blockade” (Fig. 6d). The TBIs were accordingly reduced (labeled on the plots) after depletion of SAFB/SAFB2. The enhanced blocking by over-active MILs is highly consistent with a loss of blocking upon MILs KO in Fig. 5g, i. These results together demonstrated that a large group of MILs are intronic transcriptional roadblocks, acting in a manner dependent on their m 6 A levels SAFB and SAFB2 function in a collaborative manner to rectify the transcriptional defects of host genes caused by such roadblocks.

A novel L1–host interaction between MILs, SAFB/B2 and host genes

As a result of “MIL transcriptional blockade”, mRNA levels of hosting genes such as PSMA1 and ZRANB3 were significantly decreased by either SAFB or SAFB&B2 knockdown (left halves of Fig. 6e, f), which is consistent with their increases elicited by MIL KO (Fig. 5f, h). Importantly, these two host genes were not or less affected by siSAFB or siSAFB&B2 when their Super-MILs were deleted (right halves of Fig. 6e, f), indicating that SAFB and SAFB/B2 act on these genes directly via the Super-MILs.

The apparent suppressive roles of MILs reminded us of the interesting features of MIL hosting genes, DNA damage repair (DDR) genes and long genes (Fig. 5a, b Supplementary information, Fig. S3f). Inspection of MIL-hosting DDR genes identified ZRANB3, SMARCAL1, ATR, ATRX, RB1, FANCC, FANCD2, FANCI (Supplementary information, Table S6), many of which are important genome guardians that may prevent L1 mobilization. 11,19,20,63 As an example, ZRANB3 is a DNA translocase crucial for replication fork maintenance, 106 and was recently revealed as a suppressor of L1 retrotransposition. 11 This gene was extensively shown to harbor strong MILs and was suppressed by them (Figs. 5e–g, 6d, e). Globally, DDR genes that host MILs commonly displayed a reduced expression upon SAFB or SAFB&B2 knockdown in RNA-Seq (Fig. 6g). The effect was stronger after SAFB&B2 co-depletion (Fig. 6g), consistent with a more dramatic increase of MIL expression (Fig. 6a–c). This unbiased analysis not only revealed reduction of the few genes mentioned earlier (ZRANB3, SMARCAL1, ATR, FANCD2, and FANCC), 106 but also identified other DDR genes as hosts for, and were suppressed by, MILs, such as SPIDR, 107 ERCC6L2 108 and BRIP1 (BRCA1 interacting protein, a.k.a., FANCJ 109 ) (Fig. 6g, left panel). We directly compiled two separate lists of MIL-hosting genes that have been identified as L1 suppressors. 11,63 A number of them showed significantly inhibited expression upon MIL over-activation (i.e., by SAFB or SAFB&B2 depletion) (Fig. 6h). The reduced expression of MIL-hosting DDR genes by SAFB&B2 knockdown was consistently found in other cell types we examined (Supplementary information, Fig. S10i).

Another major feature of MIL-hosting genes is that they are very long (Fig. 5a, b). To test whether long genes are more vulnerable to MIL blocking, we plotted hosting gene length against their expression changes after SAFB or SAFB&B2 knockdown. This analysis showed an interesting trend that long genes were preferentially, and more significantly suppressed (Fig. 6i), in support of the notion that MILs are important regulators of long human genes. In addition, longer genes bear on average more MILs (Supplementary information, Fig. S10j), which may contribute to the stronger changes of long genes by siSAFB&B2. For example, human ZRANB3 has a length of 336 kb, and its expression was reduced by

60% after depletion of SAFB&B2 (Fig. 6e), whereas Super-MIL KO increased the expression by

3-fold (Fig. 5h). As controls, we ruled out that transient knockdown of SAFB&B2 may impact overall chromatin state that indirectly caused the expression changes of L1 RNAs or host genes, for example, H3K9me3, the histone modification often associated with heterochromatin and RTE suppression, was not affected (Supplementary information, Fig. S10k).

SAFB/B2 and MILs impact long genes involved in crucial neuronal/synaptic functions

This intriguing cross-talk between MILs, DDR factors and long genes is reminiscent of a noted long gene enrichment/regulation in the brain, 38,39 particularly because both DDR 110,111 and L1 activity 7,32,33,37 were uniquely crucial in this tissue. Long genes are associated with vital neuronal functions and are involved in NNDs. 38,39,40 We hence interrogated potential roles of MILs as unappreciated regulatory elements in the human brain. By analyzing published MeRIP-Seq datasets in fetal human tissues, 112 we identified a large number of MILs in each tissue, among which the fetal brain contains one of the largest (n = 3339) (Fig. 7a). MIL-hosting genes in the fetal brain are enriched for functional terms “neuronal system” “cell projection”, “synapsis organization” and “synaptic genes” (Fig. 7b), a large number of which are crucial for neuronal and synaptic functions. As examples, strong MILs were harbored in key neuronal/neurodevelopmental genes GPHN (Gephyrin), 113 UBE3A, 114 and CTNND2 (delta2-catenin) 115 (two examples shown in Fig. 7c), in synaptic genes such as DLG2 (coding for the postsynaptic density protein-93), 33 in genes coding for crucial transmembrane molecules (CNTNAP4 and CTNNA2), 116,117 and in major neurotransmitter receptor genes such as GABA receptor type-A γ3 (GABRG3) and Glutamate Receptor AMPA Type-4 (GRIA4) (Supplementary information, Table S7). DDR genes were also enriched as MIL hosts in the fetal brain, but were not as highly ranked as neuronal or synaptic genes (not shown). In terms of gene length, neuronal/synaptic genes are longer than average (Fig. 7d), which is a known feature, 38,39 but neuronal/synaptic genes that harbor MILs are exceptionally long (Fig. 7d Supplementary information, Table S7).

a A barplot showing the MIL numbers identified by MeRIP-Seq in 8 fetal human tissues (Xiao et al. 112 ). b The top functional gene categories that MIL-hosting genes in the fetal human brain are enriched against fetal brain expressed genes (RNA-Seq FPKM > 0.1) (by Metascape see Materials and methods). c Genome browser snapshots showing tracks of fetal brain RNA-Seq and MeRIP-Seq at the CTNND2 and GPHN loci. MILs are highlighted. d A boxplot showing the length of four groups of genes. All genes, all RefSeq genes neuronal, genes associated with neuronal or synaptic functions MIL hosts, fetal brain genes that host MILs MIL hosts & neuronal, the shared group between these two (gene lists in Supplementary information, Table S7). e A boxplot showing the expression levels of MILs after co-depletion of SAFB and SAFB2 as compared to control (siCTL). Data was generated from polyA RNA-Seq in human NPC cells. f A boxplot showing the expression changes of MIL-hosting DDR or neuronal/synaptic genes after siSAFB&B2 knockdown. The y-axis indicates Log2 fold changes based on hNPC RNA-Seq. g A regression plot showing the relationship between gene length and their expression changes after SAFB&B2 depletion in human NPCs, similar to Fig. 6i. h A barplot showing the observed or expected numbers of brain MIL-hosting genes that overlap with SFARI autism-associated genes. The expected number was calculated based on that all genes expressed in hNPCs (n = 19,286) have a chance to be SFARI genes P-value and odds ratio: Fisher’s exact test. i A violin plot showing the Log2 fold changes of MIL-hosting genes listed in the SFARI database (the red group in Panel h) after SAFB&B2 co-depletion. The 20 most down-regulated genes are labeled. j A model figure showing the major findings of this work: 1), m 6 A on RC-L1s promotes the retrotranscription activity of these live L1s 2), m 6 A on retrotranspositionally dead MILs mediates their roles in acting as transcriptional roadblocks that preferentially impede long human genes, which include DDR genes and neuronal/synaptic genes 3), the m 6 A-L1 readers SAFB and SAFB2 represent a host defense system that on one hand binds RC-L1s to inhibit their expression and retrotransposition, and on the other hand reduces MIL expression to safeguard the transcription of long human genes 4), MILs represent an unappreciated large category of cell-type-specific transcriptional elements for gene regulation in health or diseases. The colored round objects denote m 6 A-L1-binding RBPs that often associate with the nuclear matrix. Arrows indicate positive regulation or Pol2 direction, while “----|” indicates negative regulation or suppression. The P-value for panel d was calculated by Mann-Whitney U test P-values in panels e and f were calculated with paired Student’s t-tests (siSAFB&B2 vs siCTL). Data in this figure of hNPC after siSAFB&B2 was based on polyA selected RNA-Seq.

Human neural progenitor cells (hNPCs) were known to harbor high L1 activity. 118 To confirm the roles of MILs/SAFB&B2 in brain cells, we generated hNPCs from induced human pluripotent stem cells (iPSCs) with a high purity, as shown by expression of marker genes NESTIN, SOX1 and SOX2 in immunofluorescence (Supplementary information, Fig. S11a). Co-depletion of SAFB&B2 in hNPCs demonstrated their highly conserved function as we found in other transformed cells (Fig. 6). First, their co-depletion strongly induced L1 RNAs, to a level higher than that by SAFB knockdown alone (Supplementary information, Fig. S11b) and RNA-Seq analyses confirmed a global increase of MILs (Fig. 7e). Second, the upregulation of MILs was accompanied by a significant decrease of their hosting genes, including DDR genes and a large category of neuronal/synaptic genes (Fig. 7f). The genes down-regulated by SAFB&B2 knockdown (FDR < 0.05 by EdgeR) were highly enriched for functions related to nervous system development (Supplementary information, Fig. S11c). Third, when we ranked the MIL-hosting genes in NPCs by their length, it was obvious that long genes, especially those >100 kb, were particularly inclined to be inhibited (Fig. 7g).

Human brain also contains a myriad of non-neuronal cells that are not differentiated from NPCs. Microglia is the resident macrophage in the brain crucial in development and diseases, 119,120 but L1 expression and function in this cell type are less explored. We conducted TT-Seq/MINT-Seq in HMC3 cells, a transformed primary cell type of embryonic human microglia, 121,122 which identified 1607 MILs (Supplementary information, Fig. S11d). In contrast to the fetal human brain, the GO terms for MIL-hosting genes in HMC3 are similar to those in other somatic cell lines, such as DDR genes (Supplementary information, Fig. S11e), suggesting a unique program of MILs in the cells of neuronal lineage. Consistently, RT-qPCR showed that SAFB also suppressed L1 RNA expression in human microglia (Supplementary information, Fig. S11f). These results together established a general mechanism that SAFB&B2 suppressed MILs in long genes to safeguard their transcriptional output and critical functions.

Implication of MILs in neurodevelopmental diseases

Aberrant L1 activity has been widely observed in NNDs. 31,34,43 Of the genes affected by MILs/SAFB&B2, a large number are associated with NNDs. We examined the list of genes compiled by the SFARI database that are implicated in autism spectrum disorder (ASD) 123 (https://www.sfari.org/resource/sfari-gene/). There is a strong association between autism-associated genes and MIL-hosting genes in the fetal brain (170 out of 861 SFARI genes are MIL hosts, Fisher’s exact test, P-value < 6.19e−31, Fig. 7h). Notably, the SFARI genes that host MILs are largely down-regulated in hNPCs after SAFB&B2 depletion (Fig. 7i Supplementary information, Fig. S12a), including many aforementioned genes crucial in synaptic and neuronal functions (top 20 down-regulated genes shown in Fig. 7i). SAFRI genes without MILs were largely unaffected (Supplementary information, Fig. S12b), supporting that the effects of SAFB&B2 on neuronal/synaptic genes are specifically mediated by MILs.


Results

Sas4p was the connective core subunit of the SAS-I complex

The yeast SAS-I complex consists of at least three subunits, as was reported previously (Meijsing and Ehrenhofer-Murray, 2001 Osada et al., 2001). In order to investigate more closely the protein-protein interactions within the SAS-I complex, we used several different assays. In a yeast two-hybrid assay, a Sas4 fusion to the Gal4 DNA-binding domain interacted with the preys Sas2p and Sas5p (Fig. 1A), but not with the empty vector. Using Sas5p as the bait in a two-hybrid screen (Fromont-Racine et al., 1997), SAS4 was isolated six times on three overlapping fragments, limiting the region of interaction with Sas5p to the C-terminal part (amino acids 318-381 data not shown).

Protein-protein interactions within the SAS-I complex. (A) Sas4p interacted with Sas2p and Sas5p in the two-hybrid system as shown by growth or no growth of the reporter strain in the absence of histidine. GBD, Gal4 DNA-binding domain (pAS-BC) GAD, Gal4 activation domain (pACTIIst) see Table 2 for plasmid details. (B) Coimmunoprecipitation analysis of Sas5p-HA3/myc6-Sas2p, Sas4p-myc9/Sas2p and Sas5p-HA3/myc6-Sas4p interactions. Bound proteins were analyzed on 12.5% SDS polyacrylamide gels, immunoblotted and probed with α-myc or α-Sas2 antibodies. 2Δ, SAS2Δ 4Δ, SAS4Δ 5Δ, SAS5Δ. (C) GST-pulldown assay to determine direct interactions within the SAS-I complex. Purified GST-Sas fusion proteins were immobilized on glutathione agarose beads and incubated with in vitro-translated radiolabeled SAS-I proteins (abbreviated 2, 4 and 5) or luciferase (L, negative control). An asterisk indicates degradation products of radiolabeled Sas4p. Washed beads were eluted with sample buffer, separated on 12.5% SDS polyacrylamide gels, and bands were visualized using a PhosphoImager.

Protein-protein interactions within the SAS-I complex. (A) Sas4p interacted with Sas2p and Sas5p in the two-hybrid system as shown by growth or no growth of the reporter strain in the absence of histidine. GBD, Gal4 DNA-binding domain (pAS-BC) GAD, Gal4 activation domain (pACTIIst) see Table 2 for plasmid details. (B) Coimmunoprecipitation analysis of Sas5p-HA3/myc6-Sas2p, Sas4p-myc9/Sas2p and Sas5p-HA3/myc6-Sas4p interactions. Bound proteins were analyzed on 12.5% SDS polyacrylamide gels, immunoblotted and probed with α-myc or α-Sas2 antibodies. 2Δ, SAS2Δ 4Δ, SAS4Δ 5Δ, SAS5Δ. (C) GST-pulldown assay to determine direct interactions within the SAS-I complex. Purified GST-Sas fusion proteins were immobilized on glutathione agarose beads and incubated with in vitro-translated radiolabeled SAS-I proteins (abbreviated 2, 4 and 5) or luciferase (L, negative control). An asterisk indicates degradation products of radiolabeled Sas4p. Washed beads were eluted with sample buffer, separated on 12.5% SDS polyacrylamide gels, and bands were visualized using a PhosphoImager.

No direct interaction was detected between Sas2p and Sas5p in the two-hybrid assay, suggesting that Sas4p was the bridging subunit between Sas2p and Sas5p. This observation was confirmed by two other assays. Coimmunoprecipitation of Sas2p with Sas5p occurred only when Sas4p was present, whereas deletion of Sas5p or Sas2p had no influence on the interaction between Sas4p and Sas2p or Sas5p and Sas4p, respectively (Fig. 1B). This observation was supported by TAP-purifications of the SAS-I complex with extracts prepared from wild-type and sas4Δ strains in which SAS5 was genomically TAP-tagged. In wild-type extracts, Sas2p and Sas4p co-purified together with Sas5-TAP by contrast, Sas2p was not recovered in the absence of Sas4p (data not shown). Additionally, in a GST pulldown assay with in vitro-translated radiolabeled SAS-I subunits, we observed direct interactions between GST-Sas2p and radiolabeled Sas4p, between GST-Sas4p and both Sas2p and Sas5p, and between GST-Sas5p and Sas4p (Fig. 1C). GST-Sas2p and GST-Sas4p also appeared to interact with themselves (radiolabeled Sas2p and Sas4p, respectively), but not with the negative control (Luciferase, Fig. 1C). Sas2p interacts with itself in two-hybrid assays (Meijsing and Ehrenhofer-Murray, 2001) moreover, we cannot rule out the possibility that Sas4p dimerizes under certain conditions. Little radiolabeled Sas2p or Sas5p was bound by GST-Sas5 or GST-Sas2p, respectively. Together, these data further support the notion that Sas4p is the connecting piece between Sas2p and Sas5p.

Sas2p and Sas5p interacted with the importins Pse1p and Kap123p

We used a yeast two-hybrid screen (Fromont-Racine et al., 1997) to identify proteins that interact with Sas5p. Among the positive clones that interacted with the bait Sas5p, we isolated eight times the ORF of YMR308c, which corresponds to the PSE1 gene. Interaction of Sas5p and Pse1p occurred within three overlapping C-terminal fragments of Pse1p (Fig. 2A). Pse1p belongs to the family of nuclear transport receptors called karyopherins (Kaps) or importins and was previously reported to support the nuclear import of a number of transcription factors (Chook and Blobel, 2001). Pse1p is homologous to Kap123p and, like Kap123p, is functionally redundant in mediating nuclear import of ribosomal proteins (Rout et al., 1997), ribosome associated proteins (Franke et al., 2001) and histones H3 and H4 (Mosammaparast et al., 2002a). Hence, we were also interested to test for a physical interaction between the SAS-I complex and Kap123p.

Sas2p and Sas5p interacted with the importins Pse1p and Kap123p. (A) Domains in Pse1p. The blue bar indicates the importin β homology domain containing an Armadillo repeat (ARM, red). The C-terminal fragments of Pse1p interacting with Sas5p in the two-hybrid assay are illustrated by black lines. (B) Coimmunoprecipitation analysis of Sas5p-myc9/Pse1p-HA3, Sas5p-myc9/Kap123p-HA3, myc6-Sas2p/Pse1p-HA3 and myc6-Sas2p/Kap123p-HA3 interactions. Bound proteins were separated on 12.5% SDS polyacrylamide gels, immunoblotted and probed with an α-HA antibody. Pse1p-HA3 coimmunoprecipitated with Sas5p-myc9 or myc6-Sas2p (upper panels), and Kap123p-HA3 co-precipitated with Sas5p-myc9 or myc6-Sas2p (lower panels).

Sas2p and Sas5p interacted with the importins Pse1p and Kap123p. (A) Domains in Pse1p. The blue bar indicates the importin β homology domain containing an Armadillo repeat (ARM, red). The C-terminal fragments of Pse1p interacting with Sas5p in the two-hybrid assay are illustrated by black lines. (B) Coimmunoprecipitation analysis of Sas5p-myc9/Pse1p-HA3, Sas5p-myc9/Kap123p-HA3, myc6-Sas2p/Pse1p-HA3 and myc6-Sas2p/Kap123p-HA3 interactions. Bound proteins were separated on 12.5% SDS polyacrylamide gels, immunoblotted and probed with an α-HA antibody. Pse1p-HA3 coimmunoprecipitated with Sas5p-myc9 or myc6-Sas2p (upper panels), and Kap123p-HA3 co-precipitated with Sas5p-myc9 or myc6-Sas2p (lower panels).

We performed coimmunoprecipitation experiments in order to verify the two-hybrid interaction between Sas5p and Pse1p in vivo, and to test for the interactions of Sas2p and Sas5p with Kap123p or Pse1p. For this purpose, subunits of the SAS-I complex, Pse1p or Kap123p were epitope tagged. In coimmunoprecipitation assays with myc-tagged SAS-I subunits immobilized on beads, significant coprecipitates of Pse1p-HA3 and Kap123p-HA3 were detected with Sas5p-myc9 and myc6-Sas2p, respectively, but not with myc6-Sas4p (Fig. 2B, and data not shown). Some non-specific binding of Pse1p-HA3 and Kap123p-HA3 to the myc-antibody was also observed, but to a much lesser extent than the coimmunoprecipitation (Fig. 2B). These observations suggested that Sas2p and Sas5p were potential import substrates of Pse1p and Kap123p.

Pse1p and Kap123p were required for the nuclear import of Sas2p and Sas5p

To determine the role of Pse1p, Kap123p, and potentially other karyopherins in the import of Sas2p and Sas5p in vivo, GFP fusions of the SAS-I proteins were generated and their subcellular localizations were analyzed in cells harboring mutations or deletions in specific KAP genes. GFP fusions of Sas2p, Sas4p and Sas5p showed predominantly nuclear localizations in the wild-type strain GFP-Sas4p was exclusively localized in the nucleus, whereas Sas2p and Sas5p exhibited an additional slightly elevated GFP signal in the cytoplasm (Fig. 3). The nuclear accumulation of GFP-Sas2p and GFP-Sas5p, but not of GFP-Sas4p, was clearly decreased in kap123Δ or pse1-1 cells (Fig. 3). Interestingly, these effects were already visible in pse1-1 cells at the permissive temperature (23°C), and a shift towards the nonpermissive temperature had no further consequences. By contrast, the nuclear localization of GFP-Sas4p in pse1-1 cells did not change even under nonpermissive conditions (data not shown). However, the nuclear import of Sas2p and Sas5p was not completely blocked in pse1-1 and kap123Δ strains, as some nuclear GFP-Sas2p and GFP-Sas5p signal still remained in these kap mutants. In agreement with this, pse1-7 and kap123Δ strains did not show one of the sas silencing phenotypes (data not shown), the suppression of an HMR silencing defect (Ehrenhofer-Murray, 1997). We used pse1-7 in this assay, because crosses with the pse1-1 mutant showed very poor spore viability, such that pse1-1 was not manageable in these genetic assays (data not shown). The absence of a silencing phenotype was not surprising, given that Pse1p and Kap123p have overlapping functions and that one can substitute for the other. Alternatively, other importins may take over function when the Pse1p and Kap123p import pathways are disturbed. Additionally, we generated pse1-1sas4Δ and kap123Δsas4Δ double mutant strains, in which we analyzed the subcellular localizations of GFP-Sas2p and GFP-Sas5p. We did not detect a significant difference in the localizations of Sas2p and Sas5p in the double mutant strains when compared with those of each pse1-1 or kap123Δ single mutant (data not shown).

Nuclear accumulation of GFP-Sas2p and GFP-Sas5p was significantly decreased by kap123Δ or pse1-1. GFP-Sas2p (pAE1098), GFP-Sas4p (pAE1101) or GFP-Sas5p (pAE1106) were expressed in wild-type (WT) and kap mutant strains (see Tables 1 and 2), and the GFP tag was detected by fluorescence microscopy. The Hoechst staining visualizes the nucleus. Bars, 5 μm.

Nuclear accumulation of GFP-Sas2p and GFP-Sas5p was significantly decreased by kap123Δ or pse1-1. GFP-Sas2p (pAE1098), GFP-Sas4p (pAE1101) or GFP-Sas5p (pAE1106) were expressed in wild-type (WT) and kap mutant strains (see Tables 1 and 2), and the GFP tag was detected by fluorescence microscopy. The Hoechst staining visualizes the nucleus. Bars, 5 μm.

In order to test whether the nuclear import of Sas2p and Sas5p was specifically mediated by Kap123p and Pse1p, we studied subcellular localization of the SAS-I proteins in various other kap mutants. Five more deletions of KAP genes, KAP108/SXM1, KAP114, KAP119, KAP122/PDR6 and KAP142/MSN5, were at our disposal. We found no significant decrease in the nuclear accumulation of GFP-Sas2p, GFP-Sas4p and GFP-Sas5p in cells lacking Kap108p, Kap114p, Kap122p or Kap142p (Fig. 4). Taken together, these results showed that the transport receptors Kap123p and Pse1p were required for the nuclear import of Sas2p and Sas5p, but not Sas4p, whereas the other karyopherins tested were not involved in the transport.

Nuclear localization of GFP-Sas2p and GFP-Sas5p was impaired in kap123Δ, but not in kap108Δ, kap119Δ, kap122Δ and kap142Δ strains. GFP-Sas2p, GFP-Sas4p or GFP-Sas5p (see Table 2) were detected in wild-type (WT) and kap deletion strains (as indicated) by fluorescence microscopy. Bar, 5 μm.

Nuclear localization of GFP-Sas2p and GFP-Sas5p was impaired in kap123Δ, but not in kap108Δ, kap119Δ, kap122Δ and kap142Δ strains. GFP-Sas2p, GFP-Sas4p or GFP-Sas5p (see Table 2) were detected in wild-type (WT) and kap deletion strains (as indicated) by fluorescence microscopy. Bar, 5 μm.

Identification of potential nuclear localization signals within the SAS-I complex

In order to identify putative NLSs in the three SAS-I subunits, we analyzed the amino acid (aa) sequences of Sas2p, Sas4p and Sas5p using the Prosite patterns/profiles motif scan program (Falquet et al., 2002). Bipartite NLSs were predicted for Sas2p (aa 19-36) and Sas4p (aa 336-353, Fig. 5A), whereas no such sequence was proposed for Sas5p. Additionally, the Prosite motif scan revealed a cullin homology region for Sas4p (aa 91-275, Fig. 5A), a motif frequently found in proteins involved in cell cycle transitions. A sequence analysis with SMART, a web tool (http://smart.embl.de/) for the identification of protein domains (Letunic et al., 2004), displayed the known Sas2p histone acetyltransferase (HAT) domain of the MYST family (aa 126-314, Fig. 5A). A zinc finger precedes the HAT domain and mediates the interaction of Sas2p with Sas4p (Meijsing and Ehrenhofer-Murray, 2001). SMART analysis of Sas5p revealed a region common to the YEATS family of proteins, which includes Y NK7, E NL, A F-9 and T FIIF (small subunit) that are implicated in stimulation of transcription (aa 6-114, Fig. 5A).

Deletion analysis of Sas2p, Sas4p and Sas5p and their cellular localizations. (A) Predicted domains, motifs and nuclear localization signals (NLS) of Sas2p, Sas4p and Sas5p. The indicated domains/motifs were found by sequence analysis with the Prosite motif scan and the SMART domain search program. NLSs are depicted as green triangles an asterisk indicates that the NLS could not be verified experimentally. In Sas2p, the histone acetyltransferase (HAT) domain is shown (red), preceded by a zinc finger motif (yellow). A cullin homology domain (orange) was found in Sas4p. Sas5p has sequence homology to the YEATS family of proteins (blue hexagon). (B) The N-terminal domains of Sas2p and Sas5p were necessary for their nuclear accumulation. Full-length or fragments of Sas2p, Sas4p and Sas5p (pAE1098-pAE1108, Table 2) were expressed as fusions to GFP in wild-type yeast cells and visualized by fluorescence microscopy. The start and end point of each fusion is indicated by their corresponding amino acids to the left of each image. Bar, 5 μm.

Deletion analysis of Sas2p, Sas4p and Sas5p and their cellular localizations. (A) Predicted domains, motifs and nuclear localization signals (NLS) of Sas2p, Sas4p and Sas5p. The indicated domains/motifs were found by sequence analysis with the Prosite motif scan and the SMART domain search program. NLSs are depicted as green triangles an asterisk indicates that the NLS could not be verified experimentally. In Sas2p, the histone acetyltransferase (HAT) domain is shown (red), preceded by a zinc finger motif (yellow). A cullin homology domain (orange) was found in Sas4p. Sas5p has sequence homology to the YEATS family of proteins (blue hexagon). (B) The N-terminal domains of Sas2p and Sas5p were necessary for their nuclear accumulation. Full-length or fragments of Sas2p, Sas4p and Sas5p (pAE1098-pAE1108, Table 2) were expressed as fusions to GFP in wild-type yeast cells and visualized by fluorescence microscopy. The start and end point of each fusion is indicated by their corresponding amino acids to the left of each image. Bar, 5 μm.

Several deletion mutants of the SAS-I proteins were created in order to test which parts of the proteins were necessary for nuclear accumulation and whether or not the proposed NLSs were genuine (Fig. 5). Nuclear accumulation was significantly decreased in deletions of a large N-terminal region of Sas2p (aa 1-146) as well as a smaller region, which both encompass the putative NLS (Fig. 5B), suggesting that Sas2p (aa 1-48) indeed contained an NLS. Various N-terminal deletions of Sas4p had no effect on the nuclear localization, not even a deletion covering the entire N-terminus including the cullin homology region (aa 1-287, Fig. 5B). Deletion of the C-terminal aa 328-481 also had no effect on the nuclear signal of Sas4p, although a suggested bipartite NLS (aa 336-353) lies within this deletion. Only a remaining C-terminal part of Sas4p (aa 377-481) was not exclusively localized to the nucleus, although it cannot be ruled out that this fusion was no longer retained in the nucleus because of back-diffusion due to its small size. Together, the region of Sas4p responsible for the nuclear accumulation was presumably located within aa 288-327 (Fig. 5B). Finally, a deletion of the C-terminal half of Sas5p (aa 124-248) did not significantly decrease the nuclear signal of the GFP-Sas5p fusion, whereas deletion of the N-terminal half (aa 1-123) significantly reduced the nuclear accumulation (Fig. 5B), pointing to a yet unidentified signal sequence for nuclear import in the N-terminus of Sas5p.

The fact that Sas2p contained a predicted bipartite NLS of the `classical' type prompted us to assay the GFP-Sas fusions for nuclear localization in the kap60 temperature-sensitive alleles srp1-31 and srp1-49. However, the nuclear accumulation of GFP-Sas2p, GFP-Sas4p and GFP-Sas5p was not decreased in srp1-31 or srp1-49 mutants, not even at non-permissive temperature (data not shown), indicating that the Kap95p/Kap60p import receptor was not required for SAS-I import.

Interestingly, the three GFP-Sas reporters displayed a nuclear localization even when the other SAS-I subunits were deleted for instance, GFP-Sas4p remained nuclear in the absence of SAS2 or SAS5 or both (data not shown), suggesting that the subunits were imported independently of each other.

Sas5p participated in protein complexes of different molecular weight in pse1-1 and wild-type cells

We performed size fractionation experiments of total protein lysates from pse1-1 and its isogenic wild-type strain, which were previously transformed with pRS426-HA-Sas5 (pAE625, see Table 2) and cultured at permissive temperature (24°C). Analysis of the gel filtration fractions from both experiments revealed that the two HA-Sas5p elution profiles differed in two ways: first, the pse1-1 elution profile contained a new peak of Sas5p (fractions 67-75), corresponding to proteins of significantly lower molecular weight and lacking in the wild-type profile, while the fractions corresponding to the SAS-I complex (fractions 57-65) were similar to wild-type (Fig. 6). Second, a minor part of Sas5p was also found in higher molecular mass fractions of the wild-type profile (fractions 37-43), indicating that Sas5p was also a component of larger protein complexes in wild-type extracts, but not in the pse1-1 mutant (Fig. 6). Interestingly, it was previously reported that Sas5p is part of larger protein complexes (Meijsing and Ehrenhofer-Murray, 2001). The nature of this higher molecular weight assembly is not known, but further investigations will show whether its formation may be dependent on the nuclear import of Sas5p.

Size fractionation of wild-type and pse1-1 protein extracts prepared from cells (AEY2956 or AEY2957) grown at 24°C and expressing HA-Sas5p (pAE625). The elution profiles of HA-Sas5p were analyzed by gel electrophoresis on 12.5% SDS polyacrylamide gels and immunoblotting of the indicated fractions using an anti-HA antibody. The elution peaks of marker proteins are labeled above.

Size fractionation of wild-type and pse1-1 protein extracts prepared from cells (AEY2956 or AEY2957) grown at 24°C and expressing HA-Sas5p (pAE625). The elution profiles of HA-Sas5p were analyzed by gel electrophoresis on 12.5% SDS polyacrylamide gels and immunoblotting of the indicated fractions using an anti-HA antibody. The elution peaks of marker proteins are labeled above.

The putative Sas2p NLS showed similarity with those of histones H3 and H4

Two other nuclear proteins were recently reported to be substrates of the Kap123p and Pse1p import pathways, the histones H3 and H4 (Mosammaparast et al., 2002a). Both H3 and H4 contain an NLS at their N-terminal domains the minimal NLS of H3 was localized to residues 1-28, while that of H4 was limited to residues 1-21 (Mosammaparast et al., 2002a). Even though this H4(1-21)-GFP2 fusion did not appear entirely nuclear, its nuclear signal was easily detectable above the cytoplasmic background. We generated an alignment of these minimal NLSs with the proposed NLS of Sas2p and found a considerable agreement among these sequences. Ten out of 22 aligned residues were highly conserved in the three proteins (Fig. 7). This accordance prompted us to search for proteins containing a consensus NLS based on our alignment that could serve as a potential signal sequence for the Kap123p and Pse1p nuclear import pathways. Using the BLASTP tool (Altschul et al., 1997) and each of the aligned NLS as the input query, we identified three other nuclear proteins with significant matches to the consensus NLS: histone H1, Snf12p/Swp73p, a subunit of the SWI/SNF chromatin remodeling complex, and the histone H2A variant Htz1p, the latter having the best agreement to the consensus sequence (Fig. 7). Remarkably, all aligned NLSs are located in the very N-terminal ends of these proteins. The presence of a similar consensus sequence in Snf12p, histones H1 and Htz1p raises the question whether these nuclear proteins may constitute further import substrates of Kap123p or Pse1p. However, Sas5p (this study), a number of ribosomal proteins (Pemberton et al., 1998) and ribosome associated proteins (Franke et al., 2001) are also substrates of these import pathways and do not display any sequence homology with this consensus NLS.

Alignment of the suggested NLSs of Sas2p and histones H3 and H4. Histone H1, Snf12p and the histone variant Htz1p were found by BLAST P queries using the NLSs of Sas2p, histones H3 and H4, respectively, and showed significant similarity to the consensus sequence.

Alignment of the suggested NLSs of Sas2p and histones H3 and H4. Histone H1, Snf12p and the histone variant Htz1p were found by BLAST P queries using the NLSs of Sas2p, histones H3 and H4, respectively, and showed significant similarity to the consensus sequence.


General Transfection Protocol

Preparing Cells for Transfection

Removing Adherent Cells Using Trypsin

Trypsinizing cells prior to subculturing or cell counting is an important technique for successful cell culture. The following technique works consistently well when passaging cells.

Materials Required:

  • 1X trypsin-EDTA solution
  • 1X PBS or 1X HBSS
  • adherent cells to be subcultured
  • appropriate growth medium (e.g., DMEM) with serum or growth factors or both added
  • culture dishes, flasks or multiwell plates, as needed
  • hemocytometer
  1. Prepare a sterile trypsin-EDTA solution in a calcium- and magnesium-free salt solution such as 1X PBS or 1X HBSS. The 1X solution can be frozen and thawed for future use, but trypsin activity will decline with each freeze-thaw cycle. The trypsin-EDTA solution may be stored for up to 1 month at 4°C.
  2. Remove medium from the tissue culture dish. Add enough PBS or HBSS to cover the cell monolayer: 2ml for a 150mm flask, 1ml for a 100mm plate. Rock the plates to distribute the solution evenly. Remove and repeat the wash. Remove the final wash. Add enough trypsin solution to cover the cell monolayer.
  3. Place plates in a 37°C incubator until cells just begin to detach (usually 1&ndash2 minutes).
  4. Remove the flask from the incubator. Strike the bottom and sides of the culture vessel sharply with the palm of your hand to help dislodge the remaining adherent cells. View the cells under a microscope to check whether all cells have detached from the growth surface. If necessary, cells may be returned to the incubator for an additional 1&ndash2 minutes.
  5. When all cells have detached, add medium containing serum to cells to inactivate the trypsin. Gently pipet cells to break up cell clumps. Cells may be counted using a hemocytometer, distributed to fresh plates for subculturing, or both.

Typically, cells are subcultured to prepare for transfection the next day. The subculture should bring the cells of interest to the desired confluency for transfection. As a general guideline, plate 5 × 10⁴ cells per well in a 24-well plate or 5.5 × 10⁵ cells for a 60mm culture dish for

80% confluency on the day of transfection. Change cell numbers proportionally for different size plates (see Table 3).

Table 3. Area of Culture Plates for Cell Growth.

Size of Plate Growth Area a (cm²) Relative Area b
24-well 1.88 1X
96-well 0.32 0.2X
12-well 3.83 2X
6-well 9.4 5X
35mm 8.0 4.2X
60mm 21 11X
100mm 55 29X

a This information was calculated for Corning® culture dishes.
b Relative area is expressed as a factor of the total growth area of the 24-well plate recommended for optimization studies. To determine the proper plating density, multiply 5 × 10⁴ cells by this factor.

Preparing DNA for Transfection

High-quality DNA free of nucleases, RNA and chemicals is as important for successful transfection as the transfection reagent chosen. See the Protocols and Applications Guide chapter on DNA purification for information about purifying transfection-quality DNA.

Example Protocol Using ViaFect&trade Reagent

We strongly recommend that you optimize transfection conditions for each cell line. If you have optimized transfection parameters, use the empirically determined conditions for your experimental transfections.

If you choose not to optimize transfection parameters, use the general conditions recommended below.

Materials Required:

  • cell culture medium with serum appropriate for the cell type being transfected
  • serum-free cell culture medium for complex formation (such as Opti-MEM® I reduced-serum medium)
  • 96-well or other culture plates
  • U- or V-bottom dilution plates or microcentrifuge tubes

The total volume of transfection complex (medium, DNA and ViaFect&trade Transfection Reagent) to add per well of a 96-well plate is 5&ndash10&mul. The following protocol is a guideline for transfecting approximately 10&ndash20 wells, depending on the volume of ViaFect&trade Transfection Reagent:DNA mixture used. For additional wells, scale volumes accordingly.

  1. To a sterile tube or U- or V-bottom plate, add 90&ndash99&mul of serum-free medium prewarmed to room temperature so that the final volume after adding the DNA is 100&mul. Add 1&mug of plasmid DNA to the medium, and mix. For a 3:1 ViaFect&trade Transfection Reagent:DNA ratio, add 3&mul of ViaFect&trade Transfection Reagent, and mix immediately.
  2. Incubate the ViaFect&trade Transfection Reagent:DNA mixture for 5&ndash20 minutes at room temperature.
    Optional: Add mixture to cells without an incubation period.
    Note: Longer incubations may adversely affect transfections.
  3. Add 5&ndash10&mul of the ViaFect&trade Transfection Reagent:DNA mixture per well to a 96-well plate containing 100&mul of cells in growth medium. We suggest 10&mul of mixture as a starting point. Mix gently by pipetting or using a plate shaker for 10&ndash30 seconds. Return cells to the incubator for 24&ndash48 hours.
    Note: The total growth medium volume may vary depending on well format and your laboratory&rsquos common practices.
  4. Measure transfection efficiency using an assay appropriate for the reporter gene. For transient transfection, cells are typically assayed 24&ndash48 hours after transfection.

Optimizing Transfection with Lipid Reagents

In previous sections, we discussed factors that influence transfection success. Here we present a method to optimize transfection of a particular cell line with a single transfection reagent. For more modern lipid-based reagents such as the ViaFect&trade Transfection Reagent, we recommend initially testing 50&ndash100ng of DNA per well of a 96-well plate at reagent:DNA ratios of 3:1 or 4:1 for adherent cell lines or 2:1 for suspension cell lines. Figure 4 outlines a typical optimization matrix. When preparing the ViaFect&trade Transfection Reagent:DNA complex, the incubation time may require optimization we recommend 5&ndash20 minutes.

Figure 4. Transfection optimization using the ViaFect&trade Transfection Reagent. TF-1 cells were plated in growth media without antibiotics at 30,000 cells per well in a white 96-well assay plate and transfected with a CMV-luc2 plasmid using various lipid (ViaFect&trade Transfection Reagent):DNA ratios. The DNA concentration was held constant at 1&mug per 100&mul of Opti-MEM® I reduced-serum medium, and the amount of ViaFect&trade Transfection Reagent was varied to obtain the indicated ratios. Either 5 or 10&mul of transfection complex was then added to cells in the 96-well plate. Twenty-four hours after transfection, the ONE-Glo&trade + Tox Luciferase Assay was performed. Note that 0:1 is the negative control with DNA but no lipid. These results show that, for this particular cell line, a 2:1 ViaFect&trade Transfection Reagent:DNA ratio gave optimal results.

For traditional cationic lipid reagents, we recommend testing various amounts of transfected DNA (0.25, 0.5, 0.75 and 1µg per well in a 24-well plate) at two charge ratios of lipid reagent to DNA (2:1 and 4:1 see Figure 5). This brief optimization can be performed using a transfection interval of 1 hour under serum-free conditions. One 24-well culture plate per reagent is required for the brief optimization with adherent cells (three replicates per DNA amount).

Figure 5. Suggested plating format for initial optimization of cationic lipid transfection conditions.

A more thorough optimization can be performed to screen additional charge ratios, time points and effects of serum-containing medium at the DNA amounts found to be optimal during initial optimization studies. One hour or two hours for the transfection interval is optimal for many cell lines. In some cases, however, it may be necessary to test charge ratios and transfection intervals outside of these ranges to achieve optimal gene transfer.

Some transfection methods require removing medium with reagent after incubation others do not. Read the technical literature accompanying the selected transfection reagent to learn which method is appropriate for your system. However, if there is excessive cell death during transfection, consider decreasing time of exposure to the transfection reagent, decreasing the amounts of DNA and reagent added to cells, plating additional cells and removing the reagent after the incubation period and adding complete medium.

Endpoint Assays

Many transient expression assays use lytic reporter assays like the Dual-Luciferase® Assay System (Cat.# E1910) and Bright-Glo&trade Assay System (Cat.# E2610) 24 hours after transfection. The Nano-Glo® Dual-Luciferase® Reporter Assay System (Cat.# N1610) allows detection in just a few hours after transfection. However, the time frame for most assays can vary (24&ndash72 hours after transfection), depending on protein expression levels. Reporter-protein assays use colorimetric, radioactive or luminescent methods to measure enzyme activity present in a cell lysate. Some assays (e.g., Luciferase Assay System) require that cells are lysed in a buffer after removing the medium, then mixed with a separate assay reagent to determine luciferase activity. Others are homogeneous assays (e.g., Bright-Glo&trade Assay System) that include the lysis reagent and assay reagent in the same solution and can be added directly to cells in medium. Examine the reporter assay results and determine where the greatest expression (highest reading) occurred. These are the conditions to use with your constructs of interest.

Alternative detection methods include histochemical staining of cells (determining the percentage of cells that are stained in the presence of the reporter gene substrate Figure 6), fluorescence microscopy (Figure 7) or cell sorting if using a fluorescent reporter like the Monster Green® Fluorescent Protein phMGFP Vector (Cat.# E6421).

Figure 6. Histochemical staining of RAW 264.7 cells for &beta-galactosidase activity. RAW 264.7 cells were transfected using 0.1µg DNA per well and a 3:1 ratio of FuGENE® HD to DNA. Complexes were formed for 5 minutes prior to applying 5µl of the complex mixture to 50,000 cells/well in a 96-well plate. Twenty-four hours post-transfection, cells were stained for &beta-galactosidase activity using X-gal. Data courtesy of Fugent, LLC.

Figure 7. Fluorescence microscopy of cells transfected with ViaFect&trade Transfection Reagent. iCell® human tissue cells in 96-well plates were transfected with ViaFect&trade Transfection Reagent and a GFP reporter plasmid at the specified reagent:DNA ratio and GFP expression was imaged after transfection. Panel A. iCell® Hepatocytes with a 6:1 reagent:DNA ratio imaged one day after transfection. Panel B. iCell® Cardiomyocytes with a 2:1 reagent:DNA ratio imaged one day after transfection. Panel C. iCell® DopaNeurons with a 4:1 reagent:DNA ratio imaged three days after transfection. Data courtesy of Cellular Dynamics International.

Real-Time Assays

For some types of transfection experiments, especially those examining the changes in gene expression levels associated with pathological mechanisms, monitoring reporter activity in living cells is desirable. Such real-time assays can provide valuable information on the expression of multiple genes in a dynamic fashion. The Nano-Glo® Live Cell detection options (Cat.# N2011) are designed to detect NanoLuc® luminescence from living cells using nonlytic protocols. These assays monitor luminescence at a single time point or continuously for up to 72 hours without compromising cell viability.


Materials and Methods

FUS constructs

FUS and altFUS sequences were obtained from Bio Basic Gene Synthesis service. All FUS constructs were subcloned into pcDNA3.1- (Invitrogen) using Gibson assembly (New England Biolabs, E26115). FUS and altFUS wild-type sequences correspond to that of the human FUS canonical transcript (ENST00000254108 or NM_004960). FUS and altFUS proteins were either untagged or tagged with V5 (GKPIPNPLLGLDST) and 2 F lag (DYKDDDDKDYKDDDDK), respectively. When tagged, FUS was tagged on the N-terminal, and altFUS was tagged on the C-terminal. For immunofluorescence assays, N-terminal GFP-tagged FUS was also cloned into pcDNA3.1- by Gibson assembly. The necessary gBlocks were purchased from IDT. The monocistronic constructs FUS (Ø) and FUS (Ø) -R495x were generated by mutating all altFUS methionines (ATG) to threonines (ACG). These mutations are synonymous in the FUS CDS (TAT > TAC, both coding for tyrosine). The altFUS-mutated sequence was obtained from Bio Basic Gene Synthesis service and then subcloned in FUS sequences in pcDNA3.1- using Gibson assembly. The bicistronic constructs are named as follows throughout the article: FUS, FUS-R495x, or FUS (F lag ) and FUS ( Flag ) -R495x when altFUS is F lag -tagged in the + 2 reading frame. The monocistronic constructs are named as follows throughout the article: FUS (Ø) or FUS (Ø) -R495x to indicate altFUS absence.

Cell culture, transfections, Western blots and immunofluorescence

HEK293 and HeLa cells cultures tested negative for mycoplasma contamination (ATCC 30–1012K). Transfections, immunofluorescence, confocal analyses and Western blots were carried out as previously described (Vanderperre et al, 2011 ). For FUS knockdown, 150,000 HEK293 cells in a 6-well plate were transfected with 25 nM FUS SMARTpool: siGENOME siRNA (Dharmacon, Canada, L-009497-00-0005) or ON-TARGET plus Nontargeting pool siRNAs (Dharmacon, D-001810-10-05) with DharmaFECT one transfection reagent (Dharmacon, T-2001–02) according to the manufacturer’s protocol. Cell media were changed every 24 h, and cells were processed 72 h after transfection. For immunofluorescence, primary antibodies were diluted as follows: anti-Flag (Sigma, F1804) 1/1,000, anti-TOMM20 (Abcam, ab186734) 1/500, anti-V5 (Cell Signalling Technologies, #13202) 1/1,000, anti-TDP-43 (ProteinTech, 10782-2-AP) 1/500 and anti-TIAR (Cell Signalling Technologies, #8611) 1/1,600. For Western blots, primary antibodies were diluted as follows: anti-Flag (Sigma, F1804) 1/8,000, anti-V5 (Sigma, V8012) 1/8,000, anti-actin (Sigma, A5441) 1/40,000, anti-FUS (Abcam, ab84078) 1/500, anti-altFUS (Abcam, custom antibody) 1/3,000, anti-H3 (Cell Signalling Technologies, 9715S) 1/6,000, anti-LC3 (Cell Signalling Technologies, #2775) 1/1,000, anti-Hsp70 (Thermo Fisher Scientific, MA3-028) 1/1,000, anti-Tubulin (Thermo Fisher Scientific, a11126) 1/2,000 and anti-VDAC (Abcam, ab15895) 1/10,000. The altFUS antibody was generated by injection two rabbits, each with 2 unique altFUS peptide (Appendix Fig S1A). The purified antibody from rabbit 2 was used in this study at a 1/2000 dilution. As our custom antibody is a polyclonal one raised against 2 peptides, it sadly did not recognize the native form of the protein. Mitochondrial morphology was evaluated using the microP tool (Peng et al, 2011 ). A minimum of 100 cells per replicate were counted across 3 independent experiments (n = 3, i.e. minimum 300 cells for each experimental condition). Colocalization analyses were performed using the JACoP plugin (Just Another Colocalization Plugin) implemented in ImageJ software, as previously described (Samandi et al, 2017 ). When specified, images obtained by confocal microscopy on the Leica TCS SP8 STED 3X were deconvolved using the Huygens software (Scientific Volume Imaging B.V., Hilversum, Netherlands). The software uses a signal reassignment algorithm for deconvolution, and identical deconvolution parameters were applied to all images. The default parameters were used, including the classic maximum-likelihood estimation (CMLE) algorithm, signal-to-noise ratio and background estimation radius. The maximum iteration number was set at 30. Human tissue lysates for altFUS endogenous expression were purchased from Zyagen Laboratories (San Diego, California, USA).

R ibo -seq data and conservation analyses

Global aggregate reads for initiating ribosomes and elongating ribosomes footprints across all available studies were downloaded from the Gwips portal (https://gwips.ucc.ie/), for Homo sapiens and for Mus musculus. For altFUS protein conservation analysis, all FUS mRNAs with at least EST evidence were retrieved across all available species from NCBI RefSeq. We performed an in silico 3-frame translation and retrieved the best matching protein sequence per species that displayed a minimum of 20% sequence identity with the human altFUS sequence over 25% of human altFUS length. AltFUS homologous sequences were found in 83 species, and we manually added that of Drosophila melanogaster that displayed a 37.5% sequence identity over 19% of the human altFUS length. All retrieved altFUS sequences were then aligned using Clustalω with default parameters.

Peptide-centric analysis of proteomics data sets

The stand-alone PepQuery tool (v1.0) (Wen et al, 2019 ) was downloaded from the PepQuery website (http://www.pepquery.org/). The tool was run on the following data sets from the TCGA consortium: colon cancer (COCA) proteome, ovarian cancer (OVCA) proteome, phosphoproteome and glycoproteome, and the breast cancer (BRCA) proteome and phosphoproteome. The reference database was set to the Ensembl database (hg38_Ensembl_20190910). The following parameters were set for all runs unless specified: carbamidomethylation of cysteine as fixed modification (as well as iTRAQ 4-plex of K, iTRAQ 4-plex of peptide N-term for BRCA and OVCA) oxidation of methionine as variable modification (as well as iTRAQ 4-plex of Y for BRCA and OVCA) a maximum of 3 modifications per peptides trypsin digestion with maximum of 1 miscleavage precursor tolerance of 10 ppm (20 ppm for COCA) fragment mass tolerance of 0.05 Da (0.6 Da for COCA) and the hyperscore was used as a scoring metric, and 10 000 randoms. For phosphoproteomes, phosphorylation of Y, T and S was added as variable modifications. For the glycoproteome, deamidation of Q and N was added as variable modifications. PepQuery was run in the protein mode, with altFUS (IP_243680) whole sequence as input. Spectra were visualized and drawn using an in-house python script.

Human induced pluripotent stem cell differentiation into motor neurons

Directed differentiation to human iPSC-motor neurons was performed as previously reported (Hall et al, 2017 ). Briefly, iPSCs were maintained on Geltrex (Life Technologies) with Essential 8 Medium Media (Life Technologies) and passaged using EDTA (Life Technologies, 0.5 mM). All cell cultures were maintained at 37°C and 5% carbon dioxide. For motor neuron differentiation, iPSCs were differentiated to neuroepithelium by plating to 100% confluency in chemically defined medium consisting of DMEM/F12 GlutaMAX, Neurobasal, L-Glutamine, N2 supplement, non-essential amino acids, B27 supplement, β-mercaptoethanol (Life Technologies) and insulin (Sigma). Treatment with the following small molecules from day 0–7: 1 µM dorsomorphin (Millipore), 2 µM SB431542 (Tocris Bioscience) and 3.3 µM CHIR99021 (Miltenyi Biotec). At day 8, cells patterned for 7 days with 0.5 µM retinoic acid and 1 µM purmorphamine. At day 14, spinal cord motor neuron precursors were treated with 0.1 µM purmorphamine for a further 4 days before being terminally differentiated for > 10 days in 0.1 µM Compound E (Enzo Life Sciences) to promote cell cycle exit. Throughout the neural conversion and patterning phase (D0-18), the neuroepithelial layer was enzymatically dissociated twice (at D4-5 and D10-12) using dispase (GIBCO, 1 mg/ml).

Preparation of tissue lysates of the motor cortex of ALS patients

Approximately 100mg of motor cortex from 4 sporadic ALS and 4 C9orf72-ALS cases was lysed in 10× RIPA (50 mM Tris–HCl pH 7.8, 150 mM NaCl, 0.5% sodium deoxycholate, 1% NP40 supplemented with protease inhibitors and EDTA) volume using TissueLyser equipment (Qiagen). Lysates were incubated on ice 20 min followed by centrifugation at 20,000 g for 20 min at 4°C. Supernatant was taken as “RIPA fraction”, and pellets were resuspended in RIPA and SDS (final concentration of 2%). 3 sporadic ALS and 3 C9orf72-ALS samples were subsequently used as they were sufficiently concentrated to load 100 ug of proteins onto SDS–PAGE gels.

Recombinant proteins purification

GFP-FUS recombinant protein was purified from HEK293 cells. Briefly, the GFP-FUS construct was transfected in HEK293 cells. Cells were grown up to 80% confluence, rinsed twice with PBS, pelleted and snap-frozen at −80°C. Cells were then resuspended in lysis buffer (10 mM HEPES-NaOH pH 7.6, 3 mM MgCl2, 300 mM KCl, 5% glycerol, 0.5% NP-40) complemented with phosphatase and protease inhibitors, and incubated on ice for 30 mins. The lysates were centrifuged at 16,000 g for 10 mins at 4°C. The supernatant was added to GFP-trap sepharose beads (ChromoTek, Germany) and incubated at 4°C for 3 h on rotation. The beads were then washed 5 times with the washing buffer (10 mM HEPES-NaOH pH 7.6, 3 mM MgCl2, 5% glycerol, 0.5% NP-40) at increasing concentrations of KCl: twice with 100 mM KCl, once with 200 mM KCl and twice with 400 mM KCl. The GFP-FUS protein was then eluted from the beads using 50 μl of acidic glycine solution (0.1 M, pH 3). After centrifugation at 2,500 g for 3 min at 4°C, the elution was transferred to a new Eppendorf tube and equilibrated with 50 μl of Tris solution (1 M ph 8). The elution step was repeated three times for a final total volume of 300 μl.

GST-altFUS recombinant protein was purified from Rosetta TM competent cells. AltFUS sequence was subcloned in the pGEX-4T1. Briefly, GST-altFUS expression was induced in competent cells at a OD600 of 0.5 with 0.5 mM of IPTG and 2% ethanol. Cells were then left to grow for 1.5 h at 37°C. Cells were then centrifuged at 5,000 rpm for 10 mins at 4°C and resuspended in lysis buffer (1 mM EDTA, 250 mM NaCl, 10% glycerol, 1 mM DTT, 25 mg/ml lysozyme, 1% Triton X-100, in PBS) and sonicated prior to incubation on rotation for 30 mins at 4°C. Both the soluble and insoluble fractions were then collected and loaded on an acrylamide gel. After Coomassie coloration, GST-altFUS protein was found mostly in the insoluble fraction. Thus, the pellet was resuspended in 8M Urea lysis buffer to ensure resuspension and then dialysed using a 2K 0.5ML 10/PK Slide-A-Lyser cassette (#PI66205, Thermo Fisher) in the usual lysis buffer (no urea). The dialysed lysate was then incubated with GST-trap sepharose beads (Glutathione Sepharose 4B—GE Healthcare, 17-0756-01) for 3 h on rotation at 4°C. The beads were then washed 5 times: twice with lysis buffer, once with PBS and twice with washing buffer (50 mM Tris pH 8.0, 250 mM NaCl, 0.1% Triton X-100). The GST-altFUS protein was then eluted by incubating the beads in the elution buffer (10 mM reduced glutathione, 50 mM Tris pH 8.0, 250 mM NaCl, 10% glycerol and 0.1% Triton X-100) on rotation for 15 mins at 4 °C. After centrifugation at 150 g for 3 min at 4°C, the elution was transferred to a new Eppendorf tube. The elution step was repeated twice.

The GFP-FUS and GST-alFUS recombinant proteins were dosed using a commercial LSD1 recombinant protein (BML-SE544, Enzo Life Sciences) of known concentration. The concentration evaluated by this standard curve was then confirmed by NanoDrop quantification of total protein.

Mitochondrial extracts and cellular fractionation

Mitochondrial extracts were prepared as previously described (Delcourt et al, 2018 ). Briefly, HEK293 cells grown up to 80% confluence, were rinsed twice with PBS and gathered using a cell scraper. Cells were pelleted by centrifugation at 500 g for 10 min at 4°C. Supernatant was discarded, and cells were suspended in mitochondrial buffer (210 mM mannitol, 70 mM sucrose, 1 mM EDTA, 10 mM HEPES-NaOH, pH 7.5, 0.5 mM PMSF and EDTA-free protease inhibitor (Thermo Fisher Scientific)). Cells were disrupted by 15 consecutive passages through a 25G1 0.5 × 25 needle syringe on ice, followed by a 3-min centrifugation at 2,000 g at 4°C. Supernatant was collected, and the pellet was suspended in mitochondrial buffer. The cell disruption was repeated four times, and all retrieved supernatants containing mitochondria were again passed through syringe needle in mitochondrial buffer and cleared by centrifugation for 3 min at 2,000 g at 4°C. Supernatants were pooled and centrifuged for 10 min at 13,000 g at 4°C to pellet mitochondria. The pellet was suspended in 200 μl of mitochondrial buffer until further processing. Cellular fractionation was performed using the Cell Fractionation Kit (#9038S, Cell Signaling Technology). Briefly, HEK293 cells were grown up to 80% confluence, washed twice with PBS and gathered using a cell scraper. Cells were spun at 350 g for 5 min at 4°C, and 2.5 × 10 6 cells were suspended in 500 μl of ice-cold PBS. An aliquot of 100 μl was spun at 350 g for 5 min at 4°C and resuspended in SDS buffer (4% SDS, Tris–HCl 100 mM pH 7.6) and kept as WCL (whole cell lysate). The rest of the collected cells (remaining 400 μl) were spun at 500 g for 5 min at 4°C. Supernatant was discarded, and pellet was resuspended in 500 μl of CIB (cytoplasmic isolation buffer) from the kit, vortexed for 5 sec and incubated on ice for 5 min. After centrifugation at 500 g for 5 min at 4°C, the supernatant was collected as the cytosplasmic fraction. The pellet was resuspended in 500 μl of MIB buffer (membrane isolation buffer) from the kit, vortexed for 15 sec and incubated on ice for 5 min. After centrifugation at 8,000 g for 5 min at 4°C, the supernatant was collected as the membrane and organelles fraction. To each 100 μl of fraction was added 60 μl of loading buffer 1× (from Cold Spring Laemmli sample buffer: 50 mM Tris pH 6.8, 2% SDS, 10% glycerol, 5% β-mercaptoethanol) before processing for Western blot.

Mitochondrial membrane potential measurements

Mitochondrial membrane potential was measured by flow cytometry in HEK293 cells using TMRE (tetramethylrhodamine ethyl ester, Abcam, ab113852). FCCP was used as a positive control to validate each independent experiment. Cells were grown up to 80% confluence and washed twice with PBS. The cells were then incubated for 5 mins at 37°C, 5% CO2 with PBS/A (0.2% BSA in PBS) solution (experimental) or 3 μM FCCP in PBS/A solution (positive control). Then, 100 nM of TMRE was added and cells were incubated 15 min at 37°C, 5% CO2. After incubation, cells were trypsinized and centrifuged at 800 g for 5 min at 4°C and resuspended in 500 μl of PBS and kept on ice. Cells were immediately analysed by flow cytometry. A gate for living cells was set, as well as a second gate to filter out cell doublets. TMRE fluorescence (PE-A) was recorded over a minimum of 50,000 gated cells for each experimental condition. The mean TMRE fluorescence intensity was measured over 3 independent experiments for each experimental condition.

Stimulated Emission Depletion (STED) microscopy

Samples were prepared as described above for confocal microscopy. A Leica TCS SP8 STED 3X was used with a 100x objective lens and immersion oil for Dual Color STED images. Images were obtained by sequential scanning of a given area. The combination of Alexa Fluor 488 (Thermo Fisher Scientific, A-11017) and Alexa Fluor 568 (Thermo Fisher Scientific, A-21069) dyes was chosen for STED imaging. Alexa Fluor 488 dye was excited with a white light laser (WLL) at 488 nm and was depleted using the 660 nm STED laser. Alexa Fluor 568 dye was excited with a WLL at 561 nm and was depleted using the 660 nm STED laser. The STED laser (660 nm) was applied at 80% of maximum power.

Fast Protein Liquid Chromatography (FPLC) and affinity purification–mass spectrometry (AP-MS)

Mitochondrial extracts of HEK293 cells were centrifuged at 13,000 g for 10 min at 4°C to remove the supernatant and were resuspended in FPLC buffer (50 mM Tris–HCl, 1 mM EDTA, 150 mM NaCl, 1% Triton X-100, pH 7.5, filtered with 0.2-μm filters) at 2 mg/ml for a total of 4 mg of mitochondrial proteins. Samples were incubated on ice for 15 min and then centrifuged at 10,000 g for 5 min at 4°C, and the supernatant was loaded in the injection syringe without disrupting the pellet. The FPLC was performed on a HiLoad 16/60 Superdex 200 pg Column (GE Healthcare, Chicago, USA) at 4°C. The column was pre-equilibrated with the FPLC buffer for up to 0.2 CV (column volume), and the sample was applied at a flow rate of 0.5 ml/min with a pressure alarm set at 0.5 MPa. The elution was performed over 72 fractions of 1.5 ml for a maximum of 1.1 CV. For altFUS probing by Western blot, proteins were precipitated from 150 μl of each 4 fractions in technical duplicates. First, 600 μl of methanol was added to each tube and mixed gently, before adding 150 μl of chloroform. Tubes were gently inverted 10 times before adding 450 μl of milli-Q H2O and vortexing briefly. After centrifugation at 12,000 g for 3 min, the upper phase was discarded, and 400 μl of methanol was added. Tubes are centrifuged at 16,000 g for 4 min, and the pellet was resuspended in loading buffer. For interactome analysis by mass spectrometry, fractions of interest (8–14) were pooled together and incubated at 4°C overnight with magnetic F lag beads (Sigma, M8823) pre-conditioned with FPLC buffer. The beads were then washed 3 times with 5 ml of FPLC buffer and 5 times with 5 ml of 20 mM NH4HCO3 (ABC). Proteins were eluted and reduced from the beads using 10 mM DTT (15 min at 55°C) and then treated with 20 mM IAA (1 h at room temperature in the dark). Proteins were digested overnight by adding 1 μg of Trypsin (Promega, Madison, Wisconsin) in 100 μl ABC at 37°C overnight. Digestion was quenched using 1% formic acid, and supernatant was collected. Beads were washed once with acetonitrile/water/formic acid (1/1/0.01 v/v) and pooled with supernatant. Peptides were dried with a SpeedVac, desalted using a C18 Zip-Tip (Millipore Sigma, Etobicoke, Ontario, Canada) and resuspended into 30 μl of 1% formic acid in water prior to MS analysis.

Mass spectrometry analysis

Peptides were separated in a PepMap C18 Nano Column (75 μm × 50 cm, Thermo Fisher Scientific). The setup used a 0–35% gradient (0–215 min) of 90% acetonitrile, 0.1% formic acid at a flow rate of 200 nl/min followed by acetonitrile wash and column re-equilibration for a total gradient duration of 4 h with a RSLC Ultimate 3000 (Thermo Fisher Scientific, Dionex). Peptides were sprayed using an EASY-Spray source (Thermo Fisher Scientific) at 2 kV coupled to a Quadrupole-Orbitrap (Q Exactive, Thermo Fisher Scientific) mass spectrometer. Full-MS spectra within a m/z 350–1,600 mass range at 70,000 resolution were acquired with an automatic gain control (AGC) target of 1e6 and a maximum accumulation time (maximum IT) of 20 ms. Fragmentation (MS/MS) of the top ten ions detected in the Full-MS scan at 17,500 resolution, AGC target of 5e5, a maximum IT of 60 ms with a fixed first mass of 50 within a 3 m/z isolation window at a normalized collision energy (NCE) of 25. Dynamic exclusion was set to 40 s. Mass spectrometry RAW files were searched with Andromeda search engine implemented in MaxQuant 1.5.5.1. The digestion mode was set at Trypsin/P with a maximum of two missed cleavages per peptides. Oxidation of methionine and acetylation of N-terminal were set as variable modifications, and carbamidomethylation of cysteine was set as fixed modification. Precursor and fragment tolerances were set at 4.5 and 20 ppm, respectively. Files were searched using a target-decoy approach against UniprotKB (Homo sapiens 03/2017 release) with the addition of altFUS sequence for a total of 92,949 entries. The false discovery rate (FDR) was set at 1% for peptide spectrum match, peptide and protein levels. Protein interactions were then scored using the SAINT algorithm, with Mock cells as control and the magnetic F lag beads in HEK293 cell CRAPome (Choi et al, 2011 ). Proteins with a SAINT score above 0.99 were considered, as well as those presenting a SAINT score above 0.88 with a minimum of two unique peptides.

Biological processes and cellular compartment enrichment analysis

Proteins identified in altFUS interactome were screened for cellular compartment and biological processes enrichment using Gene Ontology (GO) enrichment. Proteins were queried against the whole human proteome for cellular compartment and against the human mitochondrial proteome (MitoCarta 2.0) for biological processes. The statistical analysis used Fisher’s exact test with a FDR set at 1%.

Autophagic flux measurements

The mCherry-GFP-LC3 was used to evaluate the autophagic vesicles within HeLa cells by confocal microscopy. Before fusion with the lysosome, the LC3 molecules on the autophagosome display a yellow fluorescence (combined mCherry and GFP fluorescence). After fusion, the GFP fluorescence is quenched by the lysosomal pH, and as such, the LC3 molecules display a red signal (mCherry alone). This allows a visual representation of the autophagic flux in a given cell. Cells treated with 50 nM bafilomycin for 4 h were used as a positive control to validate each independent experiment. Observations were made across 2 technical duplicates for each biological condition, across 3 independent experiments (n = 3). Alternatively, the autophagic flux was also evaluated by LC3 probing before and after bafilomycin treatment (50 nM for 4 h). The quantification corresponds to the treated/ untreated ratio of LC3-II abundance.

Cytoplasmic aggregates measurements

Images of HeLa cells were taken by confocal microscopy and then processed using the Image J 3D Objects Counter plugin. FUS cytoplasmic aggregates were then quantified in number and size (μm 2 ) for each cell. A total of 100 cells across two technical replicates were taken for each independent experiment (n = 3, i.e. a minimum of 300 cells per biological conditions).

Transgenic Drosophila and climbing assay

The bicistronic constructs, FUS and FUS-R495x, and the monocistronic constructs, altFUS, FUS (Ø) and FUS (Ø) -R495x, were subcloned in the pUASTattB expression vector for site-specific insertion into attP2 on chromosome 3. Transgenic flies were generated by Best Gene (Best Gene Inc., California, USA). The Elav-GeneSwitch-GAL4 driver (stock number: 43642, genotype: y[1] w[*] PGSG301) and the UAS-mCherry flies (stock number: 35787, genotype: y[1] sc[*] v[1] PattP2) was purchased from Bloomington (Bloomington Drosophila Stock Center, Indiana, USA). All stocks were in a w 1118 background and were cultured on standard medium at 25°C or room temperature. Transgenic flies were crossed with the Elav-GeneSwitch-GAL4 driver strain. The F1 was equally divided into two groups with equal proportion of males and females: one group will feed on standard food supplemented with ethanol (0.2%—control flies) and the other on standard food supplemented with RU-486 at 10 μM diluted in ethanol (induced flies). The climbing assay was performed as previously described (Chambers et al, 2013 ). Briefly, flies were transferred into an empty vial and tapped to the bottom. After 18 s, the number of flies at the top of the tube was considered successful. The assay was done at days 1, 10 and 20 post-induction, across 4 independent F1. Five flies were taken at days 1, 10 and 20 post-induction to validate expression of the proteins of interest.

Statistical analyses and representation

Unless otherwise stated, the statistical analysis carried was a two-way ANOVA with Tukey’s multiple comparison correction. The box plots represent the mean with the 5–95% percentile. The bar graphs represent the mean, and error bars correspond to the standard deviation. When using parametric tests, normality of data distribution was verified beforehand using the Shapiro–Wilk test.


The Molecular Biology of Coronaviruses

This chapter discusses the manipulation of clones of coronavirus and of complementary DNAs (cDNAs) of defective-interfering (DI) RNAs to study coronavirus RNA replication, transcription, recombination, processing and transport of proteins, virion assembly, identification of cell receptors for coronaviruses, and processing of the polymerase. The nature of the coronavirus genome is nonsegmented, single-stranded, and positive-sense RNA. Its size ranges from 27 to 32 kb, which is significantly larger when compared with other RNA viruses. The gene encoding the large surface glycoprotein is up to 4.4 kb, encoding an imposing trimeric, highly glycosylated protein. This soars some 20 nm above the virion envelope, giving the virus the appearance-with a little imagination-of a crown or coronet. Coronavirus research has contributed to the understanding of many aspects of molecular biology in general, such as the mechanism of RNA synthesis, translational control, and protein transport and processing. It remains a treasure capable of generating unexpected insights.


Discussion

The specific transcriptional induction of ciliary genes is precisely controlled and coordinated during cilia formation, and inhibition of transcription impairs ciliogenesis (10). Although the significance of the transcriptional regulation of ciliary gene expression was initially documented in Chlamydomonas (11), the mechanisms underlying this regulation remain poorly understood. To our knowledge, the mechanisms and genes that regulate the transcription and expression of ciliary genes have not been reported in Chlamydomonas or other unicellular organisms.

It has been generally acknowledged that the last eukaryotic common ancestor is a flagellated unicellular organism (26). Despite flagellar genes being under direct transcriptional regulation in both unicellular and multicellular organisms, the evolution of the transcriptional program controlling flagellar assembly is largely unknown (26). Previous studies have shown that RFX transcription factors, FOXJ1 transcription factors, and ciliary genes all evolved independently (26 ⇓ –28). The transcriptional control of ciliary genes generally regulates the formation of one type of cilium in unicellular organisms. Nevertheless, in multicellular organisms, many ciliary genes are differentially regulated with cell type-specific patterns of expression to generate ciliary diversity (10). RFX and FOXJ1 are key regulators of ciliary gene expression in animals (10). However, both are absent from many unicellular organisms, including Chlamydomonas, indicating that the transcriptional control of the ciliogenesis is fundamentally different in unicellular and multicellular organisms.

Previous data suggest that XAP5 proteins play vital roles in many biological processes (17 ⇓ ⇓ –20). However, little is known about the precise molecular function of the conserved nucleus-localized protein XAP5. In the current study, an in vitro binding analysis with predicted transcription regulatory elements revealed that XAP5 could specifically recognize and bind to a motif with a CTGGGGTG core in the promoter regions of ciliary genes (Fig. 3). Moreover, we demonstrated that XAP5 induced the promoter activities of targeted ciliary genes (Fig. 4). Therefore, our results show that XAP5 functions as a transcription factor to regulate flagellar assembly in Chlamydomonas. The transcriptional control of more than 100 flagellar genes was XAP5 dependent, whereas the expression of several flagellar genes was XAP5 independent, implying that the regulation of flagellar genes at the transcriptional level is a highly complex process and that XAP5-independent flagellar gene expression can be regulated by other, undiscovered transcriptional mechanisms. In addition, the promoter regions of almost all the XAP5-dependent flagellar genes contained the putative XAP5-binding motif. Thus, it is possible to use the XAP5-binding motif to predict XAP5 targets, particularly those among flagellar genes, which provides a way to identify genes potentially involved in ciliary assembly and function.


Results

Lack of Mirc11 does not alter NK cell development

Mirc11 consists of miR-23a, miR-27a, and miR-24-2 (Supplementary Fig. S1A) and acts as a switch in regulating the lineage commitment of hematopoietic stem and progenitor cells (HSPC refs. 40, 41). A lack of Mirc11 did not alter the percentages of CD3ε NK1.1 + NK cells in the bone marrow (BM) and other peripheral organs (spleen, liver, lung, and blood Supplementary Fig. S1B). Expression of NK cell maturation markers CD51 (αv) and CD49b was unaffected by Mirc11 deficiency. A decrease in expression of CD51 (42) and CD27, an increase in integrin CD11b (αMβ2), and eventually, loss of CD27 define functional maturation, which were comparable between Mirc11-deficient and WT mice of the two genotypes (Supplementary Fig. S1C and S1D). NK cells defined by the stochastic acquisition of distinct Ly49 receptors such as Ly49C/I, Ly49H, Ly49A, Ly49D, and Ly49G2 (42) were similar (Supplementary Fig. S1C and S1E). Terminally mature NK cells, as defined by KLRG1expression (43), were unaltered in Mirc11 − / relative to WT mice. Thus, lack of Mirc11 did not affect the development and maturation of NK cells.

Effect of Mirc11 on NK cell–mediated cytotoxicity

Next, we evaluated the cytotoxic potential of naïve NK cells against B16F10 tumor cells that express CD155 (induced-self), a ligand for DNAM-1 parental EL4 thymoma cells (self) EL4 cells stably transfected with H60 (EL4 H60 ), a ligand for NKG2D (induced-self) RMA cells (self) RMA/S cells that lack the normal expression of MHC class I H-2 b (missing-self) and YAC1 cells that are of H-2 a strain background (allo). Naïve NK cells were only able to mediate detectable cytotoxicity against missing-self and allo targets, with a statistically significant difference between WT and Mirc11 −/− mice being reached only for missing-self (Supplementary Fig. S2A).

IL2 plays a role in the clearance of B16F10 cells in vivo (44). Purified splenic NK cells expanded with IL2 and tested their cytotoxic potentials on day 7. NK cells from Mirc11 –/– and WT mice were similarly cytotoxic to B16F10 cells and other targets (Supplementary Fig. S2B). The reduction in cytotoxicity of IL15-cultured NK cells from Mirc11 −/− or WT mice was only significant for missing-self or allo targets (Supplementary Fig. S2C). Thus, Mirc11 does not regulate the cytotoxic potentials of NK cells, consistent with the role of IL2, which is known to bypass the requirement of WASp-dependent actin polymerization in NK cells (45).

In a transplant rejection model, host WT or Mirc11 −/− (H-2 b ) mice were challenged with donor-derived splenocytes from C57BL/6 (H-2 b “self”), β2M tm1Unc /J (H-2 b but negative for cell-surface H2-K b and H2-D b “missing-self”) and BALB/c (H-2 d “nonself”) mice. Donor splenocytes were labeled and injected. The number of remaining donor splenocytes after 18 hours represented a surrogate marker for NK cell–mediated killing. Loss of Mirc11 significantly impaired the ability of NK cells to clear “missing-self” but not “nonself” targets (Supplementary Fig. S2D), suggesting that NK cells partially rely on Mirc11 for regulation of cell-mediated cytotoxicity.

Mirc11 is essential for proinflammatory responses of NK cells

To address whether the lack of Mirc11 affects the production of inflammatory cytokines, IL2- or IL15-cultured NK cells were cocultured with tumor targets (Fig. 1A and B). A lack of Mirc11 significantly impaired the ability of NK cells to produce IFNγ. Although culturing NK cells from Mirc11 −/− mice with IL2 helped to overcome the defect in antitumor cytotoxicity, it did not rescue IFNγ production (Fig. 1A). This indicates that Mirc11 regulates the cytotoxicity and cytokine production via distinct mechanisms.

Lack of Mirc11 reduces NK cell–mediated cytokine production in vitro. A and B, Intracellular IFNγ is measured in IL2- or IL15-cultured NK cells from WT (n = 3) and Mirc11 −/− (n = 3) mice after coculture with indicated targets (effector:target = 1:1). C, Quantitative analyses of cytokines and chemokines produced by WT ( , n = 6) or Mirc11 −/− ( , n = 6) NK cells following activation with indicated antibodies and IL12 and IL18. D, Venn diagram depicting differentially expressed genes (FDR < 0.05) in NK cells following anti-NKG2D activation (Mirc11 −/− , n = 3 WT, n = 3). E, Volcano plot depicting alterations in the transcriptome of NK cells (n = 3, 3). We plotted all the genes with–Log10 (P values) greater than 80 at the Y-axis equal to 80. F, Heat map derived from RNA-seq of IL2-cultured NK cells stimulated with anti-NKG2D (n = 3, 3). Bar graphs represent the mean with the standard deviations obtained using unpaired t test (*, P < 0.05 **, < 0.01 ***, <0.001).

To determine the specific signaling pathways affected by the absence of Mirc11, we stimulated NK cells with plate-bound antibodies. IL2-cultured NK cells were activated with anti-NKG2D (DAP10 and DAP12), anti-NCR1 (Fc εRIγ and CD3ζ), anti-CD137 (Lck-Fyn and TRAF2), anti-CD244 (SAP-Fyn), or anti-Ly49H (DAP10 and DAP12). Culture supernatants were analyzed for IFNγ, TNFα, GM-CSF, CCL3 (MIP1α), CCL4 (MIP1β), and CCL5 (RANTES). Generation of these cytokines and chemokines was significantly impaired in NK cells from Mirc11 −/− mice compared with WT (Fig. 1C). IL12 and IL18 can also induce NK cells to produce cytokines (46), and analyses of their supernatants revealed that the production of IFNγ, GM-CSF, CCL3, CCL4, and CCL5 was comparable between Mirc11 −/− and WT mice. Thus, the function of Mirc11 is required downstream of activation receptor–mediated signaling, and that lack of the Mirc11 complex did not render NK cells globally hyporesponsive.

To use an unbiased approach to identify target genes regulated by the Mirc11, we performed transcriptome-wide RNA-seq analyses of purified IL2-cultured splenic NK cells activated with plate-bound anti-NKG2D. Total mRNA was isolated, transcribed, and sequenced. Unsupervised analyses of transcriptional profiles of nonstimulated and anti-NKG2D–activated NK cells using principal component analyses (PCA) revealed that stimulated WT NK cells possessed a transcriptome distinct from Mirc11 −/− NK cells (Supplementary Fig. S3A). We used statistical filtering (P < 0.05 Mann–Whitney, Benjamini–Hochberg correction of stimulated vs. unstimulated) to identify genes that corresponded to different conditions based on the similarity of available transcripts. Among the unstimulated controls, a set of 255 genes was distinctly expressed in NK cells from the WT that were low or not expressed in the Mirc11 −/− NK cells (Fig. 1D). Following anti-NKG2D–mediated activation, WT NK cells differentially expressed 1,074 transcripts compared with the unstimulated, of which 240 overlapped between WT and Mirc11 −/− mice. A total of 834 genes were expressed only in WT NK cells following anti-NKG2D–mediated activation, which was low or absent in the Mirc11 −/− NK cells. Indeed, comparison of the differentially expressed transcripts revealed attenuation of the expression of proinflammatory cytokines, chemokines, and cytotoxic granule-associated factors in the Mirc11 −/− NK cells (Fig. 1E and F). Collectively, this shows that Mirc11 plays a role in positively regulating production of proinflammatory factors.

Mirc11 is obligatory for NK cell–mediated in vivo clearance of L. monocytogenes

To evaluate the role of Mirc11 in the IFNγ-dependent clearance of L. monocytogenes infection, we systemically infected mice with live L. monocytogenes (2 × 10 4 , ∼0.5 LD50) and tested for expression of individual members of Mirc11 in purified splenic NK cells after 48 hours. Significant changes in the expression were observed for the miR-27a transcript, followed by the miR-24-2 and the miR-23a, suggesting a regulatory function of these transcripts during Listeria infection (Fig. 2A). Compared with WT mice, the number of bacteria in the livers of Mirc11 −/− mice was higher after 48 hours (Fig. 2B). We also observed a significant reduction in the percentages of IFNγ-positive CD3ε − NK1.1 + NK cells from Mirc11 −/− compared with WT mice (Fig. 2C) during L. monocytogenes infection.

Mirc11 is obligatory for NK cell–produced IFNγ-dependent in vivo clearance of L. monocytogenes and B16F10. A, Relative expression of members of Mirc11 in NK cells from WT mice (n = 3) following L. monocytogenes infection. B, Quantification of bacterial burden in the liver (n = 9, 9). Bacterial burden from individual mice is shown. C, Percentage of WT (n = 9) and Mirc11 −/− (n = 9) splenic NK cells producing IFNγ after infection with L. monocytogenes. D and E, NK cells isolated from spleens of mixed BM chimeras 1 (n = 5). Total NK cell number (D) and percentage of IFNγ + NK cells (E). F, Venn diagram depicting the number of differentially expressed genes (FDR < 0.05) in fresh NK cells (n = 3, 3, 3, 3) after L. monocytogenes infection. G, Volcano plot depicting alterations in the transcriptomic profiles of NK cells. Genes with –Log10 (P values) greater than 80 at the Y-axis are plotted. H, RNA-seq heat map analyses of freshly isolated NK cells from mice infected with L. monocytogenes (n = 3, 3). I, Pulmonary pseudometastases in lungs of mice after intravenous injection of B16F10 cells (n = 4, 4). Left, freshly isolated, representative lungs. Middle, hematoxylin and eosin–stained lung sections. Right, magnified regions (100 ×). J, Quantification of lung nodules. WT (n = 6) and Mirc11 −/− (n = 8) mice were injected intravenously with either 2 × 10 5 (left) or 10 6 (right) B16F10 cells, and lungs were harvested at 14 or 7 days, respectively. Double-blinded counting was used. K, Relative expression of members of Mirc11 in NK cells isolated from lungs of B16F10-challenged WT mice (n = 3). The data presented are a compilation of three independent experiments showing the mean with standard error. Data were analyzed using an unpaired t test (*, P < 0.05 **, < 0.01 ***, < 0.001). Mann–Whitney U test was used for statistical analyses of BM chimerism.

To ensure that the impairment in cytokine production observed in Mirc11 NK cells was intrinsic, we generated mixed BM chimeras. Equal numbers of BM-derived cells from WT (CD45.1 + B6.SJL) and Mirc11 −/− (CD45.2 + C57BL/6) mice were transferred into Rag2 −/− γc −/− mice. Six weeks later, mice were challenged with L. monocytogenes (2 × 10 4 ). After 48 hours, spleens were analyzed for the percentage of IFNγ-positive CD3ε − NK1.1 + NK cells. Whereas the percentage of NK cells did not differ, indicating comparable survival rates of NK cells between the two genotypes (Fig. 2D), IFNγ production was significantly reduced in NK cells from Mirc11 −/− compared with that from WT mice (Fig. 2E).

To further assess transcriptomic changes, WT and Mirc11 −/− mice were again challenged with L. monocytogenes. After 48 hours, splenic CD3ε − NK1.1 + NK cells were sorted, and total mRNA for each group (unchallenged or challenged) was sequenced. Unsupervised analyses of transcriptomic profiles using PCA revealed that the transcriptome of NK cells from the WT mice was distinct from that from Mirc11 −/− mice (Supplementary Fig. S3A). Statistical analyses revealed the differentially expressed genes in the WT and Mirc11 −/− mice. NK cells derived from WT and Mirc11 −/− mice challenged with L. monocytogenes showed differential expression of 6,519 and 7,537 genes, respectively. NK cells from the spleen of the nonchallenged WT and Mirc11 −/− mice had differential expression of 3,433 and 4,184 genes, respectively. In NK cells derived from nonchallenged mice, a set of 740 genes were differentially expressed in WT compared with those of Mirc11 −/− mice with a difference in the expression of 2-fold or more change in either direction (Fig. 2F). Following L. monocytogenes challenge, NK cells from WT mice expressed 5,302 transcripts, of which 1,217 were expressed by Mirc11 −/− mice. After normalizing quantities of NK cell transcripts from Mirc11 −/− to WT quantities, we plotted all genes using Volcano plots to determine overall change in the transcriptomic profile (Fig. 2G). Analyses of a select panel of transcripts encoding proinflammatory factors using heat maps of normalized expression values (Log2) revealed the inability of NK cells from Mirc11 −/− mice to transcribe many of these genes (Fig. 2H). Transcripts that encode many proinflammatory cytokines (Ifng, Lif, Tnfa, csf2), chemokines (Ccl22, Ccl19, Ccl17, Ccl8, Ccl12, Ccl7), cytotoxic granule–associated factors (Gzmc, Gzmb, Pfr1, Gzmf, Gzma), and interleukins (Il10, Il17a, Il27, Il6, Il12a, Il15, Il22, Il22b, Il18, Il23a, Il7, Il2, Il4) were reduced by more than 50% in NK cells from the Mirc11 −/− mice relative to WT (Fig. 2H). Thus, Mirc11 plays a role in positively regulating the production of proinflammatory factors.

Mirc11 is required for NK cell–mediated in vivo clearance of B16F10

To evaluate the role of the Mirc11 in regulating antitumor response in vivo, we used a B16F10 melanoma-based pulmonary pseudometastasis model (47), where the host mice depend on NK cell–mediated IFNγ production for tumor clearance (48). Mice were challenged intravenously, the lungs were harvested when indicated, and the number of nodules counted. Gross analyses and hematoxylin and eosin staining (Fig. 2I) and quantitation (Fig. 2J) indicated significantly more pseudometastases in the lungs of Mirc11 −/− mice compared with WT mice. Expression of Mirc11 increased following tumor challenge compared with that of nonchallenged mice (Fig. 2K). This suggests that an increase in Mirc11 expression correlates with an augmented ability of NK cells to mediate clearance of B16F10 tumors.

Identification of Mirc11 targets in NK cells

We next identified the mRNA targets of Mirc11 by using RNA-seq data to determine the differentially targeted transcripts using Fisher exact test to compare splenic NK cells that were derived from the nonchallenged and L. monocytogenes–challenged WT and Mirc11 −/− mice. Potential targets of miR-23a, miR-27a, and miR-24-2 were identified based on the presence of the unique “seed” sequences present in the 3′ untranslated region (UTR) of the transcripts (Fig. 3A). We identified a set of targets based on “the aggregate probability of conserved targeting” (PCT ref. 49). In silico predictions that matched the 3′ UTR of the transcripts and their orthologs based on the UCSC whole-genome alignments identified a total of 6,606 genes that could be targeted by Mirc11 (Fig. 3A). Among these, 5,972 were present in the RNA-seq data obtained from splenic NK cells following L. monocytogenes infection (Fig. 3B). Only a fraction of the genes (617) were differentially expressed between WT and the Mirc11 −/− mice under nonchallenged condition (Fig. 3C). However, following Listeria challenge, 2,073 of these transcripts were differentially expressed between the NK cells derived from WT and the Mirc11 −/− mice (Fig. 3D). Lack of miR-24-2 was associated with the most differentially expressed genes (1,169), followed by miR-23a (1,145) and miR-27a (394). Many of these genes can be targeted by one or more of the three miRNAs.

Qualitative alterations in potential target transcripts of Mirc11. A, Venn diagram depicting the number of total target input modulated by miR-23a, miR-27a, and miR-24-2 and the overlapping targets. B, Venn diagram depicting the number of total target genes expressed in NK cells and modulated by miR-23a, miR-27a, and miR-24-2 and the overlapping targets. C and D, Venn diagram depicting the number of total target genes in NK cells freshly isolated or following L. monocytogenes infection. E–G, Hierarchical clustering of all potential target genes of Mirc11 in NK cells from L. monocytogenes–challenged mice. TargetScan 7.1–based in silico analyses was used to identify mRNAs present in the total genome-wide RNA-seq analyses of NK cells from mice that were unchallenged or challenged with L. monocytogenes.

Analyses of the identities of the targets revealed clusters of genes that control apoptosis and cell survival, metabolic regulators, transcriptional activators of cytokines and chemokines, and signaling proteins (Fig. 3E–G). The first group was transcripts encoding transcriptional activators (Stat1, Ctnnb1p1, Zfp799, Zfp113, Zfp397, Zfp329, IRF4, Pparg) or repressors (ATF3, Runx2, Hic2) that may control production of inflammatory cytokines and chemokines. Alterations in the amount of ATF3, IRF4, and Runx2 transcripts may suggest mechanisms. We found the amounts of ATF3 transcripts reduced in NK cells from Mirc11 −/− mice compared with those from WT. ATF3 interacts with the cis-regulatory element and represses transcription of the Ifng gene (50). Therefore, a reduction in ATF3 does not provide an explanation of the reduction in IFNγ or other inflammatory cytokine production that we observe in Mirc11 −/− mice. After L. monocytogenes infection, we observed an increase in IRF4-encoding transcripts in the NK cells from WT but not Mirc11 −/− mice. Mirc11 has a broader transcriptional influence, since genes that were not encoding cytokines and chemokines were also altered.

Mirc11 is a positive regulator of NF-κB– and AP-1–mediated gene transcription

Transcriptional induction of proinflammatory factors is primarily mediated by NF-κB (p50/Rel-A) and AP-1 (c-Fos/c-Jun ref. 51). Analyses of the activation status of Jnk1/2 that is upstream of NF-κB and AP-1 revealed a reduction in their phosphorylation following anti-NKG2D–mediated activation (Supplementary Fig. S3B). Similar reductions were seen in the phosphorylation of Erk1/2 but not of p38. To define the effect of Mirc11 on NF-κB and AP-1 activation, we performed in silico regulatory network genome-wide analyses using Detecting Mechanism of Action by Network Dysregulation (DeMAND ref. 52) and a precompiled Bayesian network based on the gene-expression profiles of 254 B-cell lymphoma cell preparations on U95av2 arrays. Following stimulation with anti-NKG2D, the DeMAND analysis revealed that the Rel-A (p65, P < 10 −7 ) and Jun (P < 10 −8 ) networks were altered between the WT and Mirc11 − / NK cells (Supplementary Fig. S4A and S4B). Likewise, gene network analysis using the IPA software tool revealed that the NF-κB (P < 10 −42 ) and AP-1 (P < 10 −13 ) pathways were enriched among the differentially expressed gene sets in NK cells derived from the L. monocytogenes–challenged mice (Fig. 4). A total of 280 differentially expressed genes were identified with 64 upregulated and 216 downregulated that are transcriptionally regulated by NF-κB. A total of 320 differentially expressed genes were identified, with 132 upregulated and 188 downregulated that are the target genes of AP-1. Hierarchical clustering revealed distinct gene groups either upregulated or downregulated as presented in heat maps (Fig. 4A and B). Functional classification and Fisher exact test further identified enriched transcripts that encode proinflammatory cytokines, chemokines, and other transcription factors. Thus, the reduced response of Mirc11 −/− NK cells is most likely due to disruption of the activation of NF-κB and AP-1.

Mirc11 targets NF-κB– and AP-1–mediated gene transcriptions. Hierarchical clustering of NF-κB (A) or AP-1 (B) target genes that are differentially expressed between NK cells derived from WT and Mirc11 −/− mice using an unpaired t test following L. monocytogenes infection. Gene sets were identified using IPA informatics software through classification into gene ontology (GO) categories with an FDR of 0.01% based on biological process and molecular function categories with a minimum of 2-fold change restriction. RNA-seq data from WT (n = 3, 3) and Mirc11 −/− (n = 3, 3) are compared and shown.

To validate these in silico findings, we determined the functional status of both NF-κB and AP-1 in NK cells from Mirc11 − / mice. NK cells from the WT and Mirc11 −/− mice were stimulated with plate-bound anti-NKG2D, prepared nuclear lysates, and quantified the binding of NF-κB and AP-1 to chromosomal DNA using electrophoretic mobility shift assay (EMSA). We found that both NF-κB and AP-1 pathways are activated in WT NK cells but are reduced in Mirc11 −/− NK cells (Fig. 5A).

Mirc11 targets NF-κB– and AP-1–mediated gene transcriptions. A, Nuclear translocation of NF-κB and AP-1 in NK cells following activation with anti-NKG2D (n = 3, 3). Data presented are a representative of three independent experiments. B, Venn diagram of the number of differentially expressed (FDR < 0.05) and shared transcripts from gene expression profiles of NK cells from in vivo L. monocytogenes–challenged mice (n = 3, 3) and in vitro anti-NKG2D–activated (n = 3, 3) cultures. C, Differentially expressed known target transcripts of NF-κB and AP-1 in the absence of Mirc11 that are shared between NK cells from in vivo L. monocytogenes–challenged mice (n = 3, 3) and in NK cells that were activated in vitro with anti-NKG2D (n = 3, 3). D, GSEA was used to show and compare the set of gene targets that are regulated downstream of NF-κB and TNFα in NK cells that were anti-NKG2D activated (n = 3, 3) or NK cells derived from L. monocytogenes–challenged WT (n = 3) and Mirc11 −/− (n = 3) mice. The data presented are a compilation of three independent experiments.

Next, we compared the gene-expression profiles from both the RNA-seq libraries made from NK cells activated with anti-NKG2D and NK cells from L. monocytogenes–infected mice to identify shared and common targets. NK cells from L. monocytogenes–challenged mice (WT vs. Mirc11 −/− mice) contained the 6,073 genes that were differentially expressed compared with 628 genes that were differentially expressed in NK cells that were stimulated in vitro with anti-NKG2D (WT vs. Mirc11 −/− mice). There were 446 genes that were shared between the two RNA-seq libraries, of which many were known to be transcriptionally regulated by NF-κB, AP-1, or both (Fig. 5B). These shared groups of genes included proinflammatory cytokines and chemokines. Expression of most of the target genes in NK cells from Mirc11 −/− mice was significantly reduced, indicating that Mirc11 functions as a central repressor of one or more negative regulators of NF-κB and AP-1 activation pathways (Fig. 5C). Also, we evaluated the gene set enrichment for NF-κB and TNFα response pathways. Analysis of NK cells stimulated either in vitro with anti-NKG2D or derived from L. monocytogenes–infected mice revealed that these pathways were significantly impaired in NK cells lacking Mirc11 (FDR-adjusted P < 0.05) that was stimulated either in vitro with anti-NKG2D or derived from L. monocytogenes–infected mice (Fig. 5D). Overall, transcriptome-wide RNA-seq analyses revealed that the primary target genes of the Mirc11 involved transcriptional regulation by NF-κB and AP-1. NK cells from WT mice infected with L. monocytogenes exhibited a proinflammatory gene set similar to that of human chronic inflammatory bowel disease however, NK cells from Mirc11 −/− mice lacked this profile (Supplementary Fig. S5).

A20, Cbl-b, and Itch are direct targets of Mirc11 in mice and humans

We tested whether the deubiquitinating enzymes and E3 ligases with membrane proximal functions were principal targets of Mirc11. We analyzed two deubiquitinating enzymes (A20 and Cyld) and two E3 ligases (Cbl-b and Itch) with established roles in NF-κB and AP-1 signaling (53) in ex vivo–purified splenic NK cells 48 hours after L. monocytogenes infection. NK cells from noninfected WT mice contained all four proteins (Fig. 6A). Upon L. monocytogenes infection, splenic NK cells from WT mice reduced expression of Cbl-b, Cyld, and Itch, whereas expression of A20 remained undetectable. NK cells from Mirc11 −/− mice had more A20, Cbl-b, Cyld, and Itch regardless of the infection status. We validated these results in IL2-cultured splenic NK cells activated with plate-bound anti-NKG2D. NK cells from WT and deficient mice expressed all four proteins, but expression of all four was higher in NK cells from Mirc11 −/− mice (Fig. 6B).

Mirc11 targets members of E3 ligases in NK cells from mice and humans. A, Expression of E3 ligases A20, Cbl-b, Itch, and Cyld in freshly isolated NK cells from WT (n = 2) and Mirc11 −/− (n = 2) mice following L. monocytogenes infection. B, Expression of A20, Cbl-b, Itch, and Cyld E3 ligases in IL15-cultured NK cells following anti-NKG2D activation (n = 3, 3). C, Predicted interactions between the 3′ UTR of transcripts encoding A20, Cbl-b, Itch, and Cyld containing target sequences and members of Mirc11. D, Dual luciferase assay measuring activity of miR-23a, miR-24-2, miR-27a, or control mimetics (CM) as a ratio of Renilla to firefly luciferase on the 3′ UTR of select target genes in HEK293T cells 2 days after transfection. Data are normalized to control 3′ UTR plus no miRNA condition. Transfection was done in triplicates, and the average with standard deviation is shown. E, Forced expression of the members of the Mirc11 cluster in human NK cells enhances IFNγ production. IFNγ production by primary human NK cells transduced with pre-miR lentiviral vectors for human miR-23a, miR-24-2, or miR-27a or CM. Forty-eight hours following transduction, NK cells were cocultured with K562 cells, and the percentage of intracellular IFNγ + CD3ε − CD56 + NK cells was enumerated. NK cells from a total of 4 normal healthy individuals were used. The data presented are a compilation of three independent experiments showing the mean with standard error obtained using an unpaired t test (**, P < 0.01 ***, < 0.001).

To identify the mechanistic link between Mirc11 and these E3 ligases, we analyzed the 3′ UTR sequences of A20, Cbl-b, Cyld, and Itch. Using TargetScan, we found that the 3′ UTR of all four E3 ligases contained predicted binding sites for one or more members of Mirc11 (Fig. 6C). To identify which miRNA targeted Tnfaip3 (A20), we cloned their putative interacting sequences from 3′ UTR regions downstream of the firefly luciferase reporter gene in a pMIR-Report vector. We identified and cloned one, two, three, and two 3′ UTR regions of Tnfaip3, Cblb, Cyld, and Itch, respectively (Fig. 6C). These sequences or control mimetics were cotransduced with vectors expressing miRNA23a, miRNA24-2, and miRNA27a into HEK293T cells. After 48 hours, cells were lysed, and the luciferase activity was measured.

The 3′ UTR of Cblb contained two seed matches between 3,216–3,236 and 5,879–5,901 nts targeted by miR-27a-3p and miR-23a-3p, respectively. Although the proximal sequence for miR-27a-3p contained 13 noncontiguous nucleotides (out of 21) that were complementary, it was unable to block the translation of luciferase. However, the distal sequence that was targeted by miR-23a-3p with a 7mer seed match was functional as indicated by the reduction in luciferase activity. The 3′ UTR of Cyld contained three sequences between 3,842–3,864, 4,031–4,051, and 5,979–6,000 nts, all of which were targeted by miR-24-2-3p. However, none of these could block the translation of luciferase. miR-24-1, the paralog of miR-24-2, contains an identical sequence (54). The 3′ UTR of Itch contained two seed matches between 2,882–2,900 and 4,649–4,669 nts that were targeted by miR-27a-3p and miR-23a-3p, respectively. Incorporation of the target sequences in the 3′UTR of luciferase indicated that the miR-27a-3p could reduce the translation of luciferase but not the miR-23a-3p. Thus, miRNA23a significantly reduced the luciferase activity of vector that contained the 3′ UTR of Tnfaip3. Similarly, miRNA27a reduced luciferase activity of vectors containing 3′ UTR sequences of either Cblb or Itch (Fig. 6D). With three different 3′ UTR sequences of Cyld, we did not observe any reduction of luciferase activity, indicating that Mirc11 members may not target Cyld (Fig. 6D). Thus, Mirc11 controls the activity of A20, Cbl-b, and Itch in NK cells by binding to the 3′ UTR of their respective transcripts and regulating their translation.

We investigated whether this regulatory mechanism was active in human NK cells. We purified CD3ε − CD56 + NK cells from human peripheral blood mononuclear cells (PBMC) and transduced them with lentiviral pLenti-TetCMV vectors encoding individual pre-miRs or all three pre-miRs that encode the members of the Mirc11. NK cells were activated with plate-bound anti-NKG2D (1D11, 5 μg/mL) for 24 hours, and the percentage of IFNγ-positive cells was quantified by flow cytometry. Transduction of pre-miR23a or pre-miR-27a increased the percentage of IFNγ + NK cells compared with that of empty vector (Fig. 6E). Out of 4 individual PBMCs transduced with pre-miR24-2, only one showed an augmentation of percentage of IFNγ + NK cells. These results indicate that Mirc11 suppresses the translation of specific deubiquitinating enzymes or E3 ligases to allow maximum threshold of activation.

Mirc11 augments K63 polyubiquitination of TRAF6

To establish the roles of A20, Cbl-b, Cyld, and Itch as suppressors of NK cell activation, we used their respective gene knockout mice. The total number of splenic NK cells or the expression of KLRG1 between the knockout mice to their respective WT controls was comparable (Supplementary Fig. S6A and S6B). Splenic NK cells from Tnfaip3 fl/fl Rosa Cre-ER mice were cultured with IL15 for 7 days, and on day 4, tamoxifen was added to the culture to induce the deletion of A20-encoding alleles. Splenic NK cells from mice lacking Cblb, Cyld, and Itch were also prepared by culturing with IL15. Although NK cells from three of the knockout models mediated comparable cytotoxicity against EL4 H60 , RMA/S, and YAC1 compared with WT controls (Supplementary Fig. S6C), NK cells without Cbl-b exhibited increased killing. Next, we stimulated the IL15-cultured NK cells with plate-bound anti-NKG2D for 18 hours, and the supernatants were analyzed for IFNγ, GM-CSF, CCL3, CCL4, and CCL5 (Supplementary Fig. S7). The absence of Tnfaip3, Cblb, Cyld, or Itch significantly increased the generation of these cytokines and chemokines, thus corroborating data from the Mirc11 −/− mice, suggesting that this regulatory mechanism primarily affects the production of proinflammatory factors.

A lack of Mirc11 reduces K63 and increases K48 polyubiquitination of TRAF6

K63-polyubiquitination of TRAF6 recruits TAB2 and TAB3, which activate TAK1 (55) to ultimately activate NF-κB (56). We examined TRAF6 ubiquitination in NK cells following anti-NKG2D mAb–mediated activation by immunoprecipitation from NK cell lysates and probed for K63 and K48 ubiquitination. We observed a ladder of high molecular mass of K63-polyubiquitinated TRAF6 in WT (Fig. 7A, lanes 1 and 2) but not Mirc11 −/− NK cells (Fig. 7A, lanes 3 and 4). However, TRAF6 did not contain any K48 polyubiquitination in WT NK cells (Fig. 7B, lanes 1 and 2) but was abundant in Mirc11 −/− NK cells (Fig. 7B, lanes 3 and 4). K63 ubiquitination of RIP1 was augmented in WT NK cells but not in Mirc11 −/− NK cells (Fig. 7C). We verified TRAF2 modification after activation and did not observe any change in K63 ubiquitination (Fig. 7D, lanes 1–4), suggesting TRAF6 is the primary target of A20-, Cyld-, Cbl-b–, and Itch-mediated suppression of cytokine production.

Lack of Mirc11 reduces K63 and increases K48 polyubiquitination of TRAF6. A, IL15-cultured NK cells (n = 2, 2) were activated with anti-NKG2D, TRAF6 was immunoprecipitated, and the extent of K63 polyubiquitination was analyzed. B, Immunoprecipitated TRAF6 was analyzed for the extent of K48 polyubiquitination (n = 2, 2). C, IL15-cultured NK cells were activated with anti-NKG2D for 15 minutes, RIP1 was immunoprecipitated, and the extent of K63 polyubiquitination was analyzed (n = 2, 2). D, IL15-cultured NK cells were activated with anti-NKG2D for 15 minutes, TRAF2 was immunoprecipitated, and the extent of K63 polyubiquitination is shown (n = 2, 2). E, IL15-cultured NK cells were activated with anti-NKG2D in the presence of either recombinant mutant K63 or mutant K48 ubiquitin proteins. Eighteen hours following activation, supernatants were analyzed for the production of indicated cytokines (n = 3, 3). The data are a compilation of three independent experiments plotting the mean, with the error bars representing the standard error of the mean analyzing the results using an unpaired t test (**, P < 0.01 ***, <0.001).

To confirm these findings, we incubated IL15-cultured NK cells with mutant K63 or mutant K48 ubiquitin proteins that cannot form polyubiquitin chains with other ubiquitin molecules (57). We coincubated NK cells with either K63R or K48R ubiquitins, activated with anti-NKG2D, and quantified the amount of IFNγ. Incubation of NK cells from WT but not Mirc11 −/− mice with K63R significantly reduced the production of IFNγ, TNFα, GM-CSF, CCL4, and CCL5 (Fig. 7E). Addition of K48R to NK cells from neither WT nor Mirc11 −/− mice augmented the production of these cytokines. Thus, Mirc11 temporally targets and silences these E3 ligases during the active phase of receptor-mediated stimulation, which enabled optimal NK cell–mediated production of proinflammatory factors.


RESULTS

PKD1, but not PKD2, plays an important role in murine embryogenesis

PKD serine kinases are large proteins comprising multiple interaction domains, including both protein−protein- and protein−lipid-binding regions, and have the potential to function as scaffolds. To explore the physiological role of PKD2 in vivo, we therefore decided to produce mice where wild-type PKD2 alleles were replaced with mutant alleles encoding alanine substitutions for Ser 707 and Ser 711 in the PKD catalytic domain activation loop region. Phosphorylation of these two serine residues is critical for PKD2 catalytic activity [36], hence studies of PKD2 SSAA mice allows an assessment of the importance of PKD2 catalytic activity in vivo while bypassing any impact of removing the scaffold function of PKD2. Accordingly, PKD2 S707A and S711A mutations were knocked into the wild-type Prkd2 locus in mouse ES cells by homologous recombination. The resulting ES cells were used to derive mice expressing PKD2 SSAA under the control of its natural promoter (Figure 1). Previous studies have shown that homozygous deletion of PKD1 alleles causes embryonic lethality [24]. In striking contrast, homozygous PKD2 SSAA -knockin mice were viable, fertile and were phenotypically indistinguishable from their wild-type littermates (Figure 1C and results not shown). This suggests that normal embryonic development is not perturbed by the loss of PKD2 catalytic function.

Generation of mice with a knockin PKD2 mutation

(A) Depiction of the endogenous mouse Prkd2 allele containing exons 10–18, the knockin construct and the targeted allele with the neomycin cassette removed by Cre recombinase. The black/grey rectangles represent exons and the black arrowheads represent LoxP sites. Thick black lines indicate the positions of the probes used for Southern blot analysis. The knockin allele containing the Ser 707 /Ser 711 mutation in exon 16 is illustrated as a grey rectangle. (B) Genotypes of PKD2 SSAA mutant mice were determined by PCR amplification of genomic DNA. The wild-type allele generates a 236 bp product, whereas the knockin allele generates a 344 bp product. The larger knockin allele product is due to the presence of the 108 bp LoxP site and flanking region, which remains in an intronic region following Cre-mediated excision of the neomycin-selection cassette. (C) Heterozygous matings for PKD2 SSAA mice were set up and the progeny were genotyped as described in the Materials and methods section. The number (and percentage) of each genotype observed followed by its expected Mendelian frequency. P=0.482 (χ 2 test).

(A) Depiction of the endogenous mouse Prkd2 allele containing exons 10–18, the knockin construct and the targeted allele with the neomycin cassette removed by Cre recombinase. The black/grey rectangles represent exons and the black arrowheads represent LoxP sites. Thick black lines indicate the positions of the probes used for Southern blot analysis. The knockin allele containing the Ser 707 /Ser 711 mutation in exon 16 is illustrated as a grey rectangle. (B) Genotypes of PKD2 SSAA mutant mice were determined by PCR amplification of genomic DNA. The wild-type allele generates a 236 bp product, whereas the knockin allele generates a 344 bp product. The larger knockin allele product is due to the presence of the 108 bp LoxP site and flanking region, which remains in an intronic region following Cre-mediated excision of the neomycin-selection cassette. (C) Heterozygous matings for PKD2 SSAA mice were set up and the progeny were genotyped as described in the Materials and methods section. The number (and percentage) of each genotype observed followed by its expected Mendelian frequency. P=0.482 (χ 2 test).

We considered the possibility that the small amount of residual catalytic activity still present in the PKD2 SSAA mutant protein (see Figure 4A) might be sufficient to permit normal embryogenesis. Therefore, to validate our hypothesis, we generated two additional PKD mutant mouse models. The first had wild-type PKD1 alleles replaced by mutant PKD1 alleles lacking the critical PKC-dependent serine phosphorylation sites, Ser 744 and Ser 748 (Figure 2A). We also generated PKD2-deficient mice (PKD2 GT mice) using an ES cell line containing a gene-trap cassette inserted into the Prkd2 locus (Figure 2C). Our DNA sequence analysis of this ES cell line confirmed that the cell line contained only a single gene-trap cassette, located within the PKD2 locus. We mapped the gene-trap insertion site to intron 15, which disrupts and eliminates the catalytic domain of PKD2 (Figure 2C). Importantly, PKD2 GT/GT mice were born at the normal expected frequency (Figure 2E) and were viable, fertile and phenotypically indistinguishable from their wild-type littermates. In contrast, when heterozygous PKD1 SSAA mice were inter-crossed, wild-type and heterozygous newborn mice, but not homozygous PKD1 SSAA mice, were easily identified (Figure 2E). Of the 177 observed live births, two mice were homozygous for the PKD1 SSAA allele (

1%). This indicates that homozygous expression of the PKD1 SSAA allele causes embryonic lethality with incomplete penetrance. This is consistent with the report by Olson and colleagues that deletion of exons 12–14 of PKD1, which eliminates the PKD1 catalytic domain, also causes embryonic lethality with incomplete penetrance [24]. Analysis of the stage of embryogenesis that required PKD1 catalytic activity revealed that normal Mendelian frequencies of homozygous PKD1 SSAA fetuses could be identified before day 9.5 of embryogenesis, but not later (Figure 2E). Hence the majority of embryos with homozygous substitutions for PKD1 SSAA alleles perish early in development. Collectively, these results demonstrate that the catalytic activity of PKD1, but not PKD2, is essential for normal mouse embryogenesis.

Generation of PKD2 gene-trap mutant mice and PKD1-knockin mutant mice

(A) Depiction of the endogenous mouse Prkd1 allele containing exons 15–17, the knockin construct and the targeted allele with the neomycin cassette removed by Cre recombinase. The white/grey rectangles represent exons and the black arrowheads represent LoxP sites. Thick black lines indicate the positions of the probes used for Southern blot analysis. The knockin allele containing the Ser 744 /Ser 748 mutation in exon 16 is illustrated as a grey rectangle. (B) PKD1 SSAA -knockin mice were genotyped by PCR-amplification of genomic DNA over the LoxP insertion site as described in the Materials and methods section. (C) Depiction of the wild-type Prkd2 allele and the gene-trap-targeted Prkd2 allele present in the E14Tg2a.4 ES cell line. This insertion disrupts the catalytic domain structure and deletes several key motifs that are important for catalysis and substrate binding (the DFG and APE motifs) as well as the Ser 707 and Ser 711 phosphorylation sites that are important for PKD2 activation. (D) PKD2 GT mice were genotyped by PCR amplification of genomic DNA as described in the Materials and methods section. (E) Heterozygous matings for PKD2 GT mice and for PKD1 SSAA mice were set up, and the live progeny and PKD1 SSAA -knockin embryos were genotyped as described in the Materials and methods section. The number (and percentage) of each genotype observed is shown, followed by its expected Mendelian frequency. The data were analysed using a χ 2 test to determine statistical significance.

(A) Depiction of the endogenous mouse Prkd1 allele containing exons 15–17, the knockin construct and the targeted allele with the neomycin cassette removed by Cre recombinase. The white/grey rectangles represent exons and the black arrowheads represent LoxP sites. Thick black lines indicate the positions of the probes used for Southern blot analysis. The knockin allele containing the Ser 744 /Ser 748 mutation in exon 16 is illustrated as a grey rectangle. (B) PKD1 SSAA -knockin mice were genotyped by PCR-amplification of genomic DNA over the LoxP insertion site as described in the Materials and methods section. (C) Depiction of the wild-type Prkd2 allele and the gene-trap-targeted Prkd2 allele present in the E14Tg2a.4 ES cell line. This insertion disrupts the catalytic domain structure and deletes several key motifs that are important for catalysis and substrate binding (the DFG and APE motifs) as well as the Ser 707 and Ser 711 phosphorylation sites that are important for PKD2 activation. (D) PKD2 GT mice were genotyped by PCR amplification of genomic DNA as described in the Materials and methods section. (E) Heterozygous matings for PKD2 GT mice and for PKD1 SSAA mice were set up, and the live progeny and PKD1 SSAA -knockin embryos were genotyped as described in the Materials and methods section. The number (and percentage) of each genotype observed is shown, followed by its expected Mendelian frequency. The data were analysed using a χ 2 test to determine statistical significance.

PKD1 and PKD2 expression patterns in adult mice

Analysis of EST (expressed sequence tag)/gene expression databases indicates that many adult tissues show co-expression of different PKD isoforms at the mRNA level (Figure 3A). However, the relative expression of specific PKD isoforms at the protein level in different tissues and cells is not well defined. In the context of PKD1 and PKD2, which are highly homologous, most antibodies generated against these kinases cannot discriminate between the two isoforms. For example, the most commonly used ‘pan’-PKD antisera used to detect PKD1 in variety of tissues was raised against the peptide sequence EEREMKALSERVSIL (corresponding to amino acids 904–918 in murine PKD1), but reacts equally well with PKD2. In SDS/PAGE analysis, it is theoretically possible to discriminate PKD1 and PKD2 on the basis of subtle differences in their electrophoretic mobility [10]. However, because the electrophoretic mobility of PKD isoforms can be altered by protein phosphorylation [37], it is not possible to use differential migration on SDS/PAGE gels as a reliable criterion to distinguish PKD1 from PKD2 expression.

Differential expression of PKD1 and PKD2 isoforms in adult tissues

(A) Prkd1, Prkd2 and Prkd3 gene expression analysis in adult murine tissues. mRNA expression data in the GeneAtlas MOE430 (gcrma) data set was downloaded from the BioGPS gene portal hub (http://www.biogps.gnf.org) and expressed as the fold change in expression in the indicated tissues, compared with that expressed in NIH 3T3 cells. (B) Western blot analysis of PKD1, PKD2 and Akt expression in wild-type compared with PKD2 GT/GT thymocytes. Blots are representative of three or more separate experiments. Molecular masses are indicated in kDa. (C) Western blot analysis of PKD1, PKD2 and Akt expression in wild-type compared with PKD2 GT/GT adult tissues. Blots are from two separate PKD2 SSAA/SSAA mice and their wild-type littermates. Closed arrows indicate PKD2 open arrows indicate PKD1.

(A) Prkd1, Prkd2 and Prkd3 gene expression analysis in adult murine tissues. mRNA expression data in the GeneAtlas MOE430 (gcrma) data set was downloaded from the BioGPS gene portal hub (http://www.biogps.gnf.org) and expressed as the fold change in expression in the indicated tissues, compared with that expressed in NIH 3T3 cells. (B) Western blot analysis of PKD1, PKD2 and Akt expression in wild-type compared with PKD2 GT/GT thymocytes. Blots are representative of three or more separate experiments. Molecular masses are indicated in kDa. (C) Western blot analysis of PKD1, PKD2 and Akt expression in wild-type compared with PKD2 GT/GT adult tissues. Blots are from two separate PKD2 SSAA/SSAA mice and their wild-type littermates. Closed arrows indicate PKD2 open arrows indicate PKD1.

In this respect, the exons encoding the region of the protein recognized by the pan-PKD antisera will not be expressed in PKD2 GT/GT mice. Accordingly, Western blot analysis with the pan-PKD antisera of tissues from wild-type compared with PKD2 GT/GT mice will allow an assessment of which adult tissues express PKD1 and PKD2. Western blot analysis of lymphoid tissue extracts from PKD2 GT/GT mice, notably the thymus (Figure 3B), lymph nodes and spleen (Figure 3C), reveal that these tissues do not contain any protein reactive with the pan-PKD1/2 antisera, indicating that T- and B-lymphocytes, which comprise the majority of cells within these tissues, specifically express PKD2, but not PKD1. In contrast, deletion of PKD2 causes little or no loss of total PKD1/2 protein in the brain or the pancreas, but reduces overall PKD1/2 proteins levels in heart and kidney (Figure 3C). Hence PKD1 and PKD2 are co-expressed in many adult tissues, albeit at differing levels, but one exception is in lymphoid cells, which appear to only express PKD2.

PKD2 is the major PKD isoform expressed in murine lymphocytes

The data suggesting that PKD2, but not PKD1, is selectively expressed in lymphoid cells in adult mice were confirmed by in vitro kinase assays, comparing total PKD1/2 catalytic activity in wild-type and PKD2 GT/GT thymocytes. Substantial peptide substrate phosphorylation was detected in PKD1/2 immunoprecipitates prepared from phorbol-ester-activated wild-type thymocytes, but not in those prepared from PKD2 GT/GT thymocytes (Figure 4A). Another well-characterized marker of endogenous PKD1/PKD2 catalytic activity is the phosphorylation status of a conserved C-terminal autophosphorylation site (Ser 916 in PKD1 Ser 873 in PKD2). As shown in Figure 4(B), phorbol-ester-activated wild-type, but not PKD2 GT/GT , thymocytes contain active autophosphorylated PKD1/2 proteins.

PKD2 is the dominant PKD isoform expressed in murine lymphoid tissue and cells

(A) Global PKD1/2 catalytic activity in wild-type and PKD2 mutant thymocytes. PKD1/2 proteins were immunoprecipitated from phorbol-ester-activated thymocytes using pan-PKD1/2 antisera [33] before PKD catalytic activity was measured in in vitro peptide substrate kinases assays, as described in the Materials and methods section. Data are from eight wild-type, eight PKD2 GT/GT and six PKD2 SSAA/SSAA thymi, assayed in three separate experiments. Results are means + S.E.M. statistical significance was calculated by a Student's t test. (B) PKD1/2 autophosphorylation on Ser 916 and Ser 873 respectively in whole-cell extracts prepared from control and phorbol-ester-activated wild-type, heterozygous and homozygous PKD2 GT and PKD2 SSAA thymocytes. Blots are representative of two independent experiments. (C) Analysis of PKD activation loop and ERK1/2 phosphorylation in whole-cell extracts prepared from wild-type, PKD2 GT/GT and PKD2 SSAA/SSAA splenocytes activated with phorbol ester for 15 min. Similar results were observed in phorbol-ester-activated thymocytes. (D) PKD activation loop and Ser 916 phosphorylation in wild-type, PKD1 −/− and PKD3 −/− DT40 B-cells activated with phorbol ester for 15 min. (E) Analysis of PKD activation loop phosphorylation in PKD3 immunoprecipitates prepared from untreated and phorbol-ester-activated wild-type and PKD2 GT/GT thymocytes. (F) PKD3 expression and catalytic activity in primary wild-type and homozygous PKD2 GT and PKD2 SSAA lymphoid cells. PKD3 proteins were immunoprecipitated from phorbol-ester-activated thymocytes using a specific anti-PKD3 antibody [34] before PKD3 catalytic activity was measured in in vitro peptide substrate kinases assays, as described in the Materials and methods section. Results are from four to six wild-type, PKD2 GT/GT and PKD2 SSAA/SSAA thymi, assayed in three separate experiments and are means+S.E.M. Student's t test was used to calculate statistical significance.

(A) Global PKD1/2 catalytic activity in wild-type and PKD2 mutant thymocytes. PKD1/2 proteins were immunoprecipitated from phorbol-ester-activated thymocytes using pan-PKD1/2 antisera [33] before PKD catalytic activity was measured in in vitro peptide substrate kinases assays, as described in the Materials and methods section. Data are from eight wild-type, eight PKD2 GT/GT and six PKD2 SSAA/SSAA thymi, assayed in three separate experiments. Results are means + S.E.M. statistical significance was calculated by a Student's t test. (B) PKD1/2 autophosphorylation on Ser 916 and Ser 873 respectively in whole-cell extracts prepared from control and phorbol-ester-activated wild-type, heterozygous and homozygous PKD2 GT and PKD2 SSAA thymocytes. Blots are representative of two independent experiments. (C) Analysis of PKD activation loop and ERK1/2 phosphorylation in whole-cell extracts prepared from wild-type, PKD2 GT/GT and PKD2 SSAA/SSAA splenocytes activated with phorbol ester for 15 min. Similar results were observed in phorbol-ester-activated thymocytes. (D) PKD activation loop and Ser 916 phosphorylation in wild-type, PKD1 −/− and PKD3 −/− DT40 B-cells activated with phorbol ester for 15 min. (E) Analysis of PKD activation loop phosphorylation in PKD3 immunoprecipitates prepared from untreated and phorbol-ester-activated wild-type and PKD2 GT/GT thymocytes. (F) PKD3 expression and catalytic activity in primary wild-type and homozygous PKD2 GT and PKD2 SSAA lymphoid cells. PKD3 proteins were immunoprecipitated from phorbol-ester-activated thymocytes using a specific anti-PKD3 antibody [34] before PKD3 catalytic activity was measured in in vitro peptide substrate kinases assays, as described in the Materials and methods section. Results are from four to six wild-type, PKD2 GT/GT and PKD2 SSAA/SSAA thymi, assayed in three separate experiments and are means+S.E.M. Student's t test was used to calculate statistical significance.

We also examined total PKD1/2 catalytic activity in the PKD2 SSAA -knockin lymphocytes. There was no loss of PKD2 protein expression in homozygous PKD2 SSAA lymphocytes, but total PKD1/2 catalytic activity was severely impaired, as assessed by in vitro kinase assays of PKD2 proteins immunoprecipitated from phorbol-ester-stimulated wild-type compared with PKD2 SSAA/SSAA thymocytes (Figure 4A). Moreover, PKD2 SSAA mutant proteins could not effectively autophosphorylate on Ser 873 in response to phorbol ester stimulation (Figure 4B). We did, however, detect a small amount of residual PKD1/2 catalytic activity in activated PKD2 SSAA/SSAA thymocytes compared with PKD2 GT/GT thymocytes. As lymphoid cells do not express PKD1 (see above), low levels of PKD2 catalytic activity can apparently be induced in PKD2 SSAA mutant thymocytes independently of PKC-mediated phosphorylation of the PKD2 catalytic domain.

Strikingly, we observed that, although phorbol ester treatment induced significant PKD activation loop phosphorylation in wild-type lymphocytes, this was undetectable in whole-cell extracts prepared from phorbol-ester-stimulated PKD2 SSAA/SSAA or PKD2 GT/GT lymphocytes, despite normal induction of ERK1/2 (extracellular-signal-regulated kinase 1/2) phosphorylation (Figure 4C). This was surprising as PKD3 is expressed in mammalian lymphocytes [34] (Figure 4E) and the PKD activation loop segment is fully conserved in all three mammalian PKD isoforms. Hence PKD ‘activation loop’ phospho-specific antibodies should cross-react equally with PKD1 (phosphorylated on Ser 744 /Ser 748 ), PKD2 (phosphorylated on Ser 707 /Ser 711 ) and PKD3 (phosphorylated on Ser 730 /Ser 734 ). To clarify whether the PKD activation loop phospho-specific antibody could indeed detect active phosphorylated PKD3 proteins, we analysed avian DT40 B-lymphocytes, which are known to express PKD3 and PKD1 at approximately equimolar levels, but which do not express PKD2 [4]. The data in Figure 4(D) show that deletion of PKD1 in DT40 B-lymphocytes results in a complete loss of PKD1 Ser 916 phosphorylation, but only a 2–3-fold reduction in total PKD activation loop phosphorylation levels, after phorbol ester stimulation. Similarly, in PKD3 −/− DT40 cells, there was no change in PKD1 Ser 916 phosphorylation and again only a 2–3-fold reduction in PKD activation loop phosphorylation levels after phorbol ester stimulation (Figure 4D). Thus PKD activation loop phospho-specific antibodies can very efficiently detect active phosphorylated PKD3 in cells that express high PKD3 protein levels. Indeed, when we immunoprecipitated PKD3 proteins from phorbol-ester-activated PKD2 GT/GT thymocytes, we were now able to detect PKD3 activation loop phosphorylation (Figure 4E). Accordingly, the failure to detect any PKD activation loop phosphorylation in whole-cell extracts prepared from PKD2 mutant lymphocytes argues that PKD3 does not make a large quantitative contribution to the total PKD protein pool in murine lymphoid cells/tissues.

Although PKD3 protein expression is very low in murine lymphocytes, it was important to assess whether the loss of PKD2 catalytic activity had any impact on PKD3 expression or activity. We therefore immunoprecipitated PKD3 from control and phorbol-ester-stimulated wild-type and PKD2 mutant thymocytes. In these experiments, we found that PKD3 expression levels and catalytic activity were normal in wild-type, PKD2 GT/GT and PKD2 SSAA/SSAA thymocytes (Figure 4F). Thus the loss of PKD2 expression and or catalytic activity in lymphoid cells is not compensated for by increased expression/activity of other PKD isoforms. Nor is there any evidence that expression of a catalytically inactive PKD2 has any dominant-negative effect on PKD3 expression or activity.

PKD2 is dispensable for the development of mature T-lymphocytes

Since PKD2 is the major PKD isoform expressed in the thymus and in mature T-lymphocytes (results not shown), we asked whether PKD2 catalytic function is essential for normal T-lymphocyte development. Previous studies have shown that PKD can be activated by the pre-TCR in T-cell progenitors in the thymus and by the mature α/β TCR complex in peripheral T-cells [30,33]. To date, experiments exploring the consequence of PKD activity for TCR function have used transgenesis techniques to target an activated PKD mutant protein (comprising the catalytic domain of PKD1) to either to the membrane or cytosol of pre-T-cells. These experiments demonstrate that constitutively active PKD signalling can substitute for the pre-TCR to drive T-cell proliferation and differentiation in recombinase-gene-null mice [30]. However, experiments with such gain-of-function mutants can inform about the functional capacity of a protein kinase, but they do not assess whether the kinase is essential for a particular biological process.

We therefore investigated T-cell development in PKD2 GT/GT and PKD2 SSAA/SSAA mice using flow cytometry to define the major thymic and peripheral T-cell subpopulations. Early T-lymphocyte progenitors in the thymus are DN for the MHC co-receptors CD4 and CD8. These DN progenitors undergo TCRβ locus rearrangements to produce a TCRβ polypeptide that permits cell-surface expression of the pre-TCR complex. The pre-TCR then supports the survival and rapid clonal expansion of DN progenitors along with their differentiation into CD4 + CD8 + DP thymocytes. TCRα chain gene rearrangements then occur, and cells that express a functional, but non-self-reactive, α/β TCR complex differentiate into either CD4 + or CD8 + SP T-lymphocytes. Thymi from PKD2 SSAA/SSAA and PKD2 GT/GT mice did not display any obvious abnormalities and contained normal numbers of DN, DP and mature SP thymocytes that express high levels of the mature α/β TCR complex (Figures 5A and 5B). Moreover, these mature SP thymocytes were able to exit the thymus and populate peripheral tissues normally, since the frequencies and total numbers of peripheral CD4 + and CD8 + T-lymphocytes in the lymph nodes and spleen of wild-type, PKD2 SSAA/SSAA and PKD2 GT/GT mice were comparable (Figures 5C and 5D, and results not shown). Similarly, B-lymphocyte development within the bone marrow occurred normally in the absence of PKD2 catalytic activity (results not shown), resulting in normal total numbers of mature IgM/IgD-expressing B-lymphocytes in the spleens and lymph nodes of PKD2 SSAA/SSAA and PKD2 GT/GT mice (Figures 5C and 5D, and results not shown). Thus PKD2 kinase activity is dispensable for the normal development of mature, peripheral T- and B-lymphocytes.