Chapter 7 LTR activation

Previously @ref{repeats}, we showd the repeat elemnets profiling which revealed LTR activation by DUXC and DUX4 in the canine model. Here we investigated the subfamilies of LTR - ERVL and ERVL-MaLR. In the canine model, DUX4 and DUXC both activate elements form MaLR and ERVL. In addition, we found that DUXC prones to activate more ERVL elements (MLT2D, MER76, HERVL74, LTR1A2, LTR101, LTR33C) wheareas DUX4 to more MaLR elements (MLT1A/B/E2/J).

In this chapter, we (1) explain the background of ERVL and ERVL-MaLR subfamilies in human and murine, (2) exam ERVL and ERVL-MaLR activation by DUXC and DUX4 in canine myoblast, and (3) compare to the LTR subfamily activation in human myoblast.

Load and datsets and libraries

7.1 Takaway

The ERVL-MaLR elements in canine are categorized as MLT1, presenting ancient mammalian LTRs. Most of the MLT1s annotated in the canine genome were mobilized before the split of common ancient mammalian of canine and primate lineages, except MLT1A0 (-int) and MLT1A1 (-int) that might have become more mobilized in the primate lineage. Compared with DUXC, DUX4 induces higher expression levels of MLT1s in canine, though its affinity toward MLT1s has no apparent correlation with the copy number or mobilization in the primate lineage.

ERVL1: ERV1 (denoted LTR* and MER*) are enriched in both DUX4 and DUXC expressed canine skeletal muscle; 50% and 18% of ERVL1 elements are positively affected by DUXC and DUXC, respectively. DUX4 induced more carnivore-specific ERV1 (CarLTR, CfERV1, CdERVF) than DUXC.

ERVL-MaLR: Studies indicate that DUX4 uses MLT1 as promoter to re-program nearby genes’s structure during cleavage-stage and FSHD affected skeletal muscle. MLT1 presents ancient mammalian LTRs whereas THE1 and MST radiats in primate lineage. In canine genome, most of the ERVL-MaLR elements are MLT1 and mobilized before the primate lineages; the copy numbers of MLT1s existed in canine genome are compariable to that of in human genome, excpet MLT1A1 (2.2) and MLT1A0-int (>100), both of which are DUX4 preferred repeats for retrontransposon-mediated transcriptome in humand and canine myoblosts. DUX4 and DUXC positively affect 51% and 25% of ERVLs, respectively.

ERVL: (denoted MLT2, MER* and LTR*) DUX4 and DUXC positively affect 49% and 23% of ERVLs, respectively.

7.2 background in murine and human

Though sharing common ancestry, MuERV-L (denoted MT2_Mm and MT2) evolved differntly from MaLR in murine. The ERVL-MaLR include three major subfamilies: MLT, ORR1 and MT. MLT; MLT1 and MLT2, present ancenstral mammalian LTR; ORR1 and MT are the youngest repeats in rodent, and they radiate during rodent evolution. (Note: MT, ORR1 nd MT2 radiate during rodent evolution.)

Whidden et al. (Jennifer L. Whiddon 2017), Hackdrison et al. (???) and Franke et al (Vedran Franke 2017) have investigated the activities of the mouse endogenous retrovirus type-L (MuERV-L) and nonautonomous mammalian (ERVL-MaLR) subfamilies under the influence of Dux and early embroynic activation. The studies confirm the species-specific preference on activation of retrotransposon subfamilies during the 2C-like (cleavage) stage or Dux/DUX4 over-expression: human bound to ancestral (MLT1) and primate-specific (THE1, MST) ERVL-MaLR subfamilies whereas murine to MuERV-L and young submailies of MaLR that evolved during rodent evolution.

Frank 2015 (Vedran Franke 2017) showed that one of the ERVL functions is gene-remodeling via four co-options:
1. 5’ exon via retrotransoson-derived spliced donor (SD)
2. 3’ exon via retrotransposon-derived SD or poly-(A)
3. internal exon via retrotransposon-derived SD or DA
4. intraexonic, not contribute to mRNA process

In general, the younger ERVL subfamilies (Vedran Franke 2017) - ORR1, MT, MT2 - showed higher propotion of 5’exon co-option than older family (MLT) in murine. Human bound to use more MLT1 (subfamily of MaLR) 5’exon co-option, contributing MLT to promotor and first exons. Unlike rodent, human does not use MT2 for 5’exon co-option; Rodent uses more MT (subfamily of MaLR) and MT2 (subfamily of MuERL-V) 5’exon co-option. Whidden et al. (Jennifer L. Whiddon 2017) confirmed that Dux and DUX4 express different subsets of MaLR transcripts in C2C12: Dux uses MT2 (MuERL-V) and MT as promoters whereas DUX4 uses MLT as promoters to activate the targets.

Previous chapter @ref{repeats} stated that DUXC showed higher propotion of ERVL differential expression whereas DUX4 showed higher propotion in that of MaLR. Althought we observe differntial expression in MaLR in DUXC and DUX4, the question is are they in distinct subfamilies of MaLR?

7.3 ERVL subfamily activation in canine

Scatter plot of log2FC of DUXC and DUX4

Figure 7.1: Scatter plot of log2FC of DUXC and DUX4

7.4 MaLR activation in canine

Scatter plot of log2FC of DUXC and DUX4

Figure 7.2: Scatter plot of log2FC of DUXC and DUX4

Visualizaiton of the number of ERVL subfamilies activated by DUX4 and DUXC in canine.

7.5 Compare to CinC and HinC to HinH

Here we compared LTR retroelment activation of CinC and HinC to that of HinH. We aimed to answer whether copy number is associated with higher induction by DUX4 than by DUXC? Update: The answer is no. Copy number is not associated with higher induction of DUX4 nor DUXC.

7.5.1 Copy number

The code below estimates the copy number of LTR subfamilies in hg38 and canFam3.

##      repName n.canFam3 n.hg38    ratio
## 1    MLT-int       171    394 2.304094
## 2   MLT1-int       226    251 1.110619
## 3      MLT1A      8364   9312 1.113343
## 4  MLT1A-int       371    456 1.229111
## 5     MLT1A0     11780  21235 1.802632
## 6 MLT1A0-int       414    971 2.345411

Tidy up activated LTR retroelements MaLR, ERVL, and ERV1) in HinC, CinC and HinH together.

7.5.2 ERVL-MaLR

The bar plot below displays signicantly expressed ERVL-MaLR elements in either CinC, HinC or HinH (\(adj-p < 0.05\)). Only MLT1A1 and MLT1A0-int have higher copy number in human (ratio > 2) compared to canine (ratio > 0), but they don’t have higher induction in human.

repName expression (logFC) in ERVL-MaLR family in CinC, HinC and HinH; Orange indicates up/down-regulation; * indicates higher copy number in human.

Figure 7.3: repName expression (logFC) in ERVL-MaLR family in CinC, HinC and HinH; Orange indicates up/down-regulation; * indicates higher copy number in human.

7.5.3 ERVL

Shown below is the significently expressed ERVL elememts expression in either CinC, HinC, or HinH. MER21B and MER21-int have highr copy number in human.

repName expression (logFC) in ERVL family in CinC, HinC and HinH; Orange indicates up/down-regulation; * indicates higher copy number in human.

Figure 7.4: repName expression (logFC) in ERVL family in CinC, HinC and HinH; Orange indicates up/down-regulation; * indicates higher copy number in human.

7.5.4 ERV1

Shown below is the significently expressed ERV1 elements in either CinC, HinC, or HinH. Suprisingly, DUX4 up-regulated more carnior-specific retroelement than DUXC in canine.

repName expression (logFC) in ERV1 family in CinC, HinC and HinH; Orange indicates up/down-regulation; * indicates higher copy number in human. # indicates the copy number ratio (human/canine) is not available.

Figure 7.5: repName expression (logFC) in ERV1 family in CinC, HinC and HinH; Orange indicates up/down-regulation; * indicates higher copy number in human. # indicates the copy number ratio (human/canine) is not available.

7.6 DUXC-ALT and ERVL transcriptions

Does DUXC-ALT induce ERVL expression? There are 683 repeat elements exhibiting some expression in DUXC-ALT expressed cell line. The binding sites of DUXC-ALT suggested that DUXC-ALT mostly binds to LTR retroelements. We aim to exam further into DUXC-ALT’s function by looking at the expession fold changes.

7.6.1 ERV subfamily expression

Shown below is the histogram of DUXC-ALT’s repeat elements expression. Most the expression of the expression of repeat elements do not have apparent changes by DUXC-ALT, however, some ERV transcripts are present with mild fold changes. This histogram below shows that a few repeat element transcripts are present with mild fold changes.

Histogram of log2 Fold Change of CALTinC repeat elements

Figure 7.6: Histogram of log2 Fold Change of CALTinC repeat elements

Disregard the p-adj and p-value, we listed the repeat elements with log2 fold change greater and less then 0.5, and most of them are ERVL1. This might support the peak calling of DUXC-Alt where more than 60% of binding sites are overlapping with LTRs.

Table 7.1: Repeat element with log2FC > 0.5 in DUXC-Alt.
repName log2FoldChange pvalue padj repClass repFamily
CarERV2b1_LTR 0.7129453 1 1 LTR ERV1
CarLTR1A1C 0.6510292 1 1 LTR ERV1
LTR109A2 0.8342682 1 1 LTR ERV1
LTR55 0.6147695 1 1 LTR ERVL?
LTR72B_CF 0.6561270 1 1 LTR ERV1
LTR81B 0.5263868 1 1 LTR Gypsy
LTR90B 0.9274856 1 1 LTR LTR
MamGypLTR3 0.7477494 1 1 LTR Gypsy
MamGypLTR3a 0.6591307 1 1 LTR Gypsy
MER101_CF 0.5369590 1 1 LTR ERV1
MER88 0.8910420 1 1 LTR ERVL
MLT2E 0.6513342 1 1 LTR ERVL
Table 7.2: Repeat element with log2FC < -1 in DUXC-Alt.
repName log2FoldChange pvalue padj repClass repFamily
(CATCG)n -1.378227 0.8176728 1 Simple_repeat Simple_repeat
(CCCCAA)n -1.329845 0.8430315 1 Simple_repeat Simple_repeat
(CCGGA)n -1.430167 0.7516265 1 Simple_repeat Simple_repeat
(GAGAA)n -3.657527 0.4001370 1 Simple_repeat Simple_repeat
(GTGTG)n -1.087761 0.9440174 1 Simple_repeat Simple_repeat
(TAGG)n -2.614182 0.3792413 1 Simple_repeat Simple_repeat
(TTCGGG)n -1.300816 0.4825708 1 Simple_repeat Simple_repeat
5S -1.405084 0.6721832 1 rRNA rRNA
7SLRNA -1.598665 0.0323175 1 srpRNA srpRNA
CarLTR1A2B -1.363858 0.7539420 1 LTR ERV1
CfERV1b_LTR -1.194750 0.7512672 1 LTR ERV1
CfERVF1_LTR -1.222406 0.7152418 1 LTR ERV1
DNA1_Mam -1.104721 0.9054235 1 DNA TcMar
Eulor5A -1.665583 0.6233808 1 DNA? DNA?
L1M3b -1.215819 0.6528945 1 LINE L1
L1M3e -1.278423 0.6166096 1 LINE L1
LTR109A1 -1.536070 0.6485770 1 LTR ERV1
LTR81C -1.375129 0.5548298 1 LTR Gypsy
LTR91 -1.879468 0.5683125 1 LTR ERVL
MamGypLTR2c -1.111055 0.7839170 1 LTR Gypsy
MER45R -1.272985 0.6944189 1 DNA hAT-Tip100
MER74C -1.256894 0.7651719 1 LTR ERVL
MER87 -1.367775 0.8284964 1 LTR ERV1
MLT1A0-int -1.083928 0.8646201 1 LTR ERVL-MaLR
MLT1A1 -1.892689 0.5746772 1 LTR ERVL-MaLR
MLT1G1-int -1.273883 0.5981955 1 LTR ERVL-MaLR

DUXC-Alt suppressed many of ERVL and ERVL-MaLR (MLT1) elements, though weak p-values are presented.

# References

Jennifer L. Whiddon, Chao-Jen Wong, Ashlee T. Langford1. 2017. “Conservation and Innovation in the Dux4-Family Gene Network.” Nature Genetic. https://doi.org/10.1038/ng.3846.

Vedran Franke, Rosa Karlic, Sravya Ganesh. 2017. “Long Terminal Repeats Power Evolution of Genes and Gene Expression Programs in Mammalian Oocytes and Zygotes.” Genomic Research, no. 27. https://doi.org/10.1101/gr.216150.116.