Chapter 9 DUXC peaks and LINE1s

To answer whether DUXC and DUX4 directly regulate to L1s, we want to check if DUXC an DUX4’s motifs are enriched in the peaks that overlap with L1s. To do so, we did the following steps:

  1. use the narrowed peaks (+/- 50 bps from summit) and select the subset that overlap with the differentially expressed L1s (LFC > 1; adjusted p-value < 0.05; compared with control lucifereas samples)
  2. get sequence of the selected peaks
  3. use MEME to find the DUXC/DUX4 motif of these peaks
  4. use BioStrings::matchPattern() to find the motif enrichment

Results: DUXC’s L1-overlapped peaks has a palindromic motif - AATCAATTGATT (243/691); DUX4’s has motif TAA(G/T/C)TCAATCA (431/999). DUXC and DUX4 motifs are not enriched in peaks overlapping with up-regulated L1s. However, 41% and 47% of DUXC and DUX4 L1-overlaped peaks contain the conanical mofit AATCA; for those containing AATCA, ~ 70% have AATCA located within 25 bps from the summit.

Load packages and datasets

9.1 Narrow Peaks and expressed L1s

  1. Narrow the peaks to 101 bps (+/-50 bps from the summits) of CCH and hDUX4.
  1. Load the read counts, DESeqDataseq instance of canFam3 RMSK:
  1. Find peaks overlapping with up-regulated L1s
  • DUXC: 691 out of 6031 narrowed peaks are overlapping with up-regulated L1s (11.46%)
  • DUX4: 17908 out of 85825 narrowed peaks are overlapping with up-regulated L1s (20.87%)
## DUXC monoclonal sample:
## 875 out of 6031 narrowed peaks are overlapping with L1s (14.51%).
## 691 out of 6031 narrowed peaks are overlapping with up-regulated L1s (11.46%).
## DUX4 monocloal sample:
## 21888 out of 85825 narrowed peaks are overlapping with L1s (25.50%).
## 17908 out of 85825 narrowed peaks are overlapping with up-regulated L1s (20.87%).

9.1.1 Motif discovery using L1-overlapped peaks

The code below makes FASTA files for the narrowed peaks that overlap with up-regulated L1s and run MEME to exam the enrichment of DUXC/DUX4 motifs. DUXC’s L1-overlapped peaks has a palindromic motif - AATCAATTGATT (243/691); DUX4’s L1-overlaped peaks has motif TAA(G/T/C)TCAATCA (431/999). Both motifs contain the canonical 2nd part of DUX4/DUXC motif - AATCA. Does that mean both DUXC and DUX4 were attracted to bind to this half motif and create artifacts of peaks?

Top DUXC L1-overlapped peaks motif is AATCA AT TGATT (243/691); Top DUX4 L1-overlapped peaks is TAA G/T/C T C AATCA (431/999). Both shared the conanical DUX4/DUXC/mDux motif, AATCA, but none of them matched the whole DUXC nor DUX4 motifs.

DUXC L1-overlapped peaks motif: alt text alt text

9.1.2 Match DUX4 and DUXC motif to L1-overapped peaks

Here I break DUX4 and DUXC motifs into two seperate motifs. The first part of the motif is the diverged one and second the canonical: AATCA. Then I use BioStrings::matchPattern() to check if the L1-overlapped peaks contains those specific short form and continues forms of motif. About 42% and 47% of these peaks contains the canonical motif - AATCA.

## DUXC mono: 41.18 % have motif AATCA (canonical 2nd motif)
## DUX4 mono: 47.00 % have motif AATCA (canonical 2nd motif)

9.1.3 Location of AATCA on L1-overlapped peaks

About 70% of AATCA are located within 25 bps from the summit.

## 67% between [25, 75]  (summit is at 50)

## 69% between [25, 75]  (summit is at 50)