Skip to content
Snippets Groups Projects
README.md 3.62 KiB
Newer Older
Guillaume Devailly's avatar
Guillaume Devailly committed
# PigHeaT RRBS analysis
Guillaume Devailly's avatar
Guillaume Devailly committed

Analysis of PigHeaT RRBS data.

Whole-blood samples were taken form back-cross pigs Large White x (Large White x Créole pigs) that were raised in either a temperate or a tropical facility.
Guillaume Devailly's avatar
Guillaume Devailly committed

Reads were processed by Sonia using Remi's RRBS pipeline: [private link to pipeline](https://github.com/seynard/pipeline_RRBS/tree/main)
Guillaume Devailly's avatar
Guillaume Devailly committed

## 01. metadata cleaning
Guillaume Devailly's avatar
Guillaume Devailly committed
[Link to script](01_data_import.R)

Over the **479 initial RRBS samples**, one file was not successfully processed (over-covered, not enough allocated compute ressources).
Two fruther libraries had 0 CpG sites covered with at least 10 reads: GC45472 and URZ3149.
Metadata was gathered from the [PigHeaT sharepoint](https://sites.inra.fr/site/pig-heat/) for the remainig 476 samples, leading to [this metadata table](data/metadata_rrbs_phenotypes.tsv).

## 02. BED import into a BSseq object
Guillaume Devailly's avatar
Guillaume Devailly committed
[Link to script](02_data_import_genobioinfo.R)

BED import into a BSseq object was performed on the [GenoBioinfo cluster](https://bioinfo.genotoul.fr/) as following:
```bash
sbatch 02_data_import_genobioinfo.R
```

## 03. BSseq filtering
Guillaume Devailly's avatar
Guillaume Devailly committed
[Link to script](03_bsseq_cleaning.R)

BED filtering was performed on the [GenoBioinfo cluster](https://bioinfo.genotoul.fr/) as following:
```bash
sbatch 03_bsseq_cleaning.R
```

The initial object was a mostly empty matrix of 477 RRBS samples over 3.155.227, strand-merged, CpG sites.
Only CpG sites with at least 10 reads in more than 80% of the samples (more than 381 samples) were kept, resulting in 221.580 CpG sites.
Only RRBS samples with at least 10 reads in more than 30% of the 221.580 CpG sites (more than 66.474 CpG sites) were kept, resulting in 435 RRBS samples.
CpG sites that were overlapping biscuit-identified SNP were removed, filtering an additional 479 CpG.
The filtered object was consituted of **221.357 CpG sites and 435 RRBS samples**.

## 04. Differential analysis using DSS
Guillaume Devailly's avatar
Guillaume Devailly committed
[Link to script](04_differential_analysis.R)
Average DNA methylation was oberved on the 221.357 CpG sites.

![Histogram of % DNA methylation at CpH sites](plots/meanhist_rrbs_pigheat.svg)

Guillaume Devailly's avatar
Guillaume Devailly committed
We observed a bimodal distribution of DNA methylation.

The link between the average and the Standard Deviation of the DNA methylation levels was also investigated:

![Histogram of % DNA methylation at CpH sites](plots/meansd_rrbs_pigheat.svg)

Only variable CpG sites (SD > 0.05) were kept for the differential analysis, **keeping 69.854 CpG sites**.
Guillaume Devailly's avatar
Guillaume Devailly committed

Differential analysis was perfomed using [DSS](https://www.bioconductor.org/packages/DSS/) v 2.54.0, using unsmoothed DNA methylation data,
`DMLfit.multiFactor` on `~ environment + sex`, and FDR multiple-testing correction.
This resulted in **179 (0.26%) CpG sites differentially methylated between the two environments**, 
and **2110 (3.0%) CpG sites differentially methylated between the sexes**. 
23.6% (498) of the sex-DMC were on the X chromosome.
Guillaume Devailly's avatar
Guillaume Devailly committed

DMC were exported as `.bed` files: [env DMC](data/dmc_env_sscrofa11.1.bed), [sex DMC](data/dmc_sex_sscrofa11.1.bed).

![IGV tracks](plots/dmc_igv_snapshot.png)

Here are two env DMC examples.

![env DMC 1](plots/dmc_env_10_19399360.svg)
![env DMC 2](plots/dmc_env_6_87725135.svg)

Here is one sex DMC example.
Guillaume Devailly's avatar
Guillaume Devailly committed
![sex DMC](plots/dmc_sex_X_101283728.svg)
Guillaume Devailly's avatar
Guillaume Devailly committed

Guillaume Devailly's avatar
Guillaume Devailly committed
The heatmap of sex DMC is ok-ish:

![sex heatmap](plots/heatmap_dmc_sex.svg)

While the heatmap for env DMC is ugly due to high individual variability:

![env heatmap](plots/heatmap_dmc_env.svg)

(Grey = not enough coverage to compute % meCpG)

Mahattan plots of DMC were also generated, were the genome wide threshold in red is the FDR = 0.05 threshold.

![env manhattan](plots/manhattan_env.png)

![sex manhattan](plots/manhattan_sex.png)