Snippets Groups Projects

Update Snakefile

Alexis Mergez authored 1 year ago

Moved reference haplotype to data/haplotypes folder.
Removing reference folder

2c495cac

2c495cac 1 year ago

Name	Last commit	Last update
scripts
.gitignore
README.md
Snakefile
config.yaml
runSnakemake.sh

Pan1chr

Snakemake workflow for creating a pangenome at a chromosomic scale. Tools used within the workflow :

PanGeTools : https://forgemia.inra.fr/alexis.mergez/pangetools
PanGraTools : https://forgemia.inra.fr/alexis.mergez/pangratools

The file architecture for the workflow is as follow :

pan1c
├── config.yaml
├── copyHaplotypes.sh
├── data
│   ├── haplotypes
│   └── reference
├── runSnakemake.sh
├── scripts
│   ├── bin_split.py
│   └── statsAggregation.py
├── Snakefile
└── README.md

Prepare your data

This workflow can take chromosome level assemblies as well as contig level assembly.
Fasta files need to be compressed using bgzip2 (included in PanGeTools). Sequence names should follow this pattern : <haplotype name>#<ctg|chr name>. (CHM13#chr01 for example).
Make your input file read only to prevent snakemake to mess with them.

Usage

Create data/haplotypes and data/reference as presented before.
Put all haplotypes in data/haplotypes and symlink the reference file in data/reference. If you don't want the reference to appear in graphs, remove it from data/haplotypes.

Change reference name and apptainer image path in config.yml.
Change variables in runSnakemake.sh to match your need (job name, mail, etc...). Run runSnakemake.sh and wait !