Skip to content
Snippets Groups Projects
Alexis Mergez's avatar
Alexis Mergez authored
Moved reference haplotype to data/haplotypes folder.
Removing reference folder
2c495cac
History

Pan1chr

Snakemake workflow for creating a pangenome at a chromosomic scale. Tools used within the workflow :

The file architecture for the workflow is as follow :

pan1c
├── config.yaml
├── copyHaplotypes.sh
├── data
│   ├── haplotypes
│   └── reference
├── runSnakemake.sh
├── scripts
│   ├── bin_split.py
│   └── statsAggregation.py
├── Snakefile
└── README.md

Prepare your data

This workflow can take chromosome level assemblies as well as contig level assembly.
Fasta files need to be compressed using bgzip2 (included in PanGeTools). Sequence names should follow this pattern : <haplotype name>#<ctg|chr name>. (CHM13#chr01 for example).
Make your input file read only to prevent snakemake to mess with them.

Usage

Create data/haplotypes and data/reference as presented before.
Put all haplotypes in data/haplotypes and symlink the reference file in data/reference. If you don't want the reference to appear in graphs, remove it from data/haplotypes.

Change reference name and apptainer image path in config.yml.
Change variables in runSnakemake.sh to match your need (job name, mail, etc...). Run runSnakemake.sh and wait !