Pan1chr
Snakemake workflow for creating a pangenome at a chromosomic scale. Tools used within the workflow :
- PanGeTools : https://forgemia.inra.fr/alexis.mergez/pangetools
- PanGraTools : https://forgemia.inra.fr/alexis.mergez/pangratools
The file architecture for the workflow is as follow :
pan1c
├── config.yaml
├── copyHaplotypes.sh
├── data
│ ├── haplotypes
│ └── reference
├── runSnakemake.sh
├── scripts
│ ├── bin_split.py
│ └── statsAggregation.py
├── Snakefile
└── README.md
Prepare your data
This workflow can take chromosome level assemblies as well as contig level assembly.
Fasta files need to be compressed using bgzip2 (included in PanGeTools).
Sequence names should follow this pattern : <haplotype name>#<ctg|chr name>
. (CHM13#chr01
for example).
Make your input file read only to prevent snakemake to mess with them.
Usage
Create data/haplotypes
and data/reference
as presented before.
Put all haplotypes in data/haplotypes
and symlink the reference file in data/reference
. If you don't want the reference to appear in graphs, remove it from data/haplotypes
.
Change reference name and apptainer image path in config.yml
.
Change variables in runSnakemake.sh
to match your need (job name, mail, etc...).
Run runSnakemake.sh
and wait !