Usage

Align Contigs to Reference

The first step is to align contigs to a reference genome and output the result in a MAF formatted file. There are many options for alignment tools, however, we have had success with Mugsy, a very fast multiple whole genome alignment tool.

Stitch Together Scaffold

Once you have aligned the contigs to the reference, the next step is to stitch together the various alignment blocks into a scaffold. The maf_net.py utility does this by reassembling the reference sequence from the MAF blocks and using the highest scoring block for each location in the genome to assemble a scaffold genome.

Genome Annotation and Comparison

Once a draft of a genome has been completed, it can be useful to migrate annotations from an annotated reference to the new genome. In addition, this step generates a summary of the changes at the nucleic acid as well as amino acid level.

Run compare_genomes.py to migrate annotations and generate a list of differences between two species. The script requires an aligned fasta file (typically use the one generated from the previous scaffold stitching step) and a GFF file of features (genes, exons, etc.) to migrate.

The coding sequences can be checked by translating them to protein sequences using translate_cds.py. Translation errors such as missing start or stop codons, extra stop codons, etc. will be printed to STDERR.