[Last update: Thu Mar 16 06:17:38 CET 2017]

IMPORTANT: Minor improvements for the nuclear genome assemblies and annotations were performed on Sun Jan 29 22:04:59 CST 2017. The mitochondrial genome assemblies and annotations of S. paradoxus UFRJ50816 and UWOPS91-917.1 were updated on Thu Mar 16 06:17:38 CET 2017. If you have downloaded the corresponding data before our last update, please use the new data for your analysis.


The genomes were sequenced by ~100-200x PacBio sequencing reads and assembled using standard HGAP pipeline with Quiver polishing. We also performed ~200-500x Illumina paired-end sequencing to further correct remaining sequencing errors by Pilon. After manual curation, the final assemblies reached chromsome-equivalent completeness for both nuclear and mitochondrial genomes. Based on each assembly, we conducted full-fledged annotation for various genomic features, such as centromeres, protein-coding genes, tRNAs, Ty retrotransposable elements, core X-elements, Y’-elements and mitochondrial RNA genes.

Here, we provide all the assembly, annotation, CDSs, and proteome files as follows. The assembly, CDSs, and proteome files are in FASTA format. The annotation files are in GFF format. All files were further compressed using gzip. After downloading these files, you can use the command “gunzip *.gz” to uncompress these files.


Nuclear Genomes

Species Strain Assembly Annotation CDSs Proteome
S.c. S288C GENOME GFF CDS PEP
S.c. DBVPG6044 GENOME GFF CDS PEP
S.c. DBVPG6765 GENOME GFF CDS PEP
S.c. SK1 GENOME GFF CDS PEP
S.c. Y12 GENOME GFF CDS PEP
S.c. YPS128 GENOME GFF CDS PEP
S.c. UWOPS03-461.4 GENOME GFF CDS PEP
S.p. CBS432 GENOME GFF CDS PEP
S.p. N44 GENOME GFF CDS PEP
S.p. YPS138 GENOME GFF CDS PEP
S.p. UFRJ50816 GENOME GFF CDS PEP
S.p. UWOPS91-917.1 GENOME GFF CDS PEP



Mitochondrial Genomes

Species Strain Assembly Annotation CDSs Proteome
S.c. S288C GENOME GFF CDS PEP
S.c. DBVPG6044 GENOME GFF CDS PEP
S.c. DBVPG6765 GENOME GFF CDS PEP
S.c. SK1 GENOME GFF CDS PEP
S.c. Y12 GENOME GFF CDS PEP
S.c. YPS128 GENOME GFF CDS PEP
S.c. UWOPS03-461.4 GENOME GFF CDS PEP
S.p. CBS432 GENOME GFF CDS PEP
S.p. N44 GENOME GFF CDS PEP
S.p. YPS138 GENOME GFF CDS PEP
S.p. UFRJ50816 GENOME GFF CDS PEP
S.p. UWOPS91-917.1 GENOME GFF CDS PEP



Other Useful Data

1) Re-annotation for Saccharomyces arboricolus (strain H6) that was sequenced and annotated a few years ago (Liti et al. BMC Genomics, 2013).

Species Strain Assembly Annotation CDSs Proteome
S.a. H6 GENOME GFF CDS PEP

2) The subtelomere annotation for all the 12 strains as well as the SGD reference (version R64-1-1) in GFF3 format.

3) We also provide the hidden Markov model (hmm) that we built for the yeast core X-element in case anyone is interested.


Finally, we further hosted all the supplementary data sets generated in this study for the public.


Raw Reads Accession

The PacBio sequencing reads for this project has been deposed in the European Nucleotide Archive (ENA) under project PRJEB7245. The strain to read mapping information is provided here. The Illumina sequencing reads for this project has been deposed in the Short Reads Archive (SRA) under project PRJNA340312.