Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RNA-seq

ECM331 does not contain an intron in any of the three species. The parasexual cycle in Candida albicans provides an alternative pathway to meiosis for the formation of recombinant strains. NAG3 is a tandem duplicate of NAG4 (Additional file 3, cluster 59), which occurred in the ancestor of C.

Fungal Genomics

Colombo and E. Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Some of the 6,354 predicted ORFs are likely to be spurious.

The task of defining the complete set of transcripts that an organism expresses is complicated by the fact that transcriptomes are dynamic entities that change in response to the extracellular environment.

Supplementary Information

The analysis was performed on a total of 41 genes. Our analysis of gene expression of all the novel transcripts as well as the previously annotated genes has revealed a number of interesting features of gene expression. A critical piece of information missing from the Materials and methods is how reads that could be mapped to multiple places were dealt with during alignment.

Seventy out of 215 intron-containing ORFs have reciprocal best matches with S.

Comparative Genomics

In the cell cycle, a number of differences from S. Phenotypic switching is spontaneous, happens at lower rates and in certain strains up to seven different phenotypes are known. In addition to the well-described secreted aspartyl proteinases, lipases, and high-affinity iron transporters, C.


Hall and Dietrich [49] showed that the original eukaryotic biotin pathway was lost in the last common ancestor of Candida and Saccharomyces species, but it has been rebuilt through horizontal gene transfer from bacterial species via transfers of BIO3 from δ-proteobacteria and BIO4 from α-proteobacteria, followed by gene duplication and neofunctionalization. Two genes in C. The MEP family, encoding three ammonium permeases in C. In general, sequence at the ends of phrap contigs came from a single read and therefore was often of low quality and sometimes chimeric. An enrichment analysis was done using FatiGO (Al-Shahrour et al.)

GC content and 27-mer count analyses of the sequencing reads revealed two peaks with similar GC content but different coverage (Figures 1A,B). Homologs assembled separately by phrap account for 19% of the genome but contain 65% of the polymorphisms. We included tests of distributions and post hoc analyses, including Kolmogorov-Smirnov test and Kruskal-Wallis test with Dunn's multiple comparison (Figure 2—figure supplement 2B, Figure 2—figure supplement 3A-E, Figure 6A and B, and Figure 6—figure supplement 1A and B).

The excess of indels with length a multiple of three is concentrated almost completely in the coding fraction of the genome as defined by the reduced ORF set (Fig. We have also identified 190 genes truncated at the ends of contigs, only 35 of which have an identical counterpart on a potentially overlapping contig. The OPT2, OPT3, and OPT4 genes are highly similar to each other.

CHEF gel densitometry

The mitochondrial genome of each strain was reconstructed with FastaAlternateReferenceMaker tool of GATK v3. An illustrative example of such patterns is presented in Figure 1C. The graphical representation of only intra-chromosomal repeat matches (Figure 2A) identified chromosome arms that were repeat-rich or -poor. The superassembly process continued these trends while delivering a product very close to independently derived estimates of the genome size. Furthermore, 41 gene products have been associated with an EC number, indicating an enzymatic activity, with phospholipases being the most abundant. Tandem gene duplication is one mechanism by which species acquire new genes, and by extrapolation, new functions.

The physical map of Candida albicans

The sum of the contig lengths exceeded the genome size by ≈20%. This family was previously suggested as a potential antifungal target, as there are no homologs in humans [31]. Fluconazole-resistant pathogens Candida inconspicua and, C. ECM33 and orf19. Overall our results suggest that the majority of Candida tandem duplicates are under the influence of strong purifying selection, presumably to conserve gene function.

As detailed in Materials and Methods, we used assembly version 19 of the C.

Author Information

CRISPR/Cas9 has been adapted to be used in C. We provide a whole-genome description of heterozygosity in the organism. Genome annotation predicted 5,079 proteins (see section "Materials and Methods"). Please consult the reference list here. We devised a purely computational method to define a comprehensive list of multigene families using NCBI-BLAST and custom Perl scripts. This clear diploid status of CBS180 is atypical of strains in clade 2.

Strain SC5314 (7) was chosen for large-scale sequencing because of its widespread and increasing use in molecular analyses, virulence in animal models, and apparent standard diploid electrophoretic karyotype. Compared to the S. To identify subclusters with functional enrichment, we determined a significant Pearson correlation through permutation analysis as done previously ( Brown et al. Read mapping and variant calling of all strains against this final mitochondrial assembly was performed as mentioned before.


It is the etiological agent of mucosal infections such as oral and vaginal thrush and can also disseminate through the bloodstream to establish infection at several different anatomical sites (Klepser 2020). Whole-genome sequencing was performed at the Genomics Unit from Centre for Genomic Regulation (CRG) with a HiSeq2500 machine.

The annotation effort described here did not edit the underlying assembly 19 sequence. To explore these scenarios, we first analyzed the MAT locus of the different strains (Supplementary Figure 3). We have identified 602 transcripts that do not correspond to known annotated features in the CGD. In yeast, genes involved in the control of mating type are found in these silent regions, and SIR2 represses their expression by maintaining a silent-competent chromatin structure in this region. At the DNA level, positive selection may be detected by comparing the rate of amino acid altering (nonsynonymous) nucleotide substitutions with the rate of synonymous substitution (d N /d S ).

We therefore restricted our analysis to these species. For this, pairwise comparisons between overlapping LOH blocks were performed using bedtools jaccard v2. Identification of Heterozygosity in Strain SC5314. We discuss this finding in comparison with two other medically important Candida hybrid lineages:

