Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RNA-seq

The cluster is found in the Candida species, but NAG1 and DAC1 are missing in the Saccharomyces lineage. The genome size and physical map of C.

ECM331 does not contain an intron in any of the three species. The parasexual cycle in Candida albicans provides an alternative pathway to meiosis for the formation of recombinant strains. NAG3 is a tandem duplicate of NAG4 (Additional file 3, cluster 59), which occurred in the ancestor of C.

(4800), allowed us to identify C.

The OPT family. This is seen somewhat in the IGV snapshot where the repeat regions have an increase in read coverage compared to the internal unique sequence. In an earlier study, spurious genes in S. Among the protein differences, for 94 there was no ORF (100 amino acids or greater) on the homologous supercontig obviously encoding an allele, and for 57 others the ORF was fragmented into more than one ORF on the homologous supercontig.

Fungal Genomics

Colombo and E. Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Some of the 6,354 predicted ORFs are likely to be spurious.

The task of defining the complete set of transcripts that an organism expresses is complicated by the fact that transcriptomes are dynamic entities that change in response to the extracellular environment. Shown is a pair of homologous supercontigs (10065 and 20205) built from phrap contigs 1563, 2303*, 998, 2231, and 1981, where * denotes sequence complementation.

Supplementary Information

The analysis was performed on a total of 41 genes. Our analysis of gene expression of all the novel transcripts as well as the previously annotated genes has revealed a number of interesting features of gene expression. Genes were initially added to pillars based on automated assignments derived from best bidirectional BLASTP searches. A critical piece of information missing from the Materials and methods is how reads that could be mapped to multiple places were dealt with during alignment.

Seventy out of 215 intron-containing ORFs have reciprocal best matches with S.

Comparative Genomics

8 (4), R52 (2020) PUBMED 17419877 REFERENCE 3 (residues 1 to 162) AUTHORS Jones,T. In the cell cycle, a number of differences from S. (033), Even so, we could identify exactly the same clades and sub-clades in the two trees a (Supplementary Figure 9). Phenotypic switching is spontaneous, happens at lower rates and in certain strains up to seven different phenotypes are known. 64%) is much higher than the divergence observed between most distantly related strains of well-recognized yeast species (i. In addition to the well-described secreted aspartyl proteinases, lipases, and high-affinity iron transporters, C. Standard finishing experiments designed to close gaps, normally undertaken after completing an assembly, were inappropriate if most apparent gaps were caused by separate assembly of heterozygous sequence, not lack of data.


237561" /chromosome="7" /haplotype="A" /country="USA: All raw data and statistical analyses were updated in the source data files (Figure 2—source data 1-4 and Figure 6—source data 1). The high prevalence of aneuploidies in some C. Our phylogeny places the S. Firstly, CBS180, and the three strains of clade 1, 14ANR23920, 9_16, and CI1, were mainly diploid (r2 > 0. )Nucleotide resolution is provided in Figure 3B and C and Figure 4B and C to support that all breakpoints are occurring within 2 kb of a repeat sequence. For example, to visualize the distribution of spacer lengths between repeat matches on each chromosome we generated new Figure 2—figure supplement 2B and Figure 2—source data 2, and found there was a significant difference in the distribution of spacer lengths across all chromosomes (p < 0. )Members of the MEP gene family encode ammonium permeases and, along with the OPT family described below, feature prominently in our list of fungal-specific genes.

1% of phase zero, one, and two introns, respectively. Heterozygosity patterns in Candida inconspicua type strain genome. Interestingly, mitochondrial genomes from all strains of clade 2 revealed some short deleted regions, from which we highlight a major 1. We clarified the nomenclature for LTRs and retrotransposons. First, a point of clarification: The centromeres as identified by Sanwal et al5 are listed in the “gene” row. Broth-microdilution was performed according to EUCAST guidelines (Arendrup et al. )Mapping results were inspected with IGV version 2.

There is however an inversion of the surrounding region between SC5314 and WO-1 (Figure 4); this appears to result from a rearrangement between two members of the oligopeptide transporter gene family, OPT9 (a pseudogene) and OPT1. Some metabolic pathway clusters result from tandem duplication; for example, AOX1 and AOX2 (encoding cyanide insensitive enzymes required for an alternative pathway of aerobic respiration) in C. (7%, respectively) supporting paralogous duplication of multiple contiguous ORFs. The identifiers for these data are as follows:

Correlating STR distribution with Gene Ontology annotations shows that a significant proportion of the C. We removed ORFs smaller than 300 bp with no significant sequence similarity to other genes, either within the C. 2020; Bensen et al. It is difficult to discern any of this from Figure 3—figure supplement 1. Expression of this gene is not influenced by galactose in C. The reads of all strains were mapped against this region. The phylogenomic distribution of the Nag regulon is intriguing.

Hall and Dietrich [49] showed that the original eukaryotic biotin pathway was lost in the last common ancestor of Candida and Saccharomyces species, but it has been rebuilt through horizontal gene transfer from bacterial species via transfers of BIO3 from δ-proteobacteria and BIO4 from α-proteobacteria, followed by gene duplication and neofunctionalization. Two genes in C. The MEP family, encoding three ammonium permeases in C. In general, sequence at the ends of phrap contigs came from a single read and therefore was often of low quality and sometimes chimeric. An enrichment analysis was done using FatiGO (Al-Shahrour et al. )

GC content and 27-mer count analyses of the sequencing reads revealed two peaks with similar GC content but different coverage (Figures 1A,B). Homologs assembled separately by phrap account for 19% of the genome but contain 65% of the polymorphisms. We included tests of distributions and post hoc analyses, including Kolmogorov-Smirnov test and Kruskal-Wallis test with Dunn’s multiple comparison (Figure 2—figure supplement 2B, Figure 2—figure supplement 3A-E, Figure 6A and B, and Figure 6—figure supplement 1A and B). The "+" button outputs CGOB pillar data in a tabulated format. 5 mM hydrogen peroxide) (for expression data, see Fig.

The excess of indels with length a multiple of three is concentrated almost completely in the coding fraction of the genome as defined by the reduced ORF set (Fig. )The increased filamentation responses found in C. Total DNA integrity and quantity of the samples were assessed by means of agarose gel, NanoDrop 1000 Spectrophotometer (Thermo Fisher Scientific, United States) and Qubit dsDNA BR assay kit (Thermo Fisher Scientific). We have also identified 190 genes truncated at the ends of contigs, only 35 of which have an identical counterpart on a potentially overlapping contig. The OPT2, OPT3, and OPT4 genes are highly similar to each other. Albicans itself is the present participle of the Latin word albicō, meaning becoming white.

CHEF gel densitometry

B8441 sequence is now available in all of the CGD tools, including BLAST Each C. Shown are eight supercontigs accounting for 93% of the sequence of chromosome 7, ordered and oriented by physical map data. The problem is now fixed and we also implemented measures to alert us to this kind of issues in the future. 4955 may have arisen through duplication, and ECM331 is the result of reverse transcription of one of the intron-containing paralogs, in an ancestor of the three species. Both assembly and tiling path problems can be avoided by brute force, e. The annotated assemblies from this project are available in Genbank for the following genomes: Published ALS gene sequences can be found on the CGD Web site.

The switching is reversible, and colony type can be inherited from one generation to another. In addition, it has recently been reported in traditional alcoholic beverages such as oil palm wine and a sorghum beer called tchapalo (Egue et al. )On the other hand, a commercial test was also used. 2 (Shimodaira and Hasegawa, 2020) was used to assess the confidence that a given tree could correspond to a given alignment. LOCUS XP_721313 162 aa linear PLN 04-APR-2020 DEFINITION peptidylprolyl isomerase [Candida albicans SC5314]. For example in C. To have a control of the two expected scenarios, we decided to do the same analysis for C.

The mitochondrial genome of each strain was reconstructed with FastaAlternateReferenceMaker tool of GATK v3. An illustrative example of such patterns is presented in Figure 1C. This section now includes: The graphical representation of only intra-chromosomal repeat matches (Figure 2A) identified chromosome arms that were repeat-rich or -poor. The superassembly process continued these trends while delivering a product very close to independently derived estimates of the genome size. Furthermore, 41 gene products have been associated with an EC number, indicating an enzymatic activity, with phospholipases being the most abundant. Tandem gene duplication is one mechanism by which species acquire new genes, and by extrapolation, new functions.

The physical map of Candida albicans

The sum of the contig lengths exceeded the genome size by ≈20%. This family was previously suggested as a potential antifungal target, as there are no homologs in humans [31]. Fluconazole-resistant pathogens Candida inconspicua and, C. ECM33 and orf19. Overall our results suggest that the majority of Candida tandem duplicates are under the influence of strong purifying selection, presumably to conserve gene function. Observed breakpoints had significantly more overlap with long repeat sequences than expected given the total genome coverage of long repeat sequences (p < 0. )

An extended stay for up to 21 more days compared to non-infected patients is not uncommon. The proportion is color-coded according to the color bar shown. As detailed in Materials and Methods , we used assembly version 19 of the C.

This is not an indication of a security issue such as a virus or attack. However, the third gene (orf19. )Currently there are 155 metabolic pathways that have been manually curated by the Candida Genome Database. Especially high-risk individuals are patients that have recently undergone surgery, a transplant or are in the Intensive Care Units (ICU),[72] Candida albicans infections is the top source of fungal infections in critically ill or otherwise immuncompromised patients. The three genes involved in the conversion of Nag to fructose-6-phosphate encode hexokinase kinase (HXK1/orf19. )Furthermore, genes specifically duplicated in C. 5 (Gurevich et al. )

Author Information

Members of each of these families are differentially expressed as a function of the yeast–hyphae transition, phenotypic switching, or timing during experimental infection. The best studied switching mechanism is the white to opaque switching (an epigenetic process). No significant differences were found between the electrophoretic karyotype of the sequencing strain SC5314 and CBS5736. It may also affect a number of other regions.

CRISPR/Cas9 has been adapted to be used in C. We provide a whole-genome description of heterozygosity in the organism. Genome annotation predicted 5,079 proteins (see section "Materials and Methods"). Please consult the reference list here. We devised a purely computational method to define a comprehensive list of multigene families using NCBI-BLAST and custom Perl scripts. This clear diploid status of CBS180 is atypical of strains in clade 2. Among these 70 ORFs, 25 introns (35. )The orientation of supercontigs 10110 and 10253 is uncertain.

Strain SC5314 (7) was chosen for large-scale sequencing because of its widespread and increasing use in molecular analyses, virulence in animal models, and apparent standard diploid electrophoretic karyotype. Compared to the S. 7362 (SKN1) and orf19. To identify subclusters with functional enrichment, we determined a significant Pearson correlation through permutation analysis as done previously ( Brown et al. )Read mapping and variant calling of all strains against this final mitochondrial assembly was performed as mentioned before. The K-mer Analysis Toolkit (KAT; Mapleson et al. )


It is the etiological agent of mucosal infections such as oral and vaginal thrush and can also disseminate through the bloodstream to establish infection at several different anatomical sites (Klepser 2020). Whole-genome sequencing was performed at the Genomics Unit from Centre for Genomic Regulation (CRG) with a HiSeq2500 machine. (1) We think it is important to add a more detailed description (in the Materials and methods section) and critical discussion (in the Results or Discussion section) on how the repeats were mapped starting from short-read sequences.

The annotation effort described here did not edit the underlying assembly 19 sequence. To explore these scenarios, we first analyzed the MAT locus of the different strains (Supplementary Figure 3). We have identified 602 transcripts that do not correspond to known annotated features in the CGD. In yeast, genes involved in the control of mating type are found in these silent regions, and SIR2 represses their expression by maintaining a silent-competent chromatin structure in this region. TITLE The diploid genome sequence of Candida albicans JOURNAL Proc. From the eight chromosomes and the assembly gaps due to the copies of the MRS, one has ≈20 contigs as a lower bound. At the DNA level, positive selection may be detected by comparing the rate of amino acid altering (nonsynonymous) nucleotide substitutions with the rate of synonymous substitution (d N /d S ).

We therefore restricted our analysis to these species. For this, pairwise comparisons between overlapping LOH blocks were performed using bedtools jaccard v2. Identification of Heterozygosity in Strain SC5314. We discuss this finding in comparison with two other medically important Candida hybrid lineages:

Similar to previous studies (Pryszcz et al.