eVOC 2.9 Release Notes
This document describes the eVOC v2.9 ontology and data mapping releases including new features, statistics and known issues.
- Date: June 2007
- Organism: Homo sapiens
- eVoke Release: Generated from eVoke Data Release 2.9
Contents
Overview
The data consists of a set of hierarchical ontologies and several gene expression data types or platforms that are curated by annotating them across all the ontologies. The data types are annotated with controlled terms in the ontologies that describe the samples used in gene expression experiments.
The following updated gene expression data types are mapped to the controlled terms in version 2.9:
Data Relationships
The cDNA library sample data is manually curated and annotated across all the ontologies by mapping them directly to the controlled terms. The other data types are mapped transitively to the controlled terms via the manually curated data and, depending on the data type, are mapped transitively to each other.

Data Relationships (click to download a larger image)
What's New
Data
- Data mappings are current as of January 2007
Ontologies
- Human ontologies:
- 29 new terms were added to the ontologies.
- The structure of eVOC Cell Type ontology was updated as the ontology was aligned and cross-referenced with OBO Cell ontology. In the process of re-structuring, more than half of the terms in eVOC Cell Type ontology were moved under a new parent. OBO Cell Ontology is undergoing changes, so additional structural changes and refinements can be expected to happen in eVOC Cell Type ontology in the future.
- Cross-references to UniProt Knowledgebase mammalian tissue vocabulary in tisslist have been added to the Anatomical System, Cell Type, Pathology and Development Stage ontologies.
- Mouse and developmental ontologies
Statistics
Data
Statistics for All Data Mappings in eVOC
|
Data type
|
Number present
|
Number different from previous release
|
Number absent1
|
|
cDNA libraries
|
8,401
|
6
|
0
|
|
EST sequences
|
6,053,168
|
32,825
|
0
|
|
RefSeq cDNA sequences
|
22,026
|
115
|
255
|
|
H-Inv cDNA sequences
|
38,410
|
-24
|
2,708
|
|
UniGene clusters
|
50,793
|
61
|
2,095
|
|
H-Inv clusters
|
19,495
|
-14
|
21,623
|
|
Genes
|
23,306
|
-15
|
15,445
|
Notes:
- This column refers to the number of data entries from the original data source at the given date (see table in the Overview) that are not included in the eVOC ontologies. Please refer to the Known Issues, for further details.
Statistics for Manually Curated Data Types in eVOC
|
Ontology name
|
Annotated data1/ Total data
|
|
cDNA libraries
|
|
Anatomical System
|
7,846 / 8,401 (93%)
|
|
Cell Type
|
661 / 8,401 (8%)
|
|
Developmental Stage
|
6,836 / 8,401 (81%)
|
|
Pathology
|
7,093 / 8,401 (84%)
|
|
Associated With
|
0 / 8,401 (0%) 2
|
|
Treatment
|
0 / 8,395 (0%) 2
|
|
Pooling
|
8,159 / 8,401 (97%)
|
|
Experimental Technique
|
8,401 / 8,401 (100%)
|
|
Tissue Preparation
|
6,971 / 8,401 (83%)
|
|
Microarray Platform
|
8,401 / 8,401 (100%) 3
|
Notes:
- All libraries and samples are mapped to each ontology but are considered annotated in eVOC only if they are mapped to terms other than "unclassifiable" and "pending".
- Mapping of the libraries to the Associated With and Treatment ontologies is work in progress and the libraries are associated with the term "pending" in these two ontologies.
- cDNA library data is mapped to the term "not applicable" in the Microarray Platform ontology.
Ontologies
|
Ontology name
|
Total number of terms1
|
|
Anatomical System
|
515
|
|
Cell Type
|
187
|
|
Developmental Stage
|
156
|
|
Pathology
|
198
|
|
Associated With
|
23
|
|
Treatment
|
61
|
|
Tissue Preparation
|
7
|
|
Experimental Technique
|
27
|
|
Pooling
|
7
|
|
Microarray Platform
|
18
|
|
Human Development
|
657
|
|
Mouse Development
|
368
|
|
Theiler Stage
|
31
|
Note:
- The synonyms are not included in the total number of terms.
Known Issues
- Some cDNA libraries are included but not yet annotated to appropriate terms across all ontologies due to the following curation issues:
- Lack of relevant annotative information submitted with the data. Data without sufficient annotative information in the raw data file is mapped to the term "unclassifiable" in eVOC. Submitters are contacted on an on-going basis to enrich the data with more appropriate annotation information.
- Lack of appropriate eVOC ontologies or terms to accommodate the diversity of annotative information submitted. Existing ontologies are extended and updated and new ontologies are developed on an on-going basis to enrich the data with more appropriate annotation information.
- UniGene clusters that consist entirely of mRNA sequences are not yet included in eVOC. H-Inv cDNA, RefSeq cDNA and Gene data related only through such UniGene clusters will subsequently not be included. (See figure under Data Relationships.)
- 8,802 genes within eVOC with LocusLink IDs have no official HUGO gene names and symbols. These genes are included to maximizing gene expression mining, and are assigned gene names and symbols using the following convention (note that these temporary names are updated as official gene names and symbols are made available):
- Gene names: "NO_NAME_<LocusLink ID>" where the actual LocusLink ID replaces the "<LocusLink ID>"
- Gene symbols: "NO_SYMBOL_<LocusLink ID>", where the actual LocusLink ID replaces the "<LocusLink ID>".