Printable Version

eVOC 2.8 Release Notes

This document describes the eVOC v2.8 ontology and data mapping releases including new features, statistics and known issues.

  • Date: 21 November July 2006
  • Organism: Homo sapiens
  • eVoke Release: Generated from eVoke Data Release 2.7

Please note that for eVOC 2.8 only the ontologies were updated and expanded. Annotations and data mappings described below are still based on eVOC 2.7, if you require these we recommend utilizing the 2.7 release for the time being. Both annotations and mappings will be updated shortly.

Contents

Overview

The data consists of a set of hierarchical ontologies and several gene expression data types or platforms that are curated by annotating them across all the ontologies. The data types are annotated with controlled terms in the ontologies that describe the samples used in gene expression experiments.

The following updated gene expression data types are mapped to the controlled terms in version 2.7:

Data type / platform Original data source Updated through

cDNA libraries NCBI: GenBank Release 145 and daily dbEST updates http://www.ncbi.nlm.nih.gov/Genbank/ Restricted to: EST sequence cDNA libraries February 4, 2005

EST sequences NCBI: GenBank Release 145 and daily dbEST updates http://www.ncbi.nlm.nih.gov/Genbank/ February 4, 2005

RefSeq cDNA sequences NCBI: UniGene Build #180 http://www.ncbi.nlm.nih.gov/UniGene/ January 20, 2005

H-Inv cDNA sequences H-InvDB (Version_1.8) http://www.jbirc.aist.go.jp/hinv/ December 1, 2004

UniGene clusters NCBI: UniGene Build #180 http://www.ncbi.nlm.nih.gov/UniGene/ January 20, 2005

H-Inv clusters H-InvDB (Version_1.8) http://www.jbirc.aist.go.jp/hinv/ December 1, 2004

Genes NCBI: LocusLink http://www.ncbi.nlm.nih.gov/LocusLink/ February 4, 2005

Data Relationships

The cDNA library sample data is manually curated and annotated across all the ontologies by mapping them directly to the controlled terms. The other data types are mapped transitively to the controlled terms via the manually curated data and, depending on the data type, are mapped transitively to each other.


Data Relationships (click to download a larger image)

What's New

Data

  • No changes in ontology annotations or data mappings for this release

Ontologies

  • Human ontologies:

  • Mouse and developmental ontologies

    • The eVOC ontology system now contains three new ontology files within this directory: the Mouse and Human Developmental eVOC ontologies, as well as a Theiler Stage ontology for mouse. The Mouse Developmental ontology represents all anatomical structures throughout the 28 Theiler stages of mouse development, whereas the Human Developmental ontology represents all anatomical structures throughout the 23 Carnegie stages of human development. The Theiler Stage ontology is a hierarchy of the Theiler stages of mouse development, categorized as embryonic, fetal or adult development. A combination of terms from the Mouse Development and Theiler Stage ontologies therefore represents an anatomical structure on the spatial and temporal level.

    • The terms in the developmental ontologies are derived from Edinburgh Mouse Atlas Project (EMAP) and Mouse Anatomy (MA) for mouse, and Edinburgh Human Developmental Anatomy (HUMAT) for human.

Statistics

Data

Statistics for All Data Mappings in eVOC

Data type Number present Number different from previous release Number absent1

cDNA libraries 8,401 6 0

EST sequences 6,053,168 32,825 0

RefSeq cDNA sequences 22,026 115 255

H-Inv cDNA sequences 38,410 -24 2,708

UniGene clusters 50,793 61 2,095

H-Inv clusters 19,495 -14 21,623

Genes 23,306 -15 15,445

Notes:

  1. This column refers to the number of data entries from the original data source at the given date (see table in the Overview) that are not included in the eVOC ontologies. Please refer to the Known Issues, for further details.

Statistics for Manually Curated Data Types in eVOC

Ontology name Annotated data1/ Total data

cDNA libraries

Anatomical System 7,846 / 8,401 (93%)

Cell Type 661 / 8,401 (8%)

Developmental Stage 6,836 / 8,401 (81%)

Pathology 7,093 / 8,401 (84%)

Associated With 0 / 8,401 (0%) 2

Treatment 0 / 8,395 (0%) 2

Pooling 8,159 / 8,401 (97%)

Experimental Technique 8,401 / 8,401 (100%)

Tissue Preparation 6,971 / 8,401 (83%)

Microarray Platform 8,401 / 8,401 (100%) 3

Notes:

  1. All libraries and samples are mapped to each ontology but are considered annotated in eVOC only if they are mapped to terms other than "unclassifiable" and "pending".

  2. Mapping of the libraries to the Associated With and Treatment ontologies is work in progress and the libraries are associated with the term "pending" in these two ontologies.

  3. cDNA library data is mapped to the term "not applicable" in the Microarray Platform ontology.

Ontologies

Ontology name Total number of terms1

Anatomical System 512

Cell Type 180

Developmental Stage 156

Pathology 191

Associated With 23

Treatment 58

Tissue Preparation 7

Experimental Technique 27

Pooling 7

Microarray Platform 18

Human Development 657

Mouse Development 368

Theiler Stage 31

Note:
  1. The synonyms are not included in the total number of terms.

Known Issues

  1. Some cDNA libraries are included but not yet annotated to appropriate terms across all ontologies due to the following curation issues:

    • Lack of relevant annotative information submitted with the data. Data without sufficient annotative information in the raw data file is mapped to the term "unclassifiable" in eVOC. Submitters are contacted on an on-going basis to enrich the data with more appropriate annotation information.

    • Lack of appropriate eVOC ontologies or terms to accommodate the diversity of annotative information submitted. Existing ontologies are extended and updated and new ontologies are developed on an on-going basis to enrich the data with more appropriate annotation information.

  2. UniGene clusters that consist entirely of mRNA sequences are not yet included in eVOC. H-Inv cDNA, RefSeq cDNA and Gene data related only through such UniGene clusters will subsequently not be included. (See figure under Data Relationships.)

  3. 8,802 genes within eVOC with LocusLink IDs have no official HUGO gene names and symbols. These genes are included to maximizing gene expression mining, and are assigned gene names and symbols using the following convention (note that these temporary names are updated as official gene names and symbols are made available):

    • Gene names: "NO_NAME_<LocusLink ID>" where the actual LocusLink ID replaces the "<LocusLink ID>"

    • Gene symbols: "NO_SYMBOL_<LocusLink ID>", where the actual LocusLink ID replaces the "<LocusLink ID>".


Email Customer Support | Join eVOC Mailing List

Page last modified on November 21, 2006, at 01:18 PM