Printable Version

eVOC 2.9 Release Notes

This document describes the eVOC v2.9 ontology and data mapping releases including new features, statistics and known issues.

  • Date: June 2007
  • Organism: Homo sapiens
  • eVoke Release: Generated from eVoke Data Release 2.9

Contents

Overview

The data consists of a set of hierarchical ontologies and several gene expression data types or platforms that are curated by annotating them across all the ontologies. The data types are annotated with controlled terms in the ontologies that describe the samples used in gene expression experiments.

The following updated gene expression data types are mapped to the controlled terms in version 2.9:

Data type / platform Original data source Updated through

cDNA libraries NCBI: GenBank Release 158 http://www.ncbi.nlm.nih.gov/Genbank/ Restricted to: EST sequence cDNA libraries January 4, 2007

EST sequences NCBI: GenBank Release 158 http://www.ncbi.nlm.nih.gov/Genbank/ January 4, 2007

RefSeq cDNA sequences NCBI: UniGene Build #199 http://www.ncbi.nlm.nih.gov/UniGene/ January 4, 2007

H-Inv cDNA sequences H-InvDB (Version_3.8) http://www.jbirc.aist.go.jp/hinv/ January 4, 2007

UniGene clusters NCBI: UniGene Build #199 http://www.ncbi.nlm.nih.gov/UniGene/ January 4, 2007

H-Inv clusters H-InvDB (Version_3.8) http://www.jbirc.aist.go.jp/hinv/ January 4, 2007

Genes NCBI: LocusLink http://www.ncbi.nlm.nih.gov/LocusLink/ January 4, 2007

Data Relationships

The cDNA library sample data is manually curated and annotated across all the ontologies by mapping them directly to the controlled terms. The other data types are mapped transitively to the controlled terms via the manually curated data and, depending on the data type, are mapped transitively to each other.


Data Relationships (click to download a larger image)

What's New

Data

  • Data mappings are current as of January 2007

Ontologies

  • Human ontologies:
    • 29 new terms were added to the ontologies.
    • The structure of eVOC Cell Type ontology was updated as the ontology was aligned and cross-referenced with OBO Cell ontology. In the process of re-structuring, more than half of the terms in eVOC Cell Type ontology were moved under a new parent. OBO Cell Ontology is undergoing changes, so additional structural changes and refinements can be expected to happen in eVOC Cell Type ontology in the future.
    • Cross-references to UniProt Knowledgebase mammalian tissue vocabulary in tisslist have been added to the Anatomical System, Cell Type, Pathology and Development Stage ontologies.

  • Mouse and developmental ontologies

Statistics

Data

Statistics for All Data Mappings in eVOC

Data type Number present Number different from previous release Number absent1

cDNA libraries 8,401 6 0

EST sequences 6,053,168 32,825 0

RefSeq cDNA sequences 22,026 115 255

H-Inv cDNA sequences 38,410 -24 2,708

UniGene clusters 50,793 61 2,095

H-Inv clusters 19,495 -14 21,623

Genes 23,306 -15 15,445

Notes:

  1. This column refers to the number of data entries from the original data source at the given date (see table in the Overview) that are not included in the eVOC ontologies. Please refer to the Known Issues, for further details.

Statistics for Manually Curated Data Types in eVOC

Ontology name Annotated data1/ Total data

cDNA libraries

Anatomical System 7,846 / 8,401 (93%)

Cell Type 661 / 8,401 (8%)

Developmental Stage 6,836 / 8,401 (81%)

Pathology 7,093 / 8,401 (84%)

Associated With 0 / 8,401 (0%) 2

Treatment 0 / 8,395 (0%) 2

Pooling 8,159 / 8,401 (97%)

Experimental Technique 8,401 / 8,401 (100%)

Tissue Preparation 6,971 / 8,401 (83%)

Microarray Platform 8,401 / 8,401 (100%) 3

Notes:

  1. All libraries and samples are mapped to each ontology but are considered annotated in eVOC only if they are mapped to terms other than "unclassifiable" and "pending".

  2. Mapping of the libraries to the Associated With and Treatment ontologies is work in progress and the libraries are associated with the term "pending" in these two ontologies.

  3. cDNA library data is mapped to the term "not applicable" in the Microarray Platform ontology.

Ontologies

Ontology name Total number of terms1

Anatomical System 515

Cell Type 187

Developmental Stage 156

Pathology 198

Associated With 23

Treatment 61

Tissue Preparation 7

Experimental Technique 27

Pooling 7

Microarray Platform 18

Human Development 657

Mouse Development 368

Theiler Stage 31

Note:
  1. The synonyms are not included in the total number of terms.

Known Issues

  1. Some cDNA libraries are included but not yet annotated to appropriate terms across all ontologies due to the following curation issues:

    • Lack of relevant annotative information submitted with the data. Data without sufficient annotative information in the raw data file is mapped to the term "unclassifiable" in eVOC. Submitters are contacted on an on-going basis to enrich the data with more appropriate annotation information.

    • Lack of appropriate eVOC ontologies or terms to accommodate the diversity of annotative information submitted. Existing ontologies are extended and updated and new ontologies are developed on an on-going basis to enrich the data with more appropriate annotation information.

  2. UniGene clusters that consist entirely of mRNA sequences are not yet included in eVOC. H-Inv cDNA, RefSeq cDNA and Gene data related only through such UniGene clusters will subsequently not be included. (See figure under Data Relationships.)

  3. 8,802 genes within eVOC with LocusLink IDs have no official HUGO gene names and symbols. These genes are included to maximizing gene expression mining, and are assigned gene names and symbols using the following convention (note that these temporary names are updated as official gene names and symbols are made available):

    • Gene names: "NO_NAME_<LocusLink ID>" where the actual LocusLink ID replaces the "<LocusLink ID>"

    • Gene symbols: "NO_SYMBOL_<LocusLink ID>", where the actual LocusLink ID replaces the "<LocusLink ID>".


Email Customer Support | Join eVOC Mailing List

Page last modified on June 13, 2007, at 02:34 PM