Printable Version

eVOC 2.7 Release Notes

This document describes the eVOC v2.7 ontology and data mapping releases including new features, statistics and known issues.

  • Date: 10 July 2005
  • Organism: Homo sapiens
  • eVoke Release: Generated from eVoke Data Release 2.7

Contents

Overview

The data consists of a set of hierarchical ontologies and several gene expression data types or platforms that are curated by annotating them across all the ontologies. The data types are annotated with controlled terms in the ontologies that describe the samples used in gene expression experiments.

The following updated gene expression data types are mapped to the controlled terms in version 2.7:

Data type / platform Original data source Updated through

cDNA libraries NCBI: GenBank Release 145 and daily dbEST updates http://www.ncbi.nlm.nih.gov/Genbank/ Restricted to: EST sequence cDNA libraries February 4, 2005

EST sequences NCBI: GenBank Release 145 and daily dbEST updates http://www.ncbi.nlm.nih.gov/Genbank/ February 4, 2005

RefSeq cDNA sequences NCBI: UniGene Build #180 http://www.ncbi.nlm.nih.gov/UniGene/ January 20, 2005

H-Inv cDNA sequences H-InvDB (Version_1.8) http://www.jbirc.aist.go.jp/hinv/ December 1, 2004

UniGene clusters NCBI: UniGene Build #180 http://www.ncbi.nlm.nih.gov/UniGene/ January 20, 2005

H-Inv clusters H-InvDB (Version_1.8) http://www.jbirc.aist.go.jp/hinv/ December 1, 2004

Genes NCBI: LocusLink http://www.ncbi.nlm.nih.gov/LocusLink/ February 4, 2005

Data Relationships

The cDNA library sample data is manually curated and annotated across all the ontologies by mapping them directly to the controlled terms. The other data types are mapped transitively to the controlled terms via the manually curated data and, depending on the data type, are mapped transitively to each other.


Data Relationships (click to download a larger image)

What's New

Data

  • 6 new cDNA libraries added:
    • NIH_MGC_262
    • Homo sapiens thyroid differential display
    • Human embryonic stem cells
    • Random primed cDNA library from Homo sapiens peripheral blood leukocytes mRNA
    • Homo sapiens pancreatic islet
    • Human fetal brain first-strand cDNA

  • 157 ORESTES cDNA libraries previously misannotated are now classified correctly. These libraries, which were mapped to normal or unclassifiable in the Pathology ontology, are now correctly mapped to adenocarcinoma.

Ontologies

  • Anatomical System:
    • Added term "cecum" (with synonym "caecum").

  • Cell Type:
    • epithelium renamed to epithelial cell.
    • transitional renamed to transitional cell.
    • mesothelium renamed to mesothelial cell.
    • neurothelium renamed to neurothelial cell.
    • retinal pigment epithelium renamed to retinal pigment epithelial cell.
    • Synonym pigmented retinal epithelium (of retinal pigment epithelial cell) changed to pigmented retinal epithelial cell.

  • Pathology:
    • Added term hyperthyroidism.

Statistics

Data

Statistics for All Data Mappings in eVOC

Data type Number present Number different from previous release Number absent1

cDNA libraries 8,401 6 0

EST sequences 6,053,168 32,825 0

RefSeq cDNA sequences 22,026 115 255

H-Inv cDNA sequences 38,410 -24 2,708

UniGene clusters 50,793 61 2,095

H-Inv clusters 19,495 -14 21,623

Genes 23,306 -15 15,445

Notes:

  1. This column refers to the number of data entries from the original data source at the given date (see table in the Overview) that are not included in the eVOC ontologies. Please refer to the Known Issues, for further details.

Statistics for Manually Curated Data Types in eVOC

Ontology name Annotated data1/ Total data

cDNA libraries

Anatomical System 7,846 / 8,401 (93%)

Cell Type 661 / 8,401 (8%)

Developmental Stage 6,836 / 8,401 (81%)

Pathology 7,093 / 8,401 (84%)

Associated With 0 / 8,401 (0%) 2

Treatment 0 / 8,395 (0%) 2

Pooling 8,159 / 8,401 (97%)

Experimental Technique 8,401 / 8,401 (100%)

Tissue Preparation 6,971 / 8,401 (83%)

Microarray Platform 8,401 / 8,401 (100%) 3

Notes:

  1. All libraries and samples are mapped to each ontology but are considered annotated in eVOC only if they are mapped to terms other than "unclassifiable" and "pending".

  2. Mapping of the libraries to the Associated With and Treatment ontologies is work in progress and the libraries are associated with the term "pending" in these two ontologies.

  3. cDNA library data is mapped to the term "not applicable" in the Microarray Platform ontology.

Ontologies

Ontology name Total number of terms1

Anatomical System 395

Cell Type 161

Developmental Stage 154

Pathology 176

Associated With 23

Treatment 23

Tissue Preparation 7

Experimental Technique 27

Pooling 7

Microarray Platform 18

Note:
  1. The synonyms are not included in the total number of terms.

Known Issues

  1. Some cDNA libraries are included but not yet annotated to appropriate terms across all ontologies due to the following curation issues:

    • Lack of relevant annotative information submitted with the data. Data without sufficient annotative information in the raw data file is mapped to the term "unclassifiable" in eVOC. Submitters are contacted on an on-going basis to enrich the data with more appropriate annotation information.

    • Lack of appropriate eVOC ontologies or terms to accommodate the diversity of annotative information submitted. Existing ontologies are extended and updated and new ontologies are developed on an on-going basis to enrich the data with more appropriate annotation information.

  2. UniGene clusters that consist entirely of mRNA sequences are not yet included in eVOC. H-Inv cDNA, RefSeq cDNA and Gene data related only through such UniGene clusters will subsequently not be included. (See figure under Data Relationships.)

  3. 8,802 genes within eVOC with LocusLink IDs have no official HUGO gene names and symbols. These genes are included to maximizing gene expression mining, and are assigned gene names and symbols using the following convention (note that these temporary names are updated as official gene names and symbols are made available):

    • Gene names: "NO_NAME_<LocusLink ID>" where the actual LocusLink ID replaces the "<LocusLink ID>"

    • Gene symbols: "NO_SYMBOL_<LocusLink ID>", where the actual LocusLink ID replaces the "<LocusLink ID>".


Email Customer Support | Join eVOC Mailing List

Page last modified on July 05, 2005, at 05:51 PM