Metadata

Metadata is associated with much publicly available microarray data. Metadata includes information about the experimenter, experimental protocols, sample type (genotype, plant developmental stage, organ/cell type/environmental perturbations, etc.), and so on. Unfortunately, although there are detailed standards for microarray metadata, most public databases do not actually follow these standards. There is not a universal metadata format; rather, the metadata format varies considerably from database to database. In addition, experimenters frequently enter metadata information inconsistently, sometimes despite careful instructions in the database about what each metadata entry should mean. For example, for some experiments in NASCArrays, the entries under "treatment" are used to distinguish one sample from another (e.g., some samples are entered as "control" and others are entered as "sprayed with GA"). However, in other experiments, the entries under "treatment" describe the treatment that all samples in that experiment were subjected to (e.g., for all samples, "treated with Marathon"). In the absence of a human metadata curator, this type of inconsistence decreases the usefulness of metadata. However, despite these limitations in the available metadata, it is critical for better understanding the significance of experiments.

There are currently no public databases for metabolomics data; however, a suggested general format for the metadata has been described. This format includes MIAME-compliant-type experimental information such as for microarray metadata. It also takes into account that metabolomics is an emerging field, in which methodologies are varied, new methods are rapidly being developed, and there is not yet a clear standard for metabolite identification. Thus, it describes how to include detailed explanations of factors such as the metabolomics technologies and experimental conditions.

We are developing methods for automated parsing of metadata from other public microarray databases, initially focusing on NASCArrays, GEO, and BarleyBase/PlexDB. GEO contains a tremendous amount of data and has a reasonable metadata format, thus an automated parsing process for GEO metadata will facilitate analysis of microarray data from a wide variety of species. BarleyBase/PlexDB is completely MAIME-compliant, and contains a deep assortment of data from some widely-sampled plant species such as barley, wheat and rice. Any user-created XML file can be loaded into MetaOmGraph as metadata as well, although certain formatting is required for some of its sorting functions. The metadata file included with the Affy.ath1.data1 project can be used as a prototype.