BioAssay Ontology

Knowledge-based description of chemical biology assays and screening results


During the last few years small molecule biological assays performed at publically funded screening centers have been generating very large amounts of data. The largest effort is the NIH Molecular Libraries Program , which has the goal of developing novel chemical tools (chemical probes) to interrogate biological systems using high-throughput screening (HTS). Huge data sets generated by HTS are deposited in PubChem. Other public resources for small molecule screening data include ChemBank or the Psychoactive Drug Screening Program Ki database. In addition to data in PubChem and other public databases there are even larger data sets in pharmaceutical companies.

This NIH-funded project is focused on the development of a semantics-based solution involving content and software components. We will enable researchers to search, retrieve, compare, and analyze diverse biological screening data sets (such as those in PubChem). This will dramatically increase the value of existing high-throughput data for the chemical biology, screening and cheminformatics communities and also facilitate collaboration. For semantic-based querying and analysis we are developing an ontology to conceptualize the biological screening knowledge domain. Ontologies have been used for over a decade in biology (for example Gene Ontology, GO,) and they are the pillars of the semantic web. Using our BioAssay Ontology, screening experiments will be described by concepts with relationships and their properties. The power of such a formal description is that new relationships that are not explicitly stated can be inferred. We are using domain expert knowledge and also automated text mining and natural language processing methods.

A longer-term goal is to facilitate the integration of screening data with other types of life science data, such as biological pathways, disease networks, and structural biology, etc to analyze HTS in the context of specific mechanisms of action and to facilitate the transformation of data into knowledge