Contents
AbstractThis document introduces the use of the OWL Web Ontology
Language for
medical and biosciences applications. The focus is basic ideas
and the approach for creation of collaborative
applications. Use of Protégé Java API's for accessing data within
OWL documents is demonstrated using the BioPax Level 2 Ontology.
The
intended audience is people in science, medicine, or software
engineering wanting to
understand the basics of this area and software developers new to the
technology. IntroductionThe Web Ontology Language OWL represents the meanings of terms
in
vocabularies and the relationships between those terms in a way that is
suitable for processing by software. Organizing information using OWL
can be very powerful because it can eliminate the need for applications
to embed logic specific to their own software. The logic can be
externalized to the OWL data, which can be authored by subject matter
experts and updated without changing application source code. OWL
is particularly useful for representing large volumes of complex data,
such as that found in medicine and biosciences. In
this introductory article I will describes the basics of OWL and
demonstrate it somes uses with medical and biology examples. The representation of terms from vocabularies together with the relationships is called an ontology. Here the term ontology has been borrowed from philosophy where it refers to the art of describing the various kinds of things that exist and how they are related to one another. OWL is developed as an extension of the Resource Description Framework (RDF), which is a language for representing resources, such as RSS feeds. See this site's Frequently Asked Questions for more detail on RSS feeds. OWL differs from most other XML variants and XML Schema itself in that they are use to define the structure of information within a document but not to support reasoning outside their own context. One of goals of the OWL initiative from the W3C is to encourage the development of ontologies by various groups with specific subject matter expertise, such as medical and bioscience groups, and at the same time encourage the development of generic OWL processing and reasoning tools that process and the specific ontologies.
|
There are a number of ontology projects that exist at present. Here
are some of them.
| Project | Ontology |
|---|---|
| BioPAX Level 2 covers metabolic pathways16 | Metabolic Pathways |
| US National Library of Medicine, Unified Medical Language System18 | General Medical Knowledge |
| The Open Biomedical Ontologies21 project has a number of ontologies in medical and biological areas. It is sponsored by The National Center for Biomedical Ontology24 | Animal natural history and life history |
| Arabidopsis development and gross anatomy | |
| Biological imaging methods | |
| Biological process | |
| BRENDA tissue / enzyme source | |
| C. elegans development and gross anatomy | |
| Cell type | |
| Cellular component | |
| Cereal plant development, gross anatomy, and traits | |
| Chemical entities of biological interest | |
| Dictyostelium discoideum anatomy | |
| Drosophila development and gross anatomy | |
| Event (INOH pathway ontology) and codes | |
| eVOC (Expressed Sequence Annotation for Humans) | |
| FlyBase Controlled Vocabulary | |
| Fungal gross anatomy | |
| Habronattus courtship | |
| Human developmental anatomy and disease | |
| Loggerhead nesting | |
| Maize gross anatomy | |
| Mammalian phenotype | |
| Medaka fish anatomy and development | |
| MESH | |
| Microarray experimental conditions | |
| Molecular function | |
| Molecule role (INOH Protein name/family name ontology) | |
| Mosquito gross anatomy | |
| Mouse adult gross anatomy | |
| Mouse gross anatomy and development | |
| Mouse pathology | |
| Multiple alignment1 | |
| NCBI organismal classification and | |
| Thesaurus | |
| Pathways | |
| Physico-chemical methods, properties, and processes | |
| Plant environmental conditions, growth, developmental stage, and structure | |
| Plasmodium life cycle | |
| Protein covalent bond, domain, and Protein-protein interaction | |
| Sequence types and features | |
| Systems Biology | |
| UniProt taxonomy | |
| Zebrafish anatomy and development | |
| The Gene Ontology Project22 | Genes |
| Standards and Ontologies for Functional Genomics27 | Human and mouse anatomies |
| MGED Open Source Projects28 | Microarray Gene Expression Data |
| Plant Ontology™ Consortium27 | Plant structures and growth and developmental stages |
| The Trial Bank Project30 | Clinical trial database |
| Case Western Reserve University, Matthias Samwald37 | Psychoactive Drug Screening Program Ki database |
There is no need for the subject to be an Internet resource, such
as a web page example above. It may be a person as well. In this case a
URI is
used.
For example, we could model the statement
Fred Flinstone as examined on
April 12, 2006
as
Fred Flinstone is not an Internet resource but he can be described using this RDF fragment, which makes use of a URI for Fred
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:medcomterms="http://www.medicalcomputing.net/terms/">
<rdf:Description rdf:about="http://www.w3.org/People/EM/contact#fred_flinstone">
<medcomterms:was-examined>April 12, 2006</medcomterms:was-examined>
</rdf:Description>
</rdf:RDF>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#">
<contact:Person rdf:about="http://www.w3.org/People/EM/contact#fred_flinstone">
<contact:fullName>Fred Flinstone</contact:fullName>
<contact:mailbox rdf:resource="fflinstone@medicalcomputing.net"/>
</contact:Person>
</rdf:RDF>
The person, Fred Flinstone is described as a by the URI http://www.w3.org/People/EM/contact#fred_flinstone.
Fred's full name and mailbox are also listed as properties. Properties are a
fundamental idea in RDF. Properties
are binary relationships between the individuals, or objects, or
describe
data values associated with objects. Instances of properties can
be written as ordered pairs. For example, this instance of the
property mailbox can be written
(http://www.w3.org/People/EM/contact#fred_flinstone,
"fflinstone@medicalcomputing.net").
The domain of a property
restricts the
individuals to which the property can be applied. For example, we
could define the domain of was-examined property to be
people. The range of a
property restricts the values it can take. For example, the range
of the was-examined property may be defined as a valid
date before the current point in time.
The concept of a class is
described in the RDF Schema as a group of
objects that share properties.
For example, person could
be defined as a class because individuals belonging to that class all
have names and some have email addresses. The use of the term
individual here is not casual. RDF uses the term individuals to describe instances of
classes. For example, Fred Flinstone is an instance of the class
person, so he is an individual. Despite the example use, the term
individual in
RDF and OWL does not just refer to people. Classes may be
organized in a hierarchy using
the subClassOf relation. For example, employee
could be defined as a subclass of person.
The OWL Web Ontology Language
|
There a number of tools available for working with OWL and many of
them are freely available. This section lists some of them and
the references section at the
end of this article lists more.
Searching ontologies is something that most projects are going to need to do. XML in general can be searched with XPath, a standard from the W3C31. XPath is implemented by most eXensible Style Sheet Language Transformation (XSLT) processors. Xalan is the XSLT processor used in the XSLT example in the related article Introduction to Extensible Markup Language for Chemistry and Biosciences.
XPath is a query language that can retrieve a selection of nodes from the input documents, or an atomic value, or more generally, any sequence allowed by the data model. This can be enough for many applications but the more specific language SPARQL, which is a query language for RDF32. McCarthy33,34 discusses the use of SPARQL. ARQ35 is a SPARQL Processor for Jena. An online demo is provided at the sparql.org site36.
Creating an OntologyThis section describes creation of an ontology using human physiology as an example. A realistic ontology would be too complex to serve as a good example. The OWL document is physiology.owl and it imports the document biochemistry.owl. An ontology may be created by editing an OWL document with a text editor or by using one of a number of tools available (see10,11,12,13). These particular OWL documents were created using Protégé.The OWL document starts The rdf:RDF element is the top level tag described
previously. The XML Namespace attributes define the default (xmlns)
and the base (xml:base) namespaces as http://www.medicalcomputing.net/owl/physiology.owl#.
The biochemistry namespace, discussed below, is defined with the xmlns:biochemistry
attribute. A number of other namespaces are also defined,
including RDF (xmlns:rdf) , RDF Schema (xmlns:rdfs),
XML Schema (xmlns:xsd), OWL (xmlns:owl),
DAML+OIL (xmlns:daml), and Dublin Core (xmlns:dc).
The owl:Ontology element contains a comment describing
the purpose of the document and imports the biochemistry OWL
document.
A physiological system is defined with the class definition
|
The various kinds of physiological system (circulatory, endocrine,
gastrointestinal, etc) will be defined as subclasses of the class PhysiologicalSystem.
This is appropriate because the circulatory system is a physiological system, an
endocrine system is a
physiological
system, and so on. Subclassing defines an is a relation, also known as an inheritance, a parent-child, or a type specialization relation.
The article Ontology
Development 101: A Guide to Creating Your First Ontology8
discusses when to use subclassing versus properties (has a relation) when authoring
ontologies in more detail.
I have included a lengthy description of the class PhysiologicalSystem,
even though this is just an example, for a reason: to emphasize that
this is where a subject matter expert (a physician in this case) should
pour out their knowledge. That knowledge is what users of the
ontology are looking for and hoping to benefit by. Again, the
information created by subject matter experts and making this readily
available to users is what matters. The XML, the software, and
all that goes with it exist to support human users.
The class NervousSystem is defined to be a subclass of
PhysiologicalSystem
<owl:Class rdf:ID="NervousSystem">
<rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
The nervous system senses the outside environment, the state of the body itself,
and initiates movement of the musculoskeletal system.</rdfs:comment>
<rdfs:subClassOf>
<owl:Class rdf:ID="PhysiologicalSystem"/>
</rdfs:subClassOf>
</owl:Class>
This can also be shown with a Unified Modeling Language (UML) class
diagram, which also shows the other PhysiologicalSystem
subclasses.

The open arrow in the diagram represents inheritance.
Next I will introduce a model to represent measurements and
observations that a physician may use in
determining the health of the physiological systems described
above. I will create a base class called Measurement
and from that derive classes for pulse, blood pressure, body
temperature, and the blood concentration of relevant chemical
substances, such as glucose, calcium, sodium, and so on. The list
is incomplete but it illustrates the approach.

UML Class Diagram to Model
Physiological Measurements
The reader might think at this point that the ontology for
physiology will
become very large and we should create separate ontologies for each
specialization. That is a good argument. Then should not
the BloodConcentrations belong to a pathology ontology? Also,
shouldn't the CirculatorySystem belong to a cardiovascular ontology?
The reasons that I did not do this are
To describe the substances measured in the blood I have created an
example biochemistry ontology. The relationship between the
physiology ontology and the biochemistry ontology is shown below.

This new ontology illustrates two points
The blood concentration measures a particular chemical
substance. This can be described as a property of the BloodConcentration
element:
<owl:ObjectProperty rdf:ID="substanceMeasured">
<rdfs:comment xml:lang="en">The substance measured in the laboratory test.</rdfs:comment>
<rdfs:label xml:lang="en">Substance Measured</rdfs:label>
<rdfs:range rdf:resource=http://www.medicalcomputing.net/owl/biochemistry.owl#Chemical"/>
<rdfs:domain rdf:resource=BloodConcentration"/>
</owl:ObjectProperty>
A property is a map from the domain to a range. Here the domain is
the set of BloodConcentration entities and the
range is the set of Chemical entities, which are shown in
the class diagram below.

To define a benchmark for interpreting results create a reference
element for measurements in general and for blood concentration, in
particular, define a healthy range. This is shown in the class
diagram below.

Class Diagram for Healthy Range
In this diagram the dashed line represents a dependency or a has a relation. A measurement has a reference; a blood
concentration has a healthy
range.
Instantiating Individuals
of Classes
Classes are defined with the intent of being instantiated by
individuals. For example, the upper value for diastolic blood
pressure, as defined by the American Heart Association can be written
in RDF as |
A variation on subject matter expert is the person creating individual instances of classes. This person need not be an expert in data modeling but needs to be able to select the appropriate class to instantiate. In the medical information system this would be the person entering the laboratory test information.
The other users of this medical application are physicians and
patients. Those users explore and otherwise make use of the
content created by the subject matter expert to interpret (physician)
and view (patient) the laboratory test results in the context provided
by the subject matter expert.
Traversing the BioPax OWL
Ontology with Protégé Java API's
The Protégé
project includes a library for reading and manipulating OWL
documents. This is described in The Protégé-OWL API -
Programmer's Guide15.
This section
demonstrates the Protégé OWL API's for reading an OWL file.
The program
traverses the BioPax Level 2 OWL file and generates HTML documentation
for the ontology. The source code is in file BioPaxLevel2.java. The
generated HTML file is bio_pax_level2.html.
|
The main method then iterates over the classes again to give a place for the select widget to jump to.
private static StringBuffer getClassList(OWLModel owlModel) {
StringBuffer sb = new StringBuffer(
" <p>Go to BioPax Class: " +
" <select id='jump_select' onchange='jump(this)';>\n");
Collection classes = owlModel.getUserDefinedOWLNamedClasses();
for (Iterator it = classes.iterator(); it.hasNext();) {
OWLNamedClass cls = (OWLNamedClass) it.next();
sb.append(" <option value='" + cls.getName() + "'>" + cls.getName() + "</option>\n");
}
sb.append(
" </select></p>\n" +
" </form>\n");
return sb;
}
Collection classes = owlModel.getUserDefinedOWLNamedClasses();
int i = 0;
for (Iterator it = classes.iterator(); it.hasNext();) {
OWLNamedClass cls = (OWLNamedClass) it.next();
detail.append(
" <a name='" + cls.getName() + "'/>" +
" <h2>Class " + cls.getName() + "</h2>\n" +
" <p>Parent: ");
The next few lines find the parent class, which can hyperlink to it's own definition.
Collection superclasses = cls.getNamedSuperclasses();
for (Iterator superclassIt = superclasses.iterator(); superclassIt.hasNext(); ) {
String parent = ((OWLNamedClass)superclassIt.next()).getName();
detail.append("<a href='#" + parent + "'>" + parent + "</a> ");
}
detail.append("</p>\n");
Finally, the program appends the comments to the HTML file.
detail.append(
" <p id='" + cls.getName() + "' tabindex='" + i + "'/>\n");
Collection comments = cls.getComments();
for (Iterator commentIt = comments.iterator(); commentIt.hasNext(); ) {
detail.append(commentIt.next());
}
detail.append("</p<\n");
i++;
}
html.append(detail);
The markup tags for the end of the HTML file are added and the
buffer is written out to the file bio_pax_level2.html using the method
writeToFile(). To run the program enter the follow command
at a command
prompt.
>java -cp classes;lib/AbstractSyntax.jar;lib/commons-lang-2.0.jar;lib/commons-logging.jar;lib/icu4j.jar;lib/jena.jar;lib/OWLDoc.jar;lib/protege.jar;\
lib/protege-owl.jar;lib/standard-extensions.jar net.medicalcomputing.owl.example.BioPaxLevel2
Assuming that the class file is in a directory called 'classes' and all the jar files are in a directory called 'lib'. The class file is BioPaxLevel2.class.