Basic Biological Chemistry with Extensible Markup Language

Alex Amies  April 30, 2006


Contents

Background

There are a number of references out there on basic biology.  I like the freshman textbook Life: The Science of Biology by Purves et al1.  What I will attempt to do here is give a quick overview of some of the basics of biological chemistry with hands on work with relevant software and real data. Also, I want to make the barriers of entry to using the software and data a minimum both from a computing skills perspective and a biology background perspective.

With this approach readers with a stronger computing background than natural sciences background will not have to spend a great deal of time learning biology just to get to some programming applications.  Also people with a biology, chemistry, or medicine background will only need basic knowledge of computing.

I will be using Chemical Markup Language (CML) to illustrate basic biological chemistry.  To get started, I will introduce CML.  For a review of XML as it relates to chemistry and biosciences see the article Introduction to XML for Chemistry and Biosciences and for a review of tools for use with CML see the article Tools for Working with Chemical Markup Language.

Chemical Markup Language

The Chemical Markup Language (CML)2 project provides a description of fundamental chemical properties of molecules.  This has a bit more than the simple formulas you might remember from freshman chemistry.  It includes information on bonds, geometry, and spectra but I will not go into those.  I will be using an open source (which implies freely available) tool called JChemPaint3, which is a two-dimensional molecule viewer and editor written in Java.  Some of the CML documents that I use as starting points are from the Henry S. Rzepa's The Chemical Semantic Web site4 (such as the arginine molecule document below) and converted from CML1 to CML2.  The other were either converted from MDL Mol files from the ChemExper chemistry database6, which can also be opened by JChemPaint, or drawn using JChemPaint.

Another source for chemical structures is http://www.chmoogle.com/index.htm.  There are also some non CML based visualization tools at the web site for the Theoretical and Computational Biophysics Group at the University of Illinois at Urbana Champaign.  Page on Visual Molecular Dynamics15.

Let's have a look at some CML now.  Here is part of a CML document for arginine, an amino acid discussed below.  The full file is arginine2.cml.

<?xml version="1.0" encoding="UTF-8" ?>
<molecule convention="MDLMol" id="arginine" title="ARGININE"
          xmlns="http://www.xml-cml.org/schema">
  <atomArray>
    <atom id="a1" elementType="C" hydrogenCount="0" x2="0.7386" y2="0.1493"/>
    <atom id="a2" elementType="C" hydrogenCount="0" x2="-0.3772" y2="-0.6129"/>
    ...
  </atomArray>
  <bondArray>
    <bond atomRefs2="a1 a2" order="1"/>
    ...
  </bondArray>
</molecule>

This exampe focusses on geometry.  The molecule element contains an atomArray, which enumerates the atoms, and a bond array, which specifies how to connect the atoms.  This position of each atom in a cartesian plane is specified by the floating point numbers in the atom attribute elementType.  Within the CML schema there is also the ability to represent other chemical properties, including charge, isotope, shell occupancy, and so on.

The current version of CML is CML2. I provide the data files in CML2 for all the molecules described, with a couple of exceptions, in his article.  To view them in two dimensions follow these steps

  1. Download and install the (freely available) Java 5 runtime or development environment from Sun Microsystems or IBM13.
  2. Download JChemPaint3.  There is no installation just save the Java jar file to disk.
  3. Save the CML files from this article to your hard disk by right clicking in the hyperlink
  4. From a command line prompt, change directories to the folder that the JChemPaint jar was saved and start JChemPaint with the command

  5. java -jar jchempaint-2.2.0.jar

  6. Open the CML file just saved with the File | Open menu in JChemPaint.
A screenshot of JChemPaint is shown below.

Screenshot of JChemPaint

Screenshot of JChemPaint with an Arginine Molecule Loaded

In JChemPaint try changing the view options to switch on and off display of carbon and hydrogen atoms using the View | Rendering Options menu.  You can also explore editing the molecules or creating new ones and saving them to CML files.

To view the molecules in three-dimensions follow these steps

  1. Install the Java 5 runtime or development environment as above
  2. Download Jmol9, which is also a freely available java tool
  3. Unzip the Jmol zip file to a directory
  4. Download the CML file as above
  5. Start Jmol with the command

    java -jar Jmol.jar

  6. Open the CML file with the File | Open menu
  7. Drag the mouse around the canvas to rotate the molecule in three dimensions

A screenshot of Jmol with a cholesterol molecule loaded is shown below.

Screenshot of Jmol with a cholesterol molecule loaded

Screenshot of Jmol with a Cholesterol Molecule Loaded

Monomers

Organisms are mostly made up of three substances (excluding water):
A monomer is a chemical unit, which can be changed together with other chemical units to form a polymer.  Monomers are combined together to form polymers in a process called condensation.

Condensation of Two Monomers
In the diagram R stands for a general monomer. Water is released in the reaction.  The reverse reaction is hydrolysis, where water is consumed and a polymer is broken apart to individual monomers.

Functional Groups in Organic Chemistry

There are a number of important functional groups in organic chemistry.  The shape and charge of these functional groups help determine the behavior of molecules having different side chains (R again in the table).

Functional Group
Class
Formula
CML
Hydroxyl (OH)
Alchohol
Diagram of hydroxyl

hydroxyl.cml
Aldehyde (CHO)
Aldehyde
Diagram of aldehyde
aldehyde.cml
Keto (CO)
Ketones
Keto diagram
keto.cml
Carboxyl (COOH)
Carboxylic acids
Carboxyl diagram
carboxyl.cml
Amino (NH2)
Amines
Amino group diagram
amino.cml
Phosphate (OPO32-)
Organic phosphates
diagram of phosphate group
phosphate.cml
Sulfhydryl (SH)
Thiols
Sulfhydryl group diagram
sulfhydryl.cml

Next
References

Google

Please send me ideas and opinions by email at webmaster@medicalcomputing.net or add comments to my blog.  The content may become part of the web site.

© 2006 Alex Amies