Tools for Working with Chemical Markup Language

Alex Amies  March 11, 2006

Introduction

Chemical Markup Language1 is a language to describe the properties and behavior of molecules.  It is commonly used to specify the geometric arrangement of atoms within molecules.  It is very useful for modelling the two and three dimensional shapes of molecules and their physical properties. 

There are tools on the market that do molecular modelling without using CML. Why use CML instead of a vendors proprietary format?

  1. CML is a standard that allows data to be exchanged between tools.
  2. It is also documented and readable, which allows you to read and understand the raw data.  You might even want to write your own tool.  For example, a tool may combine molecules by 'editing' or 'cutting and pasting' them.
  3. You can transform CML to something else, such as HTML, PDF, or comma separated variables, with eXtensible Style sheets (XSL), or make it part of a web service. 
  4. You can avoid being held to ransom by vendor lock-in if you use a vendors proprietary format.  It is to vendors' advantage too because using open standards increases customers' confidence and saves the vendor the cost of having to develop their own format. 
  5. Since chemistry is a basis of other branches of science, medicine, and engineering we may want to embed CML within other XML variants.

This article evaluates several tools for use with CML.  The tools are not listed in any particular order.  To understand this article you should be familiar with XML and basic programming terminology.

Summary

Product
Vendor / Organization
Focus
Compatibility
Jumbo
CML / SourceForge Project
Toolkit for parsing and processing CML data.  Getting started documentation needed.
CML1, CML2
Marvin
ChemAxon Two dimensional viewer and editor, some 3D analysis
CML2
Jmol
SourceForge Project
Three dimensional analysis and visualization
CML1, CML2
JChemPaint
SourceForge Project Two dimensional viewer and editor, some 3D analysis CML1, CML2
XDrawChem SourceForge Project Two dimensional viewer and editor, focus on UNIX CML1, CML2
Chemical Markup Language Demo
Adobe Systems
Web based demo for five molecules
CML1
JOELib2
SourceForge Project Chemical expert system mainly used for converting chemical file formats.

Eclipse EMF
Eclipse Foundation
Generic XML toolkit.  Generates object model, validator, and editor from XML Schema.
CML2
Bioclipse
The International Bioclipse Association Java-based, open source, visual platform for chemo- and bioinformatics
CML1, CML2
Scalable Vector Graphics Example
Johanne Jean-Baptiste
Article describing application of SVG.
CML2
Open Babel
Source Forge Project
Conversion of between file formats.
CML1, CML2

CML Background

The main driving force behind CML is Unilever Centre for Molecular Sciences Informatics at the University of Cambridge.  It is an open source project hosted on SourceForge.  The current version of CML, Version 2.1.1 (CML 2.1.1), is described by an XML Schema document.  The older CML 1 may not be supported on by all tools.  Besides the properties of molecules the new version of CML describes

Jumbo

Jumbo2 is an open source project available from Sourceforge under the same project as CML.  It orignally was a molecular browser but now has evolved to a toolkit written in Java. A number of other tools use the Jumbo libraries, include Jmol and JChemPaint (see below).

The current stable version present is Jumbo Version 5.01.  It works with Java 5.0.  The jar file shipped includes source code, binaries, and documentation.  You can download this and the most recent code from CVS.  The libraries include code to parse and write CML files.  It includes JavaDoc but not a programming guide or programming examples.  There is some informational material on the wiki. 

Marvin

Marvin4 is a commercial tool suite produced by ChemAxon with a free evaluation version available for download from their web site.  I used MarvinView version 4.04.  This is one of a larger suite of products developed by ChemAxon.  They have a forum for questions on the web site, which is very active.  A number of different file formats are supported.

The Java WebStart version I tried got stuck at one point because of a hidden dialog.  Use of the Control-Tab key sequence fixed the probem.  Marvin is unable to use CML1 data, such as those from the molecules index3

A screen shot of Marvin View displaying an arginine molecule is shown below. 

Marvin View Displaying a 2D Image of Caffeine
Marvin View Displaying a 2D Image of arginine

In MarvinView you can rotate the molecule by dragging the mouse.  There are many display and chemical analysis options and editing features.  Al of ChemAxon's products support CML, including MarvinSketch, used for editing structure diagrams, and MarvinSpace, a OpenGL three-dimensional viewer.

Jmol

Jmol5 is a three dimensional visualization tool that can accept CML molecular data as input.  It can also output CML to files.  It is another Java open source project hosted on SourceForge.  There  are several flavors: an Applet, an application, and application programming interface.  It supports a number of different file formats.

The wiki has links to web sites making use of the Jmol applet, both commerical and academic.

Screenshot of JMol Showing 3D Structure of Glucose
Screenshot of Jmol Applet Showing 3D Structure of Glucose

A screenshot of the desktop version of Jmol with a cholesterol molecule is shown below.  It is shown in two-dimensions but the molecule can be rotated by dragging the mouse across the canvas.

Jmol Screenshot
Screenshot of the Jmol Desktop Application

The current version of Jmol is Jmol 10.  It has featurs for two- and three-dimensional visualization and showing information about the molecule such as bond angles.  It supports many data formats, in addition to CML, including

JChemPaint

JChemPaint7 is Java SourceForge project for editing for two-dimensional molecular structures.  The version that I tried was 2.2.0.  After downloading the jar file start JchemPaint using the command

java -jar jchempaint-2.2.0.jar

I had the IBM Java 1.5 Java Virtual Machine installed.  A screenshot is shown below

Screenshot of JChemPaint Displaying Arginine

Screenshot of JChemPaint Displaying Arginine

JChemPaint allows you to draw molecules and export the drawing as an image or export the data to CompChem format8 and save drawings in several formats including MDL MOL9, Scalable Vector Graphics (SVG), and SMILES.  You can use JChemPaint to make drawings of molecules and then save the CML.  For example, this drawing of a generic amino acid

Drawing of a Generic Amino Acid with JChemPaint
Drawing of a Generic Amino Acid with JChemPaint

The CML for the generic amino acid created is here

XDrawChem

XDrawChem11 is a two-dimensional molecule drawing program for UNIX (and LINUX and Mac OS X)systems.  An out of date version for Windows is also available.

Adobe Chemical Markup Language Demo

Adobe Chemical Markup Language demo12 is a very professional demonstration of SVT technology.  It shows three-dimensional visualization of several modecules in a web based application, including ability to change the view angle by dragging the mouse.  However, it is limited to the five molecules selected.  The CML shown is CML1.

Extensible Style Sheet Transforms

Extensible Style Sheet Transforms (XSLT) are a way of transforming one variety of XML into another.  The file cml1toxhtml.xslt, written by this author, transforms CML1 to CML2.  It can be used with the Java XSLT processor described in Introduction to XML for Chemistry and Biosciences.  It is not all-purpose but can be used with the files in 3.

JOELib

JOELib19 Cheminformatics algorithm library used for converting chemical file formats, including CML.  It's features include query and substructure search based on SMARTS, a SMILES extension.

Next

Google

Please send me ideas and opinions by email at webmaster@medicalcomputing.net or add comments to my blog.  The content may become part of the web site.

© 2006 Alex Amies