Computational Proteomics: Algorithms for Classifying Prostate Cancer

Collaborators:

Professors Dayanand Naik, Michael Wagner (Math and Statistics, ODU; Srinivas Kasukurti, Raghu Ram Devineni (Computer Science, ODU); Professors George Wright, Jr., John Semmes, Bao-Ling Adam (Eastern Virginia Medical School).

Objective:

One of the goals of proteomics is to develop fast, automatable, and inexpensive techniques to identify proteins from mixtures, to match the recent developments in genomics. One of the novel technologies for accomplishing this is called Surface Enhanced Laser Dissociation Ionization (SELDI) mass spectrometry. We are collaborating with Prof. Wright's group in analyzing protein mass spectral data obtained from SELDI on serum and prostate fluid samples. The goal is to make use of the protein signatures from mass spectra to classify patient samples into healthy, early prostate cancer, late prostate cancer, and benign prostate hyperplasia (BPH).

Approach:

The data obtained from SELDI mass spectra corresponds to an array of intensities corresponding to varying values of the mass to charge (hereafter called ``mass'') ratio. We process this data to extract peaks, and then match peaks to an interval on the mass axis, so that every sample corresponds to measurements of intensities at the same intervals of masses.

From known samples belonging to each of the four groups, we have employed several statistical classification procedures to classify samples belonging to unknown groups. Among these are Fisher's linear discriminant analysis, quadratic discriminant analysis, non-parametric methods, k-nearest neighbor classifiers, and support vector machines. We have also employed protein selection and principal components to reduce the data and to improve the classification.

Currently we are able to classify normal and late cancer samples with small error rates. With continued development of our methods together with suitable normalization techniques, we expect to distinguish the early cancer and BPH samples from each other. We are completing a study that compares the error rates of the various discrimination techniques in classifying prostate cancer.

We plan to apply our methods to SELDI mass spectral data from other cancers. Once the error rates in classification are low enough, SELDI mass spectral analysis could become a convenient, low-cost diagnostic tool for cancer. A longitudinal study of the protein signatures in patients as they are treated for cancer is planned across different ages and races.

back to highlights page