Data for Data Fusion in Metabolomic Cancer Diagnostics


The Dataset is a joint dataset of data from Fluorescence Spectroscopy 1H-NMR spectroscopy (CPMG and NOESY-Presat) and Biomarker measurements (TIMP-1 and CEA) on Human plasma samples (sodium citrate anticoagulant) from a study that included patients undergoing large bowel endoscopy due to symptoms which could be associated with CRC (Lomholt et al. 2009; Nielsen et al. 2008). The original dataset contains case control samples with one case (verified colorectal cancer) and three controls for each case. The control group in this dataset is from subjects in witch were found colorectal benign adenomas. The controls are matched by age, gender and location of tumors.


The fluorescence data is represented as PARAFAC scores (see Lawaetz et al. 2012 for details on the PARAFAC models). The NMR data is represented as PCA scores (1st component) of the integrated peaks. (See Bro et al for details). The Biomarker data are log transformed (base 2). The Biomarkers TIMP-1 and CEA are known to change with age and gender. This has been corrected for in the biomarker concentrations by subtracting the concentration of a matched sample from another control group (no findings). Corrections were done on both case and control samples.


The aim of our study was to show how the extended profile of the combined data gave better options for discriminating between cancer and control samples.


The data is saved in one dataset, with 94 samples and 476 variables. The first two variables are the biomarkers; the next 19 are the fluorescence data as PARAFAC scores, and the last 455 are the NMR peaks. The first 201 from CPMG, and the last 254 are the NOESY data.


All class data (cancer/adenoma, case control, age, gender) are found in the data (for example, cancer/adenoma status is found in Data.class{1,1}, and class labels in Data.classlookup{1,1})


Get the data HERE


The data are available in MATLAB 7 format and stored as dataset objects (get freeware dataset object

If you use the data please refer to

Rasmus Bro, Hans Jørgen Nielsen, Francesco Savorani, Karin Kjeldahl, Ib Jarle Christensen, Nils Brünnerand Anders Juul Lawaetz,“Data fusion in metabolomic cancer diagnostics”, Metabolomics, In Press DOI:10.1007/s11306-012-0446-0



1.   Bro, Rasmus, Hans Nielsen, Francesco Savorani et al., 2012. "Data fusion in metabolomic cancer diagnostics." Metabolomics. In Press DOI:10.1007/s11306-012-0446-0

2.   Lawaetz, Anders, Rasmus Bro, Maja Kamstrup-Nielsen et al., 2012. "Fluorescence spectroscopy as a potential metabonomic tool for early detection of colorectal cancer." Metabolomics 8 (supplement 1): 111-121.

3.   Lomholt, A. F., G. Hoyer-Hansen, H. J. Nielsen et al., 2009. "Intact and cleaved forms of the urokinase receptor enhance discrimination of cancer from non-malignant conditions in patients presenting with symptoms related to colorectal cancer." British Journal of Cancer 101: 992-997.

4.   Nielsen, H. J., N. Brunner, C. Frederiksen et al., 2008. "Plasma tissue inhibitor of metalloproteinases-1 (TIMP-1): a novel biological marker in the detection of primary colorectal cancer. Protocol outlines of the Danish-Australian endoscopy study group on colorectal cancer detection." Scand.J.Gastroenterol. 43: 242-248.