Metabolomics: from a chemometric point of view

Research output: Book/ReportPh.D. thesisResearch

Standard

Metabolomics : from a chemometric point of view. / Kamstrup-Nielsen, Maja Hermann.

Department of Food Science, University of Copenhagen, 2013. 141 p.

Research output: Book/ReportPh.D. thesisResearch

Harvard

Kamstrup-Nielsen, MH 2013, Metabolomics: from a chemometric point of view. Department of Food Science, University of Copenhagen.

APA

Kamstrup-Nielsen, M. H. (2013). Metabolomics: from a chemometric point of view. Department of Food Science, University of Copenhagen.

Vancouver

Kamstrup-Nielsen MH. Metabolomics: from a chemometric point of view. Department of Food Science, University of Copenhagen, 2013. 141 p.

Author

Kamstrup-Nielsen, Maja Hermann. / Metabolomics : from a chemometric point of view. Department of Food Science, University of Copenhagen, 2013. 141 p.

Bibtex

@phdthesis{838887a649c64ca98c0dc2cf7e62e9d8,
title = "Metabolomics: from a chemometric point of view",
abstract = "Metabolomics is the analysis of the whole metabolome and the focus inmetabolomics studies is to measure as many metabolites as possible. The useof chemometrics in metabolomics studies is widespread, but there is a clearlack of validation in the developed models. The focus in this thesis has beenhow to properly handle complex metabolomics data, in order to achievereliable and valid multivariate models. This has been illustrated by threecase studies with examples of forecasting breast cancer and early detection ofcolorectal cancer based on data from nuclear magnetic resonance (NMR)spectroscopy (Paper II), fluorescence spectroscopy (Paper III) and gaschromatography coupled to mass spectrometry (GC-MS).The principles of the three data acquisition techniques have been brieflydescribed and the methods have been compared. The techniques complementeach other, which makes room for data fusion where data from differentplatforms can be combined.Complex data are obtained when samples are analysed using NMR,fluorescence and GC-MS. Chemometrics methods which can be used toextract the relevant information from the obtained data are presented. Focushas been on principal component analysis (PCA), parallel factor analysis(PARAFAC), PARAFAC2 and partial least squares discriminant analysis(PLS-DA) all being described in depth. It can be a challenge to determine theappropriate number of components in PARAFAC2, since no specific tools havebeen developed for this purpose. Paper I is a presentation of a coreconsistency diagnostic aiding in determining the number of components in aPARAFAC2 model. It is of great importance to validate especially PLS-DAmodels and if not done properly, the developed models might reveal spuriousgroupings. Furthermore, data from metabolomics studies contain manyredundant variables. These have been suggested to be eliminated using anapproach termed reduction of redundant variables (RRV), which is timeconsuming but efficient, since the curse of dimensionality is reduced and therisk of over-fit is decreased.The use of appropriate multivariate models in metabolomics studies has beenpresented in the three case studies. In the first case study, plasma samplesfrom healthy individuals have been analysed by NMR. Some have developedbreast cancer later in life and these have been separated from healthyindividuals by means of a properly validated PLS-DA model based on NMRdata with RRV and known risk markers. The sensitivity and specificityvalues are 0.80 and 0.79, respectively, for a test set validated model.The second case study is based on plasma samples with verified colorectalcancer and three types of control samples analysed by fluorescencespectroscopy. The acquired data have been analysed by PARAFAC modelsand the components from the PARAFAC models have been used as variablesin seven PLS-DA models in order to separate the cancer samples from thecontrol groups. Sensitivity and specificity values of approximately 0.75 makefluorescence spectroscopy a potential tool in early detection of colorectalcancer.Finally, plasma samples have been analysed using GC-MS. The methodrequires extensive sample preparation and therefore the study can only beconsidered a feasibility study with room for optimization. However, 14 plasmasamples were analysed and the results indicate that GC-MS-basedmetabolomics in combination with PARAFAC2 modelling is applicable forextracting relevant biological information from the plasma samples.Overall, the work in this thesis shows that suitable and properly validatedchemometrics models used in metabolomics are very useful in forecasting andearly detection of cancer. The use of chemometrics in metabolomics can e.g.increase the understanding of the underlying etiology of cancer and could beextended to cover other diseases as well.",
author = "Kamstrup-Nielsen, {Maja Hermann}",
year = "2013",
language = "English",
publisher = "Department of Food Science, University of Copenhagen",

}

RIS

TY - BOOK

T1 - Metabolomics

T2 - from a chemometric point of view

AU - Kamstrup-Nielsen, Maja Hermann

PY - 2013

Y1 - 2013

N2 - Metabolomics is the analysis of the whole metabolome and the focus inmetabolomics studies is to measure as many metabolites as possible. The useof chemometrics in metabolomics studies is widespread, but there is a clearlack of validation in the developed models. The focus in this thesis has beenhow to properly handle complex metabolomics data, in order to achievereliable and valid multivariate models. This has been illustrated by threecase studies with examples of forecasting breast cancer and early detection ofcolorectal cancer based on data from nuclear magnetic resonance (NMR)spectroscopy (Paper II), fluorescence spectroscopy (Paper III) and gaschromatography coupled to mass spectrometry (GC-MS).The principles of the three data acquisition techniques have been brieflydescribed and the methods have been compared. The techniques complementeach other, which makes room for data fusion where data from differentplatforms can be combined.Complex data are obtained when samples are analysed using NMR,fluorescence and GC-MS. Chemometrics methods which can be used toextract the relevant information from the obtained data are presented. Focushas been on principal component analysis (PCA), parallel factor analysis(PARAFAC), PARAFAC2 and partial least squares discriminant analysis(PLS-DA) all being described in depth. It can be a challenge to determine theappropriate number of components in PARAFAC2, since no specific tools havebeen developed for this purpose. Paper I is a presentation of a coreconsistency diagnostic aiding in determining the number of components in aPARAFAC2 model. It is of great importance to validate especially PLS-DAmodels and if not done properly, the developed models might reveal spuriousgroupings. Furthermore, data from metabolomics studies contain manyredundant variables. These have been suggested to be eliminated using anapproach termed reduction of redundant variables (RRV), which is timeconsuming but efficient, since the curse of dimensionality is reduced and therisk of over-fit is decreased.The use of appropriate multivariate models in metabolomics studies has beenpresented in the three case studies. In the first case study, plasma samplesfrom healthy individuals have been analysed by NMR. Some have developedbreast cancer later in life and these have been separated from healthyindividuals by means of a properly validated PLS-DA model based on NMRdata with RRV and known risk markers. The sensitivity and specificityvalues are 0.80 and 0.79, respectively, for a test set validated model.The second case study is based on plasma samples with verified colorectalcancer and three types of control samples analysed by fluorescencespectroscopy. The acquired data have been analysed by PARAFAC modelsand the components from the PARAFAC models have been used as variablesin seven PLS-DA models in order to separate the cancer samples from thecontrol groups. Sensitivity and specificity values of approximately 0.75 makefluorescence spectroscopy a potential tool in early detection of colorectalcancer.Finally, plasma samples have been analysed using GC-MS. The methodrequires extensive sample preparation and therefore the study can only beconsidered a feasibility study with room for optimization. However, 14 plasmasamples were analysed and the results indicate that GC-MS-basedmetabolomics in combination with PARAFAC2 modelling is applicable forextracting relevant biological information from the plasma samples.Overall, the work in this thesis shows that suitable and properly validatedchemometrics models used in metabolomics are very useful in forecasting andearly detection of cancer. The use of chemometrics in metabolomics can e.g.increase the understanding of the underlying etiology of cancer and could beextended to cover other diseases as well.

AB - Metabolomics is the analysis of the whole metabolome and the focus inmetabolomics studies is to measure as many metabolites as possible. The useof chemometrics in metabolomics studies is widespread, but there is a clearlack of validation in the developed models. The focus in this thesis has beenhow to properly handle complex metabolomics data, in order to achievereliable and valid multivariate models. This has been illustrated by threecase studies with examples of forecasting breast cancer and early detection ofcolorectal cancer based on data from nuclear magnetic resonance (NMR)spectroscopy (Paper II), fluorescence spectroscopy (Paper III) and gaschromatography coupled to mass spectrometry (GC-MS).The principles of the three data acquisition techniques have been brieflydescribed and the methods have been compared. The techniques complementeach other, which makes room for data fusion where data from differentplatforms can be combined.Complex data are obtained when samples are analysed using NMR,fluorescence and GC-MS. Chemometrics methods which can be used toextract the relevant information from the obtained data are presented. Focushas been on principal component analysis (PCA), parallel factor analysis(PARAFAC), PARAFAC2 and partial least squares discriminant analysis(PLS-DA) all being described in depth. It can be a challenge to determine theappropriate number of components in PARAFAC2, since no specific tools havebeen developed for this purpose. Paper I is a presentation of a coreconsistency diagnostic aiding in determining the number of components in aPARAFAC2 model. It is of great importance to validate especially PLS-DAmodels and if not done properly, the developed models might reveal spuriousgroupings. Furthermore, data from metabolomics studies contain manyredundant variables. These have been suggested to be eliminated using anapproach termed reduction of redundant variables (RRV), which is timeconsuming but efficient, since the curse of dimensionality is reduced and therisk of over-fit is decreased.The use of appropriate multivariate models in metabolomics studies has beenpresented in the three case studies. In the first case study, plasma samplesfrom healthy individuals have been analysed by NMR. Some have developedbreast cancer later in life and these have been separated from healthyindividuals by means of a properly validated PLS-DA model based on NMRdata with RRV and known risk markers. The sensitivity and specificityvalues are 0.80 and 0.79, respectively, for a test set validated model.The second case study is based on plasma samples with verified colorectalcancer and three types of control samples analysed by fluorescencespectroscopy. The acquired data have been analysed by PARAFAC modelsand the components from the PARAFAC models have been used as variablesin seven PLS-DA models in order to separate the cancer samples from thecontrol groups. Sensitivity and specificity values of approximately 0.75 makefluorescence spectroscopy a potential tool in early detection of colorectalcancer.Finally, plasma samples have been analysed using GC-MS. The methodrequires extensive sample preparation and therefore the study can only beconsidered a feasibility study with room for optimization. However, 14 plasmasamples were analysed and the results indicate that GC-MS-basedmetabolomics in combination with PARAFAC2 modelling is applicable forextracting relevant biological information from the plasma samples.Overall, the work in this thesis shows that suitable and properly validatedchemometrics models used in metabolomics are very useful in forecasting andearly detection of cancer. The use of chemometrics in metabolomics can e.g.increase the understanding of the underlying etiology of cancer and could beextended to cover other diseases as well.

UR - https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122888352605763

M3 - Ph.D. thesis

BT - Metabolomics

PB - Department of Food Science, University of Copenhagen

ER -

ID: 91920046