skip to content
Combining experimental and in silico methods for comprehensive compound dereplication of natural products for mass spectrometry based metabolomics Preview this item
ClosePreview this item
Checking...

Combining experimental and in silico methods for comprehensive compound dereplication of natural products for mass spectrometry based metabolomics

Author: Arpana Vaniya
Publisher: Davis, Calif. : University of California, Davis, 2017.
Dissertation: Ph. D. University of California, Davis 2017
Edition/Format:   Thesis/dissertation : Document : Thesis/dissertation : eBook   Computer File : English
Summary:
Metabolomics is a rapidly growing field in "omics" research where metabolites are analyzed in biological systems. Over the past decade, mass spectrometry (MS) based metabolomics has been used for its superior analytical performance to reveal how these biological systems respond to genetic and environmental changes. MS is both sensitive and selective and is capable for providing comprehensive information for
Rating:

(not yet rated) 0 with reviews - Be the first.

Find a copy online

Links to this item

Find a copy in the library

&AllPage.SpinnerRetrieving; Finding libraries that hold this item...

Details

Genre/Form: Dissertations, Academic
Academic theses
Material Type: Document, Thesis/dissertation, Internet resource
Document Type: Internet Resource, Computer File
All Authors / Contributors: Arpana Vaniya
ISBN: 9780355462043 0355462044
OCLC Number: 1032568208
Notes: Advisor: Oliver Fiehn.
Degree granted in Chemistry.
Description: 1 online resource
Responsibility: by Arpana Vaniya.

Abstract:

Metabolomics is a rapidly growing field in "omics" research where metabolites are analyzed in biological systems. Over the past decade, mass spectrometry (MS) based metabolomics has been used for its superior analytical performance to reveal how these biological systems respond to genetic and environmental changes. MS is both sensitive and selective and is capable for providing comprehensive information for metabolic profiling by combining separation methods such as liquid chromatography (LC-MS) or gas chromatography (GC-MS). However, in untargeted metabolomics identification of small molecules is the bottleneck. In the research described here, I have combined both in silico and experimental methods for compound dereplication of natural products using MS-based metabolomics. Chapter 1 addresses the advancement of fragmentation and mass spectral trees used for unknown metabolite identification. Tools used for metabolite identification from the past 10 years are discussed, including algorithms, software, mass spectral libraries, and databases that implement fragmentation and mass spectral trees. Due to the inherent complexity of natural products in plants and microbes, unknown compound identification is increasingly difficult and limiting. Resolving this problem requires better computational tools and informative data such as those acquired by multi-stage mass spectrometry (MS[superscript n]). MS[superscript n] yields more fragmentation data and allows for more complex structural elucidation as needed for compounds with positional isomers. The limitation with using tandem mass spectrometry (MS/MS) only is that many ions are shared between positional isomers and full structural information is not available to elucidate an unknown metabolite. Fragmentation and mass spectral trees both describe the fragmentation processes of a metabolite and aid in fragmentation rule generation and substructure identification. The major difference between fragmentation and mass spectral trees is that fragmentation trees use elemental compositions to describe the fragmentation process and mass spectral trees or ion trees use precursor and product ion spectra from MS[superscript n] mass spectral acquisition. As a result, there has been a large increase in efforts to develop MS[superscript n>2] data and tools for both structure elucidation and spectral annotations with the use of fragmentation and mass spectral trees in recent years. Chapter 2 describes research and development of iTree, a MS[superscript n] mass spectral tree library of plant natural products and its aid in compound identification of natural products. In metabolomics, mass spectral library searching is a standard method for compound identification, correctly known as compound dereplication. Mass spectral libraries are either freely or commercially available and can contain both experimental and in silico MS/MS reference spectra. The coverage of MS[superscript n>2] reference spectra is much smaller in many of these MS/MS libraries and databases. To date the largest MS[superscript n>2] libraries are HighChem and mzCloud, which also support mass spectral trees. The chemical coverage of such libraries and databases are very low in comparison to the number of known compounds. iTree was developed to expand the coverage of fragmentation spectra for natural products. iTree contains more than 2,000 natural products and more than 9,000 ion tree spectra annotated with in silico generated substructures from both Mass Frontier 7.0 and CFM-ID. iTree is freely available through MassBank of North America (MoNA), an open-access mass spectral database. As a result of the high number of natural products, and specifically flavonoid aglycones, previously published fragmentation rules were studied and validated. A new rule for flavanonols was proposed as a loss of -CCO to occur specifically for this class. In addition, iTree was used to profile secondary metabolites in the roots and nodules of the host plant Datisca glomerata. More than 100 natural products were identified by combining LC-MS[superscript n], high resolution LC-MS/MS, and ion tree analysis using iTree. Overall, iTree has shown to provide a method to facilitate metabolite identification for plant natural products. Although MS[superscript n>2] data is more useful for complex structural elucidation, the predominant data used in untargeted metabolomics is MS/MS. For this reason, in silico tools that focus on the interpretation of MS and MS/MS spectra alone must be evaluated. In Chapters 3 through 5, I discuss how the Critical Assessment of Small Molecule Identification (CASMI) has allowed for such an evaluation by presenting unknown challenge data sets to the metabolomics community to evaluate the tools and methods they currently use for unknown compound identification. The results submitted by each user are compared and discussed to provide greater insight into how in silico tools can be further improved to aid in the advancement and accuracy of unknown compound identification methods. Chapter 3 focuses specifically on the performance of MS-FINDER, a software that uses MS and MS/MS spectra for structural elucidation of unknown compounds, presented in the CASMI 2016 Category 1. The aim of this category was to identify 19 natural products using high-resolution LC-MS and LC-MS/MS challenge datasets. One was excluded by the organizers after submission. Molecular formulas were first identified with MS-FINDER, Seven Golden Rules, and SIRIUS to determine a consensus formula for structural dereplication by MS-FINDER. MS-FINDER was able to identify 89% of the molecular formulas while SIRIUS correctly identified 61% and Seven Golden Rules correctly identified 83% by using the Dictionary of Natural Products (DNP) as a targeted database. Structural dereplication was approached with two methods; avaniya001 (MS-FINDER only) and avaniya002 (MS-FINDER with mass spectral library searching). This allowed us to evaluate two components, first MS-FINDER as an in silico tool for compound identification and the importance of mass spectral library searching for compound identification. In avaniya001, we correctly identified 53% of the structures as top-hit, 72% within the top-3 structures, and 78% within the top-10 hits. For avaniya002, we correctly identified 78% of the structures as top-hit and 83% within the top-3 hits. We identified 14 out of 18 challenges correctly in CASMI 2016 Category 1 contest. The winner, Dejan Nikolic from the University of Illinois correctly identified 15 out of 18 challenges. Results from CASMI 2016 Category 1 has shown that in silico software such as MS-FINDER are capable of identifying unknown compounds by interpreting MS and MS/MS spectra, but the greater challenge of unknown compound identification is a work in progress and will not be solved by using in silico tools alone. For this greater challenge we must combine in silico tools and mass spectral library searching for improved is accuracy of compound identification. Chapter 4 addresses my approach in Category 2 in CASMI 2016 using MS-FINDER only without the use of mass spectral library searching. The aim of the category was to determine the "Best Automatic Structural - In Silico Fragmentation Only" with a challenge dataset of 208 challenges. Category 2 evaluated that status of current in silico tools for mass spectral interpretation and the accuracy of compound identification without the dependency of using mass spectral libraries, databases, and additional metadata. As a result, there were 34% of the challenges with gold medal ranks when using MS-FINDER which placed Team Vaniya in third place. A closer look at the results revealed that MS-FINDER as an in silico tool only, without metadata, correctly identified 22% of the structures as top-ranking hit, 38% within the top-3 hits, and 49% within the top-10 structures.

On average all tools correctly identified about 20% of the structures as top-ranking structures and only about 50% of the structures within the top-10 hits. Lastly, Chapter 5 discusses the evaluation of current software for molecular formula determination by revisiting CASMI 2014. Determining the molecular formula is the first step in compound identification. For this reason, it is important to evaluate current software to determine the level of confidence when determining the elemental composition of unknown compounds. The challenge dataset of 42 high-resolution LC-MS and LC-MS/MS mass spectra from CASMI 2014 were used because no contest categories after this year solely focused on determining the correct molecular formulas. In this study, we used Seven Golden Rules algorithm, MS-FINDER, SIRIUS, and CFM-ID and compared these results to the original CASMI 2014 submissions. Seven Golden Rules correctly identified 36% of the formulas when considering all possible formulas; however results were boosted to 71% when using DNP and 45% using PubChem as a targeted database. MS-FINDER correctly identified 86% of the formulas, SIRIUS correctly identified 50% of the compositions regardless of the mass tolerance used, and CFM-ID correctly identified 76% when using HMDB and 69% when KEGG. In the original submission, 81% of the molecular formulas were correctly identified when using Seven Golden Rules with MS/MS library and database queries and restrictions. Nonetheless, it is crucial not to forget that the first step of compound identification starts with determining the correct molecular formula. It has been shown that current software have vastly improved calculations of the elemental composition, but not one tool was able to provide 100% correctly identified molecular formulas.

Reviews

User-contributed reviews
Retrieving GoodReads reviews...
Retrieving DOGObooks reviews...

Tags

Be the first.
Confirm this request

You may have already requested this item. Please select Ok if you would like to proceed with this request anyway.

Linked Data


Primary Entity

<http://www.worldcat.org/oclc/1032568208> # Combining experimental and in silico methods for comprehensive compound dereplication of natural products for mass spectrometry based metabolomics
    a bgn:Thesis, pto:Web_document, schema:MediaObject, schema:Book, schema:CreativeWork ;
    bgn:inSupportOf "" ;
    library:oclcnum "1032568208" ;
    library:placeOfPublication <http://id.loc.gov/vocabulary/countries/cau> ;
    schema:author <http://experiment.worldcat.org/entity/work/data/5039930370#Person/vaniya_arpana> ; # Arpana Vaniya
    schema:datePublished "2017" ;
    schema:description "Metabolomics is a rapidly growing field in "omics" research where metabolites are analyzed in biological systems. Over the past decade, mass spectrometry (MS) based metabolomics has been used for its superior analytical performance to reveal how these biological systems respond to genetic and environmental changes. MS is both sensitive and selective and is capable for providing comprehensive information for metabolic profiling by combining separation methods such as liquid chromatography (LC-MS) or gas chromatography (GC-MS). However, in untargeted metabolomics identification of small molecules is the bottleneck. In the research described here, I have combined both in silico and experimental methods for compound dereplication of natural products using MS-based metabolomics. Chapter 1 addresses the advancement of fragmentation and mass spectral trees used for unknown metabolite identification. Tools used for metabolite identification from the past 10 years are discussed, including algorithms, software, mass spectral libraries, and databases that implement fragmentation and mass spectral trees. Due to the inherent complexity of natural products in plants and microbes, unknown compound identification is increasingly difficult and limiting. Resolving this problem requires better computational tools and informative data such as those acquired by multi-stage mass spectrometry (MS[superscript n]). MS[superscript n] yields more fragmentation data and allows for more complex structural elucidation as needed for compounds with positional isomers. The limitation with using tandem mass spectrometry (MS/MS) only is that many ions are shared between positional isomers and full structural information is not available to elucidate an unknown metabolite. Fragmentation and mass spectral trees both describe the fragmentation processes of a metabolite and aid in fragmentation rule generation and substructure identification. The major difference between fragmentation and mass spectral trees is that fragmentation trees use elemental compositions to describe the fragmentation process and mass spectral trees or ion trees use precursor and product ion spectra from MS[superscript n] mass spectral acquisition. As a result, there has been a large increase in efforts to develop MS[superscript n>2] data and tools for both structure elucidation and spectral annotations with the use of fragmentation and mass spectral trees in recent years. Chapter 2 describes research and development of iTree, a MS[superscript n] mass spectral tree library of plant natural products and its aid in compound identification of natural products. In metabolomics, mass spectral library searching is a standard method for compound identification, correctly known as compound dereplication. Mass spectral libraries are either freely or commercially available and can contain both experimental and in silico MS/MS reference spectra. The coverage of MS[superscript n>2] reference spectra is much smaller in many of these MS/MS libraries and databases. To date the largest MS[superscript n>2] libraries are HighChem and mzCloud, which also support mass spectral trees. The chemical coverage of such libraries and databases are very low in comparison to the number of known compounds. iTree was developed to expand the coverage of fragmentation spectra for natural products. iTree contains more than 2,000 natural products and more than 9,000 ion tree spectra annotated with in silico generated substructures from both Mass Frontier 7.0 and CFM-ID. iTree is freely available through MassBank of North America (MoNA), an open-access mass spectral database. As a result of the high number of natural products, and specifically flavonoid aglycones, previously published fragmentation rules were studied and validated. A new rule for flavanonols was proposed as a loss of -CCO to occur specifically for this class. In addition, iTree was used to profile secondary metabolites in the roots and nodules of the host plant Datisca glomerata. More than 100 natural products were identified by combining LC-MS[superscript n], high resolution LC-MS/MS, and ion tree analysis using iTree. Overall, iTree has shown to provide a method to facilitate metabolite identification for plant natural products. Although MS[superscript n>2] data is more useful for complex structural elucidation, the predominant data used in untargeted metabolomics is MS/MS. For this reason, in silico tools that focus on the interpretation of MS and MS/MS spectra alone must be evaluated. In Chapters 3 through 5, I discuss how the Critical Assessment of Small Molecule Identification (CASMI) has allowed for such an evaluation by presenting unknown challenge data sets to the metabolomics community to evaluate the tools and methods they currently use for unknown compound identification. The results submitted by each user are compared and discussed to provide greater insight into how in silico tools can be further improved to aid in the advancement and accuracy of unknown compound identification methods. Chapter 3 focuses specifically on the performance of MS-FINDER, a software that uses MS and MS/MS spectra for structural elucidation of unknown compounds, presented in the CASMI 2016 Category 1. The aim of this category was to identify 19 natural products using high-resolution LC-MS and LC-MS/MS challenge datasets. One was excluded by the organizers after submission. Molecular formulas were first identified with MS-FINDER, Seven Golden Rules, and SIRIUS to determine a consensus formula for structural dereplication by MS-FINDER. MS-FINDER was able to identify 89% of the molecular formulas while SIRIUS correctly identified 61% and Seven Golden Rules correctly identified 83% by using the Dictionary of Natural Products (DNP) as a targeted database. Structural dereplication was approached with two methods; avaniya001 (MS-FINDER only) and avaniya002 (MS-FINDER with mass spectral library searching). This allowed us to evaluate two components, first MS-FINDER as an in silico tool for compound identification and the importance of mass spectral library searching for compound identification. In avaniya001, we correctly identified 53% of the structures as top-hit, 72% within the top-3 structures, and 78% within the top-10 hits. For avaniya002, we correctly identified 78% of the structures as top-hit and 83% within the top-3 hits. We identified 14 out of 18 challenges correctly in CASMI 2016 Category 1 contest. The winner, Dejan Nikolic from the University of Illinois correctly identified 15 out of 18 challenges. Results from CASMI 2016 Category 1 has shown that in silico software such as MS-FINDER are capable of identifying unknown compounds by interpreting MS and MS/MS spectra, but the greater challenge of unknown compound identification is a work in progress and will not be solved by using in silico tools alone. For this greater challenge we must combine in silico tools and mass spectral library searching for improved is accuracy of compound identification. Chapter 4 addresses my approach in Category 2 in CASMI 2016 using MS-FINDER only without the use of mass spectral library searching. The aim of the category was to determine the "Best Automatic Structural - In Silico Fragmentation Only" with a challenge dataset of 208 challenges. Category 2 evaluated that status of current in silico tools for mass spectral interpretation and the accuracy of compound identification without the dependency of using mass spectral libraries, databases, and additional metadata. As a result, there were 34% of the challenges with gold medal ranks when using MS-FINDER which placed Team Vaniya in third place. A closer look at the results revealed that MS-FINDER as an in silico tool only, without metadata, correctly identified 22% of the structures as top-ranking hit, 38% within the top-3 hits, and 49% within the top-10 structures."@en ;
    schema:description "On average all tools correctly identified about 20% of the structures as top-ranking structures and only about 50% of the structures within the top-10 hits. Lastly, Chapter 5 discusses the evaluation of current software for molecular formula determination by revisiting CASMI 2014. Determining the molecular formula is the first step in compound identification. For this reason, it is important to evaluate current software to determine the level of confidence when determining the elemental composition of unknown compounds. The challenge dataset of 42 high-resolution LC-MS and LC-MS/MS mass spectra from CASMI 2014 were used because no contest categories after this year solely focused on determining the correct molecular formulas. In this study, we used Seven Golden Rules algorithm, MS-FINDER, SIRIUS, and CFM-ID and compared these results to the original CASMI 2014 submissions. Seven Golden Rules correctly identified 36% of the formulas when considering all possible formulas; however results were boosted to 71% when using DNP and 45% using PubChem as a targeted database. MS-FINDER correctly identified 86% of the formulas, SIRIUS correctly identified 50% of the compositions regardless of the mass tolerance used, and CFM-ID correctly identified 76% when using HMDB and 69% when KEGG. In the original submission, 81% of the molecular formulas were correctly identified when using Seven Golden Rules with MS/MS library and database queries and restrictions. Nonetheless, it is crucial not to forget that the first step of compound identification starts with determining the correct molecular formula. It has been shown that current software have vastly improved calculations of the elemental composition, but not one tool was able to provide 100% correctly identified molecular formulas."@en ;
    schema:exampleOfWork <http://worldcat.org/entity/work/id/5039930370> ;
    schema:genre "Dissertations, Academic"@en ;
    schema:genre "Academic theses"@en ;
    schema:inLanguage "en" ;
    schema:name "Combining experimental and in silico methods for comprehensive compound dereplication of natural products for mass spectrometry based metabolomics"@en ;
    schema:productID "1032568208" ;
    schema:url <https://search.proquest.com/docview/1970429651?accountid=14505> ;
    schema:workExample <http://worldcat.org/isbn/9780355462043> ;
    wdrs:describedby <http://www.worldcat.org/title/-/oclc/1032568208> ;
    .


Related Entities

<http://experiment.worldcat.org/entity/work/data/5039930370#Person/vaniya_arpana> # Arpana Vaniya
    a schema:Person ;
    schema:familyName "Vaniya" ;
    schema:givenName "Arpana" ;
    schema:name "Arpana Vaniya" ;
    .

<http://worldcat.org/isbn/9780355462043>
    a schema:ProductModel ;
    schema:isbn "0355462044" ;
    schema:isbn "9780355462043" ;
    .

<http://www.worldcat.org/title/-/oclc/1032568208>
    a genont:InformationResource, genont:ContentTypeGenericResource ;
    schema:about <http://www.worldcat.org/oclc/1032568208> ; # Combining experimental and in silico methods for comprehensive compound dereplication of natural products for mass spectrometry based metabolomics
    schema:dateModified "2019-05-14" ;
    void:inDataset <http://purl.oclc.org/dataset/WorldCat> ;
    .


Content-negotiable representations

Close Window

Please sign in to WorldCat 

Don't have an account? You can easily create a free account.