skip to content
Covid-19 virus
COVID-19 Resources

Reliable information about the coronavirus (COVID-19) is available from the World Health Organization (current situation, international travel). Numerous and frequently-updated resource results are available from this WorldCat.org search. OCLC’s WebJunction has pulled together information and resources to assist library staff as they consider how to handle coronavirus issues in their communities.

Image provided by: CDC/ Alissa Eckert, MS; Dan Higgins, MAM
Unsupervised learning across multiple datasets Preview this item
ClosePreview this item
Checking...

Unsupervised learning across multiple datasets

Author: Katie Planey; Olivier Michel Simonne Gevaert; Mark A Musen; Julia Salzman; Stanford University. Program in Biomedical Informatics.
Publisher: 2015.
Dissertation: Ph. D. Stanford University 2015
Edition/Format:   Thesis/dissertation : Document : Thesis/dissertation : eBook   Computer File : English
Summary:
Subtypes define distinctive subgroups of objects found within a larger cohort; these subtypes can help domain experts define actionable recommendations for each subgroup to improve outcomes. With the relatively recent explosion of large datasets accompanied by large numbers of features, a popular way to define subtypes is unsupervised learning, or clustering, algorithms. Unfortunately, unsupervised learning  Read more...
Rating:

(not yet rated) 0 with reviews - Be the first.

Find a copy online

Links to this item

Find a copy in the library

&AllPage.SpinnerRetrieving; Finding libraries that hold this item...

Details

Genre/Form: Academic theses
Material Type: Document, Thesis/dissertation, Internet resource
Document Type: Internet Resource, Computer File
All Authors / Contributors: Katie Planey; Olivier Michel Simonne Gevaert; Mark A Musen; Julia Salzman; Stanford University. Program in Biomedical Informatics.
OCLC Number: 934036901
Notes: Submitted to the Prpgram in Biomedical Informatics.
Description: 1 online resource
Responsibility: Katie Planey.

Abstract:

Subtypes define distinctive subgroups of objects found within a larger cohort; these subtypes can help domain experts define actionable recommendations for each subgroup to improve outcomes. With the relatively recent explosion of large datasets accompanied by large numbers of features, a popular way to define subtypes is unsupervised learning, or clustering, algorithms. Unfortunately, unsupervised learning algorithms have a serious drawback: there is no ground truth. While a set of clusters may correlate strongly with an outcomes variable, an outcomes, or response, variable, is not used in an unsupervised learning algorithm; this means that the accuracy of clusters derived from such algorithms, by nature, cannot be quantified. One way to ensure subtypes represent true signal is to conduct the clustering analysis on multiple datasets. However, there is a lack of methods for unsupervised learning across multiple datasets. In this dissertation, I propose novel methods for unsupervised clustering across multiple datasets, by finding a consensus across clusters derived from each individual dataset. I propose an algorithm, COINCIDE, that encompasses these novel methods; COINCIDE interprets each cluster as a node in a network. I apply COINCIDE to cancer gene expression and pathology datasets, and finally sepsis gene expression datasets, to illustrate the ability of COINCIDE to conduct unsupervised learning across multiple datasets to discover robust subtypes.

Reviews

User-contributed reviews
Retrieving GoodReads reviews...
Retrieving DOGObooks reviews...

Tags

Be the first.
Confirm this request

You may have already requested this item. Please select Ok if you would like to proceed with this request anyway.

Linked Data


\n\n

Primary Entity<\/h3>\n
<http:\/\/www.worldcat.org\/oclc\/934036901<\/a>> # Unsupervised learning across multiple datasets<\/span>\n\u00A0\u00A0\u00A0\u00A0a \npto:Web_document<\/a>, bgn:Thesis<\/a>, schema:Book<\/a>, schema:MediaObject<\/a>, schema:CreativeWork<\/a> ;\u00A0\u00A0\u00A0\nbgn:inSupportOf<\/a> \"\" ;\u00A0\u00A0\u00A0\nlibrary:oclcnum<\/a> \"934036901<\/span>\" ;\u00A0\u00A0\u00A0\nschema:contributor<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/2871939914#Organization\/stanford_university_program_in_biomedical_informatics<\/a>> ; # Stanford University. Program in Biomedical Informatics.<\/span>\n\u00A0\u00A0\u00A0\nschema:contributor<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/2871939914#Person\/musen_mark_a<\/a>> ; # Mark A. Musen<\/span>\n\u00A0\u00A0\u00A0\nschema:contributor<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/2871939914#Person\/salzman_julia<\/a>> ; # Julia Salzman<\/span>\n\u00A0\u00A0\u00A0\nschema:contributor<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/2871939914#Person\/gevaert_olivier_michel_simonne<\/a>> ; # Olivier Michel Simonne Gevaert<\/span>\n\u00A0\u00A0\u00A0\nschema:creator<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/2871939914#Person\/planey_katie<\/a>> ; # Katie Planey<\/span>\n\u00A0\u00A0\u00A0\nschema:datePublished<\/a> \"2015<\/span>\" ;\u00A0\u00A0\u00A0\nschema:description<\/a> \"Subtypes define distinctive subgroups of objects found within a larger cohort; these subtypes can help domain experts define actionable recommendations for each subgroup to improve outcomes. With the relatively recent explosion of large datasets accompanied by large numbers of features, a popular way to define subtypes is unsupervised learning, or clustering, algorithms. Unfortunately, unsupervised learning algorithms have a serious drawback: there is no ground truth. While a set of clusters may correlate strongly with an outcomes variable, an outcomes, or response, variable, is not used in an unsupervised learning algorithm; this means that the accuracy of clusters derived from such algorithms, by nature, cannot be quantified. One way to ensure subtypes represent true signal is to conduct the clustering analysis on multiple datasets. However, there is a lack of methods for unsupervised learning across multiple datasets. In this dissertation, I propose novel methods for unsupervised clustering across multiple datasets, by finding a consensus across clusters derived from each individual dataset. I propose an algorithm, COINCIDE, that encompasses these novel methods; COINCIDE interprets each cluster as a node in a network. I apply COINCIDE to cancer gene expression and pathology datasets, and finally sepsis gene expression datasets, to illustrate the ability of COINCIDE to conduct unsupervised learning across multiple datasets to discover robust subtypes.<\/span>\"@en<\/a> ;\u00A0\u00A0\u00A0\nschema:exampleOfWork<\/a> <http:\/\/worldcat.org\/entity\/work\/id\/2871939914<\/a>> ;\u00A0\u00A0\u00A0\nschema:genre<\/a> \"Academic theses<\/span>\"@en<\/a> ;\u00A0\u00A0\u00A0\nschema:inLanguage<\/a> \"en<\/span>\" ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Unsupervised learning across multiple datasets<\/span>\"@en<\/a> ;\u00A0\u00A0\u00A0\nschema:productID<\/a> \"934036901<\/span>\" ;\u00A0\u00A0\u00A0\nschema:publication<\/a> <http:\/\/www.worldcat.org\/title\/-\/oclc\/934036901#PublicationEvent\/2015<\/a>> ;\u00A0\u00A0\u00A0\nschema:url<\/a> <http:\/\/purl.stanford.edu\/zw234xf4051<\/a>> ;\u00A0\u00A0\u00A0\nwdrs:describedby<\/a> <http:\/\/www.worldcat.org\/title\/-\/oclc\/934036901<\/a>> ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n\n

Related Entities<\/h3>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/2871939914#Organization\/stanford_university_program_in_biomedical_informatics<\/a>> # Stanford University. Program in Biomedical Informatics.<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Organization<\/a> ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Stanford University. Program in Biomedical Informatics.<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/2871939914#Person\/gevaert_olivier_michel_simonne<\/a>> # Olivier Michel Simonne Gevaert<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Person<\/a> ;\u00A0\u00A0\u00A0\nschema:familyName<\/a> \"Gevaert<\/span>\" ;\u00A0\u00A0\u00A0\nschema:givenName<\/a> \"Olivier Michel Simonne<\/span>\" ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Olivier Michel Simonne Gevaert<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/2871939914#Person\/musen_mark_a<\/a>> # Mark A. Musen<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Person<\/a> ;\u00A0\u00A0\u00A0\nschema:familyName<\/a> \"Musen<\/span>\" ;\u00A0\u00A0\u00A0\nschema:givenName<\/a> \"Mark A.<\/span>\" ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Mark A. Musen<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/2871939914#Person\/planey_katie<\/a>> # Katie Planey<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Person<\/a> ;\u00A0\u00A0\u00A0\nschema:familyName<\/a> \"Planey<\/span>\" ;\u00A0\u00A0\u00A0\nschema:givenName<\/a> \"Katie<\/span>\" ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Katie Planey<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/2871939914#Person\/salzman_julia<\/a>> # Julia Salzman<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Person<\/a> ;\u00A0\u00A0\u00A0\nschema:familyName<\/a> \"Salzman<\/span>\" ;\u00A0\u00A0\u00A0\nschema:givenName<\/a> \"Julia<\/span>\" ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Julia Salzman<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/www.worldcat.org\/title\/-\/oclc\/934036901<\/a>>\u00A0\u00A0\u00A0\u00A0a \ngenont:InformationResource<\/a>, genont:ContentTypeGenericResource<\/a> ;\u00A0\u00A0\u00A0\nschema:about<\/a> <http:\/\/www.worldcat.org\/oclc\/934036901<\/a>> ; # Unsupervised learning across multiple datasets<\/span>\n\u00A0\u00A0\u00A0\nschema:dateModified<\/a> \"2019-08-10<\/span>\" ;\u00A0\u00A0\u00A0\nvoid:inDataset<\/a> <http:\/\/purl.oclc.org\/dataset\/WorldCat<\/a>> ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n