skip to content
Covid-19 virus
COVID-19 Resources

Reliable information about the coronavirus (COVID-19) is available from the World Health Organization (current situation, international travel). Numerous and frequently-updated resource results are available from this WorldCat.org search. OCLC’s WebJunction has pulled together information and resources to assist library staff as they consider how to handle coronavirus issues in their communities.

Image provided by: CDC/ Alissa Eckert, MS; Dan Higgins, MAM
Large-scale and high-dimensional statistical learning methods and algorithms Preview this item
ClosePreview this item
Checking...

Large-scale and high-dimensional statistical learning methods and algorithms

Author: Junyang Qian; Trevor Hastie; Manuel Rivas; Robert Tibshirani; Stanford University. Department of Statistics.
Publisher: [Stanford, California] : [Stanford University], 2020. ©2020
Dissertation: Ph.D. Stanford University 2020. Thesis
Edition/Format:   Thesis/dissertation : Document : Thesis/dissertation : eBook   Computer File : English
Summary:
In the past two decades, many areas such as genomics, neuroscience, economics and Internet services have been producing increasingly big datasets that have high dimension, large sample size, or both. This provides unprecedented opportunities for us to retrieve and infer valuable information from the data. Meanwhile, it also poses new challenges for statistical methodologies and computational algorithms. On the one  Read more...
Rating:

(not yet rated) 0 with reviews - Be the first.

Find a copy online

Links to this item

Find a copy in the library

&AllPage.SpinnerRetrieving; Finding libraries that hold this item...

Details

Material Type: Document, Thesis/dissertation, Internet resource
Document Type: Internet Resource, Computer File
All Authors / Contributors: Junyang Qian; Trevor Hastie; Manuel Rivas; Robert Tibshirani; Stanford University. Department of Statistics.
OCLC Number: 1157264777
Notes: Submitted to the Department of Statistics.
Description: 1 online resource
Responsibility: Junyang Qian.

Abstract:

In the past two decades, many areas such as genomics, neuroscience, economics and Internet services have been producing increasingly big datasets that have high dimension, large sample size, or both. This provides unprecedented opportunities for us to retrieve and infer valuable information from the data. Meanwhile, it also poses new challenges for statistical methodologies and computational algorithms. On the one hand, we want to formulate a reasonable model to capture the desired structures and improve the quality of statistical estimation and inference. On the other hand, in the face of increasingly large datasets, computation can be a big hurdle for one to arrive at meaningful conclusions. This thesis stands at the intersection of the two topics, proposing statistical methods to capture desired structures in the data, and seeking scalable approaches to optimizing the computation for very large datasets. We propose a scalable and flexible framework for solving large-scale sparse regression problems with the lasso/elastic-net and a scalable framework for solving sparse reduced rank regression in the presence of multiple correlated responses and other nuances such as missing values. Optimized implementations are developed for genomics data in the PLINK 2.0 format in R packages snpnet and multiSnpnet respectively. The two methods are demonstrated on the very large and ultrahigh-dimensional UK Biobank studies and see significant improvement over traditional predictive modeling methods. In addition, we consider a different class of high-dimensional problems, heterogeneous causal effect estimation. Unlike the setting of supervised learning, the main challenge of such problems is that in the historical data, we never observe the other side of the coin, so we have no access to the ground truth of the true difference among treatments. We propose adaptation of nonparametric statistical learning methods, in particular gradient boosting and multivariate adaptive regression splines, to the estimation of treatment effect based on the predictors available. The implementation is packaged in an R package causalLearning.

Reviews

User-contributed reviews
Retrieving GoodReads reviews...
Retrieving DOGObooks reviews...

Tags

Be the first.
Confirm this request

You may have already requested this item. Please select Ok if you would like to proceed with this request anyway.

Linked Data


\n\n

Primary Entity<\/h3>\n
<http:\/\/www.worldcat.org\/oclc\/1157264777<\/a>> # Large-scale and high-dimensional statistical learning methods and algorithms<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nbgn:Thesis<\/a>, pto:Web_document<\/a>, schema:CreativeWork<\/a>, schema:MediaObject<\/a>, schema:Book<\/a> ;\u00A0\u00A0\u00A0\nbgn:inSupportOf<\/a> \"\" ;\u00A0\u00A0\u00A0\nlibrary:oclcnum<\/a> \"1157264777<\/span>\" ;\u00A0\u00A0\u00A0\nlibrary:placeOfPublication<\/a> <http:\/\/id.loc.gov\/vocabulary\/countries\/cau<\/a>> ;\u00A0\u00A0\u00A0\nschema:author<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/10254945458#Person\/qian_junyang<\/a>> ; # Junyang Qian<\/span>\n\u00A0\u00A0\u00A0\nschema:contributor<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/10254945458#Organization\/stanford_university_department_of_statistics<\/a>> ; # Stanford University. Department of Statistics.<\/span>\n\u00A0\u00A0\u00A0\nschema:contributor<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/10254945458#Person\/tibshirani_robert<\/a>> ; # Robert Tibshirani<\/span>\n\u00A0\u00A0\u00A0\nschema:contributor<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/10254945458#Person\/hastie_trevor<\/a>> ; # Trevor Hastie<\/span>\n\u00A0\u00A0\u00A0\nschema:contributor<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/10254945458#Person\/rivas_manuel<\/a>> ; # Manuel Rivas<\/span>\n\u00A0\u00A0\u00A0\nschema:copyrightYear<\/a> \"2020<\/span>\" ;\u00A0\u00A0\u00A0\nschema:datePublished<\/a> \"2020<\/span>\" ;\u00A0\u00A0\u00A0\nschema:description<\/a> \"In the past two decades, many areas such as genomics, neuroscience, economics and Internet services have been producing increasingly big datasets that have high dimension, large sample size, or both. This provides unprecedented opportunities for us to retrieve and infer valuable information from the data. Meanwhile, it also poses new challenges for statistical methodologies and computational algorithms. On the one hand, we want to formulate a reasonable model to capture the desired structures and improve the quality of statistical estimation and inference. On the other hand, in the face of increasingly large datasets, computation can be a big hurdle for one to arrive at meaningful conclusions. This thesis stands at the intersection of the two topics, proposing statistical methods to capture desired structures in the data, and seeking scalable approaches to optimizing the computation for very large datasets. We propose a scalable and flexible framework for solving large-scale sparse regression problems with the lasso\/elastic-net and a scalable framework for solving sparse reduced rank regression in the presence of multiple correlated responses and other nuances such as missing values. Optimized implementations are developed for genomics data in the PLINK 2.0 format in R packages snpnet and multiSnpnet respectively. The two methods are demonstrated on the very large and ultrahigh-dimensional UK Biobank studies and see significant improvement over traditional predictive modeling methods. In addition, we consider a different class of high-dimensional problems, heterogeneous causal effect estimation. Unlike the setting of supervised learning, the main challenge of such problems is that in the historical data, we never observe the other side of the coin, so we have no access to the ground truth of the true difference among treatments. We propose adaptation of nonparametric statistical learning methods, in particular gradient boosting and multivariate adaptive regression splines, to the estimation of treatment effect based on the predictors available. The implementation is packaged in an R package causalLearning.<\/span>\"@en<\/a> ;\u00A0\u00A0\u00A0\nschema:exampleOfWork<\/a> <http:\/\/worldcat.org\/entity\/work\/id\/10254945458<\/a>> ;\u00A0\u00A0\u00A0\nschema:inLanguage<\/a> \"en<\/span>\" ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Large-scale and high-dimensional statistical learning methods and algorithms<\/span>\"@en<\/a> ;\u00A0\u00A0\u00A0\nschema:productID<\/a> \"1157264777<\/span>\" ;\u00A0\u00A0\u00A0\nschema:url<\/a> <http:\/\/purl.stanford.edu\/xf104bg8789<\/a>> ;\u00A0\u00A0\u00A0\nwdrs:describedby<\/a> <http:\/\/www.worldcat.org\/title\/-\/oclc\/1157264777<\/a>> ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n\n

Related Entities<\/h3>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/10254945458#Organization\/stanford_university_department_of_statistics<\/a>> # Stanford University. Department of Statistics.<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Organization<\/a> ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Stanford University. Department of Statistics.<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/10254945458#Person\/hastie_trevor<\/a>> # Trevor Hastie<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Person<\/a> ;\u00A0\u00A0\u00A0\nschema:familyName<\/a> \"Hastie<\/span>\" ;\u00A0\u00A0\u00A0\nschema:givenName<\/a> \"Trevor<\/span>\" ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Trevor Hastie<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/10254945458#Person\/qian_junyang<\/a>> # Junyang Qian<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Person<\/a> ;\u00A0\u00A0\u00A0\nschema:familyName<\/a> \"Qian<\/span>\" ;\u00A0\u00A0\u00A0\nschema:givenName<\/a> \"Junyang<\/span>\" ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Junyang Qian<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/10254945458#Person\/rivas_manuel<\/a>> # Manuel Rivas<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Person<\/a> ;\u00A0\u00A0\u00A0\nschema:familyName<\/a> \"Rivas<\/span>\" ;\u00A0\u00A0\u00A0\nschema:givenName<\/a> \"Manuel<\/span>\" ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Manuel Rivas<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/10254945458#Person\/tibshirani_robert<\/a>> # Robert Tibshirani<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Person<\/a> ;\u00A0\u00A0\u00A0\nschema:familyName<\/a> \"Tibshirani<\/span>\" ;\u00A0\u00A0\u00A0\nschema:givenName<\/a> \"Robert<\/span>\" ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Robert Tibshirani<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/id.loc.gov\/vocabulary\/countries\/cau<\/a>>\u00A0\u00A0\u00A0\u00A0a \nschema:Place<\/a> ;\u00A0\u00A0\u00A0\ndcterms:identifier<\/a> \"cau<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/www.worldcat.org\/title\/-\/oclc\/1157264777<\/a>>\u00A0\u00A0\u00A0\u00A0a \ngenont:InformationResource<\/a>, genont:ContentTypeGenericResource<\/a> ;\u00A0\u00A0\u00A0\nschema:about<\/a> <http:\/\/www.worldcat.org\/oclc\/1157264777<\/a>> ; # Large-scale and high-dimensional statistical learning methods and algorithms<\/span>\n\u00A0\u00A0\u00A0\nschema:dateModified<\/a> \"2020-06-09<\/span>\" ;\u00A0\u00A0\u00A0\nvoid:inDataset<\/a> <http:\/\/purl.oclc.org\/dataset\/WorldCat<\/a>> ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n\n

Content-negotiable representations<\/p>\n