skip to content
Best practices in data cleaning : a complete guide to everything you need to do before and after collecting your data Preview this item
ClosePreview this item
Checking...

Best practices in data cleaning : a complete guide to everything you need to do before and after collecting your data

Author: Jason W Osborne
Publisher: Thousand Oaks, Calif. : SAGE, ©2013.
Edition/Format:   Book : EnglishView all editions and formats
Database:WorldCat
Summary:
"Many researchers jump straight from data collection to data analysis without realizing how analyses and hypothesis tests can go profoundly wrong without clean data. This book provides a clear, step-by-step process to examining and cleaning data in order to decrease error rates and increase both the power and replicability of results. Jason W. Osborne, author of Best Practices in Quantitative Methods (SAGE, 2008)  Read more...
Rating:

(not yet rated) 0 with reviews - Be the first.

Subjects
More like this

 

Find a copy in the library

&AllPage.SpinnerRetrieving; Finding libraries that hold this item...

Details

Document Type: Book
All Authors / Contributors: Jason W Osborne
ISBN: 9781412988018 1412988012
OCLC Number: 730403666
Description: xv, 275 pages : illustrations ; 23 cm
Contents: Machine generated contents note: ch. 1 Why Data Cleaning Is Important: Debunking the Myth of Robustness --
Origins of Data Cleaning --
Are Things Really That Bad? --
Why Care About Testing Assumptions and Cleaning Data? --
How Can This State of Affairs Be True? --
The Best Practices Orientation of This Book --
Data Cleaning Is a Simple Process; However ... --
One Path to Solving the Problem --
For Further Enrichment --
SECTION I BEST PRACTICES AS YOU PREPARE FOR DATA COLLECTION --
ch. 2 Power and Planning for Data Collection: Debunking the Myth of Adequate Power --
Power and Best Practices in Statistical Analysis of Data --
How Null-Hypothesis Statistical Testing Relates to Power --
What Do Statistical Tests Tell Us? --
How Does Power Relate to Error Rates? --
Low Power and Type I Error Rates in a Literature --
How to Calculate Power --
The Effect of Power on the Replicability of Study Results --
Can Data Cleaning Fix These Sampling Problems? --
Conclusions --
For Further Enrichment --
Appendix --
ch. 3 Being True to the Target Population: Debunking the Myth of Representativeness --
Sampling Theory and Generalizability --
Aggregation or Omission Errors --
Including Irrelevant Groups --
Nonresponse and Generalizability --
Consent Procedures and Sampling Bias --
Generalizability of Internet Surveys --
Restriction of Range --
Extreme Groups Analysis --
Conclusion --
For Further Enrichment --
ch. 4 Using Large Data Sets With Probability Sampling Frameworks: Debunking the Myth of Equality --
What Types of Studies Use Complex Sampling? --
Why Does Complex Sampling Matter? --
Best Practices in Accounting for Complex Sampling --
Does It Really Make a Difference in the Results? --
So What Does All This Mean? --
For Further Enrichment --
SECTION II BEST PRACTICES IN DATA CLEANING AND SCREENING --
ch. 5 Screening Your Data for Potential Problems: Debunking the Myth of Perfect Data --
The Language of Describing Distributions --
Testing Whether Your Data Are Normally Distributed --
Conclusions --
For Further Enrichment --
Appendix --
ch. 6 Dealing With Missing or Incomplete Data: Debunking the Myth of Emptiness --
What Is Missing or Incomplete Data? --
Categories of Missingness --
What Do We Do With Missing Data? --
The Effects of Listwise Deletion --
The Detrimental Effects of Mean Substitution --
The Effects of Strong and Weak Imputation of Values --
Multiple Imputation: A Modern Method of Missing Data Estimation --
Missingness Can Be an Interesting Variable in and of Itself --
Summing Up: What Are Best Practices? --
For Further Enrichment --
Appendixes --
ch. 7 Extreme and Influential Data Points: Debunking the Myth of Equality --
What Are Extreme Scores? --
How Extreme Values Affect Statistical Analyses --
What Causes Extreme Scores? --
Extreme Scores as a Potential Focus of Inquiry --
Identification of Extreme Scores --
Why Remove Extreme Scores? --
Effect of Extreme Scores on Inferential Statistics --
Effect of Extreme Scores on Correlations and Regression --
Effect of Extreme Scores on t-Tests and ANOVAs --
To Remove or Not to Remove? --
For Further Enrichment --
ch. 8 Improving the Normality of Variables Through Box-Cox Transformation: Debunking the Myth of Distributional Irrelevance --
Why Do We Need Data Transformations? --
When a Variable Violates the Assumption of Normality --
Traditional Data Transformations for Improving Normality --
Application and Efficacy of Box-Cox Transformations --
Reversing Transformations --
Conclusion --
For Further Enrichment --
Appendix --
ch. 9 Does Reliability Matter? Debunking the Myth of Perfect Measurement --
What Is a Reasonable Level of Reliability? --
Reliability and Simple Correlation or Regression --
Reliability and Partial Correlations --
Reliability and Multiple Regression --
Reliability and Interactions in Multiple Regression --
Protecting Against Overcorrecting During Disattenuation --
Other Solutions to the Issue of Measurement Error --
What If We Had Error-Free Measurement? --
An Example From My Research --
Does Reliability Influence Other Analyses? --
The Argument That Poor Reliability Is Not That Important --
Conclusions and Best Practices --
For Further Enrichment --
SECTION III ADVANCED TOPICS IN DATA CLEANING --
ch. 10 Random Responding, Motivated Misresponding, and Response Sets: Debunking the Myth of the Motivated Participant --
What Is a Response Set? --
Common Types of Response Sets --
Is Random Responding Truly Random? --
Detecting Random Responding in Your Research --
Does Random Responding Cause Serious Problems With Research? --
Example of the Effects of Random Responding --
Are Random Responders Truly Random Responders? --
Summary --
Best Practices Regarding Random Responding --
Magnitude of the Problem --
For Further Enrichment --
ch. 11 Why Dichotomizing Continuous Variables Is Rarely a Good Practice: Debunking the Myth of Categorization --
What Is Dichotomization and Why Does It Exist? --
How Widespread Is This Practice? --
Why Do Researchers Use Dichotomization? --
Are Analyses With Dichotomous Variables Easier to Interpret? --
Are Analyses With Dichotomous Variables Easier to Compute? --
Are Dichotomous Variables More Reliable? --
Other Drawbacks of Dichotomization --
For Further Enrichment --
ch. 12 The Special Challenge of Cleaning Repeated Measures Data: Lots of Pits in Which to Fall --
Treat All Time Points Equally --
What to Do With Extreme Scores? --
Missing Data --
Summary --
ch. 13 Now That the Myths Are Debunked ... : Visions of Rational Quantitative Methodology for the 21st Century.
Responsibility: Jason W. Osborne.

Abstract:

This book provides a clear, step-by-step process of examining and cleaning data in order to decrease error rates and increase both the power and replicability of results.  Read more...

Reviews

Editorial reviews

Publisher Synopsis

"This book provides the perfect bridge between the formal study of statistics and the practice of statistics. It fills the gap left by many of the traditional texts that focus either on the technical Read more...

 
User-contributed reviews
Retrieving GoodReads reviews...
Retrieving DOGObooks reviews...

Tags

Be the first.

Similar Items

Related Subjects:(2)

User lists with this item (1)

Confirm this request

You may have already requested this item. Please select Ok if you would like to proceed with this request anyway.

Linked Data


<http://www.worldcat.org/oclc/730403666>
library:oclcnum"730403666"
library:placeOfPublication
library:placeOfPublication
owl:sameAs<info:oclcnum/730403666>
rdf:typeschema:Book
schema:about
schema:about
schema:about
<http://id.worldcat.org/fast/1742283>
rdf:typeschema:Intangible
schema:name"Quantitative research"@en
schema:name"Quantitative research."@en
schema:about
schema:about
<http://id.worldcat.org/fast/1122933>
rdf:typeschema:Intangible
schema:name"Social sciences--Methodology"@en
schema:name"Social sciences--Methodology."@en
schema:copyrightYear"2013"
schema:creator
schema:datePublished"2013"
schema:description"Machine generated contents note: ch. 1 Why Data Cleaning Is Important: Debunking the Myth of Robustness -- Origins of Data Cleaning -- Are Things Really That Bad? -- Why Care About Testing Assumptions and Cleaning Data? -- How Can This State of Affairs Be True? -- The Best Practices Orientation of This Book -- Data Cleaning Is a Simple Process; However ... -- One Path to Solving the Problem -- For Further Enrichment -- SECTION I BEST PRACTICES AS YOU PREPARE FOR DATA COLLECTION -- ch. 2 Power and Planning for Data Collection: Debunking the Myth of Adequate Power -- Power and Best Practices in Statistical Analysis of Data -- How Null-Hypothesis Statistical Testing Relates to Power -- What Do Statistical Tests Tell Us? -- How Does Power Relate to Error Rates? -- Low Power and Type I Error Rates in a Literature -- How to Calculate Power -- The Effect of Power on the Replicability of Study Results -- Can Data Cleaning Fix These Sampling Problems? -- Conclusions -- For Further Enrichment -- Appendix -- ch. 3 Being True to the Target Population: Debunking the Myth of Representativeness -- Sampling Theory and Generalizability -- Aggregation or Omission Errors -- Including Irrelevant Groups -- Nonresponse and Generalizability -- Consent Procedures and Sampling Bias -- Generalizability of Internet Surveys -- Restriction of Range -- Extreme Groups Analysis -- Conclusion -- For Further Enrichment -- ch. 4 Using Large Data Sets With Probability Sampling Frameworks: Debunking the Myth of Equality -- What Types of Studies Use Complex Sampling? -- Why Does Complex Sampling Matter? -- Best Practices in Accounting for Complex Sampling -- Does It Really Make a Difference in the Results? -- So What Does All This Mean? -- For Further Enrichment -- SECTION II BEST PRACTICES IN DATA CLEANING AND SCREENING -- ch. 5 Screening Your Data for Potential Problems: Debunking the Myth of Perfect Data -- The Language of Describing Distributions -- Testing Whether Your Data Are Normally Distributed -- Conclusions -- For Further Enrichment -- Appendix -- ch. 6 Dealing With Missing or Incomplete Data: Debunking the Myth of Emptiness -- What Is Missing or Incomplete Data? -- Categories of Missingness -- What Do We Do With Missing Data? -- The Effects of Listwise Deletion -- The Detrimental Effects of Mean Substitution -- The Effects of Strong and Weak Imputation of Values -- Multiple Imputation: A Modern Method of Missing Data Estimation -- Missingness Can Be an Interesting Variable in and of Itself -- Summing Up: What Are Best Practices? -- For Further Enrichment -- Appendixes -- ch. 7 Extreme and Influential Data Points: Debunking the Myth of Equality -- What Are Extreme Scores? -- How Extreme Values Affect Statistical Analyses -- What Causes Extreme Scores? -- Extreme Scores as a Potential Focus of Inquiry -- Identification of Extreme Scores -- Why Remove Extreme Scores? -- Effect of Extreme Scores on Inferential Statistics -- Effect of Extreme Scores on Correlations and Regression -- Effect of Extreme Scores on t-Tests and ANOVAs -- To Remove or Not to Remove? -- For Further Enrichment -- ch. 8 Improving the Normality of Variables Through Box-Cox Transformation: Debunking the Myth of Distributional Irrelevance -- Why Do We Need Data Transformations? -- When a Variable Violates the Assumption of Normality -- Traditional Data Transformations for Improving Normality -- Application and Efficacy of Box-Cox Transformations -- Reversing Transformations -- Conclusion -- For Further Enrichment -- Appendix -- ch. 9 Does Reliability Matter? Debunking the Myth of Perfect Measurement -- What Is a Reasonable Level of Reliability? -- Reliability and Simple Correlation or Regression -- Reliability and Partial Correlations -- Reliability and Multiple Regression -- Reliability and Interactions in Multiple Regression -- Protecting Against Overcorrecting During Disattenuation -- Other Solutions to the Issue of Measurement Error -- What If We Had Error-Free Measurement? -- An Example From My Research -- Does Reliability Influence Other Analyses? -- The Argument That Poor Reliability Is Not That Important -- Conclusions and Best Practices -- For Further Enrichment -- SECTION III ADVANCED TOPICS IN DATA CLEANING -- ch. 10 Random Responding, Motivated Misresponding, and Response Sets: Debunking the Myth of the Motivated Participant -- What Is a Response Set? -- Common Types of Response Sets -- Is Random Responding Truly Random? -- Detecting Random Responding in Your Research -- Does Random Responding Cause Serious Problems With Research? -- Example of the Effects of Random Responding -- Are Random Responders Truly Random Responders? -- Summary -- Best Practices Regarding Random Responding -- Magnitude of the Problem -- For Further Enrichment -- ch. 11 Why Dichotomizing Continuous Variables Is Rarely a Good Practice: Debunking the Myth of Categorization -- What Is Dichotomization and Why Does It Exist? -- How Widespread Is This Practice? -- Why Do Researchers Use Dichotomization? -- Are Analyses With Dichotomous Variables Easier to Interpret? -- Are Analyses With Dichotomous Variables Easier to Compute? -- Are Dichotomous Variables More Reliable? -- Other Drawbacks of Dichotomization -- For Further Enrichment -- ch. 12 The Special Challenge of Cleaning Repeated Measures Data: Lots of Pits in Which to Fall -- Treat All Time Points Equally -- What to Do With Extreme Scores? -- Missing Data -- Summary -- ch. 13 Now That the Myths Are Debunked ... : Visions of Rational Quantitative Methodology for the 21st Century."@en
schema:description""Many researchers jump straight from data collection to data analysis without realizing how analyses and hypothesis tests can go profoundly wrong without clean data. This book provides a clear, step-by-step process to examining and cleaning data in order to decrease error rates and increase both the power and replicability of results. Jason W. Osborne, author of Best Practices in Quantitative Methods (SAGE, 2008) provides easily-implemented suggestions that are research-based and will motivate change in practice by empirically demonstrating for each topic the benefits of following best practices and the potential consequences of not following these guidelines. If your goal is to do the best research you can do, draw conclusions that are most likely to be accurate representations of the population(s) you wish to speak about, and report results that are most likely to be replicated by other researchers, then this basic guidebook is indispensible."--Publisher's website."@en
schema:exampleOfWork<http://worldcat.org/entity/work/id/1186972753>
schema:inLanguage"en"
schema:name"Best practices in data cleaning : a complete guide to everything you need to do before and after collecting your data"@en
schema:publisher
schema:url
schema:workExample
umbel:isLike<http://bnb.data.bl.uk/id/resource/GBB1D1809>

Content-negotiable representations

Close Window

Please sign in to WorldCat 

Don't have an account? You can easily create a free account.