skip to content
Text mining in practice with R Preview this item
ClosePreview this item
Checking...

Text mining in practice with R

Author: Ted Kwartler
Publisher: Hoboken (NJ) : John Wiley & Sons, cop. 2017.
Edition/Format:   Print book : English : 1st edView all editions and formats
Summary:

A reliable, cost-effective approach to extracting priceless business information from all sources of text Excavating actionable business insights from data is a complex undertaking, and that  Read more...

Rating:

(not yet rated) 0 with reviews - Be the first.

Subjects
More like this

Find a copy in the library

&AllPage.SpinnerRetrieving; Finding libraries that hold this item...

Details

Document Type: Book
All Authors / Contributors: Ted Kwartler
ISBN: 9781119282013 1119282012 9781119282099 1119282098 9781119282082 111928208X
OCLC Number: 1016088008
Description: XII, 307 str. : ilustr. ; 24 cm.
Contents: Foreword 1 Chapter 1: What is Text Mining? 1 1.1 What is it? 1 1.1.1 What is text mining in practice? 1 1.1.2 Where does text mining fit? 1 1.2 Why we care about text mining? 1 1.2.1 What are the consequences of ignoring text? 1 1.2.2 What are the benefits of text mining? 1 1.2.3 Setting Expectations: When text mining should (and should not) be used. 1 1.3 A basic workflow. How the process works. 1 1.4 What tools do I need to get started with this? 1 1.5 A Simple Example 1 1.6 A Real World Use Case 1 1.7 Summary 1 Chapter 2: Basics of text mining 1 2.1 What is Text Mining in a practical sense? 1 2.2 Types of Text Mining: Bag of Words. 1 2.2.1 Types of Text Mining: Syntactic Parsing. 1 2.3 The text mining process in context 1 2.4 String Manipulation: Number of Characters & Substitutions 1 2.4.1 String Manipulations: Paste, Character Splits & Extractions 1 2.5 Keyword Scanning 1 2.6 String Packages stringr & stringi 1 2.7 Preprocessing Steps for Bag of Words Text Mining 1 2.8 Spell Check 1 2.9 Frequent Terms & Associations 1 2.9 Delta Assist Wrap Up 1 2.10 Summary 1 Chapter 3: Common Text Mining Visualizations 1 3.1 A tale of two (or three) cultures 1 3.2 Simple Exploration: Term Frequency, Associations & Word Networks 1 3.2.1 Term Frequency 1 3.2.2 Word Associations 1 3.2.3 Word Networks 1 3.3 Simple Word Clusters: Hierarchical Dendrograms 1 3.4 Word Clouds: Overused but Effective 1 3.4.1 One Corpus Word Clouds 1 3.4.2 Comparing and Contrasting Corpora in Word Clouds 1 3.4.3 Polarized Tag Plot 1 3.5 Summary 1 Chapter 4: Sentiment Scoring 1 4.1 What is Sentiment Analysis? 1 4.2 Sentiment Scoring: Parlor Trick or Insightful? 1 4.3 Polarity: Simple Sentiment Scoring 1 4.3.1 Subjectivity Lexicons 1 4.3.2 Qdap's Scoring for positive and negative word choice 1 4.3.3 Revisiting Word Clouds...Sentiment Word Clouds 1 4.4 Emoticons :) Dealing with these perplexing clues 1 4.4.1 Symbol-Based Emoticons Native to R 1 4.4.2 Punctuation Based Emoticons 1 4.4.3 Emoji 1 4.5 R's Archived Sentiment Scoring Library 1 4.5 Sentiment the tidytext way 1 4.6 Airbnb.com Boston Wrap Up 1 4.7 Summary 1 Chapter 5: Hidden Structures: Clustering, String Distance, Text Vectors & Topic Modeling 1 5.1 What is clustering? 1 5.1.1 K Means Clustering 1 5.1.2 Spherical K Means Clustering 1 5.1.3 K Mediod Clustering 1 5.1.4 Evaluating the cluster approaches 1 5.2 Calculating & Exploring String Distance 1 5.2.1 What is string distance? 1 5.2.2 Fuzzy Matching-amatch, ain 1 5.2.3 Similarity Distances- stringdist, stringdistmatrix 1 5.3 LDA Topic Modeling Explained 1 5.3.2 Topic Modeling Case Study 1 5.3.2 LDA &LDAvis 1 5.4 Text to Vectors using "text2vec" 1 5.4.1 text2vec 1 5.5 Summary 1 Chapter 6: Document Classification: Finding Clickbait from Headlines 1 6.1 What is document classification? 1 6.2 Clickbait Case Study 1 6.2.2 Session & Data Set Up 1 6.2.3 GLMNET Training 1 6.2.4 GLMNET Test Predictions 1 6.2.5 Test Set Evaluation 1 6.2.6 Finding the most impactful words 1 6.2.7 Case study Wrap Up: Model Accuracy & Improving Performance Recommendations 1 6.3 Summary 1 Chapter 7: Predictive Modeling: Using text for classifying & predicting outcomes. 1 7.1 Classification Vs Prediction 1 7.2 Case Study I: Will this patient come back to the hospital? 1 7.2.2 Patient Readmission in the Text Mining Workflow 1 7.2.3 Session & Data Set Up 1 7.2.4 Patient Modeling 1 7.2.5 More Model KPI: AUC, Recall, Precision & F1 1 7.2.5.1 Additional Evaluation Metrics 1 7.2.6 Apply the model to new patients 1 7.2.7 Patient Readmission Conclusion 1 7.3 Case Study II: Predicting Box Office Success 1 7.3.2 Opening Weekend Revenue in the Text Mining Workflow 1 7.3.3 Session & Data Set Up 1 7.3.4 Opening Weekend Modeling 1 7.3.5 Model Evaluation 1 7.3.6 Apply the Model to new Movie Reviews 1 7.3.7 Movie Revenue Conclusion 1 7.4 Summary 1 Chapter 8: The OpenNLP Project 1 8.1 What is the OpenNLP project? 1 8.2 R's OpenNLP Package 1 8.3 Named Entities in Hillary Clinton's Email 1 8.3.1 R Session Set-up 1 8.3.2 Minor Text Cleaning 1 8.3.3 Using OpenNLP on a single email 1 8.3.4 Using OpenNLP on multiple documents 1 8.3.5 Revisiting the Text Mining Workflow 1 8.4 Analyzing the Named Entities 1 8.4.1 Worldwide Map of Hillary Clinton's Location Mentions 1 8.4.2 Mapping Only European Locations 1 8.4.3 Entities & Polarity: How does Hillary Clinton feel about an entity? 1 8.4.4 Stock Charts for Entities 1 8.4.5 Reach an Insight or Conclusion about Hillary Clinton's Emails 1 8.5 Summary 1 Chapter 9: Text Sources 1 9.1 Sourcing Text 1 9.2 Web Sources 1 9.2.1 Web Scraping a Single Page with rvest 1 9.2.2 Web Scraping Multiple Pages with rvest 1 9.2.3 Application Program Interfaces (APIs) 1 9.2.4 Newspaper Articles from The Guardian Newspaper 1 9.2.5 Tweets using the "twitteR" Package 1 9.2.6 Calling an API without a dedicated R package 1 9.2.7 Using jsonlite to access the New York Times 1 9.2.8 Using RCurl & XML to Parse Google News Feeds 1 9.2.9 The tm library Web-Mining Plugin 1 9.3 Getting Text from File Sources 1 9.3.1 Individual CSV, TXT and Microsoft Office Files 1 9.3.2 Reading multiple files quickly 1 9.3.2 Extracting Text from PDFs 1 9.3.3 Optical Character Recognition: Extracting Text from Images 1 9.4 Summary 1
Responsibility: Ted Kwartler.

Reviews

User-contributed reviews
Retrieving GoodReads reviews...
Retrieving DOGObooks reviews...

Tags

Be the first.
Confirm this request

You may have already requested this item. Please select Ok if you would like to proceed with this request anyway.

Linked Data


\n\n

Primary Entity<\/h3>\n
<http:\/\/www.worldcat.org\/oclc\/1016088008<\/a>> # Text mining in practice with R<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:CreativeWork<\/a>, schema:Book<\/a> ;\u00A0\u00A0\u00A0\nlibrary:oclcnum<\/a> \"1016088008<\/span>\" ;\u00A0\u00A0\u00A0\nlibrary:placeOfPublication<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/4094907933#Place\/hoboken_nj<\/a>> ; # Hoboken (NJ)<\/span>\n\u00A0\u00A0\u00A0\nlibrary:placeOfPublication<\/a> <http:\/\/id.loc.gov\/vocabulary\/countries\/xxu<\/a>> ;\u00A0\u00A0\u00A0\nschema:about<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/4094907933#Thing\/informatics<\/a>> ; # informatics<\/span>\n\u00A0\u00A0\u00A0\nschema:about<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/4094907933#Thing\/informatika<\/a>> ; # informatika<\/span>\n\u00A0\u00A0\u00A0\nschema:author<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/4094907933#Person\/kwartler_ted<\/a>> ; # Ted Kwartler<\/span>\n\u00A0\u00A0\u00A0\nschema:bookEdition<\/a> \"1st ed.<\/span>\" ;\u00A0\u00A0\u00A0\nschema:bookFormat<\/a> bgn:PrintBook<\/a> ;\u00A0\u00A0\u00A0\nschema:copyrightYear<\/a> \"op.<\/span>\" ;\u00A0\u00A0\u00A0\nschema:datePublished<\/a> \"2017<\/span>\" ;\u00A0\u00A0\u00A0\nschema:exampleOfWork<\/a> <http:\/\/worldcat.org\/entity\/work\/id\/4094907933<\/a>> ;\u00A0\u00A0\u00A0\nschema:inLanguage<\/a> \"en<\/span>\" ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Text mining in practice with R<\/span>\" ;\u00A0\u00A0\u00A0\nschema:productID<\/a> \"1016088008<\/span>\" ;\u00A0\u00A0\u00A0\nschema:publication<\/a> <http:\/\/www.worldcat.org\/title\/-\/oclc\/1016088008#PublicationEvent\/hoboken_nj_john_wiley_&_sons_cop_2017<\/a>> ;\u00A0\u00A0\u00A0\nschema:publisher<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/4094907933#Agent\/john_wiley_&_sons<\/a>> ; # John Wiley & Sons<\/span>\n\u00A0\u00A0\u00A0\nschema:workExample<\/a> <http:\/\/worldcat.org\/isbn\/9781119282082<\/a>> ;\u00A0\u00A0\u00A0\nschema:workExample<\/a> <http:\/\/worldcat.org\/isbn\/9781119282099<\/a>> ;\u00A0\u00A0\u00A0\nschema:workExample<\/a> <http:\/\/worldcat.org\/isbn\/9781119282013<\/a>> ;\u00A0\u00A0\u00A0\nwdrs:describedby<\/a> <http:\/\/www.worldcat.org\/title\/-\/oclc\/1016088008<\/a>> ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n\n

Related Entities<\/h3>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/4094907933#Agent\/john_wiley_&_sons<\/a>> # John Wiley & Sons<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nbgn:Agent<\/a> ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"John Wiley & Sons<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/4094907933#Person\/kwartler_ted<\/a>> # Ted Kwartler<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Person<\/a> ;\u00A0\u00A0\u00A0\nschema:familyName<\/a> \"Kwartler<\/span>\" ;\u00A0\u00A0\u00A0\nschema:givenName<\/a> \"Ted<\/span>\" ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Ted Kwartler<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/4094907933#Place\/hoboken_nj<\/a>> # Hoboken (NJ)<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Place<\/a> ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"Hoboken (NJ)<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/4094907933#Thing\/informatics<\/a>> # informatics<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Thing<\/a> ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"informatics<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/experiment.worldcat.org\/entity\/work\/data\/4094907933#Thing\/informatika<\/a>> # informatika<\/span>\n\u00A0\u00A0\u00A0\u00A0a \nschema:Thing<\/a> ;\u00A0\u00A0\u00A0\nschema:name<\/a> \"informatika<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/id.loc.gov\/vocabulary\/countries\/xxu<\/a>>\u00A0\u00A0\u00A0\u00A0a \nschema:Place<\/a> ;\u00A0\u00A0\u00A0\ndcterms:identifier<\/a> \"xxu<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/worldcat.org\/isbn\/9781119282013<\/a>>\u00A0\u00A0\u00A0\u00A0a \nschema:ProductModel<\/a> ;\u00A0\u00A0\u00A0\nschema:isbn<\/a> \"1119282012<\/span>\" ;\u00A0\u00A0\u00A0\nschema:isbn<\/a> \"9781119282013<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/worldcat.org\/isbn\/9781119282082<\/a>>\u00A0\u00A0\u00A0\u00A0a \nschema:ProductModel<\/a> ;\u00A0\u00A0\u00A0\nschema:isbn<\/a> \"111928208X<\/span>\" ;\u00A0\u00A0\u00A0\nschema:isbn<\/a> \"9781119282082<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/worldcat.org\/isbn\/9781119282099<\/a>>\u00A0\u00A0\u00A0\u00A0a \nschema:ProductModel<\/a> ;\u00A0\u00A0\u00A0\nschema:isbn<\/a> \"1119282098<\/span>\" ;\u00A0\u00A0\u00A0\nschema:isbn<\/a> \"9781119282099<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/www.worldcat.org\/title\/-\/oclc\/1016088008<\/a>>\u00A0\u00A0\u00A0\u00A0a \ngenont:InformationResource<\/a>, genont:ContentTypeGenericResource<\/a> ;\u00A0\u00A0\u00A0\nschema:about<\/a> <http:\/\/www.worldcat.org\/oclc\/1016088008<\/a>> ; # Text mining in practice with R<\/span>\n\u00A0\u00A0\u00A0\nschema:dateModified<\/a> \"2020-02-25<\/span>\" ;\u00A0\u00A0\u00A0\nvoid:inDataset<\/a> <http:\/\/purl.oclc.org\/dataset\/WorldCat<\/a>> ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n
<http:\/\/www.worldcat.org\/title\/-\/oclc\/1016088008#PublicationEvent\/hoboken_nj_john_wiley_&_sons_cop_2017<\/a>>\u00A0\u00A0\u00A0\u00A0a \nschema:PublicationEvent<\/a> ;\u00A0\u00A0\u00A0\nschema:location<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/4094907933#Place\/hoboken_nj<\/a>> ; # Hoboken (NJ)<\/span>\n\u00A0\u00A0\u00A0\nschema:organizer<\/a> <http:\/\/experiment.worldcat.org\/entity\/work\/data\/4094907933#Agent\/john_wiley_&_sons<\/a>> ; # John Wiley & Sons<\/span>\n\u00A0\u00A0\u00A0\nschema:startDate<\/a> \"co. 2017<\/span>\" ;\u00A0\u00A0\u00A0\u00A0.\n\n\n<\/div>\n\n

Content-negotiable representations<\/p>\n