skip to content
Data analytics with Hadoop : an introduction for data scientists Preview this item
ClosePreview this item
Checking...

Data analytics with Hadoop : an introduction for data scientists

Author: Benjamin Bengfort; Jenny Kim
Publisher: Sebastopol, CA : O'Reilly Media, 2016.
Edition/Format:   eBook : Document : English : First editionView all editions and formats
Database:WorldCat
Summary:

Ready to use statistical and machine-learning techniques across large data sets? This practical guide shows you why the Hadoop ecosystem is perfect for the job.

Rating:

(not yet rated) 0 with reviews - Be the first.

Subjects
More like this

 

Find a copy online

Links to this item

Find a copy in the library

&AllPage.SpinnerRetrieving; Finding libraries that hold this item...

Details

Genre/Form: Electronic books
Additional Physical Format: Print version:
Bengfort, Benjamin.
Data Analytics with Hadoop.
[Place of publication not identified] : O'Reilly Media, Incorporated 2015
(OCoLC)948570730
Material Type: Document, Internet resource
Document Type: Internet Resource, Computer File
All Authors / Contributors: Benjamin Bengfort; Jenny Kim
ISBN: 9781491913765 1491913762 9781491913758 1491913754
OCLC Number: 952135791
Notes: Includes index.
Description: 1 online resource : illustrations
Contents: Copyright; Table of Contents; Preface; What to Expect from This Book; Who This Book Is For; How to Read This Book; Overview of Chapters; Programming and Code Examples; GitHub Repository; Executing Distributed Jobs; Permissions and Citation; Feedback and How to Contact Us; Safari® Books Online; How to Contact Us; Acknowledgments; Part I. Introduction to Distributed Computing; Chapter 1. The Age of the Data Product; What Is a Data Product?; Building Data Products at Scale with Hadoop; Leveraging Large Datasets; Hadoop for Data Products; The Data Science Pipeline and the Hadoop Ecosystem Big Data WorkflowsConclusion; Chapter 2. An Operating System for Big Data; Basic Concepts; Hadoop Architecture; A Hadoop Cluster; HDFS; YARN; Working with a Distributed File System; Basic File System Operations; File Permissions in HDFS; Other HDFS Interfaces; Working with Distributed Computation; MapReduce: A Functional Programming Model; MapReduce: Implemented on a Cluster; Beyond a Map and Reduce: Job Chaining; Submitting a MapReduce Job to YARN; Conclusion; Chapter 3. A Framework for Python and Hadoop Streaming; Hadoop Streaming; Computing on CSV Data with Streaming Executing Streaming JobsA Framework for MapReduce with Python; Counting Bigrams; Other Frameworks; Advanced MapReduce; Combiners; Partitioners; Job Chaining; Conclusion; Chapter 4. In-Memory Computing with Spark; Spark Basics; The Spark Stack; Resilient Distributed Datasets; Programming with RDDs; Interactive Spark Using PySpark; Writing Spark Applications; Visualizing Airline Delays with Spark; Conclusion; Chapter 5. Distributed Analysis and Patterns; Computing with Keys; Compound Keys; Keyspace Patterns; Pairs versus Stripes; Design Patterns; Summarization; Indexing; Filtering Toward Last-Mile AnalyticsFitting a Model; Validating Models; Conclusion; Part II. Workflows and Tools for Big Data Science; Chapter 6. Data Mining and Warehousing; Structured Data Queries with Hive; The Hive Command-Line Interface (CLI); Hive Query Language (HQL); Data Analysis with Hive; HBase; NoSQL and Column-Oriented Databases; Real-Time Analytics with HBase; Conclusion; Chapter 7. Data Ingestion; Importing Relational Data with Sqoop; Importing from MySQL to HDFS; Importing from MySQL to Hive; Importing from MySQL to HBase; Ingesting Streaming Data with Flume; Flume Data Flows Ingesting Product Impression Data with FlumeConclusion; Chapter 8. Analytics with Higher-Level APIs; Pig; Pig Latin; Data Types; Relational Operators; User-Defined Functions; Wrapping Up; Spark's Higher-Level APIs; Spark SQL; DataFrames; Conclusion; Chapter 9. Machine Learning; Scalable Machine Learning with Spark; Collaborative Filtering; Classification; Clustering; Conclusion; Chapter 10. Summary: Doing Distributed Data Science; Data Product Lifecycle; Data Lakes; Data Ingestion; Computational Data Stores; Machine Learning Lifecycle; Conclusion
Responsibility: Benjamin Bengfort and Jenny Kim.

Reviews

User-contributed reviews
Retrieving GoodReads reviews...
Retrieving DOGObooks reviews...

Tags

Be the first.
Confirm this request

You may have already requested this item. Please select Ok if you would like to proceed with this request anyway.

Linked Data


Primary Entity

<http://www.worldcat.org/oclc/952135791> # Data analytics with Hadoop : an introduction for data scientists
    a schema:Book, schema:MediaObject, schema:CreativeWork ;
    library:oclcnum "952135791" ;
    library:placeOfPublication <http://id.loc.gov/vocabulary/countries/cau> ;
    rdfs:comment "Warning: This malformed URI has been treated as a string - 'http://images.contentreserve.com/ImageType-100/2858-1/{254FAA77-97A5-4C0A-A386-54FCDBA59A5C}Img100.jpg'" ;
    schema:about <http://experiment.worldcat.org/entity/work/data/3304420658#Topic/computers_data_processing> ; # COMPUTERS / Data Processing
    schema:about <http://dewey.info/class/004.36/e23/> ;
    schema:about <http://experiment.worldcat.org/entity/work/data/3304420658#Topic/cluster_analysis_data_processing> ; # Cluster analysis--Data processing
    schema:about <http://experiment.worldcat.org/entity/work/data/3304420658#Topic/electronic_data_processing_distributed_processing> ; # Electronic data processing--Distributed processing
    schema:about <http://experiment.worldcat.org/entity/work/data/3304420658#CreativeWork/apache_hadoop> ; # Apache Hadoop.
    schema:author <http://experiment.worldcat.org/entity/work/data/3304420658#Person/bengfort_benjamin> ; # Benjamin Bengfort
    schema:author <http://experiment.worldcat.org/entity/work/data/3304420658#Person/kim_jenny> ; # Jenny Kim
    schema:bookEdition "First edition." ;
    schema:bookFormat schema:EBook ;
    schema:datePublished "2016" ;
    schema:description "Copyright; Table of Contents; Preface; What to Expect from This Book; Who This Book Is For; How to Read This Book; Overview of Chapters; Programming and Code Examples; GitHub Repository; Executing Distributed Jobs; Permissions and Citation; Feedback and How to Contact Us; Safari® Books Online; How to Contact Us; Acknowledgments; Part I. Introduction to Distributed Computing; Chapter 1. The Age of the Data Product; What Is a Data Product?; Building Data Products at Scale with Hadoop; Leveraging Large Datasets; Hadoop for Data Products; The Data Science Pipeline and the Hadoop Ecosystem"@en ;
    schema:exampleOfWork <http://worldcat.org/entity/work/id/3304420658> ;
    schema:genre "Electronic books"@en ;
    schema:inLanguage "en" ;
    schema:isSimilarTo <http://www.worldcat.org/oclc/948570730> ;
    schema:name "Data analytics with Hadoop : an introduction for data scientists"@en ;
    schema:productID "952135791" ;
    schema:url <http://lib.myilibrary.com?id=927555> ;
    schema:url <http://proquest.safaribooksonline.com/?fpi=9781491913734> ;
    schema:url <http://samples.overdrive.com/?crid=254faa77-97a5-4c0a-a386-54fcdba59a5c&.epub-sample.overdrive.com> ;
    schema:url <http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=1244937> ;
    schema:url <http://proquest.safaribooksonline.com/?uiCode=ohlink&xmlId=9781491913734> ;
    schema:url <http://public.eblib.com/choice/PublicFullRecord.aspx?p=4537258> ;
    schema:url <https://www.overdrive.com/search?q=254FAA77-97A5-4C0A-A386-54FCDBA59A5C> ;
    schema:url <http://proxy.ohiolink.edu:9099/login?url=http://proquest.safaribooksonline.com/?uiCode=ohlink&xmlId=9781491913734> ;
    schema:url "http://images.contentreserve.com/ImageType-100/2858-1/{254FAA77-97A5-4C0A-A386-54FCDBA59A5C}Img100.jpg" ;
    schema:url <http://site.ebrary.com/lib/iupui/docDetail.action?docID=11218023> ;
    schema:workExample <http://worldcat.org/isbn/9781491913765> ;
    schema:workExample <http://worldcat.org/isbn/9781491913758> ;
    wdrs:describedby <http://www.worldcat.org/title/-/oclc/952135791> ;
    .


Related Entities

<http://experiment.worldcat.org/entity/work/data/3304420658#Person/bengfort_benjamin> # Benjamin Bengfort
    a schema:Person ;
    schema:familyName "Bengfort" ;
    schema:givenName "Benjamin" ;
    schema:name "Benjamin Bengfort" ;
    .

<http://experiment.worldcat.org/entity/work/data/3304420658#Person/kim_jenny> # Jenny Kim
    a schema:Person ;
    schema:familyName "Kim" ;
    schema:givenName "Jenny" ;
    schema:name "Jenny Kim" ;
    .

<http://experiment.worldcat.org/entity/work/data/3304420658#Topic/cluster_analysis_data_processing> # Cluster analysis--Data processing
    a schema:Intangible ;
    schema:name "Cluster analysis--Data processing"@en ;
    .

<http://experiment.worldcat.org/entity/work/data/3304420658#Topic/computers_data_processing> # COMPUTERS / Data Processing
    a schema:Intangible ;
    schema:name "COMPUTERS / Data Processing"@en ;
    .

<http://experiment.worldcat.org/entity/work/data/3304420658#Topic/electronic_data_processing_distributed_processing> # Electronic data processing--Distributed processing
    a schema:Intangible ;
    schema:name "Electronic data processing--Distributed processing"@en ;
    .

<http://site.ebrary.com/lib/iupui/docDetail.action?docID=11218023>
    rdfs:comment "Available on campus and off-campus with authorized logon" ;
    .

<http://worldcat.org/isbn/9781491913758>
    a schema:ProductModel ;
    schema:isbn "1491913754" ;
    schema:isbn "9781491913758" ;
    .

<http://worldcat.org/isbn/9781491913765>
    a schema:ProductModel ;
    schema:isbn "1491913762" ;
    schema:isbn "9781491913765" ;
    .

<http://www.worldcat.org/oclc/948570730>
    a schema:CreativeWork ;
    rdfs:label "Data Analytics with Hadoop." ;
    schema:description "Print version:" ;
    schema:isSimilarTo <http://www.worldcat.org/oclc/952135791> ; # Data analytics with Hadoop : an introduction for data scientists
    .


Content-negotiable representations

Close Window

Please sign in to WorldCat 

Don't have an account? You can easily create a free account.