skip to content
Practical Enterprise Data Lake Insights : handle data-driven challenges in an Enterprise Big Data Lake Preview this item
ClosePreview this item
Checking...

Practical Enterprise Data Lake Insights : handle data-driven challenges in an Enterprise Big Data Lake

Author: Saurabh Gupta; Venkata Giri
Publisher: [Berkeley, CA] : Apress, 2018.
Edition/Format:   eBook : Document : EnglishView all editions and formats
Summary:
Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues. When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data. Starting from sourcing data into the Hadoop ecosystem, you will go through  Read more...
Rating:

(not yet rated) 0 with reviews - Be the first.

Subjects
More like this

Find a copy online

Links to this item

Find a copy in the library

&AllPage.SpinnerRetrieving; Finding libraries that hold this item...

Details

Genre/Form: Electronic books
Additional Physical Format: Printed edition:
Material Type: Document, Internet resource
Document Type: Internet Resource, Computer File
All Authors / Contributors: Saurabh Gupta; Venkata Giri
ISBN: 9781484235225 1484235223 1484235215 9781484235218
OCLC Number: 1042329316
Description: 1 online resource
Contents: Intro; Table of Contents; About the Authors; About the Technical Reviewer; Acknowledgments; Foreword; Chapter 1: Introduction to Enterprise Data Lakes; Data explosion: the beginning; Big data ecosystem; Hadoop and MapReduce --
Early days; Evolution of Hadoop; History of Data Lake; Data Lake: the concept; Data lake architecture; Why Data Lake?; Data Lake Characteristics; Data lake vs. Data warehouse; How to achieve success with Data Lake?; Data governance and data operations; Data democratization with data lake; Fast Data --
Life beyond Big Data; Conclusion. Chapter 2: Data lake ingestion strategiesWhat is data ingestion?; Understand the data sources; Structured vs. Semi-structured vs. Unstructured data; Data ingestion framework parameters; ETL vs. ELT; Big Data Integration with Data Lake; Hadoop Distributed File System (HDFS); Copy files directly into HDFS; Batched data ingestion; Challenges and design considerations; Design considerations; Commercial ETL tools; Real-time ingestion; CDC design considerations; Example of CDC pipeline: Databus, LinkedIn's open-source solution; Apache Sqoop; Sqoop 1; Sqoop 2; How Sqoop works? Sqoop design considerationsNative ingestion utilities; Oracle copyToBDA; Greenplum gphdfs utility; Data transfer from Greenplum to using gpfdist; Ingest unstructured data into Hadoop; Apache Flume; Tiered architecture for convergent flow of events; Features and design considerations; Conclusion; Chapter 3: Capture Streaming Data with Change-Data-Capture; Change Data Capture Concepts; Strategies for Data Capture; Retention and Replay; Retention Period; Types of CDC; Incremental; Bulk; Hybrid; CDC --
Trade-offs; CDC Tools; Challenges; Downstream Propagation; Use Case. Centralization of Change DataAnalyzing a Centralized Data Store; Metadata: Data about Data; Structure of Data; Privacy/Sensitivity Information; Special Fields; Data Formats; Delimited Format; Avro File Format; Consumption and Checkpointing; Simple Checkpoint Mechanism; Parallelism; Merging and Consolidation; Design Considerations for Merge and Consolidate; Data Quality; Challenges; Design Aspects; Operational Aspects; Publishing to Kafka; Schema and Data; Sample Schema; Schema Repository; Multiple Topics and Partitioning; Sizing and Scaling; Tools; Conclusion. Chapter 4: Data Processing Strategies in Data LakesMapReduce Processing Framework; Motivation: Why MapReduce?; MapReduce V1 Refresher and Design Considerations; Yet Another Resource Negotiator --
YARN; YARN concepts; Hive; Hive --
Quick Refresher; Hive Components; Hive Metastore (a.k.a. HCatalog); Hive --
Design Considerations; Hive LLAP; Apache Pig; Pig Execution Architecture; Apache Spark; Why Spark?; Resilient Distributed Datasets (RDD); RDD Runtime Components; RDD Composition; Datasets and DataFrames; Bucketing, Sorting, and Partitioning; Deployment Modes of Spark Application.
Responsibility: Saurabh Gupta, Venkata Giri.

Abstract:

Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues. When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data. Starting from sourcing data into the Hadoop ecosystem, you will go through stages that can bring up tough questions such as data processing, data querying, and security. Concepts such as change data capture and data streaming are covered. The book takes an end-to-end solution approach in a data lake environment that includes data security, high availability, data processing, data streaming, and more. Each chapter includes application of a concept, code snippets, and use case demonstrations to provide you with a practical approach. You will learn the concept, scope, application, and starting point. What You'll Learn: Get to know data lake architecture and design principles Implement data capture and streaming strategies Implement data processing strategies in Hadoop Understand the data lake security framework and availability model.

Reviews

User-contributed reviews
Retrieving GoodReads reviews...
Retrieving DOGObooks reviews...

Tags

Be the first.
Confirm this request

You may have already requested this item. Please select Ok if you would like to proceed with this request anyway.

Linked Data


Primary Entity

<http://www.worldcat.org/oclc/1042329316> # Practical Enterprise Data Lake Insights : handle data-driven challenges in an Enterprise Big Data Lake
    a schema:Book, schema:CreativeWork, schema:MediaObject ;
    library:oclcnum "1042329316" ;
    library:placeOfPublication <http://id.loc.gov/vocabulary/countries/cau> ;
    schema:about <http://dewey.info/class/004.36/e23/> ;
    schema:about <http://experiment.worldcat.org/entity/work/data/5311859577#Topic/business_mathematics_&_systems> ; # Business mathematics & systems
    schema:about <http://experiment.worldcat.org/entity/work/data/5311859577#Topic/electronic_data_processing_distributed_processing_management> ; # Electronic data processing--Distributed processing--Management
    schema:about <http://experiment.worldcat.org/entity/work/data/5311859577#Topic/information_storage_and_retrieval_systems> ; # Information storage and retrieval systems
    schema:about <http://experiment.worldcat.org/entity/work/data/5311859577#Topic/information_technology_general_issues> ; # Information technology: general issues
    schema:about <http://experiment.worldcat.org/entity/work/data/5311859577#Topic/computers_data_processing> ; # COMPUTERS--Data Processing
    schema:about <http://experiment.worldcat.org/entity/work/data/5311859577#Topic/databases> ; # Databases
    schema:about <http://experiment.worldcat.org/entity/work/data/5311859577#Topic/big_data> ; # Big data
    schema:author <http://experiment.worldcat.org/entity/work/data/5311859577#Person/giri_venkata> ; # Venkata Giri
    schema:author <http://experiment.worldcat.org/entity/work/data/5311859577#Person/gupta_saurabh> ; # Saurabh Gupta
    schema:bookFormat schema:EBook ;
    schema:datePublished "2018" ;
    schema:description "Intro; Table of Contents; About the Authors; About the Technical Reviewer; Acknowledgments; Foreword; Chapter 1: Introduction to Enterprise Data Lakes; Data explosion: the beginning; Big data ecosystem; Hadoop and MapReduce -- Early days; Evolution of Hadoop; History of Data Lake; Data Lake: the concept; Data lake architecture; Why Data Lake?; Data Lake Characteristics; Data lake vs. Data warehouse; How to achieve success with Data Lake?; Data governance and data operations; Data democratization with data lake; Fast Data -- Life beyond Big Data; Conclusion."@en ;
    schema:description "Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues. When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data. Starting from sourcing data into the Hadoop ecosystem, you will go through stages that can bring up tough questions such as data processing, data querying, and security. Concepts such as change data capture and data streaming are covered. The book takes an end-to-end solution approach in a data lake environment that includes data security, high availability, data processing, data streaming, and more. Each chapter includes application of a concept, code snippets, and use case demonstrations to provide you with a practical approach. You will learn the concept, scope, application, and starting point. What You'll Learn: Get to know data lake architecture and design principles Implement data capture and streaming strategies Implement data processing strategies in Hadoop Understand the data lake security framework and availability model."@en ;
    schema:exampleOfWork <http://worldcat.org/entity/work/id/5311859577> ;
    schema:genre "Electronic books"@en ;
    schema:inLanguage "en" ;
    schema:isSimilarTo <http://worldcat.org/entity/work/data/5311859577#CreativeWork/> ;
    schema:name "Practical Enterprise Data Lake Insights : handle data-driven challenges in an Enterprise Big Data Lake"@en ;
    schema:productID "1042329316" ;
    schema:url <http://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9781484235225> ;
    schema:url <https://public.ebookcentral.proquest.com/choice/publicfullrecord.aspx?p=5438674> ;
    schema:url <https://link.springer.com/book/10.1007/978-1-4842-3521-8> ;
    schema:url <https://link.springer.com/book/10.1007/978-1-4842-3522-5> ;
    schema:url <https://doi.org/10.1007/978-1-4842-3522-5> ;
    schema:url <https://www.books24x7.com/marc.asp?bookid=142747> ;
    schema:url <https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=1840106> ;
    schema:url <http://dx.doi.org/10.1007/978-1-4842-3522-5> ;
    schema:url <http://ezproxy.library.yorku.ca/sso/skillport?context=142747> ;
    schema:url <https://proquest.safaribooksonline.com/9781484235225> ;
    schema:workExample <http://dx.doi.org/10.1007/978-1-4842-3522-5> ;
    schema:workExample <http://worldcat.org/isbn/9781484235225> ;
    schema:workExample <http://worldcat.org/isbn/9781484235218> ;
    umbel:isLike <http://bnb.data.bl.uk/id/resource/GBB8M4619> ;
    wdrs:describedby <http://www.worldcat.org/title/-/oclc/1042329316> ;
    .


Related Entities

<http://experiment.worldcat.org/entity/work/data/5311859577#Person/giri_venkata> # Venkata Giri
    a schema:Person ;
    schema:familyName "Giri" ;
    schema:givenName "Venkata" ;
    schema:name "Venkata Giri" ;
    .

<http://experiment.worldcat.org/entity/work/data/5311859577#Person/gupta_saurabh> # Saurabh Gupta
    a schema:Person ;
    schema:familyName "Gupta" ;
    schema:givenName "Saurabh" ;
    schema:name "Saurabh Gupta" ;
    .

<http://experiment.worldcat.org/entity/work/data/5311859577#Topic/business_mathematics_&_systems> # Business mathematics & systems
    a schema:Intangible ;
    schema:name "Business mathematics & systems"@en ;
    .

<http://experiment.worldcat.org/entity/work/data/5311859577#Topic/computers_data_processing> # COMPUTERS--Data Processing
    a schema:Intangible ;
    schema:name "COMPUTERS--Data Processing"@en ;
    .

<http://experiment.worldcat.org/entity/work/data/5311859577#Topic/electronic_data_processing_distributed_processing_management> # Electronic data processing--Distributed processing--Management
    a schema:Intangible ;
    schema:name "Electronic data processing--Distributed processing--Management"@en ;
    .

<http://experiment.worldcat.org/entity/work/data/5311859577#Topic/information_storage_and_retrieval_systems> # Information storage and retrieval systems
    a schema:Intangible ;
    schema:name "Information storage and retrieval systems"@en ;
    .

<http://experiment.worldcat.org/entity/work/data/5311859577#Topic/information_technology_general_issues> # Information technology: general issues
    a schema:Intangible ;
    schema:name "Information technology: general issues"@en ;
    .

<http://worldcat.org/entity/work/data/5311859577#CreativeWork/>
    a schema:CreativeWork ;
    schema:description "Printed edition:" ;
    schema:isSimilarTo <http://www.worldcat.org/oclc/1042329316> ; # Practical Enterprise Data Lake Insights : handle data-driven challenges in an Enterprise Big Data Lake
    .

<http://worldcat.org/isbn/9781484235218>
    a schema:ProductModel ;
    schema:isbn "1484235215" ;
    schema:isbn "9781484235218" ;
    .

<http://worldcat.org/isbn/9781484235225>
    a schema:ProductModel ;
    schema:isbn "1484235223" ;
    schema:isbn "9781484235225" ;
    .

<http://www.worldcat.org/title/-/oclc/1042329316>
    a genont:InformationResource, genont:ContentTypeGenericResource ;
    schema:about <http://www.worldcat.org/oclc/1042329316> ; # Practical Enterprise Data Lake Insights : handle data-driven challenges in an Enterprise Big Data Lake
    schema:dateModified "2019-10-18" ;
    void:inDataset <http://purl.oclc.org/dataset/WorldCat> ;
    .


Content-negotiable representations

Close Window

Please sign in to WorldCat 

Don't have an account? You can easily create a free account.