Cover image for Data Science and Big Data Analytics : Discovering, Analyzing, Visualizing and Presenting Data.
Data Science and Big Data Analytics : Discovering, Analyzing, Visualizing and Presenting Data.
Title:
Data Science and Big Data Analytics : Discovering, Analyzing, Visualizing and Presenting Data.
Author:
Services, EMC Education.
ISBN:
9781118876220
Personal Author:
Edition:
1st ed.
Physical Description:
1 online resource (435 pages)
Contents:
Cover -- Title Page -- Copyright -- Contents -- Introduction -- Chapter 1 Introduction to Big Data Analytics -- 1.1 Big Data Overview -- 1.1.1 Data Structures -- 1.1.2 Analyst Perspective on Data Repositories -- 1.2 State of the Practice in Analytics -- 1.2.1 BI Versus Data Science -- 1.2.2 Current Analytical Architecture -- 1.2.3 Drivers of Big Data -- 1.2.4 Emerging Big Data Ecosystem and a New Approach to Analytics -- 1.3 Key Roles for the New Big Data Ecosystem -- 1.4 Examples of Big Data Analytics -- Summary -- Exercises -- Bibliography -- Chapter 2 Data Analytics Lifecycle -- 2.1 Data Analytics Lifecycle Overview -- 2.1.1 Key Roles for a Successful Analytics Project -- 2.1.2 Background and Overview of Data Analytics Lifecycle -- 2.2 Phase 1: Discovery -- 2.2.1 Learning the Business Domain -- 2.2.2 Resources -- 2.2.3 Framing the Problem -- 2.2.4 Identifying Key Stakeholders -- 2.2.5 Interviewing the Analytics Sponsor -- 2.2.6 Developing Initial Hypotheses -- 2.2.7 Identifying Potential Data Sources -- 2.3 Phase 2: Data Preparation -- 2.3.1 Preparing the Analytic Sandbox -- 2.3.2 Performing ETLT -- 2.3.3 Learning About the Data -- 2.3.4 Data Conditioning -- 2.3.5 Survey and Visualize -- 2.3.6 Common Tools for the Data Preparation Phase -- 2.4 Phase 3: Model Planning -- 2.4.1 Data Exploration and Variable Selection -- 2.4.2 Model Selection -- 2.4.3 Common Tools for the Model Planning Phase -- 2.5 Phase 4: Model Building -- 2.5.1 Common Tools for the Model Building Phase -- 2.6 Phase 5: Communicate Results -- 2.7 Phase 6: Operationalize -- 2.8 Case Study: Global Innovation Network and Analysis (GINA) -- 2.8.1 Phase 1: Discovery -- 2.8.2 Phase 2: Data Preparation -- 2.8.3 Phase 3: Model Planning -- 2.8.4 Phase 4: Model Building -- 2.8.5 Phase 5: Communicate Results -- 2.8.6 Phase 6: Operationalize -- Summary -- Exercises.

Bibliography -- Chapter 3 Review of Basic Data Analytic Methods Using R -- 3.1 Introduction to R -- 3.1.1 R Graphical User Interfaces -- 3.1.2 Data Import and Export -- 3.1.3 Attribute and Data Types -- 3.1.4 Descriptive Statistics -- 3.2 Exploratory Data Analysis -- 3.2.1 Visualization Before Analysis -- 3.2.2 Dirty Data -- 3.2.3 Visualizing a Single Variable -- 3.2.4 Examining Multiple Variables -- 3.2.5 Data Exploration Versus Presentation -- 3.3 Statistical Methods for Evaluation -- 3.3.1 Hypothesis Testing -- 3.3.2 Difference of Means -- 3.3.3 Wilcoxon Rank-Sum Test -- 3.3.4 Type I and Type II Errors -- 3.3.5 Power and Sample Size -- 3.3.6 ANOVA -- Summary -- Exercises -- Bibliography -- Chapter 4 Advanced Analytical Theory and Methods: Clustering -- 4.1 Overview of Clustering -- 4.2 K-means -- 4.2.1 Use Cases -- 4.2.2 Overview of the Method -- 4.2.3 Determining the Number of Clusters -- 4.2.4 Diagnostics -- 4.2.5 Reasons to Choose and Cautions -- 4.3 Additional Algorithms -- Summary -- Exercises -- Bibliography -- Chapter 5 Advanced Analytical Theory and Methods: Association Rules -- 5.1 Overview -- 5.2 Apriori Algorithm -- 5.3 Evaluation of Candidate Rules -- 5.4 Applications of Association Rules -- 5.5 An Example: Transactions in a Grocery Store -- 5.5.1 The Groceries Dataset -- 5.5.2 Frequent Itemset Generation -- 5.5.3 Rule Generation and Visualization -- 5.6 Validation and Testing -- 5.7 Diagnostics -- Summary -- Exercises -- Bibliography -- Chapter 6 Advanced Analytical Theory and Methods: Regression -- 6.1 Linear Regression -- 6.1.1 Use Cases -- 6.1.2 Model Description -- 6.1.3 Diagnostics -- 6.2 Logistic Regression -- 6.2.1 Use Cases -- 6.2.2 Model Description -- 6.2.3 Diagnostics -- 6.3 Reasons to Choose and Cautions -- 6.4 Additional Regression Models -- Summary -- Exercises.

Chapter 7 Advanced Analytical Theory and Methods: Classification -- 7.1 Decision Trees -- 7.1.1 Overview of a Decision Tree -- 7.1.2 The General Algorithm -- 7.1.3 Decision Tree Algorithms -- 7.1.4 Evaluating a Decision Tree -- 7.1.5 Decision Trees in R -- 7.2 Naïve Bayes -- 7.2.1 Bayes' Theorem -- 7.2.2 Naïve Bayes Classifier -- 7.2.3 Smoothing -- 7.2.4 Diagnostics -- 7.2.5 Naïve Bayes in R -- 7.3 Diagnostics of Classifiers -- 7.4 Additional Classification Methods -- Summary -- Exercises -- Bibliography -- Chapter 8 Advanced Analytical Theory and Methods: Time Series Analysis -- 8.1 Overview of Time Series Analysis -- 8.1.1 Box-Jenkins Methodology -- 8.2 ARIMA Model -- 8.2.1 Autocorrelation Function (ACF) -- 8.2.2 Autoregressive Models -- 8.2.3 Moving Average Models -- 8.2.4 ARMA and ARIMA Models -- 8.2.5 Building and Evaluating an ARIMA Model -- 8.2.6 Reasons to Choose and Cautions -- 8.3 Additional Methods -- Summary -- Exercises -- Chapter 9 Advanced Analytical Theory and Methods: Text Analysis -- 9.1 Text Analysis Steps -- 9.2 A Text Analysis Example -- 9.3 Collecting Raw Text -- 9.4 Representing Text -- 9.5 Term Frequency-Inverse Document Frequency (TFIDF) -- 9.6 Categorizing Documents by Topics -- 9.7 Determining Sentiments -- 9.8 Gaining Insights -- Summary -- Exercises -- Bibliography -- Chapter 10 Advanced Analytics-Technology and Tools: MapReduce and Hadoop -- 10.1 Analytics for Unstructured Data -- 10.1.1 Use Cases -- 10.1.2 MapReduce -- 10.1.3 Apache Hadoop -- 10.2 The Hadoop Ecosystem -- 10.2.1 Pig -- 10.2.2 Hive -- 10.2.3 HBase -- 10.2.4 Mahout -- 10.3 NoSQL -- Summary -- Exercises -- Bibliography -- Chapter 11 Advanced Analytics-Technology and Tools: In-Database Analytics -- 11.1 SQL Essentials -- 11.1.1 Joins -- 11.1.2 Set Operations -- 11.1.3 Grouping Extensions -- 11.2 In-Database Text Analysis.

11.3 Advanced SQL -- 11.3.1 Window Functions -- 11.3.2 User-Defined Functions and Aggregates -- 11.3.3 Ordered Aggregates -- 11.3.4 MADlib -- Summary -- Exercises -- Bibliography -- Chapter 12 The Endgame, or Putting It All Together -- 12.1 Communicating and Operationalizing an Analytics Project -- 12.2 Creating the Final Deliverables -- 12.2.1 Developing Core Material for Multiple Audiences -- 12.2.2 Project Goals -- 12.2.3 Main Findings -- 12.2.4 Approach -- 12.2.5 Model Description -- 12.2.6 Key Points Supported with Data -- 12.2.7 Model Details -- 12.2.8 Recommendations -- 12.2.9 Additional Tips on Final Presentation -- 12.2.10 Providing Technical Specifications and Code -- 12.3 Data Visualization Basics -- 12.3.1 Key Points Supported with Data -- 12.3.2 Evolution of a Graph -- 12.3.3 Common Representation Methods -- 12.3.4 How to Clean Up a Graphic -- 12.3.5 Additional Considerations -- Summary -- Exercises -- References and Further Reading -- Bibliography -- Index -- EULA.
Abstract:
Data Science and Big Data Analytics is about harnessing the power of data for new insights. The book covers the breadth of activities and methods and tools that Data Scientists use. The content focuses on concepts, principles and practical applications that are applicable to any industry and technology environment, and the learning is supported and explained with examples that you can replicate using open-source software. This book will help you: Become a contributor on a data science team Deploy a structured lifecycle approach to data analytics problems Apply appropriate analytic techniques and tools to analyzing big data Learn how to tell a compelling story with data to drive business action Prepare for EMC Proven Professional Data Science Certification.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: