Cover image for Handbook of Statistical Analysis and Data Mining Applications.
Handbook of Statistical Analysis and Data Mining Applications.
Title:
Handbook of Statistical Analysis and Data Mining Applications.
Author:
Nisbet, Robert.
ISBN:
9780080912035
Personal Author:
Physical Description:
1 online resource (859 pages)
Contents:
Front Cover -- Handbook of Statistical Analysis and Data Mining Applications -- Copyright Page -- Table of Contents -- Foreword 1 -- Foreword 2 -- Preface -- Introduction -- List of Tutorials by Guest Authors -- Part 1: History of Phases of Data Analysis, Basic Theory, and the Data Mining Process -- Chapter 1: The Background for Data Mining Practice -- Preamble -- A Short History of Statistics and Data Mining -- Modern Statistics: A Duality? -- Assumptions of the Parametric Model -- Two Views of Reality -- Aristotle -- Plato -- The Rise of Modern Statistical Analysis: The Second Generation -- Data, Data Everywhere -- Machine Learning Methods: The Third Generation -- Statistical Learning Theory: The Fourth Generation -- Postscript -- References -- Chapter 2: Theoretical Considerations for Data Mining -- Preamble -- The Scientific Method -- What Is Data Mining? -- A Theoretical Framework for the Data Mining Process -- Microeconomic Approach -- Inductive Database Approach -- Strengths of the Data Mining Process -- Customer-Centric Versus Account-Centric: A New Way to Look at Your Data -- The Physical Data Mart -- The Virtual Data Mart -- Householded Databases -- The Data Paradigm Shift -- Creation of the CAR -- Major Activities of Data Mining -- Major Challenges of Data Mining -- Examples of Data Mining Applications -- Major Issues in Data Mining -- General Requirements for Success in a Data Mining Project -- Example of a Data Mining Project: Classify a Bat's Species by Its Sound -- The Importance of Domain Knowledge -- Postscript -- Why Did Data Mining Arise? -- Some Caveats with Data Mining Solutions -- References -- Chapter 3: The Data Mining Process -- Preamble -- The Science of Data Mining -- The Approach to Understanding and Problem Solving -- CRISP-DM -- Business Understanding (Mostly Art).

Define the Business Objectives of the Data Mining Model -- Assess the Business Environment for Data Mining -- Formulate the Data Mining Goals and Objectives -- Data Understanding (Mostly Science) -- Data Acquisition -- Data Integration -- Data Description -- Data Quality Assessment -- Data Preparation (A Mixture of Art and Science) -- Modeling (A Mixture of Art and Science) -- Steps in the Modeling Phase of CRISP-DM -- Deployment (Mostly Art) -- Closing the Information Loop (Art) -- The Art of Data Mining -- Artistic Steps in Data Mining -- Postscript -- References -- Chapter 4: Data Understanding and Preparation -- Preamble -- Activities of Data Understanding and Preparation -- Definitions -- Issues That Should be Resolved -- Basic Issues That Must Be Resolved in Data Understanding -- Basic Issues That Must Be Resolved in Data Preparation -- Data Understanding -- Data Acquisition -- Data Extraction -- Data Description -- Data Assessment -- Data Profiling -- Data Cleansing -- Data Transformation -- Data Imputation -- Data Weighting and Balancing -- Data Filtering and Smoothing -- Data Abstraction -- Data Reduction -- Data Sampling -- Data Discretization -- Data Derivation -- Postscript -- References -- Chapter 5: Feature Selection -- Preamble -- Variables as Features -- Types of Feature Selections -- Feature Ranking Methods -- Gini Index -- Bi-variate Methods -- Multivariate Methods -- Complex Methods -- Subset Selection Methods -- The Other Two Ways of Using Feature Selection in STATISTICA: Interactive Workspace -- STATISTICA DMRecipe Method -- Postscript -- References -- Chapter 6: Accessory Tools for Doing Data Mining -- Preamble -- Data Access Tools -- Structured Query Language (SQL) Tools -- Extract, Transform, and Load (ETL) Capabilities -- Data Exploration Tools -- Basic Descriptive Statistics.

Combining Groups (Classes) for Predictive Data Mining -- Slicing/Dicing and Drilling Down into Data Sets/Results Spreadsheets -- Modeling Management Tools -- Data Miner Workspace Templates -- Modeling Analysis Tools -- Feature Selection -- Importance Plots of Variables -- In-Place Data Processing (IDP) -- Example: The IDP Facility of STATISTICA Data Miner -- How to Use the SQL -- Rapid Deployment of Predictive Models -- Model Monitors -- Postscript -- Bibliography -- Part 2: The Algorithms in Data Mining and Text Mining, the Organization of the Three most common Data Mining Tools, and Selected Speci... -- Chapter 7: Basic Algorithms for Data Mining: A Brief Overview -- Preamble -- STATISTICA Data Miner Recipe (DMRecipe) -- KXEN -- Basic Data Mining Algorithms -- Association Rules -- Neural Networks -- Radial Basis Function (RBF) Networks -- Automated Neural Nets -- Generalized Additive Models (GAMs) -- Outputs of GAMs -- Interpreting Results of GAMs -- Classification and Regression Trees (CART) -- Recursive Partitioning -- Pruning Trees -- General Comments about CART for Statisticians -- Advantages of CART over Other Decision Trees -- Uses of CART -- General Chaid Models -- Advantages of CHAID -- Disadvantages of CHAID -- Generalized EM and k-Means Cluster Analysis-An Overview -- k-Means Clustering -- EM Cluster Analysis -- Processing Steps of the EM Algorithm -- V-fold Cross-Validation as Applied to Clustering -- Postscript -- References -- Bibliography -- Chapter 8: Advanced Algorithms for Data Mining -- Preamble -- Advanced Data Mining Algorithms -- Interactive Trees -- Multivariate Adaptive Regression Splines (MARSplines) -- Statistical Learning Theory: Support Vector Machines -- Sequence, Association, and Link Analyses -- Independent Components Analysis (ICA) -- Kohonen Networks -- Characteristics of a Kohonen Network.

Quality Control Data Mining and Root Cause Analysis -- Image and Object Data Mining: Visualization and 3D-Medical and Other Scanning Imaging -- Postscript -- References -- Chapter 9: Text Mining and Natural Language Processing -- Preamble -- The Development of Text Mining -- A Practical Example: NTSB -- Goals of Text Mining of NTSB Accident Reports -- Drilling into Words of Interest -- Means with Error Plots -- Feature Selection Tool -- A Conclusion: Losing Control of the Aircraft in Bad Weather Is Often Fatal -- Summary -- Text Mining Concepts Used in Conducting Text Mining Studies -- Postscript -- References -- Chapter 10: The Three Most Common Data Mining Software Tools -- Preamble -- SPSS Clementine Overview -- Overall Organization of Clementine Components -- Organization of the Clementine Interface -- Clementine Interface Overview -- Setting the Default Directory -- SuperNodes -- Execution of Streams -- SAS-Enterprise Miner (SAS-EM) Overview -- Overall Organization of SAS-EM Version 5.3 Components -- Layout of the SAS-Enterprise Miner Window -- Various SAS-EM Menus, Dialogs, and Windows Useful During the Data Mining Process -- Software Requirements to Run SAS-EM 5.3 Software -- STATISTICA Data Miner, QC-Miner, and Text Miner Overview -- Overall Organization and Use of STATISTICA Data Miner -- Three Formats for Doing Data Mining in STATISTICA -- Postscript -- References -- Chapter 11: Classification -- Preamble -- What Is Classification? -- Initial Operations in Classification -- Major Issues with Classification -- What Is the Nature of the Data Set to Be Classified? -- How Accurate Does the Classification Have to Be? -- How Understandable Do the Classes Have to Be? -- Assumptions of Classification Procedures -- Numerical Variables Operate Best -- No Missing Values -- Variables Are Linear and Independent in Their Effects on the Target Variable.

Methods for Classification -- Nearest-Neighbor Classifiers -- Analyzing Imbalanced Data Sets with Machine Learning Programs -- CHAID -- Random Forests and Boosted Trees -- Logistic Regression -- Neural Networks -- Naive Bayesian Classifiers -- What Is the Best Algorithm for Classification? -- Postscript -- References -- Chapter 12: Numerical Prediction -- Preamble -- Linear Response Analysis and the Assumptions of the Parametric Model -- Parametric Statistical Analysis -- Assumptions of the Parametric Model -- The Assumption of Independency -- The Assumption of Normality -- Normality and the Central Limit Theorem -- The Assumption of Linearity -- Linear Regression -- Methods for Handling Variable Interactions in Linear Regression -- Collinearity among Variables in a Linear Regression -- The Concept of the Response Surface -- Generalized Linear Models (GLMs) -- Methods for Analyzing Nonlinear Relationships -- Nonlinear Regression and Estimation -- Logit and Probit Regression -- Poisson Regression -- Exponential Distributions -- Piecewise Linear Regression -- Data Mining and Machine Learning Algorithms Used in Numerical Prediction -- Numerical Prediction with C&RT -- Model Results Available in C&RT -- Advantages of Classification and Regression Trees (C&RT) Methods -- General Issues Related to C&RT -- Application to Mixed Models -- Neural Nets for Prediction -- Manual or Automated Operation? -- Structuring the Network for Manual Operation -- Modern Neural Nets Are "Gray Boxes" -- Example of Automated Neural Net Results -- Support Vector Machines (SVMs) and Other Kernel Learning Algorithms -- Postscript -- References -- Chapter 13: Model Evaluation and Enhancement -- Preamble -- Introduction -- Model Evaluation -- Splitting Data -- Avoiding Overfit Through Complexity Regularization -- Error Metric: Estimation -- Error Metric: Classification.

Error Metric: Ranking.
Abstract:
The Handbook of Statistical Analysis and Data Mining Applications is a comprehensive professional reference book that guides business analysts, scientists, engineers and researchers (both academic and industrial) through all stages of data analysis, model building and implementation. The Handbook helps one discern the technical and business problem, understand the strengths and weaknesses of modern data mining algorithms, and employ the right statistical methods for practical application. Use this book to address massive and complex datasets with novel statistical approaches and be able to objectively evaluate analyses and solutions. It has clear, intuitive explanations of the principles and tools for solving problems using modern analytic techniques, and discusses their application to real problems, in ways accessible and beneficial to practitioners across industries - from science and engineering, to medicine, academia and commerce. This handbook brings together, in a single resource, all the information a beginner will need to understand the tools and issues in data mining to build successful data mining solutions. Written "By Practitioners for Practitioners" Non-technical explanations build understanding without jargon and equations Tutorials in numerous fields of study provide step-by-step instruction on how to use supplied tools to build models Practical advice from successful real-world implementations Includes extensive case studies, examples, MS PowerPoint slides and datasets CD-DVD with valuable fully-working  90-day software included:  "Complete Data Miner - QC-Miner - Text Miner" bound with book.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: