
Discovering Knowledge in Data : An Introduction to Data Mining.
Title:
Discovering Knowledge in Data : An Introduction to Data Mining.
Author:
Larose, Daniel T.
ISBN:
9780471687535
Personal Author:
Edition:
1st ed.
Physical Description:
1 online resource (336 pages)
Contents:
DISCOVERING KNOWLEDGE IN DATA -- CONTENTS -- PREFACE -- 1 INTRODUCTION TO DATA MINING -- What Is Data Mining? -- Why Data Mining? -- Need for Human Direction of Data Mining -- Cross-Industry Standard Process: CRISP-DM -- Case Study 1: Analyzing Automobile Warranty Claims: Example of the CRISP-DM Industry Standard Process in Action -- Fallacies of Data Mining -- What Tasks Can Data Mining Accomplish? -- Description -- Estimation -- Prediction -- Classification -- Clustering -- Association -- Case Study 2: Predicting Abnormal Stock Market Returns Using Neural Networks -- Case Study 3: Mining Association Rules from Legal Databases -- Case Study 4: Predicting Corporate Bankruptcies Using Decision Trees -- Case Study 5: Profiling the Tourism Market Using k-Means Clustering Analysis -- References -- Exercises -- 2 DATA PREPROCESSING -- Why Do We Need to Preprocess the Data? -- Data Cleaning -- Handling Missing Data -- Identifying Misclassifications -- Graphical Methods for Identifying Outliers -- Data Transformation -- Min-Max Normalization -- Z-Score Standardization -- Numerical Methods for Identifying Outliers -- References -- Exercises -- 3 EXPLORATORY DATA ANALYSIS -- Hypothesis Testing versus Exploratory Data Analysis -- Getting to Know the Data Set -- Dealing with Correlated Variables -- Exploring Categorical Variables -- Using EDA to Uncover Anomalous Fields -- Exploring Numerical Variables -- Exploring Multivariate Relationships -- Selecting Interesting Subsets of the Data for Further Investigation -- Binning -- Summary -- References -- Exercises -- 4 STATISTICAL APPROACHES TO ESTIMATION AND PREDICTION -- Data Mining Tasks in Discovering Knowledge in Data -- Statistical Approaches to Estimation and Prediction -- Univariate Methods: Measures of Center and Spread -- Statistical Inference -- How Confident Are We in Our Estimates?.
Confidence Interval Estimation -- Bivariate Methods: Simple Linear Regression -- Dangers of Extrapolation -- Confidence Intervals for the Mean Value of y Given x -- Prediction Intervals for a Randomly Chosen Value of y Given x -- Multiple Regression -- Verifying Model Assumptions -- References -- Exercises -- 5 k-NEAREST NEIGHBOR ALGORITHM -- Supervised versus Unsupervised Methods -- Methodology for Supervised Modeling -- Bias-Variance Trade-Off -- Classification Task -- k-Nearest Neighbor Algorithm -- Distance Function -- Combination Function -- Simple Unweighted Voting -- Weighted Voting -- Quantifying Attribute Relevance: Stretching the Axes -- Database Considerations -- k-Nearest Neighbor Algorithm for Estimation and Prediction -- Choosing k -- Reference -- Exercises -- 6 DECISION TREES -- Classification and Regression Trees -- C4.5 Algorithm -- Decision Rules -- Comparison of the C5.0 and CART Algorithms Applied to Real Data -- References -- Exercises -- 7 NEURAL NETWORKS -- Input and Output Encoding -- Neural Networks for Estimation and Prediction -- Simple Example of a Neural Network -- Sigmoid Activation Function -- Back-Propagation -- Gradient Descent Method -- Back-Propagation Rules -- Example of Back-Propagation -- Termination Criteria -- Learning Rate -- Momentum Term -- Sensitivity Analysis -- Application of Neural Network Modeling -- References -- Exercises -- 8 HIERARCHICAL AND k-MEANS CLUSTERING -- Clustering Task -- Hierarchical Clustering Methods -- Single-Linkage Clustering -- Complete-Linkage Clustering -- k-Means Clustering -- Example of k-Means Clustering at Work -- Application of k-Means Clustering Using SAS Enterprise Miner -- Using Cluster Membership to Predict Churn -- References -- Exercises -- 9 KOHONEN NETWORKS -- Self-Organizing Maps -- Kohonen Networks -- Example of a Kohonen Network Study -- Cluster Validity.
Application of Clustering Using Kohonen Networks -- Interpreting the Clusters -- Cluster Profiles -- Using Cluster Membership as Input to Downstream Data Mining Models -- References -- Exercises -- 10 ASSOCIATION RULES -- Affinity Analysis and Market Basket Analysis -- Data Representation for Market Basket Analysis -- Support, Confidence, Frequent Itemsets, and the A Priori Property -- How Does the A Priori Algorithm Work (Part 1)? Generating Frequent Itemsets -- How Does the A Priori Algorithm Work (Part 2)? Generating Association Rules -- Extension from Flag Data to General Categorical Data -- Information-Theoretic Approach: Generalized Rule Induction Method -- J-Measure -- Application of Generalized Rule Induction -- When Not to Use Association Rules -- Do Association Rules Represent Supervised or Unsupervised Learning? -- Local Patterns versus Global Models -- References -- Exercises -- 11 MODEL EVALUATION TECHNIQUES -- Model Evaluation Techniques for the Description Task -- Model Evaluation Techniques for the Estimation and Prediction Tasks -- Model Evaluation Techniques for the Classification Task -- Error Rate, False Positives, and False Negatives -- Misclassification Cost Adjustment to Reflect Real-World Concerns -- Decision Cost/Benefit Analysis -- Lift Charts and Gains Charts -- Interweaving Model Evaluation with Model Building -- Confluence of Results: Applying a Suite of Models -- Reference -- Exercises -- EPILOGUE: "WE'VE ONLY JUST BEGUN" -- INDEX.
Abstract:
DANIEL T. LAROSE received his PhD in statistics from the University of Connecticut. An associate professor of statistics at Central Connecticut State University, he developed and directs Data Mining@CCSU, the world's first online master of science program in data mining. He has also worked as a data mining consultant for Connecticut-area companies. He is currently working on the next two books of his three-volume series on Data Mining: Data Mining Methods and Models and Data Mining the Web: Uncovering Patterns in Web Content, scheduled to publish respectively in 2005 and 2006.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Subject Term:
Genre:
Electronic Access:
Click to View