Cover image for Data Preparation for Data Mining Using SAS.
Data Preparation for Data Mining Using SAS.
Title:
Data Preparation for Data Mining Using SAS.
Author:
Refaat, Mamdouh.
ISBN:
9780080491004
Personal Author:
Physical Description:
1 online resource (425 pages)
Series:
The Morgan Kaufmann Series in Data Management Systems
Contents:
Front Cover -- Data Preparation for Data Mining Using SAS -- Copyright Page -- Contents -- List of Figures -- List of Tables -- Preface -- CHAPTER 1. INTRODUCTION -- 1.1 The Data Mining Process -- 1.2 Methodologies of Data Mining -- 1.3 The Mining View -- 1.4 The Scoring View -- 1.5 Notes on Data Mining Software -- CHAPTER 2. TASKS AND DATA FLOW -- 2.1 Data Mining Tasks -- 2.2 Data Mining Competencies -- 2.3 The Data Flow -- 2.4 Types of Variables -- 2.5 The Mining View and the Scoring View -- 2.6 Steps of Data Preparation -- CHAPTER 3. REVIEW OF DATA MINING MODELING TECHNIQUES -- 3.1 Introduction -- 3.2 Regression Models -- 3.3 Decision Trees -- 3.4 Neural Networks -- 3.5 Cluster Analysis -- 3.6 Association Rules -- 3.7 Time Series Analysis -- 3.8 Support Vector Machines -- CHAPTER 4. SAS MACROS: A QUICK START -- 4.1 Introduction:Why Macros? -- 4.2 The Basics: The Macro and Its Variables -- 4.3 Doing Calculations -- 4.4 Programming Logic -- 4.5 Working with Strings -- 4.6 Macros That Call Other Macros -- 4.7 Common Macro Patterns and Caveats -- 4.8 Where to Go From Here -- CHAPTER 5. DATA ACQUISITION AND INTEGRATION -- 5.1 Introduction -- 5.2 Sources of Data -- 5.3 Variable Types -- 5.4 Data Rollup -- 5.5 Rollup with Sums, Averages, and Counts -- 5.6 Calculation of the Mode -- 5.7 Data Integration -- CHAPTER 6. INTEGRITY CHECKS -- 6.1 Introduction -- 6.2 Comparing Datasets -- 6.3 Dataset Schema Checks -- 6.4 Nominal Variables -- 6.5 Continuous Variables -- CHAPTER 7. EXPLORATORY DATA ANALYSIS -- 7.1 Introduction -- 7.2 Common EDA Procedures -- 7.3 Univariate Statistics -- 7.4 Variable Distribution -- 7.5 Detection of Outliers -- 7.6 Testing Normality -- 7.7 Cross-tabulation -- 7.8 Investigating Data Structures -- CHAPTER 8. SAMPLING AND PARTITIONING -- 8.1 Introduction -- 8.2 Contents of Samples -- 8.3 Random Sampling -- 8.4 Balanced Sampling.

8.5 Minimum Sample Size -- 8.6 Checking Validity of Sample -- CHAPTER 9. DATA TRANSFORMATIONS -- 9.1 Raw and Analytical Variables -- 9.2 Scope of Data Transformations -- 9.3 Creation of New Variables -- 9.4 Mapping of Nominal Variables -- 9.5 Normalization of Continuous Variables -- 9.6 Changing the Variable Distribution -- CHAPTER 10. BINNING AND REDUCTION OF CARDINALITY -- 10.1 Introduction -- 10.2 Cardinality Reduction -- 10.3 Binning of Continuous Variables -- CHAPTER 11. TREATMENT OF MISSING VALUES -- 11.1 Introduction -- 11.2 Simple Replacement -- 11.3 Imputing Missing Values -- 11.4 Imputation Methods and Strategy -- 11.5 SAS Macros for Multiple Imputation -- 11.6 Predicting Missing Values -- CHAPTER 12. PREDICTIVE POWER AND VARIABLE REDUCTION I -- 12.1 Introduction -- 12.2 Metrics of Predictive Power -- 12.3 Methods of Variable Reduction -- 12.4 Variable Reduction: Before or During Modeling -- CHAPTER 13. ANALYSIS OF NOMINAL AND ORDINAL VARIABLES -- 13.1 Introduction -- 13.2 Contingency Tables -- 13.3 Notation and Definitions -- 13.4 Contingency Tables for Binary Variables -- 13.5 Contingency Tables for Multicategory Variables -- 13.6 Analysis of Ordinal Variables -- 13.7 Implementation Scenarios -- CHAPTER 14. ANALYSIS OF CONTINUOUS VARIABLES -- 14.1 Introduction -- 14.2 When Is Binning Necessary? -- 14.3 Measures of Association -- 14.4 Correlation Coefficients -- CHAPTER 15. PRINCIPAL COMPONENT ANALYSIS -- 15.1 Introduction -- 15.2 Mathematical Formulations -- 15.3 Implementing and Using PCA -- 15.4 Comments on Using PCA -- CHAPTER 16. FACTOR ANALYSIS -- 16.1 Introduction -- 16.2 Relationship Between PCA and FA -- 16.3 Implementation of Factor Analysis -- CHAPTER 17. PREDICTIVE POWER AND VARIABLE REDUCTION II -- 17.1 Introduction -- 17.2 Data with Binary Dependent Variables -- 17.3 Data with Continuous Dependent Variables.

17.4 Variable Reduction Strategies -- CHAPTER 18. PUTTING IT ALL TOGETHER -- 18.1 Introduction -- 18.2 The Process of Data Preparation -- 18.3 Case Study: The Bookstore -- APPENDIX. LISTING OF SAS MACROS -- A.1 Copyright and Software License -- A.2 Dependencies between Macros -- A.3 Data Acquisition and Integration -- A.4 Integrity Checks -- A.5 Exploratory Data Analysis -- A.6 Sampling and Partitioning -- A.7 Data Transformations -- A.8 Binning and Reduction of Cardinality -- A.9 Treatment of Missing Values -- A.10 Analysis of Nominal and Ordinal Variables -- A.11 Analysis of Continuous Variables -- A.12 Principal Component Analysis -- A.13 Factor Analysis -- A.14 Predictive Power and Variable Reduction II -- A.15 Other Macros -- Bibliography -- Index -- About the Author.
Abstract:
Are you a data mining analyst, who spends up to 80% of your time assuring data quality, then preparing that data for developing and deploying predictive models? And do you find lots of literature on data mining theory and concepts, but when it comes to practical advice on developing good mining views find little "how to” information? And are you, like most analysts, preparing the data in SAS? This book is intended to fill this gap as your source of practical recipes. It introduces a framework for the process of data preparation for data mining, and presents the detailed implementation of each step in SAS. In addition, business applications of data mining modeling require you to deal with a large number of variables, typically hundreds if not thousands. Therefore, the book devotes several chapters to the methods of data transformation and variable selection. FEATURES * A complete framework for the data preparation process, including implementation details for each step. * The complete SAS implementation code, which is readily usable by professional analysts and data miners. * A unique and comprehensive approach for the treatment of missing values, optimal binning, and cardinality reduction. * Assumes minimal proficiency in SAS and includes a quick-start chapter on writing SAS macros. * CD includes dozens of SAS macros plus the sample data and the program for the book's case study.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: