Data Mining Applications with R.

Title:

Author:

Zhao, Yanchang.

ISBN:

9780124115200

Personal Author:

Zhao, Yanchang.

Physical Description:

1 online resource (493 pages)

Contents:

Front Cover -- Data Mining Applications with R -- Copyright -- Contents -- Preface -- Background -- Objectives and Significance -- Target Audience -- Acknowledgments -- Review Committee -- Additional Reviewers -- Foreword -- References -- Chapter 1: Power Grid Data Analysis with R and Hadoop -- 1.1. Introduction -- 1.2. A Brief Overview of the Power Grid -- 1.3. Introduction to MapReduce, Hadoop, and RHIPE -- 1.3.1. MapReduce -- 1.3.1.1. An Example: The Iris Data -- 1.3.2. Hadoop -- 1.3.3. RHIPE: R with Hadoop -- 1.3.3.1. Installation -- 1.3.3.2. Iris MapReduce Example with RHIPE -- 1.3.3.2.1. The Map Expression -- 1.3.3.2.2. The Reduce Expression -- 1.3.3.2.3. Running the Job -- 1.3.3.2.4. Looking at Results -- 1.3.4. Other Parallel R Packages -- 1.4. Power Grid Analytical Approach -- 1.4.1. Data Preparation -- 1.4.2. Exploratory Analysis and Data Cleaning -- 1.4.2.1. 5-min Summaries -- 1.4.2.2. Quantile Plots of Frequency -- 1.4.2.3. Tabulating Frequency by Flag -- 1.4.2.4. Distribution of Repeated Values -- 1.4.2.5. White Noise -- 1.4.3. Event Extraction -- 1.4.3.1. OOS Frequency Events -- 1.4.3.2. Finding Generator Trip Features -- 1.4.3.3. Creating Overlapping Frequency Data -- 1.5. Discussion and Conclusions -- Appendix -- References -- Chapter 2: Picturing Bayesian Classifiers: A Visual Data Mining Approach to Parameters Optimization -- 2.1. Introduction -- 2.2. Related Works -- 2.3. Motivations and Requirements -- 2.3.1. R Packages Requirements -- 2.4. Probabilistic Framework of NB Classifiers -- 2.4.1. Choosing the Model -- 2.4.1.1. Multivariate Bernoulli model -- 2.4.1.2. Multinomial Model -- 2.4.1.3. Poisson Model -- 2.4.2. Estimating the Parameters -- 2.5. Two-Dimensional Visualization System -- 2.5.1. Design Choices -- 2.5.2. Visualization Design -- 2.6. A Case Study: Text Classification -- 2.6.1. Description of the Dataset.

2.6.2. Creating Document-Term Matrices -- 2.6.3. Loading Existing Term-Document Matrices -- 2.6.4. Running the Program -- 2.6.4.1. Comparing Models -- 2.7. Conclusions -- Acknowledgments -- References -- Chapter 3: Discovery of Emergent Issues and Controversies in Anthropology Using Text Mining, Topic Modeling, and Social Ne ... -- 3.1. Introduction -- 3.2. How Many Messages and How Many Twitter-Users in the Sample? -- 3.3. Who Is Writing All These Twitter Messages? -- 3.4. Who Are the Influential Twitter-Users in This Sample? -- 3.5. What Is the Community Structure of These Twitter-Users? -- 3.6. What Were Twitter-Users Writing About During the Meeting? -- 3.7. What Do the Twitter Messages Reveal About the Opinions of Their Authors? -- 3.8. What Can Be Discovered in the Less Frequently Used Words in the Sample? -- 3.9. What Are the Topics That Can Be Algorithmically Discovered in This Sample? -- 3.10. Conclusion -- References -- Chapter 4: Text Mining and Network Analysis of Digital Libraries in R -- 4.1. Introduction -- 4.2. Dataset Preparation -- 4.3. Manipulating the Document-Term Matrix -- 4.3.1. The Document-Term Matrix -- 4.3.2. Term Frequency-Inverse Document Frequency -- 4.3.3. Exploring the Document-Term Matrix -- 4.4. Clustering Content by Topics Using the LDA -- 4.4.1. The Latent Dirichlet Allocation -- 4.4.2. Learning the Various Distributions for LDA -- 4.4.3. Using the Log-Likelihood for Model Validation -- 4.4.4. Topics Representation -- 4.4.5. Plotting the Topics Associations -- 4.5. Using Similarity Between Documents to Explore Document Cohesion -- 4.5.1. Computing Similarities Between Documents -- 4.5.2. Using a Heatmap to Illustrate Clusters of Documents -- 4.6. Social Network Analysis of Authors -- 4.6.1. Constructing the Network as a Graph -- 4.6.2. Author Importance Using Centrality Measures -- 4.7. Conclusion -- References.

Chapter 5: Recommender Systems in R -- 5.1. Introduction -- 5.2. Business Case -- 5.3. Evaluation -- 5.4. Collaborative Filtering Methods -- 5.5. Latent Factor Collaborative Filtering -- 5.6. Simplified Approach -- 5.7. Roll Your Own -- 5.8. Final Thoughts -- References -- Chapter 6: Response Modeling in Direct Marketing: A Data Mining-Based Approach for Target Selection -- 6.1. Introduction/Background -- 6.2. Business Problem -- 6.3. Proposed Response Model -- 6.4. Modeling Detail -- 6.4.1. Data Collection -- 6.4.2. Data Preprocessing -- 6.4.2.1. Data Integration and Cleaning -- 6.4.2.2. Data Normalization -- 6.4.3. Feature Construction -- 6.4.3.1. Target Variable Construction -- 6.4.3.2. Predictor Variables -- 6.4.3.3. Interaction Variables -- 6.4.4. Feature Selection -- 6.4.4.1. F-Score -- 6.4.4.2. Step1: Selection of Interaction Features Using F-Score -- 6.4.4.3. Step2: Selection of Features Using F-Score -- 6.4.4.4. Step3: Selection of Best Subset of Features Using Random Forest -- 6.4.5. Data Sampling for Training and Test -- 6.4.6. Class Balancing -- 6.4.7. Classifier (SVM) -- 6.5. Prediction Result -- 6.6. Model Evaluation -- 6.7. Conclusion -- References -- Chapter 7: Caravan Insurance Customer Profile Modeling with R -- 7.1. Introduction -- 7.2. Data Description and Initial Exploratory Data Analysis -- 7.2.1. Variable Correlations and Logistic Regression Analysis -- 7.3. Classifier Models of Caravan Insurance Holders -- 7.3.1. Overview of Model Building and Validating -- 7.3.2. Review of Four Classifier Methods -- 7.3.3. RP Model -- 7.3.4. Bagging Ensemble -- 7.3.5. Support Vector Machine -- 7.3.6. LR Classification -- 7.3.7. Comparison of Four Classifier Models: ROC and AUC -- 7.3.8. Model Comparison: Recall-Precision, Accuracy-v-Cut-off, and Computation Times -- 7.4. Discussion of Results and Conclusion.

Appendix A. Details of the Full Data Set Variables -- Appendix B. Customer Profile Data-Frequency of Binary Values -- Appendix C. Proportion of Caravan Insurance Holders vis-à-vis other Customer Profile Variables -- Appendix D. LR Model Details -- Appendix E. R Commands for Computation of ROC Curves for Each Model Using Validation Dataset -- Appendix F. Commands for Cross-Validation Analysis of Classifier Models -- References -- Chapter 8: Selecting Best Features for Predicting Bank Loan Default -- 8.1. Introduction -- 8.2. Business Problem -- 8.3. Data Extraction -- 8.4. Data Exploration and Preparation -- 8.4.1. Null Value Detection -- 8.4.2. Outlier Detection -- 8.5. Missing Imputation -- 8.5.1. Relevance Analysis -- 8.5.2. Data Set Balancing -- 8.5.3. Feature Selection -- 8.6. Modeling -- 8.7. Model Evaluation -- 8.8. Finding and Model Deployment -- 8.9. Lessons and Discussions -- Appendix. Selecting Best Features for Predicting Bank Loan Default -- References -- Chapter 9: A Choquet Integral Toolbox and Its Application in Customer Preference Analysis -- 9.1. Introduction -- 9.2. Background -- 9.2.1. Aggregation Functions -- 9.2.2. Choquet Integral -- 9.2.3. Fuzzy Measure Representation -- 9.2.4. Shapley Value and Interaction Index -- 9.3. Rfmtool Package -- 9.3.1. Installation -- 9.3.2. Toolbox Description -- 9.3.3. Preference Analysis Example -- 9.4. Case Study -- 9.4.1. Traveler Preference Study and Hotel Management -- 9.4.2. Data Collection and Experiment Design -- 9.4.3. Model Evaluation -- 9.4.4. Result Analysis -- 9.4.4.1. Preference Profile Construction -- 9.4.4.2. Interaction Behavior Analysis -- 9.4.5. Discussion -- 9.5. Conclusions -- References -- Chapter 10: A Real-Time Property Value Index Based on Web Data -- 10.1. Introduction -- 10.2. Housing Prices and Indices -- 10.3. A Data Mining Approach -- 10.3.1. Data Capture.

10.3.2. Geocoding -- 10.3.3. Price Evolution -- 10.4. Real Estate Pricing Models -- 10.4.1. Model 1: Hedonic Model Plus Smooth Term -- 10.4.2. Model 2: GWR Plus a Smooth Term -- 10.4.3. Relationship to Other Work -- 10.5. Conclusion -- Acknowledgments -- References -- Chapter 11: Predicting Seabed Hardness Using Random Forest in R -- 11.1. Introduction -- 11.2. Study Region and Data Processing -- 11.2.1. Study Region -- 11.2.2. Data Processing of Seabed Hardness -- 11.2.3. Predictors -- 11.3. Dataset Manipulation and Exploratory Analyses -- 11.3.1. Features of the Dataset -- 11.3.2. Exploratory Data Analyses -- 11.4. Application of RF for Predicting Seabed Hardness -- 11.5. Model Validation Using rfcv -- 11.6. Optimal Predictive Model -- 11.7. Application of the Optimal Predictive Model -- 11.8. Discussion and Conclusions -- 11.8.1. Selection of Relevant Predictors and the Consequences of Missing the Most Important Predictors -- 11.8.2. Issues with Searching for the Most Accurate Predictive Model Using RF -- 11.8.3. Predictive Accuracy of RF and Prediction Maps of Seabed Hardness -- 11.8.4. Limitations -- Appendix AA. Dataset of Seabed Hardness and 15 Predictors -- Appendix BA. R Function, rf.cv, Shows the Cross-Validated Prediction Performance of a Predictive Model -- References -- Chapter 12: Supervised Classification of Images, Applied to Plankton Samples Using R and Zooimage -- 12.1. Background -- 12.2. Challenges -- 12.3. Data Extraction and Exploration -- 12.4. Data Preprocessing -- 12.5. Modeling -- 12.6. Model Evaluation -- 12.7. Model Deployment -- 12.8. Lessons, Discussion, and Conclusions -- Acknowledgments -- References -- Chapter 13: Crime Analyses Using R -- 13.1. Introduction -- 13.2. Problem Definition -- 13.3. Data Extraction -- 13.4. Data Exploration and Preprocessing -- 13.5. Visualizations -- 13.6. Modeling.

13.7. Model Evaluation.

Abstract:

Data Mining Applications with R is a great resource for researchers and professionals to understand the wide use of R, a free software environment for statistical computing and graphics, in solving different problems in industry. R is widely used in leveraging data mining techniques across many different industries, including government, finance, insurance, medicine, scientific research and more. This book presents 15 different real-world case studies illustrating various techniques in rapidly growing areas. It is an ideal companion for data mining researchers in academia and industry looking for ways to turn this versatile software into a powerful analytic tool. R code, Data and color figures for the book are provided at the RDataMining.com website. Helps data miners to learn to use R in their specific area of work and see how R can apply in different industries Presents various case studies in real-world applications, which will help readers to apply the techniques in their work Provides code examples and sample data for readers to easily learn the techniques by running the code by themselves.

Local Note:

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

Subject Term:

Data mining -- Industrial applications -- Case studies.

R (Computer program language).

Genre:

Added Author:

Electronic Access:

Holds: Copies:

Available:*

Bound With These Titles

On Order