Cover image for Data Mining Algorithms : Explained Using R.
Data Mining Algorithms : Explained Using R.
Title:
Data Mining Algorithms : Explained Using R.
Author:
Cichosz, Pawel.
ISBN:
9781118950807
Personal Author:
Edition:
1st ed.
Physical Description:
1 online resource (718 pages)
Contents:
Cover -- Title Page -- Copyright -- Contents -- Acknowledgements -- Preface -- Part I Preliminaries -- Chapter 1 Tasks -- 1.1 Introduction -- 1.1.1 Knowledge -- 1.1.2 Inference -- 1.2 Inductive learning tasks -- 1.2.1 Domain -- 1.2.2 Instances -- 1.2.3 Attributes -- 1.2.4 Target attribute -- 1.2.5 Input attributes -- 1.2.6 Training set -- 1.2.7 Model -- 1.2.8 Performance -- 1.2.9 Generalization -- 1.2.10 Overfitting -- 1.2.11 Algorithms -- 1.2.12 Inductive learning as search -- 1.3 Classification -- 1.3.1 Concept -- 1.3.2 Training set -- 1.3.3 Model -- 1.3.4 Performance -- 1.3.5 Generalization -- 1.3.6 Overfitting -- 1.3.7 Algorithms -- 1.4 Regression -- 1.4.1 Target function -- 1.4.2 Training set -- 1.4.3 Model -- 1.4.4 Performance -- 1.4.5 Generalization -- 1.4.6 Overfitting -- 1.4.7 Algorithms -- 1.5 Clustering -- 1.5.1 Motivation -- 1.5.2 Training set -- 1.5.3 Model -- 1.5.4 Crisp vs. soft clustering -- 1.5.5 Hierarchical clustering -- 1.5.6 Performance -- 1.5.7 Generalization -- 1.5.8 Algorithms -- 1.5.9 Descriptive vs. predictive clustering -- 1.6 Practical issues -- 1.6.1 Incomplete data -- 1.6.2 Noisy data -- 1.7 Conclusion -- 1.8 Further readings -- References -- Chapter 2 Basic statistics -- 2.1 Introduction -- 2.2 Notational conventions -- 2.3 Basic statistics as modeling -- 2.4 Distribution description -- 2.4.1 Continuous attributes -- 2.4.2 Discrete attributes -- 2.4.3 Confidence intervals -- 2.4.4 m-Estimation -- 2.5 Relationship detection -- 2.5.1 Significance tests -- 2.5.2 Continuous attributes -- 2.5.3 Discrete attributes -- 2.5.4 Mixed attributes -- 2.5.5 Relationship detection caveats -- 2.6 Visualization -- 2.6.1 Boxplot -- 2.6.2 Histogram -- 2.6.3 Barplot -- 2.7 Conclusion -- 2.8 Further readings -- References -- Part II Classification -- Chapter 3 Decision trees -- 3.1 Introduction.

3.2 Decision tree model -- 3.2.1 Nodes and branches -- 3.2.2 Leaves -- 3.2.3 Split types -- 3.3 Growing -- 3.3.1 Algorithm outline -- 3.3.2 Class distribution calculation -- 3.3.3 Class label assignment -- 3.3.4 Stop criteria -- 3.3.5 Split selection -- 3.3.6 Split application -- 3.3.7 Complete process -- 3.4 Pruning -- 3.4.1 Pruning operators -- 3.4.2 Pruning criterion -- 3.4.3 Pruning control strategy -- 3.4.4 Conversion to rule sets -- 3.5 Prediction -- 3.5.1 Class label prediction -- 3.5.2 Class probability prediction -- 3.6 Weighted instances -- 3.7 Missing value handling -- 3.7.1 Fractional instances -- 3.7.2 Surrogate splits -- 3.8 Conclusion -- 3.9 Further readings -- References -- Chapter 4 Naïve Bayes classifier -- 4.1 Introduction -- 4.2 Bayes rule -- 4.3 Classification by Bayesian inference -- 4.3.1 Conditional class probability -- 4.3.2 Prior class probability -- 4.3.3 Independence assumption -- 4.3.4 Conditional attribute value probabilities -- 4.3.5 Model construction -- 4.3.6 Prediction -- 4.4 Practical issues -- 4.4.1 Zero and small probabilities -- 4.4.2 Linear classification -- 4.4.3 Continuous attributes -- 4.4.4 Missing attribute values -- 4.4.5 Reducing naïivety -- 4.5 Conclusion -- 4.6 Further readings -- References -- Chapter 5 Linear classification -- 5.1 Introduction -- 5.2 Linear representation -- 5.2.1 Inner representation function -- 5.2.2 Outer representation function -- 5.2.3 Threshold representation -- 5.2.4 Logit representation -- 5.3 Parameter estimation -- 5.3.1 Delta rule -- 5.3.2 Gradient descent -- 5.3.3 Distance to decision boundary -- 5.3.4 Least squares -- 5.4 Discrete attributes -- 5.5 Conclusion -- 5.6 Further readings -- References -- Chapter 6 Misclassification costs -- 6.1 Introduction -- 6.2 Cost representation -- 6.2.1 Cost matrix -- 6.2.2 Per-class cost vector.

6.2.3 Instance-specific costs -- 6.3 Incorporating misclassification costs -- 6.3.1 Instance weighting -- 6.3.2 Instance resampling -- 6.3.3 Minimum-cost rule -- 6.3.4 Instance relabeling -- 6.4 Effects of cost incorporation -- 6.5 Experimental procedure -- 6.6 Conclusion -- 6.7 Further readings -- References -- Chapter 7 Classification model evaluation -- 7.1 Introduction -- 7.1.1 Dataset performance -- 7.1.2 Training performance -- 7.1.3 True performance -- 7.2 Performance measures -- 7.2.1 Misclassification error -- 7.2.2 Weighted misclassification error -- 7.2.3 Mean misclassification cost -- 7.2.4 Confusion matrix -- 7.2.5 ROC analysis -- 7.2.6 Probabilistic performance measures -- 7.3 Evaluation procedures -- 7.3.1 Model evaluation vs. modeling procedure evaluation -- 7.3.2 Evaluation caveats -- 7.3.3 Hold-out -- 7.3.4 Cross-validation -- 7.3.5 Leave-one-out -- 7.3.6 Bootstrapping -- 7.3.7 Choosing the right procedure -- 7.3.8 Evaluation procedures for temporal data -- 7.4 Conclusion -- 7.5 Further readings -- References -- Part III Regression -- Chapter 8 Linear regression -- 8.1 Introduction -- 8.2 Linear representation -- 8.2.1 Parametric representation -- 8.2.2 Linear representation function -- 8.2.3 Nonlinear representation functions -- 8.3 Parameter estimation -- 8.3.1 Mean square error minimization -- 8.3.2 Delta rule -- 8.3.3 Gradient descent -- 8.3.4 Least squares -- 8.4 Discrete attributes -- 8.5 Advantages of linear models -- 8.6 Beyond linearity -- 8.6.1 Generalized linear representation -- 8.6.2 Enhanced representation -- 8.6.3 Polynomial regression -- 8.6.4 Piecewise-linear regression -- 8.7 Conclusion -- 8.8 Further readings -- References -- Chapter 9 Regression trees -- 9.1 Introduction -- 9.2 Regression tree model -- 9.2.1 Nodes and branches -- 9.2.2 Leaves -- 9.2.3 Split types.

9.2.4 Piecewise-constant regression -- 9.3 Growing -- 9.3.1 Algorithm outline -- 9.3.2 Target function summary statistics -- 9.3.3 Target value assignment -- 9.3.4 Stop criteria -- 9.3.5 Split selection -- 9.3.6 Split application -- 9.3.7 Complete process -- 9.4 Pruning -- 9.4.1 Pruning operators -- 9.4.2 Pruning criterion -- 9.4.3 Pruning control strategy -- 9.5 Prediction -- 9.6 Weighted instances -- 9.7 Missing value handling -- 9.7.1 Fractional instances -- 9.7.2 Surrogate splits -- 9.8 Piecewise linear regression -- 9.8.1 Growing -- 9.8.2 Pruning -- 9.8.3 Prediction -- 9.9 Conclusion -- 9.10 Further readings -- References -- Chapter 10 Regression model evaluation -- 10.1 Introduction -- 10.1.1 Dataset performance -- 10.1.2 Training performance -- 10.1.3 True performance -- 10.2 Performance measures -- 10.2.1 Residuals -- 10.2.2 Mean absolute error -- 10.2.3 Mean square error -- 10.2.4 Root mean square error -- 10.2.5 Relative absolute error -- 10.2.6 Coefficient of determination -- 10.2.7 Correlation -- 10.2.8 Weighted performance measures -- 10.2.9 Loss functions -- 10.3 Evaluation procedures -- 10.3.1 Hold-out -- 10.3.2 Cross-validation -- 10.3.3 Leave-one-out -- 10.3.4 Bootstrapping -- 10.3.5 Choosing the right procedure -- 10.4 Conclusion -- 10.5 Further readings -- References -- Part IV Clustering -- Chapter 11 (Dis)similarity measures -- 11.1 Introduction -- 11.2 Measuring dissimilarity and similarity -- 11.3 Difference-based dissimilarity -- 11.3.1 Euclidean distance -- 11.3.2 Minkowski distance -- 11.3.3 Manhattan distance -- 11.3.4 Canberra distance -- 11.3.5 Chebyshev distance -- 11.3.6 Hamming distance -- 11.3.7 Gower's coefficient -- 11.3.8 Attribute weighting -- 11.3.9 Attribute transformation -- 11.4 Correlation-based similarity -- 11.4.1 Discrete attributes.

11.4.2 Pearson's correlation similarity -- 11.4.3 Spearman's correlation similarity -- 11.4.4 Cosine similarity -- 11.5 Missing attribute values -- 11.6 Conclusion -- 11.7 Further readings -- References -- Chapter 12 k-Centers clustering -- 12.1 Introduction -- 12.1.1 Basic principle -- 12.1.2 (Dis)similarity measures -- 12.2 Algorithm scheme -- 12.2.1 Initialization -- 12.2.2 Stop criteria -- 12.2.3 Cluster formation -- 12.2.4 Implicit cluster modeling -- 12.2.5 Instantiations -- 12.3 k-Means -- 12.3.1 Center adjustment -- 12.3.2 Minimizing dissimilarity to centers -- 12.4 Beyond means -- 12.4.1 k-Medians -- 12.4.2 k-Medoids -- 12.5 Beyond (fixed) k -- 12.5.1 Multiple runs -- 12.5.2 Adaptive k-centers -- 12.6 Explicit cluster modeling -- 12.7 Conclusion -- 12.8 Further readings -- References -- Chapter 13 Hierarchical clustering -- 13.1 Introduction -- 13.1.1 Basic approaches -- 13.1.2 (Dis)similarity measures -- 13.2 Cluster hierarchies -- 13.2.1 Motivation -- 13.2.2 Model representation -- 13.3 Agglomerative clustering -- 13.3.1 Algorithm scheme -- 13.3.2 Cluster linkage -- 13.4 Divisive clustering -- 13.4.1 Algorithm scheme -- 13.4.2 Wrapping a flat clustering algorithm -- 13.4.3 Stop criteria -- 13.5 Hierarchical clustering visualization -- 13.6 Hierarchical clustering prediction -- 13.6.1 Cutting cluster hierarchies -- 13.6.2 Cluster membership assignment -- 13.7 Conclusion -- 13.8 Further readings -- References -- Chapter 14 Clustering model evaluation -- 14.1 Introduction -- 14.1.1 Dataset performance -- 14.1.2 Training performance -- 14.1.3 True performance -- 14.2 Per-cluster quality measures -- 14.2.1 Diameter -- 14.2.2 Separation -- 14.2.3 Isolation -- 14.2.4 Silhouette width -- 14.2.5 Davies-Bouldin index -- 14.3 Overall quality measures -- 14.3.1 Dunn index -- 14.3.2 Average Davies-Bouldin index -- 14.3.3 C index.

14.3.4 Average silhouette width.
Abstract:
Data Mining Algorithms is a practical, technically-oriented guide to data mining algorithms that covers the most important algorithms for building classification, regression, and clustering models, as well as techniques used for attribute selection and transformation, model quality evaluation, and creating model ensembles. The author presents many of the important topics and methodologies widely used in data mining, whilst demonstrating the internal operation and usage of data mining algorithms using examples in R.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: