Cover image for Learning Data Mining with R.
Learning Data Mining with R.
Title:
Learning Data Mining with R.
Author:
Makhabel, Bater.
ISBN:
9781783982110
Personal Author:
Physical Description:
1 online resource (348 pages)
Contents:
Learning Data Mining with R -- Table of Contents -- Learning Data Mining with R -- Credits -- About the Author -- Acknowledgments -- About the Reviewers -- www.PacktPub.com -- Support files, eBooks, discount offers, and more -- Why subscribe? -- Free access for Packt account holders -- Preface -- What this book covers -- What you need for this book -- Who this book is for -- Conventions -- Reader feedback -- Customer support -- Downloading the example code -- Errata -- Piracy -- Questions -- 1. Warming Up -- Big data -- Scalability and efficiency -- Data source -- Data mining -- Feature extraction -- Summarization -- The data mining process -- CRISP-DM -- SEMMA -- Social network mining -- Social network -- Text mining -- Information retrieval and text mining -- Mining text for prediction -- Web data mining -- Why R? -- What are the disadvantages of R? -- Statistics -- Statistics and data mining -- Statistics and machine learning -- Statistics and R -- The limitations of statistics on data mining -- Machine learning -- Approaches to machine learning -- Machine learning architecture -- Data attributes and description -- Numeric attributes -- Categorical attributes -- Data description -- Data measuring -- Data cleaning -- Missing values -- Junk, noisy data, or outlier -- Data integration -- Data dimension reduction -- Eigenvalues and Eigenvectors -- Principal-Component Analysis -- Singular-value decomposition -- CUR decomposition -- Data transformation and discretization -- Data transformation -- Normalization data transformation methods -- Data discretization -- Visualization of results -- Visualization with R -- Time for action -- Summary -- 2. Mining Frequent Patterns, Associations, and Correlations -- An overview of associations and patterns -- Patterns and pattern discovery -- The frequent itemset -- The frequent subsequence.

The frequent substructures -- Relationship or rules discovery -- Association rules -- Correlation rules -- Market basket analysis -- The market basket model -- A-Priori algorithms -- Input data characteristics and data structure -- The A-Priori algorithm -- The R implementation -- A-Priori algorithm variants -- The Eclat algorithm -- The R implementation -- The FP-growth algorithm -- Input data characteristics and data structure -- The FP-growth algorithm -- The R implementation -- The GenMax algorithm with maximal frequent itemsets -- The R implementation -- The Charm algorithm with closed frequent itemsets -- The R implementation -- The algorithm to generate association rules -- The R implementation -- Hybrid association rules mining -- Mining multilevel and multidimensional association rules -- Constraint-based frequent pattern mining -- Mining sequence dataset -- Sequence dataset -- The GSP algorithm -- The R implementation -- The SPADE algorithm -- The R implementation -- Rule generation from sequential patterns -- High-performance algorithms -- Time for action -- Summary -- 3. Classification -- Classification -- Generic decision tree induction -- Attribute selection measures -- Tree pruning -- General algorithm for the decision tree generation -- The R implementation -- High-value credit card customers classification using ID3 -- The ID3 algorithm -- The R implementation -- Web attack detection -- High-value credit card customers classification -- Web spam detection using C4.5 -- The C4.5 algorithm -- The R implementation -- A parallel version with MapReduce -- Web spam detection -- Web key resource page judgment using CART -- The CART algorithm -- The R implementation -- Web key resource page judgment -- Trojan traffic identification method and Bayes classification -- Estimating -- Prior probability estimation -- Likelihood estimation.

The Bayes classification -- The R implementation -- Trojan traffic identification method -- Identify spam e-mail and Naïve Bayes classification -- The Naïve Bayes classification -- The R implementation -- Identify spam e-mail -- Rule-based classification of player types in computer games and rule-based classification -- Transformation from decision tree to decision rules -- Rule-based classification -- Sequential covering algorithm -- The RIPPER algorithm -- The R implementation -- Rule-based classification of player types in computer games -- Time for action -- Summary -- 4. Advanced Classification -- Ensemble (EM) methods -- The bagging algorithm -- The boosting and AdaBoost algorithms -- The Random forests algorithm -- The R implementation -- Parallel version with MapReduce -- Biological traits and the Bayesian belief network -- The Bayesian belief network (BBN) algorithm -- The R implementation -- Biological traits -- Protein classification and the k-Nearest Neighbors algorithm -- The kNN algorithm -- The R implementation -- Document retrieval and Support Vector Machine -- The SVM algorithm -- The R implementation -- Parallel version with MapReduce -- Document retrieval -- Classification using frequent patterns -- The associative classification -- CBA -- Discriminative frequent pattern-based classification -- The R implementation -- Text classification using sentential frequent itemsets -- Classification using the backpropagation algorithm -- The BP algorithm -- The R implementation -- Parallel version with MapReduce -- Time for action -- Summary -- 5. Cluster Analysis -- Search engines and the k-means algorithm -- The k-means clustering algorithm -- The kernel k-means algorithm -- The k-modes algorithm -- The R implementation -- Parallel version with MapReduce -- Search engine and web page clustering.

Automatic abstraction of document texts and the k-medoids algorithm -- The PAM algorithm -- The R implementation -- Automatic abstraction and summarization of document text -- The CLARA algorithm -- The CLARA algorithm -- The R implementation -- CLARANS -- The CLARANS algorithm -- The R implementation -- Unsupervised image categorization and affinity propagation clustering -- Affinity propagation clustering -- The R implementation -- Unsupervised image categorization -- The spectral clustering algorithm -- The R implementation -- News categorization and hierarchical clustering -- Agglomerative hierarchical clustering -- The BIRCH algorithm -- The chameleon algorithm -- The Bayesian hierarchical clustering algorithm -- The probabilistic hierarchical clustering algorithm -- The R implementation -- News categorization -- Time for action -- Summary -- 6. Advanced Cluster Analysis -- Customer categorization analysis of e-commerce and DBSCAN -- The DBSCAN algorithm -- Customer categorization analysis of e-commerce -- Clustering web pages and OPTICS -- The OPTICS algorithm -- The R implementation -- Clustering web pages -- Visitor analysis in the browser cache and DENCLUE -- The DENCLUE algorithm -- The R implementation -- Visitor analysis in the browser cache -- Recommendation system and STING -- The STING algorithm -- The R implementation -- Recommendation systems -- Web sentiment analysis and CLIQUE -- The CLIQUE algorithm -- The R implementation -- Web sentiment analysis -- Opinion mining and WAVE clustering -- The WAVE cluster algorithm -- The R implementation -- Opinion mining -- User search intent and the EM algorithm -- The EM algorithm -- The R implementation -- The user search intent -- Customer purchase data analysis and clustering high-dimensional data -- The MAFIA algorithm -- The SURFING algorithm -- The R implementation.

Customer purchase data analysis -- SNS and clustering graph and network data -- The SCAN algorithm -- The R implementation -- Social networking service (SNS) -- Time for action -- Summary -- 7. Outlier Detection -- Credit card fraud detection and statistical methods -- The likelihood-based outlier detection algorithm -- The R implementation -- Credit card fraud detection -- Activity monitoring - the detection of fraud involving mobile phones and proximity-based methods -- The NL algorithm -- The FindAllOutsM algorithm -- The FindAllOutsD algorithm -- The distance-based algorithm -- The Dolphin algorithm -- The R implementation -- Activity monitoring and the detection of mobile fraud -- Intrusion detection and density-based methods -- The OPTICS-OF algorithm -- The High Contrast Subspace algorithm -- The R implementation -- Intrusion detection -- Intrusion detection and clustering-based methods -- Hierarchical clustering to detect outliers -- The k-means-based algorithm -- The ODIN algorithm -- The R implementation -- Monitoring the performance of the web server and classification-based methods -- The OCSVM algorithm -- The one-class nearest neighbor algorithm -- The R implementation -- Monitoring the performance of the web server -- Detecting novelty in text, topic detection, and mining contextual outliers -- The conditional anomaly detection (CAD) algorithm -- The R implementation -- Detecting novelty in text and topic detection -- Collective outliers on spatial data -- The route outlier detection (ROD) algorithm -- The R implementation -- Characteristics of collective outliers -- Outlier detection in high-dimensional data -- The brute-force algorithm -- The HilOut algorithm -- The R implementation -- Time for action -- Summary -- 8. Mining Stream, Time-series, and Sequence Data -- The credit card transaction flow and STREAM algorithm.

The STREAM algorithm.
Abstract:
This book is intended for the budding data scientist or quantitative analyst with only a basic exposure to R and statistics. This book assumes familiarity with only the very basics of R, such as the main data types, simple functions, and how to move data around. No prior experience with data mining packages is necessary; however, you should have a basic understanding of data mining concepts and processes.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: