Cover image for Data Mining : Practical Machine Learning Tools and Techniques.
Data Mining : Practical Machine Learning Tools and Techniques.
Title:
Data Mining : Practical Machine Learning Tools and Techniques.
Author:
Witten, Ian H.
ISBN:
9780080890364
Personal Author:
Edition:
3rd ed.
Physical Description:
1 online resource (665 pages)
Series:
Morgan Kaufmann Series in Data Management Systems
Contents:
Front cover -- Data Mining: Practical Machine Learning Tools and Techniques -- Copyright page -- Table of Contents -- List of Figures -- List of Tables -- Preface -- Updated and revised content -- Second Edition -- Third Edition -- Acknowledgments -- About the Authors -- Part I Introduction to Data Mining -- CHAPTER 1 What's It All About? -- 1.1 Data mining and machine learning -- Describing Structural Patterns -- Machine Learning -- Data Mining -- 1.2 Simple examples: the weather and other problems -- The Weather Problem -- Contact Lenses: An Idealized Problem -- Irises: A Classic Numeric Dataset -- CPU Performance: Introducing Numeric Prediction -- Labor Negotiations: A More Realistic Example -- Soybean Classification: A Classic Machine Learning Success -- 1.3 Fielded applications -- Web Mining -- Decisions Involving Judgment -- Screening Images -- Load Forecasting -- Diagnosis -- Marketing and Sales -- Other Applications -- 1.4 Machine learning and statistics -- 1.5 Generalization as search -- 1.6 Data mining and ethics -- Reidentification -- Using Personal Information -- Wider Issues -- 1.7 Further reading -- CHAPTER 2 Input: -- 2.1 What's a concept? -- 2.2 What's in an example? -- Relations -- Other Example Types -- 2.3 What's in an attribute? -- 2.4 Preparing the input -- Gathering the Data Together -- ARFF Format -- Sparse Data -- Attribute Types -- Missing Values -- Inaccurate Values -- Getting to Know Your Data -- 2.5 Further reading -- CHAPTER 3 Output: -- 3.1 Tables -- 3.2 Linear models -- 3.3 Trees -- 3.4 Rules -- Classification Rules -- Association Rules -- Rules with Exceptions -- More Expressive Rules -- 3.5 Instance-based representation -- 3.6 Clusters -- 3.7 Further reading -- CHAPTER 4 Algorithms: -- 4.1 InFerring rudimentary rules -- Missing Values and Numeric Attributes -- Discussion -- 4.2 Statistical modeling.

Missing Values and Numeric Attributes -- Naïve Bayes for Document Classification -- Discussion -- 4.3 Divide-and-conquer: constructing decision trees -- Calculating Information -- Highly Branching Attributes -- Discussion -- 4.4 Covering algorithms: constructing rules -- Rules versus Trees -- A Simple Covering Algorithm -- Rules versus Decision Lists -- 4.5 Mining association rules -- Item Sets -- Association Rules -- Generating Rules Efficiently -- Discussion -- 4.6 Linear models -- Numeric Prediction: Linear Regression -- Linear Classification: Logistic Regression -- Linear Classification Using the Perceptron -- Linear Classification Using Winnow -- 4.7 Instance-based learning -- Distance Function -- Finding Nearest Neighbors Efficiently -- Discussion -- 4.8 Clustering -- Iterative Distance-Based Clustering -- Faster Distance Calculations -- Discussion -- 4.9 Multi-instance learning -- Aggregating the Input -- Aggregating the Output -- Discussion -- 4.10 Further reading -- 4.11 Weka implementations -- CHAPTER 5 Credibility: -- 5.1 Training and testing -- 5.2 Predicting performance -- 5.3 Cross-validation -- 5.4 Other estimates -- Leave-One-Out Cross-Validation -- The Bootstrap -- 5.5 Comparing data mining schemes -- 5.6 Predicting probabilities -- Quadratic Loss Function -- Informational Loss Function -- Discussion -- 5.7 Counting the cost -- Cost-Sensitive Classification -- Cost-Sensitive Learning -- Lift Charts -- ROC Curves -- Recall-Precision Curves -- Discussion -- Cost Curves -- 5.8 Evaluating numeric prediction -- 5.9 Minimum description length principle -- 5.10 Applying the MDL principle to clustering -- 5.11 Further reading -- Part II Advanced Data Mining -- CHAPTER 6 Implementations: -- 6.1 Decision trees -- Numeric Attributes -- Missing Values -- Pruning -- Estimating Error Rates -- Complexity of Decision Tree Induction.

From Trees to Rules -- C4.5: Choices and Options -- Cost-Complexity Pruning -- Discussion -- 6.2 Classification rules -- Criteria for Choosing Tests -- Missing Values, Numeric Attributes -- Generating Good Rules -- Using Global Optimization -- Obtaining Rules from Partial Decision Trees -- Rules with Exceptions -- Discussion -- 6.3 Association rules -- Building a Frequent-Pattern Tree -- Finding Large Item Sets -- Discussion -- 6.4 Extending linear models -- Maximum-Margin Hyperplane -- Nonlinear Class Boundaries -- Support Vector Regression -- Kernel Ridge Regression -- Kernel Perceptron -- Multilayer Perceptrons -- Backpropagation -- Radial Basis Function Networks -- Stochastic Gradient Descent -- Discussion -- 6.5 Instance-based learning -- Reducing the Number of Exemplars -- Pruning Noisy Exemplars -- Weighting Attributes -- Generalizing Exemplars -- Distance Functions for Generalized Exemplars -- Generalized Distance Functions -- Discussion -- 6.6 Numeric prediction with local linear models -- Model Trees -- Building the Tree -- Pruning the Tree -- Nominal Attributes -- Missing Values -- Pseudocode for Model Tree Induction -- Rules from Model Trees -- Locally Weighted Linear Regression -- Discussion -- 6.7 Bayesian networks -- Making Predictions -- Learning Bayesian Networks -- Specific Algorithms -- Data Structures for Fast Learning -- Discussion -- 6.8 Clustering -- Choosing the Number of Clusters -- Hierarchical Clustering -- Example of Hierarchical Clustering -- Incremental Clustering -- Probability-Based Clustering -- The EM Algorithm -- Extending the Mixture Model -- Bayesian Clustering -- Discussion -- 6.9 Semisupervised learning -- Clustering for Classification -- Co-training -- EM and Co-training -- Discussion -- 6.10 Multi-instance learning -- Converting to Single-Instance Learning -- Upgrading Learning Algorithms.

Dedicated Multi-Instance Methods -- Discussion -- 6.11 Weka implementations -- CHAPTER 7 Data Transformations -- 7.1 Attribute selection -- Scheme-Independent Selection -- Searching the Attribute Space -- Scheme-Specific Selection -- 7.2 Discretizing numeric attributes -- Unsupervised Discretization -- Entropy-Based Discretization -- Other Discretization Methods -- Entropy-Based versus Error-Based Discretization -- Converting Discrete Attributes to Numeric Attributes -- 7.3 Projections -- Principal Components Analysis -- Random Projections -- Partial Least-Squares Regression -- Text to Attribute Vectors -- Time Series -- 7.4 Sampling -- Reservoir Sampling -- 7.5 Cleansing -- Improving Decision Trees -- Robust Regression -- Detecting Anomalies -- One-Class Learning -- Outlier Detection -- Generating Artificial Data -- 7.6 Transforming multiple classes to binary ones -- Simple Methods -- Error-Correcting Output Codes -- Ensembles of Nested Dichotomies -- 7.7 Calibrating class probabilities -- 7.8 Further reading -- 7.9 Weka implementations -- CHAPTER 8 Ensemble Learning -- 8.1 Combining multiple models -- 8.2 Bagging -- Bias-Variance Decomposition -- Bagging with Costs -- 8.3 Randomization -- Randomization versus Bagging -- Rotation Forests -- 8.4 Boosting -- AdaBoost -- The Power of Boosting -- 8.5 Additive regression -- Numeric Prediction -- Additive Logistic Regression -- 8.6 Interpretable ensembles -- Option Trees -- Logistic Model Trees -- 8.7 Stacking -- 8.8 Further reading -- 8.9 Weka implementations -- CHAPTER 9 Moving on: -- 9.1 Applying data mining -- 9.2 Learning from massive datasets -- 9.3 Data stream learning -- 9.4 Incorporating domain knowledge -- 9.5 Text mining -- 9.6 Web mining -- 9.7 Adversarial situations -- 9.8 Ubiquitous data mining -- 9.9 Further reading -- Part III The Weka Data Mining Workbench.

CHAPTER 10 Introduction to Weka -- 10.1 What's in weka? -- 10.2 How do you use it? -- 10.3 What else can you do? -- 10.4 How do you get it? -- CHAPTER 11 The Explorer -- 11.1 Getting started -- Preparing the Data -- Loading the Data into the Explorer -- Building a Decision Tree -- Examining the Output -- Doing It Again -- Working with Models -- When Things Go Wrong -- 11.2 Exploring the explorer -- Loading and Filtering Files -- Converting Files to ARFF -- Using Filters -- Training and Testing Learning Schemes -- Do It Yourself: The User Classifier -- Using a Metalearner -- Clustering and Association Rules -- Attribute Selection -- Visualization -- 11.3 Filtering algorithms -- Unsupervised Attribute Filters -- Adding and Removing Attributes -- Changing Values -- Conversions -- String Conversion -- Multi-Instance Data -- Time Series -- Randomizing -- Unsupervised Instance Filters -- Randomizing and Subsampling -- Sparse Instances -- Supervised Filters -- Supervised Attribute Filters -- Supervised Instance Filters -- 11.4 Learning algorithms -- Bayesian Classifiers -- Trees -- Rules -- Functions -- Neural Networks -- Lazy Classifiers -- Multi-Instance Classifiers -- Miscellaneous Classifiers -- 11.5 Metalearning algorithms -- Bagging and Randomization -- Boosting -- Combining Classifiers -- Cost-Sensitive Learning -- Optimizing Performance -- Retargeting Classifiers for Different Tasks -- 11.6 Clustering algorithms -- 11.7 Association-rule learners -- 11.8 Attribute selection -- Attribute Subset Evaluators -- Single-Attribute Evaluators -- Search methods -- CHAPTER 12 The Knowledge Flow Interface -- 12.1 Getting started -- 12.2 Components -- 12.3 Configuring and connecting the components -- 12.4 Incremental learning -- CHAPTER 13 The Experimenter -- 13.1 Getting started -- Running an Experiment -- Analyzing the Results -- 13.2 Simple setup.

13.3 Advanced setup.
Abstract:
Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Subject Term:
Electronic Access:
Click to View
Holds: Copies: