Cover image for Text Mining - Applications and Theory : Applications and Theory.
Text Mining - Applications and Theory : Applications and Theory.
Title:
Text Mining - Applications and Theory : Applications and Theory.
Author:
Berry, Michael W.
ISBN:
9780470689653
Personal Author:
Edition:
2nd ed.
Physical Description:
1 online resource (223 pages)
Contents:
Text Mining -- Contents -- List of Contributors -- Preface -- PART I TEXT EXTRACTION, CLASSIFICATION, AND CLUSTERING -- 1 Automatic keyword extraction from individual documents -- 1.1 Introduction -- 1.1.1 Keyword extraction methods -- 1.2 Rapid automatic keyword extraction -- 1.2.1 Candidate keywords -- 1.2.2 Keyword scores -- 1.2.3 Adjoining keywords -- 1.2.4 Extracted keywords -- 1.3 Benchmark evaluation -- 1.3.1 Evaluating precision and recall -- 1.3.2 Evaluating efficiency -- 1.4 Stoplist generation -- 1.5 Evaluation on news articles -- 1.5.1 The MPQA Corpus -- 1.5.2 Extracting keywords from news articles -- 1.6 Summary -- 1.7 Acknowledgements -- References -- 2 Algebraic techniques for multilingual document clustering -- 2.1 Introduction -- 2.2 Background -- 2.3 Experimental setup -- 2.4 Multilingual LSA -- 2.5 Tucker1 method -- 2.6 PARAFAC2 method -- 2.7 LSA with term alignments -- 2.8 Latent morpho-semantic analysis (LMSA) -- 2.9 LMSA with term alignments -- 2.10 Discussion of results and techniques -- 2.11 Acknowledgements -- References -- 3 Content-based spam email classification using machine-learning algorithms -- 3.1 Introduction -- 3.2 Machine-learning algorithms -- 3.2.1 Naive Bayes -- 3.2.2 LogitBoost -- 3.2.3 Support vector machines -- 3.2.4 Augmented latent semantic indexing spaces -- 3.2.5 Radial basis function networks -- 3.3 Data preprocessing -- 3.3.1 Feature selection -- 3.3.2 Message representation -- 3.4 Evaluation of email classification -- 3.5 Experiments -- 3.5.1 Experiments with PU1 -- 3.5.2 Experiments with ZH1 -- 3.6 Characteristics of classifiers -- 3.7 Concluding remarks -- 3.8 Acknowledgements -- References -- 4 Utilizing nonnegative matrix factorization for email classification problems -- 4.1 Introduction -- 4.1.1 Related work -- 4.1.2 Synopsis -- 4.2 Background -- 4.2.1 Nonnegative matrix factorization.

4.2.2 Algorithms for computing NMF -- 4.2.3 Datasets -- 4.2.4 Interpretation -- 4.3 NMF initialization based on feature ranking -- 4.3.1 Feature subset selection -- 4.3.2 FS initialization -- 4.4 NMF-based classification methods -- 4.4.1 Classification using basis features -- 4.4.2 Generalizing LSI based on NMF -- 4.5 Conclusions -- 4.6 Acknowledgements -- References -- 5 Constrained clustering with k-means type algorithms -- 5.1 Introduction -- 5.2 Notations and classical k-means -- 5.3 Constrained k-means with Bregman divergences -- 5.3.1 Quadratic k-means with cannot-link constraints -- 5.3.2 Elimination of must-link constraints -- 5.3.3 Clustering with Bregman divergences -- 5.4 Constrained smoka type clustering -- 5.5 Constrained spherical k-means -- 5.5.1 Spherical k-means with cannot-link constraints only -- 5.5.2 Spherical k-means with cannot-link and must-link constraints -- 5.6 Numerical experiments -- 5.6.1 Quadratic k-means -- 5.6.2 Spherical k-means -- 5.7 Conclusion -- References -- PART II ANOMALY AND TREND DETECTION -- 6 Survey of text visualization techniques -- 6.1 Visualization in text analysis -- 6.2 Tag clouds -- 6.3 Authorship and change tracking -- 6.4 Data exploration and the search for novel patterns -- 6.5 Sentiment tracking -- 6.6 Visual analytics and FutureLens -- 6.7 Scenario discovery -- 6.7.1 Scenarios -- 6.7.2 Evaluating solutions -- 6.8 Earlier prototype -- 6.9 Features of FutureLens -- 6.10 Scenario discovery example: bioterrorism -- 6.11 Scenario discovery example: drug trafficking -- 6.12 Future work -- References -- 7 Adaptive threshold setting for novelty mining -- 7.1 Introduction -- 7.2 Adaptive threshold setting in novelty mining -- 7.2.1 Background -- 7.2.2 Motivation -- 7.2.3 Gaussian-based adaptive threshold setting -- 7.2.4 Implementation issues -- 7.3 Experimental study -- 7.3.1 Datasets.

7.3.2 Working example -- 7.3.3 Experiments and results -- 7.4 Conclusion -- References -- 8 Text mining and cybercrime -- 8.1 Introduction -- 8.2 Current research in Internet predation and cyberbullying -- 8.2.1 Capturing IM and IRC chat -- 8.2.2 Current collections for use in analysis -- 8.2.3 Analysis of IM and IRC chat -- 8.2.4 Internet predation detection -- 8.2.5 Cyberbullying detection -- 8.2.6 Legal issues -- 8.3 Commercial software for monitoring chat -- 8.4 Conclusions and future directions -- 8.5 Acknowledgements -- References -- PART III TEXT STREAMS -- 9 Events and trends in text streams -- 9.1 Introduction -- 9.2 Text streams -- 9.3 Feature extraction and data reduction -- 9.4 Event detection -- 9.5 Trend detection -- 9.6 Event and trend descriptions -- 9.7 Discussion -- 9.8 Summary -- 9.9 Acknowledgements -- References -- 10 Embedding semantics in LDA topic models -- 10.1 Introduction -- 10.2 Background -- 10.2.1 Vector space modeling -- 10.2.2 Latent semantic analysis -- 10.2.3 Probabilistic latent semantic analysis -- 10.3 Latent Dirichlet allocation -- 10.3.1 Graphical model and generative process -- 10.3.2 Posterior inference -- 10.3.3 Online latent Dirichlet allocation (OLDA) -- 10.3.4 Illustrative example -- 10.4 Embedding external semantics from Wikipedia -- 10.4.1 Related Wikipedia articles -- 10.4.2 Wikipedia-influenced topic model -- 10.5 Data-driven semantic embedding -- 10.5.1 Generative process with data-driven semantic embedding -- 10.5.2 OLDA algorithm with data-driven semantic embedding -- 10.5.3 Experimental design -- 10.5.4 Experimental results -- 10.6 Related work -- 10.7 Conclusion and future work -- References -- Index.
Abstract:
"It is extremely useful for practitioners and students in computer science, natural language processing, bioinformatics and engineering who wish to use text mining techniques." (Journal of Information Retrieval, 1 April 2011).
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: