Cover image for Textual Information Access : Statistical Models.
Textual Information Access : Statistical Models.
Title:
Textual Information Access : Statistical Models.
Author:
Gaussier, Eric.
ISBN:
9781118562802
Personal Author:
Edition:
1st ed.
Physical Description:
1 online resource (399 pages)
Series:
Iste
Contents:
Cover -- Textual Information Access -- Title Page -- Copyright Page -- Table of Contents -- Introduction -- PART 1: INFORMATION RETRIEVAL -- Chapter 1. Probabilistic Models for Information Retrieval -- 1.1. Introduction -- 1.1.1. Heuristic retrieval constraints -- 1.2. 2-Poisson models -- 1.3. Probability ranking principle (PRP) -- 1.3.1. Reformulation -- 1.3.2. BM25 -- 1.4. Language models -- 1.4.1. Smoothing methods -- 1.4.2. The Kullback-Leibler model -- 1.4.3. Noisy channel model -- 1.4.4. Some remarks -- 1.5. Informational approaches -- 1.5.1. DFR models -- 1.5.2. Information-based models -- 1.6. Experimental comparison -- 1.7. Tools for information retrieval -- 1.8. Conclusion -- 1.9. Bibliography -- Chapter 2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval -- 2.1. Introduction -- 2.1.1. Ranking of instances -- 2.1.2. Ranking of alternatives -- 2.1.3. Relation to existing frameworks -- 2.2. Application to automatic text summarization -- 2.2.1. Presentation of the application -- 2.2.2. Automatic summary and learning -- 2.3. Application to information retrieval -- 2.3.1. Application presentation -- 2.3.2. Search engines and learning -- 2.3.3. Experimental results -- 2.4. Conclusion -- 2.5. Bibliography -- PART 2: CLASSIFICATION AND CLUSTERING -- Chapter 3. Logistic Regression and Text Classification -- 3.1. Introduction -- 3.2. Generalized linear model -- 3.3. Parameter estimation -- 3.4. Logistic regression -- 3.4.1. Multinomial logistic regression -- 3.5. Model selection -- 3.5.1. Ridge regularization -- 3.5.2. LASSO regularization -- 3.5.3. Selected Ridge regularization -- 3.6. Logistic regression applied to text classification -- 3.6.1. Problem statement -- 3.6.2. Data pre-processing -- 3.6.3. Experimental results -- 3.7. Conclusion -- 3.8. Bibliography.

Chapter 4. Kernel Methods for Textual Information Access -- 4.1. Kernel methods: context and intuitions -- 4.2. General principles of kernel methods -- 4.3. General problems with kernel choices (kernel engineering) -- 4.4. Kernel versions of standard algorithms: examples of solvers -- 4.4.1. Kernel logistic regression -- 4.4.2. Support vector machines -- 4.4.3. Principal component analysis -- 4.4.4. Other methods -- 4.5. Kernels for text entities -- 4.5.1. "Bag-of-words" kernels -- 4.5.2. Semantic kernels -- 4.5.3. Diffusion kernels -- 4.5.4. Sequence kernels -- 4.5.5. Tree kernels -- 4.5.6. Graph kernels -- 4.5.7. Kernels derived from generative models -- 4.6. Summary -- 4.7. Bibliography -- Chapter 5. Topic-based Generative Models for Text Information Access -- 5.1. Introduction -- 5.1.1. Generative versus discriminative models -- 5.1.2. Text models -- 5.1.3. Estimation, prediction and smoothing -- 5.1.4. Terminology and notations -- 5.2. Topic-based models -- 5.2.1. Fundamental principles -- 5.2.2. Illustration -- 5.2.3. General framework -- 5.2.4. Geometric interpretation -- 5.2.5. Application to text categorization -- 5.3. Topic models -- 5.3.1. Probabilistic Latent Semantic Indexing -- 5.3.2. Latent Dirichlet Allocation -- 5.3.3. Conclusion -- 5.4. Term models -- 5.4.1. Limitations of the multinomial -- 5.4.2. Dirichlet compound multinomial -- 5.4.3. DCM-LDA -- 5.5. Similarity measures between documents -- 5.5.1. Language models -- 5.5.2. Similarity between topic distributions -- 5.5.3. Fisher kernels -- 5.6. Conclusion -- 5.7. Appendix: topic model software -- 5.8. Bibliography -- Chapter 6. Conditional Random Fields for Information Extraction -- 6.1. Introduction -- 6.2. Information extraction -- 6.2.1. The task -- 6.2.2. Variants -- 6.2.3. Evaluations -- 6.2.4. Approaches not based on machine learning.

6.3. Machine learning for information extraction -- 6.3.1. Usage and limitations -- 6.3.2. Some applicable machine learning methods -- 6.3.3. Annotating to extract -- 6.4. Introduction to conditional random fields -- 6.4.1. Formalization of a labelling problem -- 6.4.2. Maximum entropy model approach -- 6.4.3. Hidden Markov model approach -- 6.4.4. Graphical models -- 6.5. Conditional random fields -- 6.5.1. Definition -- 6.5.2. Factorization and graphical models -- 6.5.3. Junction tree -- 6.5.4. Inference in CRFs -- 6.5.5. Inference algorithms -- 6.5.6. Training CRFs -- 6.6. Conditional random fields and their applications -- 6.6.1. Linear conditional random fields -- 6.6.2. Links between linear CRFs and hidden Markov models -- 6.6.3. Interests and applications of CRFs -- 6.6.4. Beyond linear CRFs -- 6.6.5. Existing libraries -- 6.7. Conclusion -- 6.8. Bibliography -- PART 3: MULTILINGUALISM -- Chapter 7. Statistical Methods for Machine Translation -- 7.1. Introduction -- 7.1.1. Machine translation in the age of the Internet -- 7.1.2. Organization of the chapter -- 7.1.3. Terminological remarks -- 7.2. Probabilistic machine translation: an overview -- 7.2.1. Statistical machine translation: the standard model -- 7.2.2. Word-based models and their limitations -- 7.2.3. Phrase-based models -- 7.3. Phrase-based models -- 7.3.1. Building word alignments -- 7.3.2. Word alignment models: a summary -- 7.3.3. Extracting bisegments -- 7.4. Modeling reorderings -- 7.4.1. The space of possible reorderings -- 7.4.2. Evaluating permutations -- 7.5. Translation: a search problem -- 7.5.1. Combining models -- 7.5.2. The decoding problem -- 7.5.3. Exact search algorithms -- 7.5.4. Heuristic search algorithms -- 7.5.5. Decoding: a solved problem? -- 7.6. Evaluating machine translation -- 7.6.1. Subjective evaluations -- 7.6.2. The BLEU metric.

7.6.3. Alternatives to BLEU -- 7.6.4. Evaluating machine translation: an open problem -- 7.7. State-of-the-art and recent developments -- 7.7.1. Using source context -- 7.7.2. Hierarchical models -- 7.7.3. Translating with linguistic resources -- 7.8. Useful resources -- 7.8.1. Bibliographic data and online resources -- 7.8.2. Parallel corpora -- 7.8.3. Tools for statistical machine translation -- 7.9. Conclusion -- 7.10. Acknowledgments -- 7.11. Bibliography -- PART 4: EMERGING APPLICATIONS -- Chapter 8. Information Mining: Methods and Interfaces for Accessing Complex Information -- 8.1. Introduction -- 8.2. The multidimensional visualization of information -- 8.2.1. Accessing information based on the knowledge of the structured domain -- 8.2.2. Visualization of a set of documents via their content -- 8.2.3. OLAP principles applied to document sets -- 8.3. Domain mapping via social networks -- 8.4. Analyzing the variability of searches and data merging -- 8.4.1. Analysis of IR engine results -- 8.4.2. Use of data unification -- 8.5. The seven types of evaluation measures used in IR -- 8.6. Conclusion -- 8.7. Acknowledgments -- 8.8. Bibliography -- Chapter 9. Opinion Detection as a Topic Classification Problem -- 9.1. Introduction -- 9.2. The TREC and TAC evaluation campaigns -- 9.2.1. Opinion detection by question-answering -- 9.2.2. Automatic summarization of opinions -- 9.2.3. The text mining challenge of opinion classification (DEFT (DÉfi Fouille de Textes)) -- 9.3. Cosine weights - a second glance -- 9.4. Which components for a opinion vectors? -- 9.4.1. How to pass from words to terms? -- 9.5. Experiments -- 9.5.1. Performance, analysis, and visualization of the results on the IMDB corpus -- 9.6. Extracting opinions from speech: automatic analysis of phone polls -- 9.6.1. France Télécom opinion investigation corpus.

9.6.2. Automatic recognition of spontaneous speech in opinion corpora -- 9.6.3. Evaluation -- 9.7. Conclusion -- 9.8. Bibliography -- Appendix A. Probabilistic Models: An Introduction -- A.1. Introduction -- A.2. Supervised categorization -- A.2.1. Filtering documents -- A.2.2. The Bernoulli model -- A.2.3. The multinomial model -- A.2.4. Evaluating categorization systems -- A.2.5. Extensions -- A.2.6. A first summary -- A.3. Unsupervised learning: the multinomial mixture model -- A.3.1. Mixture models -- A.3.2. Parameter estimation -- A.3.3. Applications -- A.4. Markov models: statistical models for sequences -- A.4.1. Modeling sequences -- A.4.2. Estimating a Markov model -- A.4.3. Language models -- A.5. Hidden Markov models -- A.5.1. The model -- A.5.2. Algorithms for hidden Markov models -- A.6. Conclusion -- A.7. A primer of probability theory -- A.7.1. Probability space, event -- A.7.2. Conditional independence and probability -- A.7.3. Random variables, moments -- A.7.4. Some useful distributions -- A.8. Bibliography -- List of Authors -- Index.
Abstract:
This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access: - information extraction and retrieval; - text classification and clustering; - opinion mining; - comprehension aids (automatic summarization, machine translation, visualization). In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications concerned, by highlighting the relationship between models and applications and by illustrating the behavior of each model on real collections. Textual Information Access is organized around four themes: informational retrieval and ranking models, classification and clustering (regression logistics, kernel methods, Markov fields, etc.), multilingualism and machine translation, and emerging applications such as information exploration. Contents Part 1: Information Retrieval 1. Probabilistic Models for Information Retrieval, Stéphane Clinchant and Eric Gaussier. 2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval, Massih-Réza Amini, David Buffoni, Patrick Gallinari,
 Tuong Vinh Truong and Nicolas Usunier. Part 2: Classification and Clustering 3. Logistic Regression and Text Classification, Sujeevan Aseervatham, Eric Gaussier, Anestis Antoniadis,
 Michel Burlet and Yves Denneulin. 4. Kernel Methods for Textual Information Access, Jean-Michel Renders. 5. Topic-Based Generative Models for Text 
Information Access, Jean-Cédric Chappelier. 6. Conditional Random Fields for Information Extraction, Isabelle Tellier and Marc Tommasi. Part 3: Multilingualism 7. Statistical Methods for Machine Translation, Alexandre Allauzen and François Yvon. Part 4: Emerging

Applications 8. Information Mining: Methods and Interfaces for Accessing Complex Information, Josiane Mothe, Kurt Englmeier and Fionn Murtagh. 9. Opinion Detection as a Topic Classification Problem, Juan-Manuel Torres-Moreno, Marc El-Bèze, Patrice Bellot and
 Fréderic Béchet.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Added Author:
Electronic Access:
Click to View
Holds: Copies: