Computational Text Analysis : for functional genomics and bioinformatics.

Title:

Author:

Raychaudhuri, Soumya.

ISBN:

9780191513770

Personal Author:

Raychaudhuri, Soumya.

Physical Description:

1 online resource (325 pages)

Contents:

Contents -- List of Figures -- List of Plates -- List of Tables -- 1 An introduction to text analysis in genomics -- 1.1 The genomics literature -- 1.2 Using text in genomics -- 1.2.1 Building databases of genetic knowledge -- 1.2.2 Analyzing experimental genomic data sets -- 1.2.3 Proposing new biological knowledge: identifying candidate genes -- 1.3 Publicly available text resources -- 1.3.1 Electronic text -- 1.3.2 Genome resources -- 1.3.3 Gene ontology -- 1.4 The advantage of text-based methods -- 1.5 Guide to this book -- 2 Functional genomics -- 2.1 Some molecular biology -- 2.1.1 Central dogma of molecular biology -- 2.1.2 Deoxyribonucleic acid -- 2.1.3 Ribonucleic acid -- 2.1.4 Genes -- 2.1.5 Proteins -- 2.1.6 Biological function -- 2.2 Probability theory and statistics -- 2.2.1 Probability -- 2.2.2 Conditional probability -- 2.2.3 Independence -- 2.2.4 Bayes' theorem -- 2.2.5 Probability distribution functions -- 2.2.6 Information theory -- 2.2.7 Population statistics -- 2.2.8 Measuring performance -- 2.3 Deriving and analyzing sequences -- 2.3.1 Sequencing -- 2.3.2 Homology -- 2.3.3 Sequence alignment -- 2.3.4 Pairwise sequence alignment and dynamic programming -- 2.3.5 Linear time pairwise alignment: BLAST -- 2.3.6 Multiple sequence alignment -- 2.3.7 Comparing sequences to profiles: weight matrices -- 2.3.8 Position specific iterative BLAST -- 2.3.9 Hidden Markov models -- 2.4 Gene expression profiling -- 2.4.1 Measuring gene expression with arrays -- 2.4.2 Measuring gene expression by sequencing and counting transcripts -- 2.4.3 Expression array analysis -- 2.4.4 Unsupervised grouping: clustering -- 2.4.5 k-means clustering -- 2.4.6 Self-organizing maps -- 2.4.7 Hierarchical clustering -- 2.4.8 Dimension reduction with principal components analysis.

2.4.9 Combining expression data with external information: supervised machine learning -- 2.4.10 Nearest neighbor classification -- 2.4.11 Linear discriminant analysis -- 3 Textual profiles of genes -- 3.1 Representing documents as word vectors -- 3.2 Metrics to compare documents -- 3.3 Some words are more important for document similarity -- 3.4 Building a vocabulary: feature selection -- 3.5 Weighting words -- 3.6 Latent semantic indexing -- 3.7 Defining textual profiles for genes -- 3.8 Using text like genomics data -- 3.9 A simple strategy to assigning keywords to groups of genes -- 3.10 Querying genes for biological function -- 4 Using text in sequence analysis -- 4.1 SWISS-PROT records as a textual resource -- 4.2 Using sequence similarity to extend literature references -- 4.3 Assigning keywords to summarize sequences hits -- 4.4 Using textual profiles to organize sequence hits -- 4.5 Using text to help identify remote homology -- 4.6 Modifying iterative sequence similarity searches to include text -- 4.7 Evaluating PSI-BLAST modified to include text -- 4.8 Combining sequence and text together -- 5 Text-based analysis of a single series of gene expression measurements -- 5.1 Pitfalls of gene expression analysis: noise -- 5.2 Phosphate metabolism: an example -- 5.3 The top fifteen genes -- 5.4 Distinguishing true positives from false positives with a literature-based approach -- 5.5 Neighbor expression information -- 5.6 Application to phosphate metabolism data set -- 5.7 Recognizing high induction false positives with literature-based scores -- 5.8 Recognizing low induction false positives -- 5.9 Assessing experiment quality with literature-based scoring -- 5.10 Improvements -- 5.11 Application to other assays -- 5.12 Assigning keywords that describe the broad biology of the experiment -- 6 Analyzing groups of genes.

6.1 Functional coherence of a group of genes -- 6.2 Overview of computational approach -- 6.3 Strategy to evaluate different algorithms -- 6.4 Word distribution divergence -- 6.5 Best article score -- 6.6 Neighbor divergence -- 6.6.1 Calculating a theoretical distribution of scores -- 6.6.2 Quantifying the difference between the empirical score distribution and the theoretical one -- 6.7 Neighbor divergence per gene -- 6.8 Corruption studies -- 6.9 Application of functional coherence scoring to screen gene expression clusters -- 6.10 Understanding the gene group's function -- 7 Analyzing large gene expression data sets -- 7.1 Groups of genes -- 7.2 Assigning keywords -- 7.3 Screening gene expression clusters -- 7.4 Optimizing cluster boundaries: hierarchical clustering -- 7.5 Application to other organisms besides yeast -- 7.6 Identifying and optimizing clusters in a Drosophila development data set -- 8 Using text classfication for gene function annotation -- 8.1 Functional vocabularies and gene annotation -- 8.1.1 Gene Ontology -- 8.1.2 Enzyme Commission -- 8.1.3 Kyoto Encyclopedia of Genes and Genomes -- 8.2 Text classification -- 8.3 Nearest neighbor classification -- 8.4 Naive Bayes classification -- 8.5 Maximum entropy classification -- 8.6 Feature selection: choosing the best words for classification -- 8.7 Classifying documents into functional categories -- 8.8 Comparing classifiers -- 8.9 Annotating genes -- 9 Finding gene names -- 9.1 Strategies to identify gene names -- 9.2 Recognizing gene names with a dictionary -- 9.3 Using word structure and appearance to identify gene names -- 9.4 Using syntax to eliminate gene name candidates -- 9.5 Using context as a clue about gene names -- 9.6 Morphology -- 9.7 Identifying gene names and their abbreviations -- 9.8. A single unified gene name finding algorithm -- 10 Protein interaction networks.

10.1 Genetic networks -- 10.2 Experimental assays to identify protein networks -- 10.2.1 Yeast two hybrid -- 10.2.2 Affinity precipitation -- 10.3 Predicting interactions versus verifying interactions with scientific text -- 10.4 Networks of co-occurring genes -- 10.5 Protein interactions and gene name co-occurrence in text -- 10.6 Number of textual co-occurrences predicts likelihood of an experimentally predicted interaction -- 10.7 Information extraction and genetic networks: increasing specificity and identifying interaction type -- 10.8 Statistical machine learning -- 11 Conclusion -- Index -- A -- B -- C -- D -- E -- F -- G -- H -- I -- J -- K -- L -- M -- N -- O -- P -- Q -- R -- S -- T -- U -- V -- W -- X -- Y -- Z.

Abstract:

This book brings together the two disparate worlds of computational text analysis and biology and presents some of the latest methods and applications to proteomics, sequence analysis and gene expression data. Modern genomics generates large and comprehensive data sets but their interpretation requires an understanding of a vast number of genes, their complex functions, and interactions. Keeping up with the literature on a single gene is a challenge itself-for thousands of genes it is simply. impossible. Here, Soumya Raychaudhuri presents the techniques and algorithms needed to access and utilize the vast scientific text, i.e. methods that automatically "read" the literature on all the genes. Including background chapters on the necessary biology, statistics and genomics, in addition to practical examples of interpreting many different types of modern experiments, this book is ideal for students and researchers in computational biology, bioinformatics, genomics, statistics and computer science.

Local Note:

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

Subject Term:

Bioinformatics.

Computational biology.

Electronic books. -- local.

Genomics -- Data processing.

Genre:

Electronic books.

Electronic Access:

Click to View

Holds: Copies:

Available:*

Bound With These Titles

On Order