Cover image for Practical Bioinformatician.
Practical Bioinformatician.
Title:
Practical Bioinformatician.
Author:
Wong, Limsoon.
ISBN:
9789812562340
Personal Author:
Physical Description:
1 online resource (539 pages)
Contents:
DEDICATION -- PREFACE -- CONTENTS -- MOLECULAR BIOLOGY FOR MOLECULAR BIOLOGY FOR -- 1. Introduction -- 2. Our Molecular Selves -- 2.1. Cells, DNAs, and Chromosomes -- 2.2. Genes and Genetic Variations -- 2.2.1. Mutations and Genetic Diseases -- 3. Our Biological Machineries -- 3.1. From Gene to Protein: The Central Dogma -- 3.2. The Genetic Code -- 3.3. Gene Regulation and Expression -- 4. Tools of the Trade -- 4.1. Basic Operations -- 4.1.1. Cutting DNA -- 4.1.2. Copying DNA -- 4.1.3. Separating DNA -- 4.1.4. Matching DNA -- 4.2. Putting It Together -- 4.2.1. Genome Sequencing -- 4.2.2. Gene Expression Profiling -- 5. Conclusion -- STRATEGY AND PLANNING OF BIOINFORMATICS EXPERIMENTS -- 1. Multi-Dimensional Bioinformatics -- 2. Reasoning and Strategy -- 3. Planning -- DATA MINING TECHNIQUES FOR THE PRACTICAL BIOINFORMATICIAN -- 1. Feature Selection Methods -- 1.1. Curse of Dimensionality -- 1.2. Signal-to-Noise Measure -- 1.3. T-Test Statistical Measure -- 1.4. Fisher Criterion Score -- 1.5. Entropy Measure -- 1.6. x2 Measure -- 1.7. Information Gain -- 1.8. Information Gain Ratio -- 1.9. Wilcoxon Rank Sum Test -- 1.10. Correlation-Based Feature Selection -- 1.11. Principal Component Analysis -- 1.12. Use of Feature Selection in Bioinformatics -- 2. Classification Methods -- 2.1. Decision Trees -- 2.2. Bayesian Methods -- 2.3. Hidden Markov Models -- 2.4. Artificial Neural Networks -- 2.5. Support Vector Machines -- 2.6. Prediction by Collective Likelihood of Emerging Patterns -- 3. Association Rules Mining Algorithms -- 3.1. Association Rules -- 3.2. The Apriori Algorithm -- 3.3. The Max-Miner Algorithm -- 3.4. Use of Association Rules in Bioinformatics -- 4. Clustering Methods -- 4.1. Factors Affecting Clustering -- 4.2. Partitioning Methods -- 4.3. Hierarchical Methods -- 4.4. Use of Clustering in Bioinformatics -- 5. Remarks.

TECHNIQUES FOR RECOGNITION OF TRANSLATION INITIATION SITES -- 1. Translation Initiation Sites -- 2. Data Set and Evaluation -- 3. Recognition by Perceptrons -- 4. Recognition by Artificial Neural Networks -- 5. Recognition by Engineering of Support Vector Machine Kernels -- 6. Recognition by Feature Generation, Feature Selection, and Feature Integration -- 6.1. Feature Generation -- 6.2. Feature Selection -- 6.3. Feature Integration for Decision Making -- 7. Improved Recognition by Feature Generation, Feature Selection, and Feature Integration -- 8. Recognition by Linear Discriminant Function -- 9. Recognition by Ribosome Scanning -- 10. Remarks -- HOW NEURAL NETWORKS FIND PROMOTERS USING RECOGNITION OF MICRO-STRUCTURAL PROMOTER COMPONENTS -- 1. Motivation and Background -- 1.1. Problem Framework -- 1.2. Promoter Recognition -- 1.3. ANN-Based Promoter Recognition -- 2. CharacteristicMotifs of Eukaryotic Promoters -- 3. Motif-Based Search for Promoters -- 3.1. Evaluation Study by Fickett and Hatzigeorgiou -- 3.2. Enhancers May Contribute to False Recognition -- 4. ANNs and Promoter Components -- 4.1. Description of Promoter Recognition Problem -- 4.2. Representation of Nucleotides for Network Processing -- 5. Structural Decomposition -- 5.1. Parallel Composition of Feature Detectors -- 5.2. First- and Second-Level ANNs -- 5.3. Cascade Composition of Feature Detectors -- 5.4. Structures Based on Multilayer Perceptrons -- 6. Time-Delay Neural Networks -- 6.1. Multistate TDNN -- 6.2. Pruning ANN Connections -- 7. Comments on Performance of ANN-Based Programs for Eukaryotic Promoter Prediction -- NEURAL-STATISTICAL MODEL OF TATA-BOX MOTIFS IN EUKARYOTES -- 1. Promoter Recognition via Recognition of TATA-Box -- 2. PositionWeight Matrix and Statistical Analysis of the TATA-Box and Its Neighborhood.

2.1. TATA Motifs as One of the Targets in the Search for Eukaryotic Promoters -- 2.2. Data Sources and Data Sets -- 2.3. Recognition Quality -- 2.4. Statistical Analysis of TATA Motifs -- 2.4.1. PWM -- 2.4.2. Numerical Characterization of Segments Around TATA-Box -- 2.4.3. Position Analysis of TATA Motifs -- 2.4.4. Characteristics of Segments S1, S2, and S3 -- 2.5. Concluding Remarks -- 3. LVQ ANN for TATA-Box Recognition -- 3.1. Data Preprocessing: Phase 1 -- 3.2. Data Preprocessing: Phase 2- Principal Component Analysis -- 3.3. Learning Vector Quantization ANN -- 3.4. The Structure of an LVQ Classifier -- 3.5. LVQ ANN Training -- 3.6. Initial Values for Network Parameters -- 3.6.1. Genetic Algorithm -- 3.6.2. Searching for Good Initial Weights -- 4. Final Model of TATA-BoxMotif -- 4.1. Structure of Final System for TATA-Box Recognition -- 4.2. Training of the ANN Part of the Model -- 4.3. Performance of Complete System -- 4.3.1. Comparison of MLVQ and System with Single LVQ ANN -- 4.4. The Final Test -- 5. Summary -- TUNING THE DRAGON PROMOTER FINDER SYSTEM FOR HUMAN PROMOTER RECOGNITION -- 1. Promoter Recognition -- 2. Dragon Promoter Finder -- 3. Model -- 4. Tuning Data -- 4.1. Promoter Data -- 4.2. Non-Promoter Data -- 5. Tuning Process -- 6. Discussions and Conclusions -- RNA SECONDARY STRUCTURE PREDICTION -- 1. Introduction to RNA Secondary Structures -- 2. RNA Secondary Structure Determination Experiments -- 3. RNA Structure Prediction Based on Sequence -- 4. Structure Prediction in the Absence of Pseudoknot -- 4.1. Loop Energy -- 4.2. First RNA Secondary Structure Prediction Algorithm -- 4.3. Speeding up Multi-Loops -- 4.4. Speeding Up Internal Loops -- 5. Structure Prediction in the Presence of Pseudoknots -- 6. Akutsu's Algorithm -- 6.1. Definition of Simple Pseudoknot -- 6.2. RNA Secondary Structure Prediction with Simple Pseudoknots.

7. Approximation Algorithm for Predicting Secondary Structure with General Pseudoknots -- PROTEIN SUBCELLULAR LOCALIZATION PREDICTION -- 1. Motivation -- 2. Biology of Localization -- 3. Experimental Techniques for Determining Localization Sites -- 3.1. Traditional Methods -- 3.1.1. Immunofluorescence Microscopy -- 3.1.2. Gradient Centrifugation -- 3.2. Large-Scale Experiments -- 3.2.1. ImmunofluorescentMicroscopy -- 3.2.2. Green Fluorescent Protein -- 3.2.3. Comments on Large-Scale Experiments -- 4. Issues and Complications -- 4.1. How Many Sites are There and What are They? -- 4.2. Is One Site Per Protein an Adequate Model? -- 4.3. How Good are the Predictions? -- 4.3.1. Which Method Should I Use? -- 4.4. A Caveat Regarding Estimated Prediction Accuracies -- 4.5. Correlation and Causality -- 5. Localization and Machine Learning -- 5.1. Standard Classifiers Using (Generalized) Amino Acid Content -- 5.2. Localization Process Modeling Approach -- 5.3. Sequence Based Machine Learning Approaches with Architectures Designed to Reflect Localization Signals -- 5.4. Nearest Neighbor Algorithms -- 5.5. Feature Discovery -- 5.6. Extraction of Localization Information from the Literature and Experimental Data -- 6. Conclusion -- HOMOLOGY SEARCH METHODS -- 1. Overview -- 2. Edit Distance and Alignments -- 2.1. Edit Distance -- 2.2. Optimal Alignments -- 2.3. More Complicated Objectives -- 2.3.1. Score Matrices -- 2.3.2. Gap Penalties -- 3. Sequence Alignment: Dynamic Programming -- 3.1. Dynamic Programming Algorithm for Sequence Alignment -- 3.1.1. Reducing Memory Needs -- 3.2. Local Alignment -- 3.3. Sequence Alignment with Gap Open Penalty -- 4. Probabilistic Approaches to Sequence Alignment -- 4.1. Scoring Matrices -- 4.1.1. PAM Matrices -- 4.1.2. BLOSUM Matrices -- 4.1.3. Weaknesses of this Approach -- 4.2. Probabilistic Alignment Significance.

5. Second Generation Homology Search: Heuristics -- 5.1. FASTA and BLAST -- 5.2. Large-Scale Global Alignment -- 6. Next-Generation Homology Search Software -- 6.1. Improved Alignment Seeds -- 6.2. Optimized Spaced Seeds and Why They Are Better -- 6.3. Computing Optimal Spaced Seeds -- 6.4. Computing More Realistic Spaced Seeds -- 6.4.1. Optimal Seeds for Coding Regions -- 6.4.2. Optimal Seeds for Variable-Length Regions -- 6.5. Approaching Smith-Waterman Sensitivity Using Multiple Seed Models -- 6.6. Complexity of Computing Spaced Seeds -- 7. Experiments -- Acknowledgments -- ANALYSIS OF PHYLOGENY: A CASE STUDY ON SAURURACEAE -- 1. The What,Why, and How of Phylogeny -- 1.1. What is Phylogeny? -- 1.2. Why Study Phylogeny? -- 1.3. How to Study Phylogeny -- 2. Case Study on Phylogeny of Saururaceae -- 3. Materials and Methods -- 3.1. Plant Materials -- 3.2. DNA Extraction, PCR, and Sequencing -- 3.3. Alignment of Sequences -- 3.4. Parsimony Analysis of Separate DNA Sequences -- 3.5. Parsimony Analysis of Combined DNA Sequences -- 3.6. Parsimony Analysis of Morphological Data -- 3.7. Analysis of Each Morphological Characters -- 4. Results -- 4.1. Phylogeny of Saururaceae from 18S Nuclear Genes -- 4.2. Phylogeny of Saururaceae from trnL-F Chloroplast DNA Sequences -- 4.3. Phylogeny of Saururaceae from matR Mitochondrial Genes -- 4.4. Phylogeny of Saururaceae from Combined DNA Sequences -- 4.5. Phylogeny of Saururaceae from Morphological Data -- 5. Discussion -- 5.1. Phylogeny of Saururaceae -- 5.2. The Differences Among Topologies from 18S, trnL-F, and matR -- 5.3. Analysis of Important Morphological Characters -- 6. Suggestions -- 6.1. Sampling -- 6.2. Selecting Out-Group -- 6.3. Gaining Sequences -- 6.4. Aligning -- 6.5. Analyzing -- 6.6. Dealing with Morphological Data -- 6.7. Comparing Phylogenies Separately from Molecular Data and Morphological Data.

6.8. Doing Experiments.
Abstract:
Computer scientists have increasingly been enlisted as "bioinformaticians" to assist molecular biologists in their research. This book is a practical introduction to bioinformatics for these computer scientists. The chapters are in-depth discussions by expert bioinformaticians on both general techniques and specific approaches to a range of selected bioinformatics problems. The book is organized into clusters of chapters on the following topics:. Overview of modern molecular biology and a broad spectrum of techniques from computer science - data mining, machine learning, mathematical modeling, sequence alignment, data integration, workflow development, etc. In-depth discussion of computational recognition of functional and regulatory sites in DNA sequences. Incisive discussion of computational prediction of secondary structure of RNA sequences. Overview of computational prediction of protein cellular localization, and selected discussions of inference of protein function. Overview of methods for discovering protein-protein interactions. Detailed discussion of approaches to gene expression analysis for the diagnosis of diseases, the treatment of diseases, and the understanding of gene functions. Case studies on analysis of phylogenies, functional annotation of proteins, construction of purpose-built integrated biological databases, and development of workflows underlying the large-scale-effort gene discovery. Sample Chapter(s). Chapter 4: Techniques for Recognition of Translation Initiation Sites (385 KB). Chapter 10: Homology Search Methods (483 KB). Contents: Molecular Biology for the Practical Bioinformatician; Strategy and Planning of Bioinformatics Experiments; Data Mining Techniques for the Practical Bioinformatician; Techniques for Recognition of Translation Initiation Sites; How Neural Networks Find Promoters Using Recognition of

Micro-Structural Promoter Components; Neural-Statistical Model of TATA-Box Motifs in Eukaryotes; Tuning the Dragon Promoter Finder System for Human Promoter Recognition; RNA Secondary Structure Prediction; Protein Localization Prediction; Homology Search Methods; Analysis of Phylogeny: A Case Study on Saururaceae; Functional Annotation and Protein Families: From Theory to Practice; Discovering Protein-Protein Interactions; Techniques for Analysis of Gene Expression; Genome-Wide cDNA Oligo Probe Design and Its Applications in i>Schizosaccharomyces Pombe; Mining New Motifs from cDNA Sequence Data; Technologies for Biological Data Integration; Construction of Biological Databases: A Case Study on the Protein Phosphatase DataBase (PPDB); A Family Classification Approach to Functional Annotation of Proteins; Informatics for Efficient EST-Based Gene Discovery in Normalized and Subtracted cDNA Libraries. Readership: Computer scientists planning to be a bioinformatician; computer science undergraduates in their sophomore and/or senior years.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: