Advances in Genomic Sequence Analysis and Pattern Discovery.

Title:

Author:

Elnitski, Laura.

ISBN:

9789814327732

Personal Author:

Elnitski, Laura.

Physical Description:

1 online resource (236 pages)

Series:

Science, Engineering, and Biology Informatics

Contents:

Contents -- Preface -- About the Editors -- Part I: Pattern Discovery Methods -- Chapter 1: Large-Scale Gene Regulatory Motif Discovery with NestedMICA Matias Piipari, Thomas A. Down and Tim J. P. Hubbard -- 1. Introduction -- 1.1. Assessment of motif inference tools -- 1.2. What is a motif? -- 1.3. Motif inference with additional supporting data -- 1.4. The NestedMICA algorithm -- 1.5. Nested sampling -- 1.6. Mosaic sequence background model -- 2. Results -- 2.1. Choice of sequence regions for motif inference -- 2.2. Finding significant matches -- 2.3. Comparison of NestedMICA Drosophila motifs against reference motifs -- 2.4. Sequence conservation analysis of motif matches -- 2.5. Positional bias analysis - finding motifs close to transcription start sites -- 2.6. Association of motifs with tissue-specific gene expression pattern -- 3. Motif Inference Tutorial -- 3.1. Sequence retrieval and preprocessing -- 3.2. Background model estimation -- 3.3. Motif inference -- 3.4. Visualizing NestedMICA motifs as sequence logos -- 3.5. Motif overrepresentation analysis -- 3.6. Comparison of sequence motifs with a reference motif set -- 4. Conclusions -- References -- Chapter 2: R'MES: A Tool to Find Motifs with a Significantly Unexpected Frequency in Biological Sequences Sophie Schbath and Mark Hoebeke -- 1. Introduction -- 2. User Guide -- 2.1. Getting exceptional frequency scores for words -- 2.2. Getting exceptional frequency scores for word families -- 2.3. Analyzing coding DNA sequences -- 2.4. Getting exceptional skew scores -- 2.5. Utilities -- 2.6. Graphical user interface -- 2.7. Implementation details -- 2.7.1. Main data structures and algorithms -- 2.7.2. Space and time complexity -- 2.7.3. Computation time and memory requirements -- 3. Discussion -- 4. Conclusion -- 5. Methods -- 5.1. Markov chain models -- 5.2. Estimated expected counts.

5.3. Gaussian approximation -- 5.4. Clumping occurrences -- 5.5. Compound Poisson approximation -- References -- Chapter 3: An Intricate Mosaic of Genomic Patterns at Mid-range Scale Alexei Fedorov and Larisa Fedorova -- 1. Introduction -- 2. Results and Discussion -- 2.1. DNA repeats - important elements at genomic mid-range scale -- 2.2. Genomic Mid-Range Inhomogeneity (MRI): Nucleotide compositional extremes and sequence nonrandomness -- 2.2.1. Genomic MRI toolkit -- 2.2.2. (G+C)-rich and (A+T)-rich MRI regions are associated with several unusual DNA structures -- 2.2.3. R-rich/Y-rich MRI regions are associated with H-DNA triplex -- 2.2.4. DNA and RNA properties of GT-rich/AC-rich MRI regions -- 2.2.5. Alternated R/Y MRI regions adopt Z-DNA conformation -- 2.3. Weak periodicities and loose patterns -- 2.3.1. Chromatin periodicities -- 2.3.2. Periodicities in protein-coding sequences -- 2.3.3. Transcription-associated mutational asymmetry in mammals -- 2.4. A complex mosaic of MRI patterns and their fundamental importance -- 2.4.1. Intricate arrangement of genomic MRI patterns -- 2.4.2. The purpose of MRI regions -- 3. Conclusions -- Acknowledgment -- References -- Chapter 4: Motif Finding from Chips to ChIPs Giulio Pavesi -- 1. Introduction -- 2. Profile-Based Methods -The Basics -- 3. Profile-Based Methods - Modeling the Background -- 4. Consensus-Based Methods -The Basics -- 5. Other Methods -- 6. Does Motif FindingWork? -- 7. Chips vs ChIPs -- 8. Conclusions -- References -- Chapter 5: A New Approach to the Discovery of RNA Structural Elements in the Human Genome Lei Hua, Miguel Cervantes-Cervantes and Jason T. L. Wang -- 1. Introduction -- 2. RelatedWork -- 3. Methods -- 4. Results -- 5. Conclusion -- References -- Part II: Performance and Paradigms.

Chapter 6: Benchmarking of Methods for Motif Discovery in DNA Kjetil Klepper, Geir Kjetil Sandve, Morten Beck Rye, Kjersti Hysing Bolstad and Finn Drabløs -- 1. Introduction -- 2. Score Functions -- 2.1. Scoring by known binding sites -- 2.2. The futility theorem -- 2.3. Alternative scoring -- 3. Benchmark Datasets -- 3.1. Criteria for good benchmark datasets -- 3.2. Substring-based datasets -- 3.2.1. Synthetic datasets -- 3.2.2. Single motifs -- 3.2.3. Regulatory modules -- 3.3. Genome-wide datasets -- 4. Benchmarking Without a Benchmark Dataset -- 5. Related Areas -- 6. Conclusion -- Abbreviations -- Acknowledgments -- References -- Chapter 7: Encyclopedias of DNA Elements for Plant Genomes Jens Lichtenberg, Alper Yilmaz, Kyle Kurz, Xiaoyu Liang, Chase Nelson, Thomas Bitterman, Eric Stockinger, Erich Grotewold and Lonnie R. Welch -- 1. Introduction -- 2. C-repeat Binding Factor Genes in Triticeae -- 3. Analysis of the Non-coding Segments in Arabidopsis thaliana -- 4. Enhancement of the Arabidopisis Gene Regulatory Information Server (AGRIS) -- 5. Methods -- 6. Conclusion -- Acknowledgments -- References -- Chapter 8: Manycore High-Performance Computing in Bioinformatics Jean-Stéphane Varré, Bertil Schmidt, Stéphane Janot and Mathieu Giraud -- 1. Introduction -- 1.1. A small history of processors -- 1.1.1. Moore's law -- 1.1.2. Frequencies and the "power wall" -- 1.1.3. Multicore processors -- 1.1.4. Data-parallelism and SIMD -- 1.2. Towards manycore processors -- 1.2.1. GPU processors -- 1.2.2. The CPU/GPU convergence -- 1.2.3. General purpose computation on GPU -- 2. Methods -- 2.1. From GPU tweaks to OpenCL -- 2.2. Programming SIMD work-items -- 2.3. Branches and divergence -- 2.4. Work-groups -- 3. Results -- 3.1. Smith-Waterman sequence alignments -- 3.1.1. Smith-waterman algorithm -- 3.1.2. Mapping onto SIMD registers.

3.1.3. Implementation on Cell/BE -- 3.1.4. Intra-task or inter-task parallelization on GPUs -- 3.1.5. Memory optimization on GPUs -- 3.1.6. Performance comparison -- 3.2. Algorithms on sequence data -- 3.2.1. RNA folding -- 3.2.2. Generic dynamic programming -- 3.2.3. Position Weight Matrices algorithms -- 3.3. Other applications -- 3.3.1. Indexing structures -- 3.3.2. Phylogeny -- 3.3.3. Multiple sequence alignment -- 3.3.4. Motif finding -- 3.3.5. Hidden Markov Models profiles -- 3.3.6. Cell molecules simulation -- 4. Discussion -- 4.1. Challenges in parallel algorithmics -- 4.2. Challenges for bioinformatics analysis -- 5. Conclusion -- References -- Chapter 9: Natural Selection and the Genome Austin L. Hughes -- 1. Introduction -- 2. The Molecular Revolution -- 3. The Neutral Theory -- 4. Positive Selection:The MHC Case -- 5. Codon-Based Methods -- 6. The McDonald-Kreitman Test -- 7. Single Nucleotide Polymorphisms -- 8. Conclusions:The Importance of Purifying Selection -- References -- Index.

Abstract:

Mapping the genomic landscapes is one of the most exciting frontiers of science. We have the opportunity to reverse engineer the blueprints and the control systems of living organisms. Computational tools are key enablers in the deciphering process. This book provides an in-depth presentation of some of the important computational biology approaches to genomic sequence analysis. The first section of the book discusses methods for discovering patterns in DNA and RNA. This is followed by the second section that reflects on methods in various ways, including performance, usage and paradigms.

Local Note:

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

Subject Term:

Gene mapping -- Data processing.

Gene mapping -- Methodology.

Genre:

Added Author:

Electronic Access:

Holds: Copies:

Available:*

Bound With These Titles

On Order