Cover image for Principles of Big Data : Preparing, Sharing, and Analyzing Complex Information.
Principles of Big Data : Preparing, Sharing, and Analyzing Complex Information.
Title:
Principles of Big Data : Preparing, Sharing, and Analyzing Complex Information.
Author:
Berman, Jules J.
ISBN:
9780124047242
Personal Author:
Physical Description:
1 online resource (288 pages)
Contents:
Front Cover -- Principles of Big Data: Preparing,Sharing,and Analyzing Complex Information -- Copyright -- Dedication -- Contents -- Acknowledgments -- Author Biography -- Preface -- Introduction -- Definition of Big Data -- Big Data Versus Small Data -- Whence Comest Big Data? -- The Most Common Purpose of Big Data is to Produce Small Data -- Opportunities -- Big Data Moves to the Center of the Information Universe -- Chapter 1: Providing Structure to Unstructured Data -- Background -- Machine Translation -- Autocoding -- Indexing -- Term Extraction -- Chapter 2: Identification, Deidentification, and Reidentification -- Background -- Features of an Identifier System -- Registered Unique Object Identifiers -- Really Bad Identifier Methods -- Embedding Information in an Identifier: Not Recommended -- One-Way Hashes -- Use Case: Hospital Registration -- Deidentification -- Data Scrubbing -- Reidentification -- Lessons Learned -- Chapter 3: Ontologies and Semantics -- Background -- Classifications, the Simplest of Ontologies -- Ontologies, Classes with Multiple Parents -- Choosing a Class Model -- Introduction to Resource Description Framework Schema -- Common Pitfalls in Ontology Development -- Chapter 4: Introspection -- Background -- Knowledge of Self -- eXtensible Markup Language -- Introduction to Meaning -- Namespaces and the Aggregation of Meaningful Assertions -- Resource Description Framework Triples -- Reflection -- Use Case: Trusted Time Stamp -- Summary -- Chapter 5: Data Integration and Software Interoperability -- Background -- The Committee to Survey Standards -- Standard Trajectory -- Specifications and Standards -- Versioning -- Compliance Issues -- Interfaces to Big Data Resources -- Chapter 6: Immutability and Immortality -- Background -- Immutability and Identifiers -- Data Objects -- Legacy Data -- Data Born from Data.

Reconciling Identifiers across Institutions -- Zero-Knowledge Reconciliation -- The Curatorś Burden -- Chapter 7: Measurement -- Background -- Counting -- Gene Counting -- Dealing with Negations -- Understanding Your Control -- Practical Significance of Measurements -- Obsessive-Compulsive Disorder: The Mark of a Great Data Manager -- Chapter 8: Simple but Powerful Big Data Techniques -- Background -- Look At the Data -- Data Range -- Denominator -- Frequency Distributions -- Mean and Standard Deviation -- Estimation-Only Analyses -- Use Case: Watching Data Trends with Google Ngrams -- Use Case: Estimating Movie Preferences -- Chapter 9: Analysis -- Background -- Analytic Tasks -- Clustering, Classifying, Recommending, and Modeling -- Clustering Algorithms -- Classifier Algorithms -- Recommender Algorithms -- Modeling Algorithms -- Data Reduction -- Normalizing and Adjusting Data -- Big Data Software: Speed and Scalability -- Find Relationships, Not Similarities -- Chapter 10: Special Considerations in Big Data Analysis -- Background -- Theory in Search of Data -- Data in Search of a Theory -- Overfitting -- Bigness Bias -- Too Much Data -- Fixing Data -- Data Subsets in Big Data: Neither Additive nor Transitive -- Additional Big Data Pitfalls -- Chapter 11: Stepwise Approach to Big Data Analysis -- Background -- Step 1. A Question Is Formulated -- Step 2. Resource Evaluation -- Step 3. A Question Is Reformulated -- Step 4. Query Output Adequacy -- Step 5. Data Description -- Step 6. Data Reduction -- Step 7. Algorithms Are Selected, if Absolutely Necessary -- Step 8. Results Are Reviewed and Conclusions Are Asserted -- Step 9. Conclusions Are Examined and Subjected to Validation -- Chapter 12: Failure -- Background -- Failure Is Common -- Failed Standards -- Complexity -- When Does Complexity Help? -- When Redundancy Fails -- Save Money.

Dont́ Protect Harmless Information -- After Failure -- Use Case: Cancer Biomedical Informatics Grid, a Bridge too Far -- Chapter 13: Legalities -- Background -- Responsibility for the Accuracy and Legitimacy of Contained Data -- Rights to Create, Use, and Share the Resource -- Copyright and Patent Infringements Incurred by Using Standards -- Protections for Individuals -- Consent -- Unconsented Data -- Good Policies Are a Good Policy -- Use Case: The Havasupai Story -- Chapter 14: Societal Issues -- Background -- How Big Data Is Perceived -- The Necessity of Data Sharing, Even When It Seems Irrelevant -- Reducing Costs and Increasing Productivity with Big Data -- Public Mistrust -- Saving Us from Ourselves -- Hubris and Hyperbole -- Chapter 15: The Future -- Background -- Will Big Data, Being Computationally Complex, Require a New Generation of Supercomputers? -- Will Big Data Achieve a Level of Complexity That Exceeds Our Ability to Fully Understand or Trust? -- Will We Need Armies of Computer Scientists Trained with the Most Advanced Techniques in Supercomputing? -- Will Big Data Create New Categories of Data Professionals for Which There Are Currently No Training Programs? -- Will Standardized Methods for Data Representation Be Uniformly Adopted, Thus Supporting Data Integration and Software Inter ... -- Will Big Data Be Accessible to the Public? -- Will Big Data Do More Harm Than Good? -- Can We Expect That Big Data Catastrophes Will Disrupt Vital Services, Cripple National Economies, and Destabilize World Pol ... -- Will Big Data Provide Answers to Important Questions That Could Not Otherwise Be Solved? -- Last Words -- Glossary -- References -- Index.
Abstract:
Principles of Big Data helps readers avoid the common mistakes that endanger all Big Data projects. By stressing simple, fundamental concepts, this book teaches readers how to organize large volumes of complex data, and how to achieve data permanence when the content of the data is constantly changing. General methods for data verification and validation, as specifically applied to Big Data resources, are stressed throughout the book. The book demonstrates how adept analysts can find relationships among data objects held in disparate Big Data resources, when the data objects are endowed with semantic support (i.e., organized in classes of uniquely identified data objects). Readers will learn how their data can be integrated with data from other resources, and how the data extracted from Big Data resources can be used for purposes beyond those imagined by the data creators. Learn general methods for specifying Big Data in a way that is understandable to humans and to computers Avoid the pitfalls in Big Data design and analysis Understand how to create and use Big Data safely and responsibly with a set of laws, regulations and ethical standards that apply to the acquisition, distribution and integration of Big Data resources.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: