Practical Data Analysis.

Title:

Author:

Cuesta, Hector.

ISBN:

9781783281008

Personal Author:

Cuesta, Hector.

Physical Description:

1 online resource (360 pages)

Contents:

Cover -- Copyright -- Credits -- Foreword -- About the Author -- Acknowledgments -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Getting Started -- Computer science -- Artificial intelligence (AI) -- Machine Learning (ML) -- Statistics -- Mathematics -- Knowledge domain -- Data, information, and knowledge -- The nature of data -- The data analysis process -- The problem -- Data preparation -- Data exploration -- Predictive modeling -- Visualization of results -- Quantitative versus qualitative data analysis -- Importance of data visualization -- What about big data? -- Sensors and cameras -- Social networks analysis -- Tools and toys for this book -- Why Python? -- Why mlpy? -- Why D3.js? -- Why MongoDB? -- Summary -- Chapter 2: Working with Data -- Data sources -- Open data -- Text files -- Excel files -- SQL databases -- NoSQL databases -- Multimedia -- Web scraping -- Data scrubbing -- Statistical methods -- Text parsing -- Data transformation -- Data formats -- CSV -- Parsing a CSV file with the csv module -- Parsing a CSV file using NumPy -- JSON -- Parsing a JSON file using json module -- XML -- Parsing an XML file in Python using xml module -- YAML -- Getting started with OpenRefine -- Text facet -- Clustering -- Text filters -- Numeric facets -- Transforming data -- Exporting data -- Operation history -- Summary -- Chapter 3: Data Visualization -- Data-Driven Documents (D3) -- HTML -- DOM -- CSS -- JavaScript -- SVG -- Getting started with D3.js -- Bar chart -- Pie chart -- Scatter plot -- Single line chart -- Multi-line chart -- Interaction and animation -- Summary -- Chapter 4: Text Classification -- Learning and classification -- Bayesian classification -- Naïve Bayes algorithm -- E-mail subject line tester -- The algorithm -- Classifier accuracy -- Summary.

Chapter 5: Similarity-based Image Retrieval -- Image similarity search -- Dynamic time warping (DTW) -- Processing the image dataset -- Implementing DTW -- Analyzing the results -- Summary -- Chapter 6: Simulation of Stock Prices -- Financial time series -- Random walk simulation -- Monte Carlo methods -- Generating random numbers -- Implementation in D3.js -- Summary -- Chapter 7: Predicting Gold Prices -- Working with the time series data -- Components of a time series -- Smoothing the time series -- The data - historical gold prices -- Nonlinear regression -- Kernel ridge regression -- Smoothing the gold prices time series -- Predicting in the smoothed time series -- Contrasting the predicted value -- Summary -- Chapter 8: Working with Support Vector Machines -- Understanding the multivariate dataset -- Dimensionality reduction -- Linear Discriminant Analysis -- Principal Component Analysis -- Getting started with support vector machine -- Kernel functions -- Double spiral problem -- SVM implemented on mlpy -- Summary -- Chapter 9: Modeling Infectious Disease with Cellular Automata -- Introduction to epidemiology -- The epidemiology triangle -- The epidemic models -- The SIR model -- Solving ordinary differential equation for the SIR model with SciPy -- The SIRS model -- Modelling with cellular automata -- Cell, state, grid, and neighborhood -- Global stochastic contact model -- Simulation of the SIRS model in CA with D3.js -- Summary -- Chapter 10: Working with Social Graphs -- Structure of a graph -- Undirected graph -- Directed graph -- Social Networks Analysis -- Acquiring my Facebook graph -- Using Netvizz -- Representing graphs with Gephi -- Statistical analysis -- Male to female ratio -- Degree distribution -- Histogram of a graph -- Centrality -- Transforming GDF to JSON -- Graph visualization with D3.js -- Summary.

Chapter 11: Sentiment Analysis of Twitter Data -- The anatomy of Twitter data -- Tweet -- Followers -- Trending topics -- Using OAuth to access Twitter API -- Getting started with Twython -- Simple search -- Working with timelines -- Working with followers -- Working with places and trends -- Sentiment classification -- Affective Norms for English Words -- Text corpus -- Getting started with Natural Language Toolkit (NLTK) -- Bag of words -- Naive Bayes -- Sentiment analysis of Tweets -- Summary -- Chapter 12: Data Processing and Aggregation with MongoDB -- Getting started with MongoDB -- Database -- Collection -- Document -- Mongo shell -- Insert/Update/Delete -- Queries -- Data preparation -- Data transformation with OpenRefine -- Inserting documents with PyMongo -- Group -- The aggregation framework -- Pipelines -- Expressions -- Summary -- Chapter 13: Working with MapReduce -- MapReduce overview -- Programming model -- Using MapReduce with MongoDB -- The map function -- The reduce function -- Using mongo shell -- Using UMongo -- Using PyMongo -- Filtering the input collection -- Grouping and aggregation -- Word cloud visualization of the most common positive words in tweets -- Summary -- Chapter 14: Online Data Analysis with IPython and Wakari -- Getting started with Wakari -- Creating an account in Wakari -- Getting started with IPython Notebook -- Data visualization -- Introduction to image processing with PIL -- Opening an image -- Image histogram -- Filtering -- Operations -- Transformations -- Getting started with Pandas -- Working with time series -- Working with multivariate dataset with DataFrame -- Grouping, aggregation, and correlation -- Multiprocessing with IPython -- Pool -- Sharing your Notebook -- The data -- Summary -- Appendix: Setting Up the Infrastructure -- Installing and running Python 3.

Installing and running Python 3.2 on Ubuntu -- Installing and running IDLE on Ubuntu -- Installing and running Python 3.2 on Windows -- Installing and running IDLE on Windows -- Installing and running NumPy -- Installing and running NumPy on Ubuntu -- Installing and running NumPy on Windows -- Installing and running SciPy -- Installing and running SciPy on Ubuntu -- Installing and running SciPy on Windows -- Installing and running mlpy -- Installing and running mlpy on Ubuntu -- Installing and running mlpy on Windows -- Installing and running OpenRefine -- Installing and running OpenRefine on Linux -- Installing and running OpenRefine on Windows -- Installing and running MongoDB -- Installing and running MongoDB on Ubuntu -- Installing and running MongoDB on Windows -- Connecting Python with MongoDB -- Installing and running UMongo -- Installing and running Umongo on Ubuntu -- Installing and running Umongo on Windows -- Installing and running Gephi -- Installing and running Gephi on Linux -- Installing and running Gephi on Windows -- Index.

Abstract:

Each chapter of the book quickly introduces a key 'theme' of Data Analysis, before immersing you in the practical aspects of each theme. You'll learn quickly how to perform all aspects of Data Analysis.Practical Data Analysis is a book ideal for home and small business users who want to slice & dice the data they have on hand with minimum hassle.

Local Note:

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

Subject Term:

Database management.

Electronic data processing.

Genre:

Electronic books.

Electronic Access:

Click to View

Holds: Copies:

Available:*

Bound With These Titles

On Order