Cover image for Data Smart : Using Data Science to Transform Information into Insight.
Data Smart : Using Data Science to Transform Information into Insight.
Title:
Data Smart : Using Data Science to Transform Information into Insight.
Author:
Foreman, John.
ISBN:
9781118839867
Personal Author:
Edition:
1st ed.
Physical Description:
1 online resource (429 pages)
Contents:
Cover -- Title Page -- Copyright -- Contents -- Chapter 1 Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask -- Some Sample Data -- Moving Quickly with the Control Button -- Copying Formulas and Data Quickly -- Formatting Cells -- Paste Special Values -- Inserting Charts -- Locating the Find and Replace Menus -- Formulas for Locating and Pulling Values -- Using VLOOKUP to Merge Data -- Filtering and Sorting -- Using PivotTables -- Using Array Formulas -- Solving Stuff with Solver -- OpenSolver: I Wish We Didn't Need This, but We Do -- Wrapping Up -- Chapter 2 Cluster Analysis Part I: Using K-Means to Segment Your Customer Base -- Girls Dance with Girls, Boys Scratch Their Elbows -- Getting Real: K-Means Clustering Subscribers in E-mail Marketing -- Joey Bag O' Donuts Wholesale Wine Emporium -- The Initial Dataset -- Determining What to Measure -- Start with Four Clusters -- Euclidean Distance: Measuring Distances as the Crow Flies -- Distances and Cluster Assignments for Everybody! -- Solving for the Cluster Centers -- Making Sense of the Results -- Getting the Top Deals by Cluster -- The Silhouette: A Good Way to Let Different K Values Duke It Out -- How about Five Clusters? -- Solving for Five Clusters -- Getting the Top Deals for All Five Clusters -- Computing the Silhouette for 5-Means Clustering -- K-Medians Clustering and Asymmetric Distance Measurements -- Using K-Medians Clustering -- Getting a More Appropriate Distance Metric -- Putting It All in Excel -- The Top Deals for the 5-Medians Clusters -- Wrapping Up -- Chapter 3 Naïve Bayes and the Incredible Lightness of Being an Idiot -- When You Name a Product Mandrill, You're Going to Get Some Signal and Some Noise -- The World's Fastest Intro to Probability Theory -- Totaling Conditional Probabilities.

Joint Probability, the Chain Rule, and Independence -- What Happens in a Dependent Situation? -- Bayes Rule -- Using Bayes Rule to Create an AI Model -- High-Level Class Probabilities Are Often Assumed to Be Equal -- A Couple More Odds and Ends -- Let's Get This Excel Party Started -- Removing Extraneous Punctuation -- Splitting on Spaces -- Counting Tokens and Calculating Probabilities -- And We Have a Model! Let's Use It -- Wrapping Up -- Chapter 4 Optimization Modeling: Because That "Fresh Squeezed" Orange Juice Ain't Gonna Blend Itself -- Why Should Data Scientists Know Optimization? -- Starting with a Simple Trade-Off -- Representing the Problem as a Polytope -- Solving by Sliding the Level Set -- The Simplex Method: Rooting around the Corners -- Working in Excel -- There's a Monster at the End of This Chapter -- Fresh from the Grove to Your Glass...with a Pit Stop Through a Blending Model -- You Use a Blending Model -- Let's Start with Some Specs -- Coming Back to Consistency -- Putting the Data into Excel -- Setting Up the Problem in Solver -- Lowering Your Standards -- Dead Squirrel Removal: The Minimax Formulation -- If-Then and the "Big M" Constraint -- Multiplying Variables: Cranking Up the Volume to 11 -- Modeling Risk -- Normally Distributed Data -- Wrapping Up -- Chapter 5 Cluster Analysis Part II: Network Graphs and Community Detection -- What Is a Network Graph? -- Visualizing a Simple Graph -- Brief Introduction to Gephi -- Gephi Installation and File Preparation -- Laying Out the Graph -- Node Degree -- Pretty Printing -- Touching the Graph Data -- Building a Graph from the Wholesale Wine Data -- Creating a Cosine Similarity Matrix -- Producing an r-Neighborhood Graph -- How Much Is an Edge Worth? Points and Penalties in Graph Modularity -- What's a Point and What's a Penalty?.

Setting Up the Score Sheet -- Let's Get Clustering! -- Split Number 1 -- Split 2: Electric Boogaloo -- And…Split 3: Split with a Vengeance -- Encoding and Analyzing the Communities -- There and Back Again: A Gephi Tale -- Wrapping Up -- Chapter 6 The Granddaddy of Supervised Artificial Intelligence-Regression -- Wait, What? You're Pregnant? -- Don't Kid Yourself -- Predicting Pregnant Customers at RetailMart Using Linear Regression -- The Feature Set -- Assembling the Training Data -- Creating Dummy Variables -- Let's Bake Our Own Linear Regression -- Linear Regression Statistics: R-Squared, F Tests, t Tests -- Making Predictions on Some New Data and Measuring Performance -- Predicting Pregnant Customers at RetailMart Using Logistic Regression -- First You Need a Link Function -- Hooking Up the Logistic Function and Reoptimizing -- Baking an Actual Logistic Regression -- Model Selection-Comparing the Performance of the Linear and Logistic Regressions -- For More Information -- Wrapping Up -- Chapter 7 Ensemble Models: A Whole Lot of Bad Pizza -- Using the Data from Chapter 6 -- Bagging: Randomize, Train, Repeat -- Decision Stump Is an Unsexy Term for a Stupid Predictor -- Doesn't Seem So Stupid to Me! -- You Need More Power! -- Let's Train It -- Evaluating the Bagged Model -- Boosting: If You Get It Wrong, Just Boost and Try Again -- Training the Model-Every Feature Gets a Shot -- Evaluating the Boosted Model -- Wrapping Up -- Chapter 8 Forecasting: Breathe Easy -- You Can't Win -- The Sword Trade Is Hopping -- Getting Acquainted with Time Series Data -- Starting Slow with Simple Exponential Smoothing -- Setting Up the Simple Exponential Smoothing Forecast -- You Might Have a Trend -- Holt's Trend-Corrected Exponential Smoothing.

Setting Up Holt's Trend-Corrected Smoothing in a Spreadsheet -- So Are You Done? Looking at Autocorrelations -- Multiplicative Holt-Winters Exponential Smoothing -- Setting the Initial Values for Level, Trend, and Seasonality -- Getting Rolling on the Forecast -- And...Optimize! -- Please Tell Me We're Done Now!!! -- Putting a Prediction Interval around the Forecast -- Creating a Fan Chart for Effect -- Wrapping Up -- Chapter 9 Outlier Detection: Just Because They're Odd Doesn't Mean They're Unimportant -- Outliers Are (Bad?) People, Too -- The Fascinating Case of Hadlum v. Hadlum -- Tukey Fences -- Applying Tukey Fences in a Spreadsheet -- The Limitations of This Simple Approach -- Terrible at Nothing, Bad at Everything -- Preparing Data for Graphing -- Creating a Graph -- Getting the k Nearest Neighbors -- Graph Outlier Detection Method 1: Just Use the Indegree -- Graph Outlier Detection Method 2: Getting Nuanced with k-Distance -- Graph Outlier Detection Method 3: Local Outlier Factors Are Where It's At -- Wrapping Up -- Chapter 10 Moving from Spreadsheets into R -- Getting Up and Running with R -- Some Simple Hand-Jamming -- Reading Data into R -- Doing Some Actual Data Science -- Spherical K-Means on Wine Data in Just a Few Lines -- Building AI Models on the Pregnancy Data -- Forecasting in R -- Looking at Outlier Detection -- Wrapping Up -- Conclusion -- Where Am I? What Just Happened? -- Before You Go-Go -- Get to Know the Problem -- We Need More Translators -- Beware the Three-Headed Geek-Monster: Tools, Performance, and Mathematical Perfection -- You Are Not the Most Important Function of Your Organization -- Get Creative and Keep in Touch! -- Index.
Abstract:
Data Science gets thrown around in the press like it's magic. Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions. But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope. Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that's done within the familiar environment of a spreadsheet.  Why a spreadsheet? It's comfortable! You get to look at the data every step of the way, building confidence as you learn the tricks of the trade. Plus, spreadsheets are a vendor-neutral place to learn data science without the hype.  But don't let the Excel sheets fool you. This is a book for those serious about learning the analytic techniques, the math and the magic, behind big data.  Each chapter will cover a different technique in a spreadsheet so you can follow along: Mathematical optimization, including non-linear programming and genetic algorithms Clustering via k-means, spherical k-means, and graph modularity Data mining in graphs, such as outlier detection Supervised AI through logistic regression, ensemble models, and bag-of-words models Forecasting, seasonal adjustments, and prediction intervals through monte carlo simulation Moving from spreadsheets into the R programming language You get your hands dirty as you work alongside John through each technique. But never fear, the topics are readily applicable and the author laces humor throughout. You'll even learn what a dead squirrel has to do with optimization

modeling, which you no doubt are dying to know.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Added Author:
Electronic Access:
Click to View
Holds: Copies: