Getting Started with Data Science

Making Sense of Data with Analytics
Author: Murtaza Haider
Publisher: IBM Press
ISBN: 0133991237
Category: Business & Economics
Page: 400
View: 7658

Continue Reading →

Master Data Analytics Hands-On by Solving Fascinating Problems You’ll Actually Enjoy! Harvard Business Review recently called data science “The Sexiest Job of the 21st Century.” It’s not just sexy: For millions of managers, analysts, and students who need to solve real business problems, it’s indispensable. Unfortunately, there’s been nothing easy about learning data science–until now. Getting Started with Data Science takes its inspiration from worldwide best-sellers like Freakonomics and Malcolm Gladwell’s Outliers: It teaches through a powerful narrative packed with unforgettable stories. Murtaza Haider offers informative, jargon-free coverage of basic theory and technique, backed with plenty of vivid examples and hands-on practice opportunities. Everything’s software and platform agnostic, so you can learn data science whether you work with R, Stata, SPSS, or SAS. Best of all, Haider teaches a crucial skillset most data science books ignore: how to tell powerful stories using graphics and tables. Every chapter is built around real research challenges, so you’ll always know why you’re doing what you’re doing. You’ll master data science by answering fascinating questions, such as: • Are religious individuals more or less likely to have extramarital affairs? • Do attractive professors get better teaching evaluations? • Does the higher price of cigarettes deter smoking? • What determines housing prices more: lot size or the number of bedrooms? • How do teenagers and older people differ in the way they use social media? • Who is more likely to use online dating services? • Why do some purchase iPhones and others Blackberry devices? • Does the presence of children influence a family’s spending on alcohol? For each problem, you’ll walk through defining your question and the answers you’ll need; exploring how others have approached similar challenges; selecting your data and methods; generating your statistics; organizing your report; and telling your story. Throughout, the focus is squarely on what matters most: transforming data into insights that are clear, accurate, and can be acted upon.

Getting Started with Data Science

Making Sense of Data with Analytics
Author: Murtaza Haider
Publisher: IBM Press
ISBN: 9780133991024
Category: Computers
Page: 250
View: 8748

Continue Reading →

Master Data Analytics Hands-On by Solving Fascinating Problems You'll Actually Enjoy! Harvard Business Review recently called data science "The Sexiest Job of the 21st Century." It's not just sexy: For millions of managers, analysts, and students who need to solve real business problems, it's indispensable. Unfortunately, there's been nothing easy about learning data science-until now. Getting Started with Data Science takes its inspiration from worldwide best-sellers like Freakonomics and Malcolm Gladwell's Outliers: It teaches through a powerful narrative packed with unforgettable stories. Murtaza Haider offers informative, jargon-free coverage of basic theory and technique, backed with plenty of vivid examples and hands-on practice opportunities. Everything's software and platform agnostic, so you can learn data science whether you work with R, Stata, SPSS, or SAS. Best of all, Haider teaches a crucial skillset most data science books ignore: how to tell powerful stories using graphics and tables. Every chapter is built around real research challenges, so you'll always know why you're doing what you're doing. You'll master data science by answering fascinating questions, such as: * Are religious individuals more or less likely to have extramarital affairs? * Do attractive professors get better teaching evaluations? * Does the higher price of cigarettes deter smoking? * What determines housing prices more: lot size or the number of bedrooms? * How do teenagers and older people differ in the way they use social media? * Who is more likely to use online dating services? * Why do some purchase iPhones and others Blackberry devices? * Does the presence of children influence a family's spending on alcohol? For each problem, you'll walk through defining your question and the answers you'll need; exploring how others have approached similar challenges; selecting your data and methods; generating your statistics; organizing your report; and telling your story. Throughout, the focus is squarely on what matters most: transforming data into insights that are clear, accurate, and can be acted upon.

Getting Started with Python Data Analysis


Author: Phuong Vo.T.H,Martin Czygan
Publisher: Packt Publishing Ltd
ISBN: 1783988452
Category: Computers
Page: 188
View: 8196

Continue Reading →

Learn to use powerful Python libraries for effective data processing and analysis About This Book Learn the basic processing steps in data analysis and how to use Python in this area through supported packages, especially Numpy, Pandas, and Matplotlib Create, manipulate, and analyze your data to extract useful information to optimize your system A hands-on guide to help you learn data analysis using Python Who This Book Is For If you are a Python developer who wants to get started with data analysis and you need a quick introductory guide to the python data analysis libraries, then this book is for you. What You Will Learn Understand the importance of data analysis and get familiar with its processing steps Get acquainted with Numpy to use with arrays and array-oriented computing in data analysis Create effective visualizations to present your data using Matplotlib Process and analyze data using the time series capabilities of Pandas Interact with different kind of database systems, such as file, disk format, Mongo, and Redis Apply the supported Python package to data analysis applications through examples Explore predictive analytics and machine learning algorithms using Scikit-learn, a Python library In Detail Data analysis is the process of applying logical and analytical reasoning to study each component of data. Python is a multi-domain, high-level, programming language. It's often used as a scripting language because of its forgiving syntax and operability with a wide variety of different eco-systems. Python has powerful standard libraries or toolkits such as Pylearn2 and Hebel, which offers a fast, reliable, cross-platform environment for data analysis. With this book, we will get you started with Python data analysis and show you what its advantages are. The book starts by introducing the principles of data analysis and supported libraries, along with NumPy basics for statistic and data processing. Next it provides an overview of the Pandas package and uses its powerful features to solve data processing problems. Moving on, the book takes you through a brief overview of the Matplotlib API and some common plotting functions for DataFrame such as plot. Next, it will teach you to manipulate the time and data structure, and load and store data in a file or database using Python packages. The book will also teach you how to apply powerful packages in Python to process raw data into pure and helpful data using examples. Finally, the book gives you a brief overview of machine learning algorithms, that is, applying data analysis results to make decisions or build helpful products, such as recommendations and predictions using scikit-learn. Style and approach This is an easy-to-follow, step-by-step guide to get you familiar with data analysis and the libraries supported by Python. Topics are explained with real-world examples wherever required.

Data Science For Dummies


Author: Lillian Pierson
Publisher: John Wiley & Sons
ISBN: 1119327644
Category: Computers
Page: 384
View: 3519

Continue Reading →

Discover how data science can help you gain in-depth insight into your business - the easy way! Jobs in data science abound, but few people have the data science skills needed to fill these increasingly important roles. Data Science For Dummies is the perfect starting point for IT professionals and students who want a quick primer on all areas of the expansive data science space. With a focus on business cases, the book explores topics in big data, data science, and data engineering, and how these three areas are combined to produce tremendous value. If you want to pick-up the skills you need to begin a new career or initiate a new project, reading this book will help you understand what technologies, programming languages, and mathematical methods on which to focus. While this book serves as a wildly fantastic guide through the broad, sometimes intimidating field of big data and data science, it is not an instruction manual for hands-on implementation. Here’s what to expect: Provides a background in big data and data engineering before moving on to data science and how it's applied to generate value Includes coverage of big data frameworks like Hadoop, MapReduce, Spark, MPP platforms, and NoSQL Explains machine learning and many of its algorithms as well as artificial intelligence and the evolution of the Internet of Things Details data visualization techniques that can be used to showcase, summarize, and communicate the data insights you generate It's a big, big data world out there—let Data Science For Dummies help you harness its power and gain a competitive edge for your organization.

Data Science from Scratch

First Principles with Python
Author: Joel Grus
Publisher: "O'Reilly Media, Inc."
ISBN: 1491904402
Category: BUSINESS & ECONOMICS
Page: 330
View: 6915

Continue Reading →

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out. Get a crash course in Python Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science Collect, explore, clean, munge, and manipulate data Dive into the fundamentals of machine learning Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering Explore recommender systems, natural language processing, network analysis, MapReduce, and databases

Doing Data Science

Straight Talk from the Frontline
Author: Cathy O'Neil,Rachel Schutt
Publisher: "O'Reilly Media, Inc."
ISBN: 144936389X
Category: Computers
Page: 408
View: 9698

Continue Reading →

Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know. In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science. Topics include: Statistical inference, exploratory data analysis, and the data science process Algorithms Spam filters, Naive Bayes, and data wrangling Logistic regression Financial modeling Recommendation engines and causality Data visualization Social networks and data journalism Data engineering, MapReduce, Pregel, and Hadoop Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

Think Like a Data Scientist

Tackle the Data Science Process Step-by-step
Author: Brian Godsey
Publisher: Manning Publications
ISBN: 9781633430273
Category:
Page: 340
View: 1311

Continue Reading →

Data science is more than just a set of tools and techniques for extracting knowledge from data sets and data streams. Data science is also a process of getting from goals and questions to real, valuable outcomes by exploring, observing, and manipulating a world of data. Traversing this world can be difficult and confusing. Software developers and non-technical folks may struggle with the uncertainty and fuzzy answers that data invariably provide, and statisticians may have trouble working with any of the multitude of relevant software tools that lie outside of their expertise. Others may not even know where to begin. Think Like a Data Scientist presents a step-by-step approach to data science, combining analytic, programming, and business perspectives into easy-to-digest techniques and thought processes for solving real world data-centric problems. This book helps you fill in conceptual knowledge gaps in the daunting fields of statistics and software development, and relates those skills to the real concerns of data science in the business world. As you work though the many practical examples, you'll use your existing knowledge of statistics and programming to solve real problems in data science. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

Developing Analytic Talent

Becoming a Data Scientist
Author: Vincent Granville
Publisher: John Wiley & Sons
ISBN: 1118810090
Category: Computers
Page: 336
View: 4602

Continue Reading →

Learn what it takes to succeed in the the most in-demand tech job Harvard Business Review calls it the sexiest tech job of the 21st century. Data scientists are in demand, and this unique book shows you exactly what employers want and the skill set that separates the quality data scientist from other talented IT professionals. Data science involves extracting, creating, and processing data to turn it into business value. With over 15 years of big data, predictive modeling, and business analytics experience, author Vincent Granville is no stranger to data science. In this one-of-a-kind guide, he provides insight into the essential data science skills, such as statistics and visualization techniques, and covers everything from analytical recipes and data science tricks to common job interview questions, sample resumes, and source code. The applications are endless and varied: automatically detecting spam and plagiarism, optimizing bid prices in keyword advertising, identifying new molecules to fight cancer, assessing the risk of meteorite impact. Complete with case studies, this book is a must, whether you're looking to become a data scientist or to hire one. Explains the finer points of data science, the required skills, and how to acquire them, including analytical recipes, standard rules, source code, and a dictionary of terms Shows what companies are looking for and how the growing importance of big data has increased the demand for data scientists Features job interview questions, sample resumes, salary surveys, and examples of job ads Case studies explore how data science is used on Wall Street, in botnet detection, for online advertising, and in many other business-critical situations Developing Analytic Talent: Becoming a Data Scientist is essential reading for those aspiring to this hot career choice and for employers seeking the best candidates.

Python for Data Science For Dummies


Author: John Paul Mueller,Luca Massaron
Publisher: John Wiley & Sons
ISBN: 1118843983
Category: Computers
Page: 432
View: 5441

Continue Reading →

Unleash the power of Python for your data analysis projects with For Dummies! Python is the preferred programming language for data scientists and combines the best features of Matlab, Mathematica, and R into libraries specific to data analysis and visualization. Python for Data Science For Dummies shows you how to take advantage of Python programming to acquire, organize, process, and analyze large amounts of information and use basic statistics concepts to identify trends and patterns. You’ll get familiar with the Python development environment, manipulate data, design compelling visualizations, and solve scientific computing challenges as you work your way through this user-friendly guide. Covers the fundamentals of Python data analysis programming and statistics to help you build a solid foundation in data science concepts like probability, random distributions, hypothesis testing, and regression models Explains objects, functions, modules, and libraries and their role in data analysis Walks you through some of the most widely-used libraries, including NumPy, SciPy, BeautifulSoup, Pandas, and MatPlobLib Whether you’re new to data analysis or just new to Python, Python for Data Science For Dummies is your practical guide to getting a grip on data overload and doing interesting things with the oodles of information you uncover.

Data Smart

Using Data Science to Transform Information into Insight
Author: John W. Foreman
Publisher: John Wiley & Sons
ISBN: 1118839862
Category: Business & Economics
Page: 432
View: 6202

Continue Reading →

Data Science gets thrown around in the press like it's magic. Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions. But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope. Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that's done within the familiar environment of a spreadsheet. Why a spreadsheet? It's comfortable! You get to look at the data every step of the way, building confidence as you learn the tricks of the trade. Plus, spreadsheets are a vendor-neutral place to learn data science without the hype. But don't let the Excel sheets fool you. This is a book for those serious about learning the analytic techniques, the math and the magic, behind big data. Each chapter will cover a different technique in a spreadsheet so you can follow along: Mathematical optimization, including non-linear programming and genetic algorithms Clustering via k-means, spherical k-means, and graph modularity Data mining in graphs, such as outlier detection Supervised AI through logistic regression, ensemble models, and bag-of-words models Forecasting, seasonal adjustments, and prediction intervals through monte carlo simulation Moving from spreadsheets into the R programming language You get your hands dirty as you work alongside John through each technique. But never fear, the topics are readily applicable and the author laces humor throughout. You'll even learn what a dead squirrel has to do with optimization modeling, which you no doubt are dying to know.

Getting Started with Business Analytics

Insightful Decision-Making
Author: David Roi Hardoon,Galit Shmueli
Publisher: CRC Press
ISBN: 149875967X
Category: Business & Economics
Page: 190
View: 939

Continue Reading →

Assuming no prior knowledge or technical skills, Getting Started with Business Analytics: Insightful Decision-Making explores the contents, capabilities, and applications of business analytics. It bridges the worlds of business and statistics and describes business analytics from a non-commercial standpoint. The authors demystify the main concepts and terminologies and give many examples of real-world applications. The first part of the book introduces business data and recent technologies that have promoted fact-based decision-making. The authors look at how business intelligence differs from business analytics. They also discuss the main components of a business analytics application and the various requirements for integrating business with analytics. The second part presents the technologies underlying business analytics: data mining and data analytics. The book helps you understand the key concepts and ideas behind data mining and shows how data mining has expanded into data analytics when considering new types of data such as network and text data. The third part explores business analytics in depth, covering customer, social, and operational analytics. Each chapter in this part incorporates hands-on projects based on publicly available data. Helping you make sound decisions based on hard data, this self-contained guide provides an integrated framework for data mining in business analytics. It takes you on a journey through this data-rich world, showing you how to deploy business analytics solutions in your organization.

Python for Data Analysis

Data Wrangling with Pandas, NumPy, and IPython
Author: Wes McKinney
Publisher: "O'Reilly Media, Inc."
ISBN: 1491957611
Category: Computers
Page: 550
View: 6995

Continue Reading →

Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process. Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub. Use the IPython shell and Jupyter notebook for exploratory computing Learn basic and advanced features in NumPy (Numerical Python) Get started with data analysis tools in the pandas library Use flexible tools to load, clean, transform, merge, and reshape data Create informative visualizations with matplotlib Apply the pandas groupby facility to slice, dice, and summarize datasets Analyze and manipulate regular and irregular time series data Learn how to solve real-world data analysis problems with thorough, detailed examples

Jupyter for Data Science

Exploratory analysis, statistical modeling, machine learning, and data visualization with Jupyter
Author: Dan Toomey
Publisher: Packt Publishing Ltd
ISBN: 1785883291
Category: Computers
Page: 242
View: 3916

Continue Reading →

Your one-stop guide to building an efficient data science pipeline using Jupyter About This Book Get the most out of your Jupyter notebook to complete the trickiest of tasks in Data Science Learn all the tasks in the data science pipeline—from data acquisition to visualization—and implement them using Jupyter Get ahead of the curve by mastering all the applications of Jupyter for data science with this unique and intuitive guide Who This Book Is For This book targets students and professionals who wish to master the use of Jupyter to perform a variety of data science tasks. Some programming experience with R or Python, and some basic understanding of Jupyter, is all you need to get started with this book. What You Will Learn Understand why Jupyter notebooks are a perfect fit for your data science tasks Perform scientific computing and data analysis tasks with Jupyter Interpret and explore different kinds of data visually with charts, histograms, and more Extend SQL's capabilities with Jupyter notebooks Combine the power of R and Python 3 with Jupyter to create dynamic notebooks Create interactive dashboards and dynamic presentations Master the best coding practices and deploy your Jupyter notebooks efficiently In Detail Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create documents that contain live code, equations, and visualizations. This book is a comprehensive guide to getting started with data science using the popular Jupyter notebook. If you are familiar with Jupyter notebook and want to learn how to use its capabilities to perform various data science tasks, this is the book for you! From data exploration to visualization, this book will take you through every step of the way in implementing an effective data science pipeline using Jupyter. You will also see how you can utilize Jupyter's features to share your documents and codes with your colleagues. The book also explains how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks. By the end of this book, you will comfortably leverage the power of Jupyter to perform various tasks in data science successfully. Style and approach This book is a perfect blend of concepts and practical examples, written in a way that is very easy to understand and implement. It follows a logical flow where you will be able to build on your understanding of the different Jupyter features with every chapter.

Python Data Science Handbook

Essential Tools for Working with Data
Author: Jake VanderPlas
Publisher: "O'Reilly Media, Inc."
ISBN: 1491912138
Category: Computers
Page: 548
View: 9677

Continue Reading →

For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools. Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python. With this handbook, you’ll learn how to use: IPython and Jupyter: provide computational environments for data scientists using Python NumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in Python Pandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python Matplotlib: includes capabilities for a flexible range of data visualizations in Python Scikit-Learn: for efficient and clean Python implementations of the most important and established machine learning algorithms

Data Analysis with Open Source Tools

A Hands-On Guide for Programmers and Data Scientists
Author: Philipp K. Janert
Publisher: "O'Reilly Media, Inc."
ISBN: 1449396658
Category: Computers
Page: 540
View: 3779

Continue Reading →

Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications. Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you. Use graphics to describe data with one, two, or dozens of variables Develop conceptual models using back-of-the-envelope calculations, as well asscaling and probability arguments Mine data with computationally intensive methods such as simulation and clustering Make your conclusions understandable through reports, dashboards, and other metrics programs Understand financial calculations, including the time-value of money Use dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situations Become familiar with different open source programming environments for data analysis "Finally, a concise reference for understanding how to conquer piles of data."--Austin King, Senior Web Developer, Mozilla "An indispensable text for aspiring data scientists."--Michael E. Driscoll, CEO/Founder, Dataspora

R for Data Science

Import, Tidy, Transform, Visualize, and Model Data
Author: Hadley Wickham,Garrett Grolemund
Publisher: "O'Reilly Media, Inc."
ISBN: 1491910364
Category: Computers
Page: 520
View: 4348

Continue Reading →

Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible. Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You’ll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you’ve learned along the way. You’ll learn how to: Wrangle—transform your datasets into a form convenient for analysis Program—learn powerful R tools for solving data problems with greater clarity and ease Explore—examine your data, generate hypotheses, and quickly test them Model—provide a low-dimensional summary that captures true "signals" in your dataset Communicate—learn R Markdown for integrating prose, code, and results

Data Science for Business

What You Need to Know about Data Mining and Data-Analytic Thinking
Author: Foster Provost,Tom Fawcett
Publisher: "O'Reilly Media, Inc."
ISBN: 144937428X
Category: Computers
Page: 414
View: 2670

Continue Reading →

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today. Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making. Understand how data science fits in your organization—and how you can use it for competitive advantage Treat data as a business asset that requires careful investment if you’re to gain real value Approach business problems data-analytically, using the data-mining process to gather good data in the most appropriate way Learn general concepts for actually extracting knowledge from data Apply data science principles when interviewing data science job candidates

Data Scientists at Work


Author: Sebastian Gutierrez
Publisher: Apress
ISBN: 143026599X
Category: Computers
Page: 364
View: 8409

Continue Reading →

Data Scientists at Work is a collection of interviews with sixteen of the world's most influential and innovative data scientists from across the spectrum of this hot new profession. "Data scientist is the sexiest job in the 21st century," according to the Harvard Business Review. By 2018, the United States will experience a shortage of 190,000 skilled data scientists, according to a McKinsey report. Through incisive in-depth interviews, this book mines the what, how, and why of the practice of data science from the stories, ideas, shop talk, and forecasts of its preeminent practitioners across diverse industries: social network (Yann LeCun, Facebook); professional network (Daniel Tunkelang, LinkedIn); venture capital (Roger Ehrenberg, IA Ventures); enterprise cloud computing and neuroscience (Eric Jonas, formerly Salesforce.com); newspaper and media (Chris Wiggins, The New York Times); streaming television (Caitlin Smallwood, Netflix); music forecast (Victor Hu, Next Big Sound); strategic intelligence (Amy Heineike, Quid); environmental big data (André Karpištšenko, Planet OS); geospatial marketing intelligence (Jonathan Lenaghan, PlaceIQ); advertising (Claudia Perlich, Dstillery); fashion e-commerce (Anna Smith, Rent the Runway); specialty retail (Erin Shellman, Nordstrom); email marketing (John Foreman, MailChimp); predictive sales intelligence (Kira Radinsky, SalesPredict); and humanitarian nonprofit (Jake Porway, DataKind). The book features a stimulating foreword by Google's Director of Research, Peter Norvig. Each of these data scientists shares how he or she tailors the torrent-taming techniques of big data, data visualization, search, and statistics to specific jobs by dint of ingenuity, imagination, patience, and passion. Data Scientists at Work parts the curtain on the interviewees’ earliest data projects, how they became data scientists, their discoveries and surprises in working with data, their thoughts on the past, present, and future of the profession, their experiences of team collaboration within their organizations, and the insights they have gained as they get their hands dirty refining mountains of raw data into objects of commercial, scientific, and educational value for their organizations and clients.

Mastering Python for Data Science


Author: Samir Madhavan
Publisher: Packt Publishing Ltd
ISBN: 1784392626
Category: Computers
Page: 294
View: 3846

Continue Reading →

Explore the world of data science through Python and learn how to make sense of data About This Book Master data science methods using Python and its libraries Create data visualizations and mine for patterns Advanced techniques for the four fundamentals of Data Science with Python - data mining, data analysis, data visualization, and machine learning Who This Book Is For If you are a Python developer who wants to master the world of data science then this book is for you. Some knowledge of data science is assumed. What You Will Learn Manage data and perform linear algebra in Python Derive inferences from the analysis by performing inferential statistics Solve data science problems in Python Create high-end visualizations using Python Evaluate and apply the linear regression technique to estimate the relationships among variables. Build recommendation engines with the various collaborative filtering algorithms Apply the ensemble methods to improve your predictions Work with big data technologies to handle data at scale In Detail Data science is a relatively new knowledge domain which is used by various organizations to make data driven decisions. Data scientists have to wear various hats to work with data and to derive value from it. The Python programming language, beyond having conquered the scientific community in the last decade, is now an indispensable tool for the data science practitioner and a must-know tool for every aspiring data scientist. Using Python will offer you a fast, reliable, cross-platform, and mature environment for data analysis, machine learning, and algorithmic problem solving. This comprehensive guide helps you move beyond the hype and transcend the theory by providing you with a hands-on, advanced study of data science. Beginning with the essentials of Python in data science, you will learn to manage data and perform linear algebra in Python. You will move on to deriving inferences from the analysis by performing inferential statistics, and mining data to reveal hidden patterns and trends. You will use the matplot library to create high-end visualizations in Python and uncover the fundamentals of machine learning. Next, you will apply the linear regression technique and also learn to apply the logistic regression technique to your applications, before creating recommendation engines with various collaborative filtering algorithms and improving your predictions by applying the ensemble methods. Finally, you will perform K-means clustering, along with an analysis of unstructured data with different text mining techniques and leveraging the power of Python in big data analytics. Style and approach This book is an easy-to-follow, comprehensive guide on data science using Python. The topics covered in the book can all be used in real world scenarios.