13 Essential Data Science Textbooks

7 min read
Data-science-essential-textbooks

Resources Available: Data Science Textbooks

Data Science has been rapidly growing as enterprises are utilizing data in order to make data-driven decisions. Such an approach comes from a systematic framework, involving the stages of data collection, cleaning, visualizations, and mining as both iterative and adaptive processes.

While this domain is still relatively new, there are many resources available to practitioners and newcomers alike, such as video tutorials, web-blog posts, and online data science competitions. Throughout this article, we will go over some of the excellent textbooks that will help you get started or develop a new skill in a data science environment.

Data Science Foundation

In this section, we will introduce books that highlight the skill that is crucial in data science: problem-solving using programming languages in R and Python. These books can be a useful resource for practitioners who want to review coding material that can assist with data visualization and modeling. For those who are starting out in this field, these books will solidify your understanding and build a strong foundation of various concepts and algorithms while using the most supported languages in data science.

1. Data Science Using Python and R

Authors: Chantal D. Larose & Daniel T. Larose
Publisher: Wiley, 1st Edition
Originally Published: April 2019 
Number of Pages: 256

This book gives an introduction to data science through the Data Science Methodology, a scientific framework of an iterative and adaptive approach to the analysis of data. Each chapter provides breakdowns of data mining algorithms provided with both Python and R-programming code snippets and exercises. For example, it introduces decision trees and demonstrates how to model this algorithm on a data set step by step.  An easy, digestible book to start coding in both R and Python right away!

2. Practical Statistics for Data Scientists: 50+ Concepts Essential Concepts Using Python and R

Authors: Peter Bruce, Andrew Bruce & Peter Gedeck
Publisher: O’Reilly Media, 2nd Edition
Originally Published: June 2020
Number of Pages: 368

This book organizes key concepts from statistics that are relevant in data science. Also, the book does an excellent job explaining practical statistics and is easy to navigate for reference. There are examples of outputs and plots in both Python and R languages. A great refresher for anyone interested in statistics for data science!

3. R for Everyone: Advanced Analytics and Graphics

Authors: Jared P. Lander
Publisher: Addison-Wesley Professional, 2nd Edition
Originally Published: June 2017
Number of Pages: 560

This book primarily focuses on working with R to help readers without a programming or statistics background to build powerful statistical models  in research projects. R for Everyone is the right balance between analytics, communication, and computer science. The reader will be equipped with the knowledge to use packages in R, such as Tidyverse and Shiny. Additionally, the author explains complex statistical concepts in a clear and concise manner.

Data Visualizations and Storytelling

In this section, we will introduce books that highlight the skill of creating effective data visualizations and showing different ways to tell a story with data. These books will help analysts with data visualizations and write up insightful reports to present to business stakeholders.

4. Storytelling with Data: A Data Visualization Guide for Business Professionals

Authors: Cole Nussbaumer Knaflic
Publisher: Wiley, 1st Edition
Originally Published: November 2015
Number of Pages: 288

This book discusses the guidelines for creating data visualizations and how to tell a story with your data effectively. The reader will be able to apply this information into any data exploration tasks, supported with concepts from business marketing and information management. An excellent book to learn about core principles of designing data visualizations, and the power of storytelling!

5. Storytelling with Data: Let’s Practice!

Authors: Cole Nussbaumer Knaflic
Publisher: Wiley, 1st Edition
Originally Published: October 2019
Number of Pages: 448
This textbook is an expanded follow-up from the previous with more exercises to give the reader an immersive experience of being a data storyteller. There are interesting supplements to hone in the necessary skills to communicate insights to the audience. Additionally, the reader will be able to practice critical-thinking and problem-solving.

6. Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals

Authors: Brent Dykes
Publisher: Wiley, 1st Edition
Originally Published: December 2019 
Number of Pages: 336

This book is for data practitioners who are passionate about using data to tell an effective story and drive better business decisions. There are guides with detailed information on how to apply the right data visualizations and communication methods to make an impact on the organization. Effective Data Storytelling teaches readers how to communicate data insights effectively to business stakeholders.

Machine Learning

In this section, we will introduce books that highlight the concepts of predictive data analytics, applied data mining, and machine learning.  These books will help analysts with data preparation for modeling, different types of models and model evaluations.

7. Fundamentals of Machine Learning for Predictive Data Analytics

Authors: John D. Kelleher, Brian  Mac Namee & Aoife D’Arcy
Publisher: The MIT Press, 2nd Edition
Originally Published: October 2020
Number of Pages: 856

This book provides a comprehensive introduction to predictive data analytics with a broad range of machine learning approaches. The targeted audience are readers who are new to machine learning. There are examples about theoretical concepts and practical applications in machine learning, such as price prediction, document classification, or customer segmentation. Each concept explains the underlying function and behavior of the models in a business context.

8. Introduction to Data Mining

Authors: Pang-Ning Tan, Michael Steinbach, Anuj Karpatne & Vipin Kumar
Publisher: Pearson, 2nd Edition
Originally Published: January 2018
Number of Pages: 864

This book provides a thorough and comprehensive overview of data mining algorithms, outlining the fundamentals concepts clearly and concisely. The reader will gain a full understanding of the mathematical concepts and the pseudocode implementation of each data mining algorithm. Each data mining algorithm is covered extensively with examples and emphasizes the importance and functionality of the model.

9. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Authors: Peter Bruce, Andrew Bruce & Peter Gedeck
Publisher: O’Reilly Media, 2nd Edition
Originally Published: October 2019
Number of Pages: 856

This book covers concepts in machine learning, although with prerequisites in Python and college-level mathematics.  The reader will learn the necessary concepts and tools to implement models, such as linear regression to deep learning techniques. The tools taught in this book are production-ready code from Python frameworks using the Scikit-Learn, TensorFlow, and Keras packages.

10. Building Machine Learning Powered Applications: Going from Idea to Product

Authors: Emmanuel Ameisen
Publisher: O’Reilly Media, 1st Edition
Originally Published: February 2020
Number of Pages: 260

This textbook teaches the reader about the deployment phase in a data science project. The reader will learn how to design, build and deploy machine learning applications from ideas to products. Building Machine Learning Powered Applications is most suited for data scientists, product designers, and software engineers. There are detailed steps on deploying data science applications from the end results of a project through practical examples in industries.

Data Science Preparation

In this section, we will talk about the books that highlight ways to jumpstart your career as a data scientist. These books will guide you through the necessary steps and teach you the data science tools in the industry.

11. Build a Career in Data Science

Authors: Emily Robinson & Jacqueline Nolis
Publisher: Manning Publications, 1st Edition
Originally Published: March 2020
Number of Pages: 354

This textbook is excellent for readers who are interested in becoming a data scientist. Readers will learn how to create a data science portfolio from scratch. Build a career in Data Science is a well-written comprehensive guide on crafting a robust resume and acing data science interviews. It also outlines the necessary skills needed to become a data scientist.

12. Learning SQL: Generate, Manipulate, and Retrieve Data

Authors: Alan Beaulieu
Publisher: O’Reilly Media, 3rd Edition
Originally Published: April 2020
Number of Pages: 384

Learning SQL is an introductory textbook for beginners with no background knowledge or experience working with relational databases. This textbook teaches SQL concepts via MySQL by finding the right balance of the basics and advanced features. It also provides very thorough explanations on abstract concepts in SQL and step-by-step syntax tutorials on querying, manipulating, and retrieving data from MySQL.

13. OpenIntro Statistics

Authors: David Diez, Mine Cetinkaya-Rundel & Christopher Barr
Publisher: OpenIntro, Inc., 4th Edition
Originally Published: May 2019
Number of Pages: 422

OpenIntro Statistics is a useful statistics textbook for working professionals. Additionally, it serves as an excellent introductory textbook for readers who are new to statistics. Statistics plays a crucial role in data science projects. It serves as the backbone for every data analysis report. The reader will gain a strong foundation in statistical analysis and modeling. This book features the necessary concepts in statistics to prepare the reader for data science interviews.

Conclusion

In summary, we covered different themes of the textbooks listed above, such as data science foundation, data visualizations and effective storytelling, predictive analytics and machine learning, and data science preparation as a career. The data science foundation books highlight  R and Python as the most supported programming languages in data science projects. Then, we introduced books that cover how to create effective data visualizations and story-telling with data. Next, we discussed textbooks that teach concepts on data mining algorithms and machine learning models from scratch to deployment. Lastly, we shared a handful of textbooks that can teach you how to jumpstart your data science career and prepare for data science interviews.

Thank you for reading this article. We hope you are able to find this information useful to help guide you in your data science education and career.

If you would like to learn more about data science, the Master of Science in Applied Data Science (MS-ADS) program at the University of San Diego will help you gain mastery over these skills in their courses.

Considering Earning Your Master’s in Data Science?

Free checklist helps you compare programs, select one that’s ideal for you.