About the Book:
Unlike the
first edition, the new edition has been split into two books.
Thoroughly
revised and updated, this is the first book of the second edition of
Introduction to Data Science: Data Wrangling and Visualization with R. It
introduces skills that can help you tackle real-world data analysis challenges.
These include R programming, data wrangling with dplyr, data visualization with
ggplot2, file organization with UNIX/Linux shell, version control with Git and
GitHub, and reproducible document preparation with Quarto and knitr. The new
edition includes additional material on data. table, locales, and accessing
data through APIs. The book is divided into four parts: R, Data Visualization,
Data Wrangling, and Productivity Tools. Each part has several chapters meant to
be presented as one lecture and includes dozens of exercises. The second book
will cover topics including probability, statistics and prediction algorithms
with R.
Throughout
the book, we use motivating case studies. In each case study, we try to
realistically mimic a data scientist’s experience. For each of the skills
covered, we start by asking specific questions and answer these through data
analysis. Examples of the case studies included in the book are: US murder
rates by state, self-reported student heights, trends in world health and
economics, and the impact of vaccines on infectious disease rates.
This book is meant to
be a textbook for a first course in Data Science. No previous knowledge of R is
necessary, although some experience with programming may be helpful. To be a
successful data analyst implementing these skills covered in this book requires
understanding advanced statistical concepts, such as those covered the second
book. If you read and understand all the chapters and complete all the
exercises in this book, and understand statistical concepts, you will be
well-positioned to perform basic data analysis tasks and you will be prepared
to learn the more advanced concepts and skills needed to become an expert. |
Contents:
Introduction
Part 1: R
1. Getting started
2. R basics
3. Programming basics
4. The tidy verse
5. data. table
6. Importing data
Part 2: Data Visualization
7. Visualizing data distributions
8. ggplot2
9. Data visualization principles
10. Data visualization in practice
Part 3: Data Wrangling
11. Reshaping data
12. Joining tables
13. Parsing dates and times
14. Locales
15. Extracting data from the web
16. String processing
17. Text analysis
Part 4: Productivity Tools
18. Organizing with Unix
19. Git and GitHub
20. Reproducible
projects |
About the Author:
Rafael
A. Irizarry is professor and chair of Data Science at the
Dana-Farber Cancer Institute, professor of biostatistics at Harvard, and a
fellow of the American Statistical Association and the International Society of
Computational Biology. Prof. Irizarry is an applied statistician and during the
last 25 years has worked in diverse areas, including genomics, sound
engineering, and public health surveillance. He disseminates solutions to data
analysis challenges as open-source software, tools that are widely downloaded
and used. Prof. Irizarry has also developed and taught several data science
courses at Harvard as well as popular online courses. |