9 Getting, cleaning and wrangling data

9.1 A Beginner’s Guide to Clean Data - beginners-guide-to-clean-data

Benjamin Greve

This book will help you to become a better data scientist by showing you the things that can go wrong when working with data - particularly low-quality data. A key difference between a junior and a senior data scientist is the awareness of potential pitfalls. The experienced data scientist will expect them, navigate around them and avoid costly iteration cycles. After reading this book, you will be able to spot data quality problems and deal with them before they can break your work, saving yourself a lot of time.

https://b-greve.gitbook.io/beginners-guide-to-clean-data/

9.2 21 Recipes for Mining Twitter Data with rtweet

Bob Rudis

The recipes contained in this book use the rtweet package by Michael W. Kearney.

https://rud.is/books/21-recipes/

9.3 Text Mining with R

Julia Silge and David Robinson

This book serves as an introduction of text mining using the tidytext package and other tidy tools in R. The functions provided by the tidytext package are relatively simple; what is important are the possible applications. Thus, this book provides compelling examples of real text mining problems.

https://www.tidytextmining.com/

9.4 Spreadsheet Munging Strategies

Duncan Garmonsway

This is a work-in-progress book about getting data out of spreadsheets, no matter how peculiar. The book is designed primarily for R users who have to extract data from spreadsheets and who are already familiar with the tidyverse. It has a cookbook structure, and can be used as a reference, but readers who begin in the middle might have to work backwards from time to time.

https://nacnudus.github.io/spreadsheet-munging-strategies/