11 Getting, cleaning and wrangling data
11.1 A Beginner’s Guide to Clean Data - beginners-guide-to-clean-data
Benjamin Greve
This book will help you to become a better data scientist by showing you the things that can go wrong when working with data - particularly low-quality data. A key difference between a junior and a senior data scientist is the awareness of potential pitfalls. The experienced data scientist will expect them, navigate around them and avoid costly iteration cycles. After reading this book, you will be able to spot data quality problems and deal with them before they can break your work, saving yourself a lot of time.
11.2 21 Recipes for Mining Twitter Data with rtweet
Bob Rudis
The recipes contained in this book use the rtweet package by Michael W. Kearney.
11.3 Text Mining with R
Julia Silge and David Robinson
This book serves as an introduction of text mining using the tidytext package and other tidy tools in R. The functions provided by the tidytext package are relatively simple; what is important are the possible applications. Thus, this book provides compelling examples of real text mining problems.
11.4 Spreadsheet Munging Strategies
Duncan Garmonsway
This is a work-in-progress book about getting data out of spreadsheets, no matter how peculiar. The book is designed primarily for R users who have to extract data from spreadsheets and who are already familiar with the tidyverse. It has a cookbook structure, and can be used as a reference, but readers who begin in the middle might have to work backwards from time to time.