27 Workflow

27.1 Agile Data Science with R

Edwin Thoen

I joined a Scrum team (frontend, backend, ux designer, product owner, second data scientist) to create a machine learning model that we brought to production using the Agile principles. It was an inspiring experience from which I learned a great deal. My colleagues patiently explained the principles of Agile software development and together we applied them to the data science context.All these experiences culminated in the workflow that we now adhere to at work and I think it is worthwhile to share it. It is heavily based on the principles of Agile software production, hence the title. We have explored which of the concepts from Agile did and did not work for data science and we got hands-on experience in working from these principles in an R project that actually got to production.

https://edwinth.github.io/ADSwR/

27.2 The Data Validation Cookbook

Mark P.J. van der Loo

The purposes of this book include demonstrating the main tools and workflows of the validate package, giving examples of common data validation tasks, and showing how to analyze data validation results.

https://data-cleaning.github.io/validate/

27.3 How I Use R

David Keyes // R for the Rest of Us

There are many great learning resources at the beginner stage and some incredible tutorials to master complex tasks in R. But, drawing from a concept in urban planning, there are far fewer resources in the middle.

Stretching the metaphor perhaps to its breaking point, new R users at the “detached single-family home” stage can’t get to the advanced “mid-rise” level without going through the middle stage. The “missing middle” in the R neighborhood is the lack of resources to that answer the types of nuts and bolts questions that new R users often have. Things like:

How should I organize my file structure when creating a new project? Should I do data cleaning in an RMarkdown file or an R script file? How do I find packages? How do I know if the packages I find are high quality?

This book is my attempt to provide answers to these types of questions.

https://howiuser.com/

27.4 Github actions with R

Chris Brown, Murray Cadzow, Paula A Martinez, Rhydwyn McGuire, David Neuzerling, David Wilkinson, Saras Windecker

GitHub actions allow us to trigger automated steps after we launch GitHub interactions such as when we push, pull, submit a pull request, or write an issue.

https://ropenscilabs.github.io/actions_sandbox/

27.5 The targets R Package User Manual

Will Landau

The targets package is a Make-like pipeline toolkit for Statistics and data science in R. With targets, you can maintain a reproducible workflow without repeating yourself. targets learns how your pipeline fits together, skips costly runtime for tasks that are already up to date, runs only the necessary computation, supports implicit parallel computing, abstracts files as R objects, and shows tangible evidence that the results match the underlying code and data.

This manual is a step-by-step written guide to targets

https://wlandau.github.io/targets-manual/index.html