This work provides open-source content for an active learning curriculum in data science. The scope of the content is sufficient for a full-semester introduction to scientifically reproducible statistical computation, data wrangling, visualization, basic statistical literacy, and data-driven modeling. The content is broken into short exercises that introduce new concepts, and longer challenges that encourage students to develop those skills in an open-ended context.
Paid: Free (and open source)
by Jonathan Lin
This is the website for Audit Analytics in R. This audience of this book is for:
Audit leaders who are looking to design their environment to encourage cultivate collaboration and sustainability. Audit data analytics practitioners, who are looking to leverage R in their data analytics tasks. You will learn what tools and technologies are well suited for a modern audit analytics toolkit, as well as learn skills with R to perform data analytics tasks. Consider this book to be your roadmap of practical items to implement and follow.
by Simon Rouchier
The topic of this book is statistical modelling and inference applied to building energy performance assessment. It has two target audiences: building energy researchers and practitioners who need a gentle introduction to statistical modelling; statisticians who may be interested in applications to energy performance.
by Daniel Kaplan
R is closely associated with statistics, but not with calculus. It turns out that R is an excellent language for doing calculus.
This book shows how to do common calculus calculations using R.
“Nobody can be a poet without feeling strong affection for words, at the same time nobody can be serious about data science without becoming close friend to matrices.”
This book is actually a cheat sheet about computing matrix algebra operations such as matrix multiplication, inversion and factorization.
It is written foR (aspiring) data scientists where with “foR” (capital letter R) I mean the side of data science addicted to R and its gorgeous ecosystem especially including Rcpp, RcppArmadillo and RcppEigen.
by Jacob Kaplan
This book introduces the programming language R and is meant for undergrads or graduate students studying criminology. R is a programming language that is well-suited to the type of work frequently done in criminology - taking messy data and turning it into useful information. While R is a useful tool for many fields of study, this book focuses on the skills criminologists should know and uses crime data for the example data sets.
The tutorial is in R. For those without experience programming in R we have a high-level version to help you learn before attempting the full version. Scroll down for a breakdown of the individual sections for an overview of what you will learn throughout.
You will get more familiar with tools from the tidyverse, including dplyr, ggplot2, tibble and purrr. These tools provide an excellent complete ecosystem to do data science in R.
You will learn to create machine learning models and how to fairly assess their performance.
Cryptocurrency Data: You will learn these tools analyzing the latest cryptocurrency data. The tutorial automatically refreshes every 12 hours and the data is publicly available and refreshed hourly.
Dear Data Scientists, Educators, and Data Scientists who are Educators:
This book is a warm welcome and an invitation. If you’re a data scientist in education or an educator in data science, your role isn’t exactly straightforward. This book is our contribution to a growing movement to merge the paths of data analysis and education. We wrote this book to make your first step on that path a little clearer and a little less scary.
This course provides an overview of skills needed for reproducible research and open science using the statistical programming language R. Students will learn about data visualisation, data tidying and wrangling, archiving, iteration and functions, probability and data simulations, general linear models, and reproducible workflows. Learning is reinforced through weekly assignments that involve working with different types of data.
15.10 Discrete Data Analysis with R Visualization and Modeling Techniques for Categorical and Count Data
by Michael Friendly, David Meyer
Presents an applied treatment of modern methods for the analysis of categorical data, both discrete response data and frequency data.
It explains how to use graphical methods for exploring data, spotting unusual features, visualizing fitted models, and presenting results.
The technology of graphs is all around us, and enables so many of the ways in which we live our lives today. That same technology is also available to us at no cost as an analytic tool to allow us to better understand network structures and dynamics in the fields of science, technology, economics, sociology and psychology to name just a few. It is available to academics and practitioners alike, and can be used on problems ranging from a very small network analysis which takes a few minutes on a laptop, to massive scale network mining requiring days or weeks of processing time.
But here’s the problem: few people really know how to do network analysis. It is still considered by many as a deep specialism or even a ‘dark art.’ It shouldn’t be.
This book aims to make the field of graph and network analysis more approachable to students and professionals by explaining the most important elements of theory and sharing common methodologies using open source programming languages like R and Python. It does so by explaining theory in as much detail as is necessary to support analytical curiosity and interpretation, and by using a wide array of example data sets and code snippets to demonstrate the specific implementation and interpretation of methodologies.
It is the author’s firm belief that all people analytics professionals should have a strong understanding of regression models and how to implement and interpret them in practice, and the aim with this book is to provide those who need it with help in getting there.
For accompanying solutions to some of the questions: https://keithmcnulty.github.io/peopleanalytics-regression-book/solutions/
by Jeffrey Leek
A book about how to be a scientist the modern, open-source way. The face of academia is changing. It is no longer sufficient to just publish or perish. We are now in an era where Twitter, Github, Figshare, and Alt Metrics are regular parts of the scientific workflow. Here I give high level advice about which tools to use, how to use them, and what to look out for. This book is appropriate for scientists at all levels who want to stay on top of the current technological developments affecting modern scientific careers.
Paid: Free or pay what you want $10
by Chester Ismay, Albert Y. Kim, Hendrik Feddersen
The intention of this book is to encourage more ‘data driven’ decisions in HR. HR Analytics is not anymore a nice-to-have addon but rather the way HR practitioners should conduct HR decision making in the future. Where applicable, human judgement is ‘added’ onto a rigorous analysis of the data done in the first place.
To achieve this ideal world, I need to equip you with some fundamental knowledge of R and RStudio, which are open-source tools for data scientists. I am well aware that on one side you want to do something for your career in HR, however you are most likely completely new to coding.
by Christoph Hanck, Martin Arnold, Alexander Gerber, Martin Schmelzer
by Shaina Race Bennett
This course is meant to instill a working knowledge of linear algebra terminology and to lay the foundations of advanced data mining techniques like Principal Component Analysis, Factor Analysis, Collaborative Filtering, Correspondence Analysis, Network Analysis, Support Vector Machines and many more.
by Sam Tyner, Ph.D (editor)
This book is for anyone looking to do forensic science analysis in a data-driven and open way. Whether you are a student, teacher, or scientist, this book is for you. We take the latest research, primarily from the Center for Statistics and Applications in Forensic Evidence (CSAFE) and the National Institute of Standards and Technology (NIST) and show you how to solve forensic science problems in R.
by Ken Steif, Ph.D
The goal of this book is to make data science accessible to social scientists and City Planners, in particular. I hope to convince readers that one with strong domain expertise plus intermediate data skills can have a greater impact in government than the sharpest computer scientist who has never studied economics, sociology, public health, political science, criminology etc.
by Julie Lowndes, Allison Horst
This course is for Excel users who want to add or integrate R and RStudio into their existing data analysis toolkit. It is a friendly intro to becoming a modern R user, full of tidyverse, RMarkdown, GitHub, collaboration & reproducibility.
Even though R’ is a terrific option for SEO, there are simply not enough resources out there. This guide is not here to deliver a course about R, there are plenty already. This guide is meant to be as practical as possible. How things should be done in an “R-ish way” is not the purpose of this guide. Grab what you want to grab and feel free to submit your own solution.
Consists of 2 courses
Introductory: This course is most relevant and targeted at folks who work with data, from analysts and program staff to engineers and scientists. This course provides an introduction to the power and possibility of a reproducible programming language (R) by demonstrating how to import, explore, visualize, analyze, and communicate different types of data. Using water resources based examples, this course guides participants through basic data science skills and strategies for continued learning and use of R.
Intermediate: In this course, we will move more quickly, assume familiarity with basic R skills, and also assume that the participant has working experience with more complex workflows, operations, and code-bases. Each module in this course functions as a “stand-alone” lesson, and can be read linearly, or out of order according to your needs and interests. Each module doesn’t necessarily require familiarity with the previous module.
This course emphasizes intermediate scripting skills like iteration, functional programming, writing functions, and controlling project workflows for better reproducibility and efficiency. Approaches to working with more complex data structures like lists and timeseries data, the fundamentals of building Shiny Apps, pulling water resources data from APIs, intermediate mapmaking and spatial data processing, integrating version control in projects with git.
by Brooke Anderson, Karl Broman, Gergely Daróczi, Mario Inchiosa, David Smith, Ali Zaidi
Minecraft is awesome fun, especially in creative mode, where you can build all sorts of crazy stuff. But ambitious building projects can be really tedious to create by hand. With the miner R package, you can write R code to manipulate your Minecraft world and create even more awesome stuff.
Here’s an introduction Rstats NYC conference talk on it: https://www.youtube.com/watch?v=r_JgPF8MJpY
by Michael Freeman, Joel Ross
This book covers the foundation skills necessary to start writing computer programs to work with data using modern and reproducible techniques. It requires no technical background. These materials were developed for the INFO 201: Technical Foundations of Informatics course taught at the University of Washington Information School; however they have been structured to be an online resource for anyone hoping to learn to work with information using programmatic approaches.
Created and maintained by Oscar Baruffa.
Keen to support the site? You're most welcome to
For updates, sign up to my newsletter