diff --git a/01-project-introduction.md b/01-project-introduction.md
new file mode 100644
index 0000000..556df93
--- /dev/null
+++ b/01-project-introduction.md
@@ -0,0 +1,276 @@
+---
+title: "Introduction to R and RStudio"
+teaching: 45
+exercises: 4
+---
+
+
+
+::::::::::::::::::::::::::::::::::::: questions 
+
+- How to find your way around RStudio?
+- How to interact with R?
+- How to organise your project files?
+- How to install packages?
+
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Describe the purpose and use of each pane in the RStudio IDE
+- Locate buttons and options in the RStudio IDE
+- Create and use R-projects
+- How to organise and access project files
+
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: keypoints 
+- Use RStudio to write and run R programs.
+- Create and start an R-project
+- Use `install.packages()` to install packages (libraries).
+- Use the `here` package to access project files
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Motivation
+
+Working with data can often be challenging. 
+Data are rarely in the shape and format that is most convenient for your end product. 
+Everyone working with data knows that there is a lot of work that goes into altering the data to make sure you can explore and highlight the interesting aspects of it. 
+In this lesson, we will use the dataset from the
+[palmerpenguins](https://allisonhorst.github.io/palmerpenguins/) R-package, which contains observational data on arctic penguins.
+Data were collected and made available by [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) and the [Palmer Station, Antarctica LTER](https://pal.lternet.edu/), a member of the [Long Term Ecological Research Network](https://lternet.edu/). 
+This lesson focuses on using the [tidyverse](https://www.tidyverse.org/) packages, a opinionated collection of packages that are tailored to the needs of data scientists.
+Can you organise your project in an orderly fashion and access all the files?
+Can you navigate a dataset in R?
+Can you add columns and change column names?
+Can you efficiently summarise the data?
+Can you create visualizations to show key aspects of the data?
+At the end of this lesson, you should be able to do al these things!
+
+## Before Starting The Workshop
+
+Please ensure you have the latest version of R and RStudio installed on your machine. 
+This is important, as some packages used in the workshop may not install correctly (or at all) if R is not up to date.
+
+* [Download and install the latest version of R here](https://www.r-project.org/)
+* [Download and install RStudio here](https://www.rstudio.com/)
+* If you are on a windows computer, also download and install [RTools](https://cran.r-project.org/bin/windows/Rtools/)
+
+## Introduction to RStudio
+
+Welcome to the R portion of the Software Carpentry workshop.
+
+Throughout this lesson, we're going to teach you some of the best-practice ways of working with data and projects using the tidyverse framework for R.
+
+We'll be using RStudio: a free, open source R Integrated Development Environment (IDE). 
+It provides a built in editor, works on all platforms (including on servers) and provides many advantages such as integration with version control and project management.
+
+
+**Basic layout**
+
+When you first open RStudio, you will be greeted by three panels:
+
+  * The interactive R console/Terminal (entire left)  
+  * Environment/History/Connections (tabbed in upper right)  
+  * Files/Plots/Packages/Help/Viewer (tabbed in lower right)  
+
+<img src="fig/01-rstudio.png" alt="RStudio layout with three default panes" style="display: block; margin: auto;" />
+
+
+Once you open files, such as R scripts, an editor panel will also open
+in the top left.
+
+<img src="fig/01-rstudio-script.png" alt="RStudio 4-pane layout with .R file open" style="display: block; margin: auto;" />
+
+## Work flow within RStudio
+There are two main ways one can work within RStudio:
+
+1. Test and play within the interactive R console then copy code into
+a .R file to run later.
+   *  This works well when doing small tests and initially starting off.
+   *  It quickly becomes laborious
+2. Start writing in a .R file and use RStudio's short cut keys for the Run command to push the current line, selected lines or modified lines to the
+interactive R console.
+   * This is a great way to start; all your code is saved for later
+   * You will be able to run the file you create from within RStudio
+   or using R's `source()` function.
+
+::::::::::::::::::::::::::::::::::::: callout 
+
+## Tip: Running segments of your code
+
+RStudio offers you great flexibility in running code from within the editor window. 
+There are buttons, menu choices, and keyboard shortcuts. 
+To run the current line, you can:  
+1. click on the `Run` button above the editor panel, or  
+2. select "Run Lines" from the "Code" menu, or  
+3. hit `ctrl`+`return` in Windows or Linux
+or `cmd`+ `return` on OS X.  
+(This shortcut can also be seen by hovering
+the mouse over the button). To run a block of code, select it and then `Run`.
+
+::::::::::::::::::::::::::::::::::::: 
+
+## Introduction to R
+
+Much of your time in R will be spent in the R interactive console. 
+This is where you will run all of your code, and can be a useful environment to try out ideas before adding them to an R script file. 
+This console in RStudio is the same as the one you would get if you typed in `R` in your command-line environment.
+
+The first thing you will see in the R interactive session is a bunch of information, followed by a ">" and a blinking cursor. 
+In many ways this is similar to the shell environment you learned about during the shell lessons: it operates on the same idea of a "Read, evaluate,
+print loop": you type in commands, R tries to execute them, and then
+returns a result.
+
+
+## Using R-projects
+
+Any data analysis process is naturally incremental, and many projects
+start life as random notes, some code, then a manuscript, and
+eventually everything is a bit mixed together.
+
+<blockquote class="twitter-tweet"><p>Managing your projects in a reproducible fashion doesn't just make your science reproducible, it makes your life easier.</p>— Vince Buffalo (@vsbuffalo) <a href="https://twitter.com/vsbuffalo/status/323638476153167872">April 15, 2013</a></blockquote>
+<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
+
+Most people tend to organize their projects like this:
+
+<img src="fig/01_bad_project.png" alt="Image of a local folder structure with files. The file names do not easily make it possible to understand which files are similar in content or which is the newest version." style="display: block; margin: auto;" />
+
+
+There are many reasons why we should *ALWAYS* avoid this:
+
+1. It is really hard to tell which version of your data is
+the original and which is the modified;
+2. It gets really messy because it mixes files with various
+extensions together;
+3. It probably takes you a lot of time to actually find
+things, and relate the correct figures to the exact code
+that has been used to generate it;
+
+A good project layout will ultimately make your life easier:
+
+* It will help ensure the integrity of your data;
+* It makes it simpler to share your code with someone else
+(a lab-mate, collaborator, or supervisor);
+* It allows you to easily upload your code with your manuscript submission;
+* It makes it easier to pick the project back up after a break.
+
+## A possible solution
+
+Fortunately, there are tools and packages which can help you manage your work effectively.
+
+One of the most powerful and useful aspects of RStudio is its project management functionality. 
+We'll be using this today to create a self-contained, reproducible
+project.
+
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 1: Creating a self-contained project
+
+We're going to create a new project in RStudio:  
+
+1. Click the "File" menu button, then "New Project".  
+2. Click "New Directory".  
+3. Click "New Project".  
+4. Type in the name of the directory to store your project, e.g. "my_project".  
+5. If available, select the checkbox for "Create a git repository."  
+6. Click the "Create Project" button.  
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+The simplest way to open an RStudio project once it has been created is to click through your file system to get to the directory where it was saved and double click on the `.Rproj` file. 
+This will open RStudio and start your R session in the same directory as the `.Rproj` file.
+All your data, plots and scripts will now be relative to the project directory. RStudio projects have the added benefit of allowing you to open multiple projects at the same time each open to its own project directory. 
+This allows you to keep multiple projects open without them interfering with each other.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 2: Opening an RStudio project through the file system
+
+1. Exit RStudio.  
+2. Navigate to the directory where you created a project in Challenge 1.   
+3. Double click on the `.Rproj` file in that directory.  
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+## Best practices for project organization
+
+Although there is no "best" way to lay out a project, there are some general principles to adhere to that will make project management easier:
+
+### Treat data as read only
+
+This is probably the most important goal of setting up a project. Data is
+typically time consuming and/or expensive to collect. 
+Working with them interactively (e.g., in Excel) where they can be modified means you are never sure of where the data came from, or how it has been modified since collection.
+It is therefore a good idea to treat your data as "read-only".
+
+### Data Cleaning
+
+In many cases your data will be "dirty": it will need significant preprocessing to get into a format R (or any other programming language) will find useful.
+This task is sometimes called "data munging". 
+Storing these scripts in a separate folder, and creating a second "read-only" data folder to hold the "cleaned" data sets can prevent confusion between the two sets.
+
+### Treat generated output as disposable
+
+Anything generated by your scripts should be treated as disposable: it should all be able to be regenerated from your scripts.
+
+There are lots of different ways to manage this output. Having an output folder with different sub-directories for each separate analysis makes it easier later.
+Since many analyses are exploratory and don't end up being used in the final project, and some of the analyses get shared between projects.
+
+::::::::::::::::::::::::::::::::::::: callout
+
+## Tip: Good Enough Practices for Scientific Computing
+
+[Good Enough Practices for Scientific computing](https://github.com/swcarpentry/good-enough-practices-in-scientific-computing/blob/gh-pages/good-enough-practices-for-scientific-computing.pdf) gives the following recommendations for project organization:
+
+1. Put each project in its own directory, which is named after the project.  
+2. Put text documents associated with the project in the `doc` directory.  
+3. Put raw data and metadata in the `data` directory, and files generated during cleanup and analysis in a `results` directory.  
+4. Put source for the project's scripts and programs in the `src` directory, and programs brought in from elsewhere or compiled locally in the `bin` directory.  
+5. Name all files to reflect their content or function.  
+
+:::::::::::::::::::::::::::::::::::::
+
+
+### Separate function definition and application
+
+One of the more effective ways to work with R is to start by writing the code you want to run directly in a .R script, and then running the selected lines (either using the keyboard shortcuts in RStudio or clicking the "Run" button) in the interactive R console.
+
+When your project is in its early stages, the initial `.R` script file usually contains many lines of directly executed code.
+Make sure to comment your code, so you know the intention of each bit, and once you have a clearer idea of what you want, tidy up your script so it only contains what is important.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 3
+Set up your project folders. For this workshop we will need folders for data, results and scripts.
+
+1. In the bottom right pane of RStudio, click on "Files".
+2. Click on "New folder" and create a folder named `data`
+3. Repeat to create `results` and `scripts`
+:::::::::::::::::::::::::::::::::::::  
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 4
+Download the palmer penguins data and place it in your `data` folder, calling it `penguins.csv`
+
+1. Go to [the raw palmer penguins data](https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv)
+2. Right click in the browser window
+3. Choose "save as..."
+4. Navigate to your project's data folder
+5. Save the file to this location
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: callout
+
+## Tip: command line in RStudio
+
+The Terminal tab in the console pane provides a convenient place directly
+within RStudio to interact directly with the command line.
+:::::::::::::::::::::::::::::::::::::
+
+### Version Control
+
+It is important to use version control with projects.  
+Go [here for a good lesson which describes using Git with RStudio](https://swcarpentry.github.io/git-novice/14-supplemental-rstudio/).
+
diff --git a/02-data-visualisation.md b/02-data-visualisation.md
new file mode 100644
index 0000000..93902a3
--- /dev/null
+++ b/02-data-visualisation.md
@@ -0,0 +1,574 @@
+---
+title: "Visualisation with ggplot2"
+teaching: 60
+exercises: 8
+---
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How do I access my data in R?
+- How do I visualise my data with ggplot2?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Read data into R
+- To be able to use `ggplot2` to generate publication quality graphics.
+- To understand the basic grammar of graphics, including the aesthetics and geometry layers, adding statistics, transforming scales, and colouring or panelling by groups.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: keypoints
+
+- Read data into R
+- Use ggplot2 to create different types of plots
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+
+
+
+## Motivation
+
+Plotting the data is one of the best ways to quickly explore it and generate hypotheses about various relationships between variables.
+
+There are several plotting systems in R, but today we will focus on `ggplot2` which implements **grammar of graphics** - a coherent system for describing components that constitute visual representation of data.
+For more information regarding principles and thinking behind `ggplot2` graphic system, please refer to [Layered grammar of graphics](https://vita.had.co.nz/papers/layered-grammar.pdf) by Hadley Wickham (@hadleywickham). 
+
+The advantage of `ggplot2` is that it allows R users to create publication quality graphics with a few lines of code. `ggplot2` has a large user base and is constantly developed and extended by the community.
+
+## Getting data into R
+We will start by reading the data into R, from the `data` folder you placed them in the last part of the introduction.
+
+
+``` r
+penguins <- read.csv("data/penguins.csv")
+```
+
+This is our first bit of R code to "assign" data to an object in our "R environment".
+The R environment can be seen in the upper right hand corner, and it lists everything R has access to at the moment.
+You should see an object called "penguins", which is a Dataset with 344 observations and 8 variables.
+We created this object with the line of code we just ran. 
+You can "read" the line, right to left as:
+"read the penguins.csv into R, and assign (<-) it to an object called penguins".
+The arrow, or assignment, is R's way of creating new objects to work on.
+
+**Note** a key difference from R and programs like SPSS or excel, is that when data is used in R, we do not automatically alter the data in the file we read it from. Everything we do with the penguins data in R from now on, only happens in R, and does not change the originating file. This way we cannot easily accidentally alter our raw data, which is a very good thing.
+
+::::::::::::::::::::::::::::::::::::: keypoints
+## Tip: We can inspect the data in several ways
+
+1. Click the data name in the Environment, and the data opens as a tab in the scripts pane.  
+2. Click the little arrow next to the data name in the Evironment, and you'll see a short preview of the data.  
+3. Type `penguins` in the R console, and a preview will be shown of the data.  
+::::::::::::::::::::::::::::::::::::: 
+
+The dataset contains the following fields:
+
+- **species**:           penguin species
+- **island**:            island of observation
+- **bill_length_mm**:    bill length in millimetres
+- **bill_depth_mm**:     bill depth in millimetres
+- **flipper_length_mm**: flipper length in millimetres
+- **body_mass_g**:       body mass in grams
+- **sex**:               penguin sex
+- **year**:              year of observation
+
+
+## Introduction to ggplot2
+
+`ggplot2` is a core member of `tidyverse` family of packages. Installing and loading the package under the same name will load all of the packages we will need for this workshop. Lets get started!
+
+
+``` r
+# install.packages("tidyverse")
+library(tidyverse)
+── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
+✔ dplyr     1.1.4     ✔ readr     2.1.5
+✔ forcats   1.0.0     ✔ stringr   1.5.1
+✔ ggplot2   3.5.1     ✔ tibble    3.2.1
+✔ lubridate 1.9.3     ✔ tidyr     1.3.1
+✔ purrr     1.0.2     
+── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
+✖ dplyr::filter() masks stats::filter()
+✖ dplyr::lag()    masks stats::lag()
+ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
+```
+
+Here's a question that we would like to answer using `penguins` data: _Do penguins with deep beaks also have long beaks?_ This might seem like a silly question, but it gets us exploring our data.
+
+To plot `penguins`, run the following code in the R-chunk or in console. The following code will put `bill_depth_mm` on the x-axis and `bill_length_mm` on the y-axis:
+
+
+``` r
+ggplot(data = penguins) +
+  geom_point(
+    mapping = aes(x = bill_depth_mm,
+                  y = bill_length_mm)
+  )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-4-1.png" style="display: block; margin: auto;" />
+
+Note that we split the function into several lines.
+In R, any function has a name and is followed by parentheses. Inside the parentheses we place any information the function needs to run.
+Here, we are using two main functions, `ggplot()` and `geom_point()`.
+To save screen space, we have placed each function on its own line, and also split up arguments into several lines.
+How this is done depends on you, there are no real rules for this.
+We will use the tidyverse coding style throughout this course, to be consistent and also save space on the screen.
+The plus sign indicates that the ggplot is not over yet and that the next line should be interpreted as additional layer to the preceding `ggplot()` function. In other words, when writing a `ggplot()` function spanning several lines, the `+` sign goes at the end of the line, not in the beginning.
+
+**Note** that in order to create a plot using `ggplot2` system, you should start your command with `ggplot()` function. It creates an empty coordinate system and initializes the dataset to be used in the graph (which is supplied as a first argument into the `ggplot()` function). In order to create graphical representation of the data, we can add one or more layers to our otherwise empty graph. Functions starting with the prefix `geom_` create a visual representation of data. In this case we added scattered points, using `geom_point()` function. There are many `geoms` in `ggplot2`, some of which we will learn in this lesson.
+
+`geom_` functions create _mapping_ of variables from the earlier defined dataset to certain aesthetic elements of the graph, such as axis, shapes or colours. The first argument of any `geom_` function expects the user to specify these mappings, wrapped in the `aes()` (short for _aesthetics_) function. In this case, we mapped `bill_depth_mm` and `bill_length_mm` variables from `penguins` dataset to x and y-axis, respectively (using `x` and `y` arguments of `aes()` function). 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 1a
+How has bill length changed over time? What do you observe? 
+
+:::::::::::::::::::::::::::::::::::::::: hint 
+
+The* `penguins` *dataset has a column called `year`, which should appear on the x-axis.
+:::::::::::::::::::::::::::::::::::::::: 
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution 
+
+
+``` r
+ggplot(data = penguins) +
+  geom_point(
+    mapping = aes(x = year, 
+                  y = bill_length_mm)
+  )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-5-1.png" style="display: block; margin: auto;" />
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 1b
+Try a different `geom_` function called `geom_jitter`. How is that different from `geom_point`?
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+ggplot(data = penguins) +
+  geom_jitter(
+    mapping = aes(x = year, 
+                  y = bill_length_mm)
+  )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-6-1.png" style="display: block; margin: auto;" />
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Mapping data
+
+What if we want to combine graphs from the previous two challenges and show the relationship between three variables in the same graph? Turns out, we don't necessarily need to use third geometrical dimension, we can employ colour.
+
+The following graph maps `island` variable from `penguins` dataset to the `colour` aesthetic of the plot. Let's take a look:
+
+
+``` r
+ggplot(data = penguins) + 
+  geom_jitter(
+    mapping = aes(x = bill_depth_mm, 
+                  y = bill_length_mm, 
+                  colour = island)
+  )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-7-1.png" style="display: block; margin: auto;" />
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 2
+What will happen if you switch colour to also be by year? Is the graph still useful? Why or why not? What is the difference in the plot between when you colour by island and when you colour by year?
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+ggplot(data = penguins) +
+  geom_jitter(
+    mapping = aes(x = bill_depth_mm, 
+                  y = bill_length_mm,
+                  colour = year)
+  )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-8-1.png" style="display: block; margin: auto;" />
+
+Island is categorical character variable with a discrete range of possible values. This, like the data type of factor, is represented with colours by assigning a specific colour to each member of the discrete set. `year` is a continuous numeric variable in which any number of potential values can exist between known values. To represent this, R uses a colour bar with a continuous gradient.
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+There are other aesthetics that can come handy. One of them is `size`. The idea is that we can vary the size of data points to illustrate another continuous variable, such as species bill depth. Lets look at four dimensions at once! 
+
+
+``` r
+ggplot(data = penguins) + 
+  geom_jitter(
+    mapping = aes(x = bill_depth_mm, 
+                  y = bill_length_mm, 
+                  colour = species, 
+                  size = year)
+  )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-9-1.png" style="display: block; margin: auto;" />
+
+It might be even better to try another type of aesthetic, like shape, for categorical data like species.
+
+
+``` r
+ggplot(data = penguins) + 
+  geom_jitter(
+    mapping = aes(x = bill_depth_mm, 
+                  y = bill_length_mm, 
+                  colour = species, 
+                  shape = species)
+  )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-10-1.png" style="display: block; margin: auto;" />
+
+Playing around with different aesthetic mappings until you find something that really makes the data "pop" is a good idea. A plot is rarely made nice on the first try, we all try different configurations until we find the one we like.
+
+# Setting values
+
+Until now, we explored different aesthetic properties of a graph mapped to certain variables. What if you want to recolour or use a certain shape to plot all data points? Well, that means that such colour or shape will no longer be *mapped* to any data, so you need to supply it to `geom_` function as a separate argument (outside of the `mapping`). 
+This is called "setting" in the ggplot2-world. We "map" aesthetics to data columns, or we "set" single values outside aesthetics to apply to the entire geom or plot.
+Here's our initial graph with all colours coloured in blue.
+
+
+``` r
+ggplot(data = penguins) + 
+  geom_point(
+    mapping = aes(x = bill_depth_mm, 
+                  y = bill_length_mm),
+    colour = "blue"
+  )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-11-1.png" style="display: block; margin: auto;" />
+
+Once more, observe that the colour is now not mapped to any particular variable from the `penguins` dataset and applies equally to all data points, therefore it is outside the `mapping` argument and is not wrapped into `aes()` function. Note that set colours are supplied as characters (in quotes). 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 3
+Change the transparency (alpha) of the data points by year. 
+
+:::::::::::::::::::::::::::::::::::::::: hint 
+`alpha` takes a value from 0 (transparent) to 1 (solid).
+:::::::::::::::::::::::::::::::::::::::: 
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+ggplot(data = penguins) + 
+  geom_point(
+    mapping = aes(x = bill_depth_mm, 
+                  y = bill_length_mm, 
+                  alpha = year)
+  )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-12-1.png" style="display: block; margin: auto;" />
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 4
+Move the transparency outside the `aes()` and set it to `0.5`. What can we benefit of each one of these methods?
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+ggplot(data = penguins) + 
+  geom_point(
+    mapping = aes(x = bill_depth_mm, 
+                  y = bill_length_mm),
+    alpha = 0.5)
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-13-1.png" style="display: block; margin: auto;" />
+Controlling the transparency can be a great way to "mute" the visual effect of certain data, while still keeping it visible. Its a great tool when you have many data points or if you have several geoms together, like we will see soon.
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+ 
+
+
+# Geometrical objects
+
+Next, we will consider different options for `geoms`. Using different `geom_` functions user can highlight different aspects of data. 
+
+A useful geom function is `geom_boxplot()`. It adds a layer with the "box and whiskers" plot illustrating the distribution of values within categories. The following chart breaks down bill length by island, where the box represents first and third quartile (the 25th and 75th percentiles), the middle bar signifies the median value and the whiskers extent to cover 95% confidence interval. Outliers (outside of the 95% confidence interval range) are shown separately.
+
+
+``` r
+ggplot(data = penguins) + 
+  geom_boxplot(
+    mapping = aes(x = species, 
+                  y = bill_length_mm)
+  )
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_boxplot()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-14-1.png" style="display: block; margin: auto;" />
+
+Layers can be added on top of each other. In the following graph we will place the boxplots **over** jittered points to see the distribution of outliers more clearly. We can map two aesthetic properties to the same variable. Here we will also use different colour for each island.
+
+
+``` r
+ggplot(data = penguins) + 
+  geom_jitter(
+    mapping = aes(x = species, 
+                  y = bill_length_mm, 
+                  colour = species)
+  ) +
+  geom_boxplot(
+    mapping = aes(x = species,
+                  y = bill_length_mm)
+  )
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_boxplot()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-15-1.png" style="display: block; margin: auto;" />
+
+Now, this was slightly inefficient due to duplication of code - we had to specify the same mappings for two layers. To avoid it, you can move common arguments of `geom_` functions to the main `ggplot()` function. In this case every layer will "inherit" the same arguments, specified in the "parent" function.
+
+
+``` r
+ggplot(data = penguins,
+       mapping = aes(x = island, 
+                     y = bill_length_mm)
+) + 
+  geom_jitter(aes(colour = island)) +
+  geom_boxplot(alpha = .6)
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_boxplot()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-16-1.png" style="display: block; margin: auto;" />
+
+You can still add layer-specific mappings or other arguments by specifying them within individual geoms. Here, we've set the transparency of the boxplot to .6, so we can see the points behind it, and also mapped colour to island in the points. We would recommend building each layer separately and then moving common arguments up to the "parent" function.
+
+We can use linear models to highlight differences in dependency between bill length and body mass by island. Notice that we added a separate argument to the `geom_smooth()` function to specify the type of model we want `ggplot2` to built using the data (linear model). The `geom_smooth()` function has also helpfully provided confidence intervals, indicating "goodness of fit" for each model (shaded gray area). For more information on statistical models, please refer to help (by typing `?geom_smooth`)
+
+
+``` r
+ggplot(data = penguins, 
+       mapping = aes(x = bill_depth_mm, 
+                     y = bill_length_mm)
+) +
+  geom_point(alpha = 0.5) +
+  geom_smooth(method = "lm")
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-17-1.png" style="display: block; margin: auto;" />
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 5
+Modify the plot so the the points are coloured by island, but there is a single regression line.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+
+## Solution 
+
+
+``` r
+ggplot(data = penguins, 
+       mapping = aes(x = bill_depth_mm, 
+                     y = bill_length_mm)) +
+  geom_point(mapping = aes(colour = species),
+             alpha = 0.5) +
+  geom_smooth(method = "lm")
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-18-1.png" style="display: block; margin: auto;" />
+In the graph above, each geom inherited all three mappings: x, y and colour. If we want only single linear model to be built, we would need to limit the effect of `colour` aesthetic to only `geom_point()` function, by moving it from the "parent" function to the layer where we want it to apply. Note, though, that because we want the `colour` to be still mapped to the `island` variable, it needs to be wrapped into `aes()` function and supplied to `mapping` argument.
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 6
+Add a regression line to the plot that plots one line for each species, while also plotting one across all species.
+
+:::::::::::::::::::::::::::::::::::::::: hint
+Add another geom!
+:::::::::::::::::::::::::::::::::::::::: 
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+
+## Solution 
+
+
+``` r
+ggplot(penguins, 
+       aes(x = bill_depth_mm, 
+           y = bill_length_mm)) +
+  geom_point(aes(colour = species),
+             alpha = 0.5) +
+  geom_smooth(method = "lm", 
+              aes(colour = species)) +
+  geom_smooth(method = "lm", 
+              colour = "black")
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-19-1.png" style="display: block; margin: auto;" />
+Look at that! The data actually reveals something called the "simpsons paradox". It's when a relationship looks to go in a specific direction, but when looking into groups within the data the relationship is the opposite. Here, the overall relationship between bill length and depths looks negative, but when we take into account that there are different species, the relationship is actually positive.
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Sub-plots (plot panels)
+
+The last thing we will cover for plots is creating sub-plots.
+Often, we'd like to create the same set of plots, but as distinctly different subplots.
+This way, we dont need to map soo many aesthetics (it can end up being really messy).
+
+Lets say, the last plot we made, we want to understand if there are also differences between male and female penguins.
+In ggplot2, this is called a "facet", and the function we use is called either `facet_wrap` or `facet_grid`. 
+
+
+``` r
+ggplot(penguins, 
+      aes(x = bill_depth_mm, 
+          y = bill_length_mm,
+          colour = species)) +
+  geom_point(alpha = 0.5) +
+  geom_smooth(method = "lm") +
+  facet_wrap(~ sex)
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-20-1.png" style="display: block; margin: auto;" />
+
+The facet's take formula arguments, meaning they contain the `tilde` (~).
+The way often we think about it, trying to "read" the code, is that we facet "over" sex (in this case). 
+
+This plot looks a little crazy though, as we have penguins with missing sex information getting their own panel, and really, it makes more sense to compare the sexes within each species rather than the other way around.
+Let us swap the places of species and sex.
+
+
+``` r
+ggplot(penguins, 
+      aes(x = bill_depth_mm, 
+          y = bill_length_mm,
+          colour = sex)) +
+  geom_point(alpha = 0.5) +
+  geom_smooth(method = "lm") +
+  facet_wrap(~ species)
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-21-1.png" style="display: block; margin: auto;" />
+
+The NA's still look weird, but its definitely better, I think. 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 7
+To the plot we just made before, try adding another variable to facet by. For instance, facet by species and island.
+
+:::::::::::::::::::::::::::::::::::::::: hint
+Add another facet variable with the `+`
+:::::::::::::::::::::::::::::::::::::::: 
+
+:::::::::::::::::::::::::::::::::::::::: solution
+
+## Solution 
+
+
+``` r
+ggplot(penguins, 
+      aes(x = bill_depth_mm, 
+          y = bill_length_mm,
+          colour = sex)) +
+  geom_point(alpha = 0.5) +
+  geom_smooth(method = "lm") +
+  facet_wrap(~ species + island)
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/02-data-visualisation-rendered-unnamed-chunk-22-1.png" style="display: block; margin: auto;" />
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+## Wrap-up
+
+We learned about different parameters of ggplot functions, and how to combine different geoms into more complex charts.
+
diff --git a/03-data-subsetting.md b/03-data-subsetting.md
new file mode 100644
index 0000000..40c3875
--- /dev/null
+++ b/03-data-subsetting.md
@@ -0,0 +1,876 @@
+---
+title: "Subsetting data with dplyr"
+teaching: 60
+exercises: 12
+---
+
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How can I subset the number of columns in my data set?
+- How can I reduce the number of rows in my data set?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Use `select()` to reduce columns
+- Use tidyselectors like `starts_with()` within `select()` to reduce columns
+- Use `filter()` to reduce rows
+- Understand common logical operations using `filter()`
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: keypoints
+
+- Subsetting rows and columns
+- Using tidyselectors
+- Understanding logical operations
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+## Motivation
+
+In many cases, we are working with data sets that contain more data than we need, or we want to inspect certain parts of the data set before we continue.
+Subsetting data sets can be challenging in base R, because there is a fair bit of repetition. 
+This can make code difficult to readn and understand.
+
+## The {dplyr} package
+
+The [{dplyr}](https://cran.r-project.org/web/packages/dplyr/index.html) package provides a number of very useful functions for manipulating data sets in a way that will reduce the probability of making errors, and  even save you some typing time. As an added bonus, you might even find the {dplyr} grammar easier to read.
+
+We're going to cover 6 of the most commonly used functions as well as using pipes (`|>`) to combine them.
+
+1. `select()` (covered in this session)
+2. `filter()` (covered in this session)
+3. `arrange()` (covered in this session)
+4. `mutate()` (covered in next session)
+5. `group_by()` (covered in Day 2 session)
+6. `summarize()` (covered in Day 2 session)
+
+
+## Selecting columns
+
+Let us first talk about selecting columns. In {dplyr}, the function name for selecting columns is `select()`! Most {tidyverse} function names for functions are inspired by English grammar, which will help us when we are writing our code.
+
+<img src="fig/03-selecting.gif" style="display: block; margin: auto;" />
+
+We first need to make sure we have the tidyverse loaded and the penguins data set at hand.
+
+``` r
+library(tidyverse)
+penguins <- read_csv("data/penguins.csv")
+```
+
+To select data, we must first tell select which data set we are selecting from, and then give it our selection. Here, we are asking R to `select()` from the `penguins` data set the `island`, `species` and `sex` columns
+
+
+``` r
+select(penguins, island, species, sex)
+```
+
+``` output
+# A tibble: 344 × 3
+   island    species sex   
+   <fct>     <fct>   <fct> 
+ 1 Torgersen Adelie  male  
+ 2 Torgersen Adelie  female
+ 3 Torgersen Adelie  female
+ 4 Torgersen Adelie  <NA>  
+ 5 Torgersen Adelie  female
+ 6 Torgersen Adelie  male  
+ 7 Torgersen Adelie  female
+ 8 Torgersen Adelie  male  
+ 9 Torgersen Adelie  <NA>  
+10 Torgersen Adelie  <NA>  
+# ℹ 334 more rows
+```
+
+When we use `select()` we don't need to use quotations, we write in the names directly. We can also use the numeric indexes for the column, if we are 100% certain of the order of the columns:
+
+
+``` r
+select(penguins, 1:3, 6)
+```
+
+``` output
+# A tibble: 344 × 4
+   species island    bill_length_mm body_mass_g
+   <fct>   <fct>              <dbl>       <int>
+ 1 Adelie  Torgersen           39.1        3750
+ 2 Adelie  Torgersen           39.5        3800
+ 3 Adelie  Torgersen           40.3        3250
+ 4 Adelie  Torgersen           NA            NA
+ 5 Adelie  Torgersen           36.7        3450
+ 6 Adelie  Torgersen           39.3        3650
+ 7 Adelie  Torgersen           38.9        3625
+ 8 Adelie  Torgersen           39.2        4675
+ 9 Adelie  Torgersen           34.1        3475
+10 Adelie  Torgersen           42          4250
+# ℹ 334 more rows
+```
+
+In some cases, we want to remove columns, and not necessarily state all columns we want to keep. 
+Select also allows for this by adding a minus (`-`)  sign in front of the column name you don't want.
+
+
+``` r
+select(penguins, -bill_length_mm, -bill_depth_mm)
+```
+
+``` output
+# A tibble: 344 × 6
+   species island    flipper_length_mm body_mass_g sex     year
+   <fct>   <fct>                 <int>       <int> <fct>  <int>
+ 1 Adelie  Torgersen               181        3750 male    2007
+ 2 Adelie  Torgersen               186        3800 female  2007
+ 3 Adelie  Torgersen               195        3250 female  2007
+ 4 Adelie  Torgersen                NA          NA <NA>    2007
+ 5 Adelie  Torgersen               193        3450 female  2007
+ 6 Adelie  Torgersen               190        3650 male    2007
+ 7 Adelie  Torgersen               181        3625 female  2007
+ 8 Adelie  Torgersen               195        4675 male    2007
+ 9 Adelie  Torgersen               193        3475 <NA>    2007
+10 Adelie  Torgersen               190        4250 <NA>    2007
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 1
+Select the columns sex, year, and species from the penguins dataset.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+select(penguins, sex, year, species)
+```
+
+``` output
+# A tibble: 344 × 3
+   sex     year species
+   <fct>  <int> <fct>  
+ 1 male    2007 Adelie 
+ 2 female  2007 Adelie 
+ 3 female  2007 Adelie 
+ 4 <NA>    2007 Adelie 
+ 5 female  2007 Adelie 
+ 6 male    2007 Adelie 
+ 7 female  2007 Adelie 
+ 8 male    2007 Adelie 
+ 9 <NA>    2007 Adelie 
+10 <NA>    2007 Adelie 
+# ℹ 334 more rows
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 2
+Change your selection so that species comes before sex. What is the difference in the output?
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+select(penguins, species, sex, year)
+```
+
+``` output
+# A tibble: 344 × 3
+   species sex     year
+   <fct>   <fct>  <int>
+ 1 Adelie  male    2007
+ 2 Adelie  female  2007
+ 3 Adelie  female  2007
+ 4 Adelie  <NA>    2007
+ 5 Adelie  female  2007
+ 6 Adelie  male    2007
+ 7 Adelie  female  2007
+ 8 Adelie  male    2007
+ 9 Adelie  <NA>    2007
+10 Adelie  <NA>    2007
+# ℹ 334 more rows
+```
+select does not only subset columns, but it can also re-arrange them. The columns appear in the order your selection is specified.
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+### Tidy selections
+
+These selections are quite convenient and fast! But they can be even better. 
+
+For instance, what if we want to choose all the columns with millimeter measurements? That could be quite convenient, making sure the variables we are working with have the same measurement scale.
+
+We could of course type them all out, but the penguins data set has names that make it even easier for us, using something called tidy-selectors.
+
+Here, we use a tidy-selector `ends_with()`, can you guess what it does? yes, it looks for columns that end with the string you provide it, here `"mm"`.
+
+
+``` r
+select(penguins, ends_with("mm"))
+```
+
+``` output
+# A tibble: 344 × 3
+   bill_length_mm bill_depth_mm flipper_length_mm
+            <dbl>         <dbl>             <int>
+ 1           39.1          18.7               181
+ 2           39.5          17.4               186
+ 3           40.3          18                 195
+ 4           NA            NA                  NA
+ 5           36.7          19.3               193
+ 6           39.3          20.6               190
+ 7           38.9          17.8               181
+ 8           39.2          19.6               195
+ 9           34.1          18.1               193
+10           42            20.2               190
+# ℹ 334 more rows
+```
+
+So convenient! There are several other tidy-selectors you can choose, [which you can find here](https://dplyr.tidyverse.org/reference/select.html), but often people resort to three specific ones:
+
+- `ends_with()` - column names ending with a character string  
+- `starts_with()` - column names starting with a character string  
+- `contains()` - column names containing a character string 
+
+If you are working with a well named data set, these functions should make your data selecting much simpler. And if you are making your own data, you can think of such convenient naming for your data, so your work can be easier for you and others.
+
+Lets only pick the measurements of the bill, we are not so interested in the flipper. Then we might want to change to `starts_with()` in stead.
+
+
+``` r
+select(penguins, starts_with("bill"))
+```
+
+``` output
+# A tibble: 344 × 2
+   bill_length_mm bill_depth_mm
+            <dbl>         <dbl>
+ 1           39.1          18.7
+ 2           39.5          17.4
+ 3           40.3          18  
+ 4           NA            NA  
+ 5           36.7          19.3
+ 6           39.3          20.6
+ 7           38.9          17.8
+ 8           39.2          19.6
+ 9           34.1          18.1
+10           42            20.2
+# ℹ 334 more rows
+```
+
+The tidy selector can be combined with each other and other selectors. So you can build exactly the data you want!
+
+
+``` r
+select(penguins, island, species, year, starts_with("bill"))
+```
+
+``` output
+# A tibble: 344 × 5
+   island    species  year bill_length_mm bill_depth_mm
+   <fct>     <fct>   <int>          <dbl>         <dbl>
+ 1 Torgersen Adelie   2007           39.1          18.7
+ 2 Torgersen Adelie   2007           39.5          17.4
+ 3 Torgersen Adelie   2007           40.3          18  
+ 4 Torgersen Adelie   2007           NA            NA  
+ 5 Torgersen Adelie   2007           36.7          19.3
+ 6 Torgersen Adelie   2007           39.3          20.6
+ 7 Torgersen Adelie   2007           38.9          17.8
+ 8 Torgersen Adelie   2007           39.2          19.6
+ 9 Torgersen Adelie   2007           34.1          18.1
+10 Torgersen Adelie   2007           42            20.2
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 3
+Select all columns containing an underscore ("_").
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution 
+
+
+``` r
+select(penguins, contains("_"))
+```
+
+``` output
+# A tibble: 344 × 4
+   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+            <dbl>         <dbl>             <int>       <int>
+ 1           39.1          18.7               181        3750
+ 2           39.5          17.4               186        3800
+ 3           40.3          18                 195        3250
+ 4           NA            NA                  NA          NA
+ 5           36.7          19.3               193        3450
+ 6           39.3          20.6               190        3650
+ 7           38.9          17.8               181        3625
+ 8           39.2          19.6               195        4675
+ 9           34.1          18.1               193        3475
+10           42            20.2               190        4250
+# ℹ 334 more rows
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 4
+Select the species and sex columns, in addition to all columns ending with "mm"
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+
+## Solution 
+
+
+``` r
+select(penguins, species, sex, ends_with("mm"))
+```
+
+``` output
+# A tibble: 344 × 5
+   species sex    bill_length_mm bill_depth_mm flipper_length_mm
+   <fct>   <fct>           <dbl>         <dbl>             <int>
+ 1 Adelie  male             39.1          18.7               181
+ 2 Adelie  female           39.5          17.4               186
+ 3 Adelie  female           40.3          18                 195
+ 4 Adelie  <NA>             NA            NA                  NA
+ 5 Adelie  female           36.7          19.3               193
+ 6 Adelie  male             39.3          20.6               190
+ 7 Adelie  female           38.9          17.8               181
+ 8 Adelie  male             39.2          19.6               195
+ 9 Adelie  <NA>             34.1          18.1               193
+10 Adelie  <NA>             42            20.2               190
+# ℹ 334 more rows
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 5
+De-select all the columns with bill measurements
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+
+## Solution 
+
+
+``` r
+select(penguins, -starts_with("bill"))
+```
+
+``` output
+# A tibble: 344 × 6
+   species island    flipper_length_mm body_mass_g sex     year
+   <fct>   <fct>                 <int>       <int> <fct>  <int>
+ 1 Adelie  Torgersen               181        3750 male    2007
+ 2 Adelie  Torgersen               186        3800 female  2007
+ 3 Adelie  Torgersen               195        3250 female  2007
+ 4 Adelie  Torgersen                NA          NA <NA>    2007
+ 5 Adelie  Torgersen               193        3450 female  2007
+ 6 Adelie  Torgersen               190        3650 male    2007
+ 7 Adelie  Torgersen               181        3625 female  2007
+ 8 Adelie  Torgersen               195        4675 male    2007
+ 9 Adelie  Torgersen               193        3475 <NA>    2007
+10 Adelie  Torgersen               190        4250 <NA>    2007
+# ℹ 334 more rows
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+### Tidy selections with `where`
+
+The last tidy-selector we'll mention is `where()`. `where()` is a very special tidy selector, that uses logical evaluations to select the data. Let's have a look at it in action, and see if we can explain it better that way.
+
+Say you are running a correlation analysis. For correlations, you need all the columns in your data to be numeric, as you cannot correlate strings or categories. Going through each individual column and seeing if it is numeric is a bit of a chore. That is where `where()` comes in!
+
+
+``` r
+select(penguins, where(is.numeric))
+```
+
+``` output
+# A tibble: 344 × 5
+   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g  year
+            <dbl>         <dbl>             <int>       <int> <int>
+ 1           39.1          18.7               181        3750  2007
+ 2           39.5          17.4               186        3800  2007
+ 3           40.3          18                 195        3250  2007
+ 4           NA            NA                  NA          NA  2007
+ 5           36.7          19.3               193        3450  2007
+ 6           39.3          20.6               190        3650  2007
+ 7           38.9          17.8               181        3625  2007
+ 8           39.2          19.6               195        4675  2007
+ 9           34.1          18.1               193        3475  2007
+10           42            20.2               190        4250  2007
+# ℹ 334 more rows
+```
+
+Magic! Let's break that down. 
+`is.numeric()` is a function in R that checks if a vector is numeric. If the vector is numeric, it returns `TRUE` if not it returns `FALSE`.
+
+
+``` r
+is.numeric(5)
+```
+
+``` output
+[1] TRUE
+```
+
+``` r
+is.numeric("something")
+```
+
+``` output
+[1] FALSE
+```
+
+Let us look at the penguins data set again
+
+``` r
+penguins
+```
+
+``` output
+# A tibble: 344 × 8
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+The penguins data is stored as a `tibble`, which is a special kind of data set in R that gives a nice print out of the data.
+Notice, right below the column name, there is some information in `<>` marks. This tells us the class of the columns. 
+Species and island are factors, while bill columns are "double" which is a decimal numeric class. 
+
+`where()` goes through all the columns and checks if they are numeric, and returns the ones that are. 
+
+
+``` r
+select(penguins, where(is.numeric))
+```
+
+``` output
+# A tibble: 344 × 5
+   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g  year
+            <dbl>         <dbl>             <int>       <int> <int>
+ 1           39.1          18.7               181        3750  2007
+ 2           39.5          17.4               186        3800  2007
+ 3           40.3          18                 195        3250  2007
+ 4           NA            NA                  NA          NA  2007
+ 5           36.7          19.3               193        3450  2007
+ 6           39.3          20.6               190        3650  2007
+ 7           38.9          17.8               181        3625  2007
+ 8           39.2          19.6               195        4675  2007
+ 9           34.1          18.1               193        3475  2007
+10           42            20.2               190        4250  2007
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 6
+Select only the columns that are factors from the `penguins` data set.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution 
+
+
+``` r
+select(penguins, where(is.factor))
+```
+
+``` output
+# A tibble: 344 × 3
+   species island    sex   
+   <fct>   <fct>     <fct> 
+ 1 Adelie  Torgersen male  
+ 2 Adelie  Torgersen female
+ 3 Adelie  Torgersen female
+ 4 Adelie  Torgersen <NA>  
+ 5 Adelie  Torgersen female
+ 6 Adelie  Torgersen male  
+ 7 Adelie  Torgersen female
+ 8 Adelie  Torgersen male  
+ 9 Adelie  Torgersen <NA>  
+10 Adelie  Torgersen <NA>  
+# ℹ 334 more rows
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 7
+Select the columns `island`, `species`, as well as all numeric columns from the `penguins` data set.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution 
+
+
+``` r
+select(penguins, island, species, where(is.numeric))
+```
+
+``` output
+# A tibble: 344 × 7
+   island    species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>     <fct>            <dbl>         <dbl>             <int>       <int>
+ 1 Torgersen Adelie            39.1          18.7               181        3750
+ 2 Torgersen Adelie            39.5          17.4               186        3800
+ 3 Torgersen Adelie            40.3          18                 195        3250
+ 4 Torgersen Adelie            NA            NA                  NA          NA
+ 5 Torgersen Adelie            36.7          19.3               193        3450
+ 6 Torgersen Adelie            39.3          20.6               190        3650
+ 7 Torgersen Adelie            38.9          17.8               181        3625
+ 8 Torgersen Adelie            39.2          19.6               195        4675
+ 9 Torgersen Adelie            34.1          18.1               193        3475
+10 Torgersen Adelie            42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 1 more variable: year <int>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Filtering rows
+
+Now that we know how to select the columns we want, we should take a look at how we filter the rows. 
+Row filtering is done with the function `filter()`, which takes statements that can be evaluated to `TRUE` or `FALSE`. 
+
+<img src="fig/03-filtering.gif" style="display: block; margin: auto;" />
+
+What do we mean with statements that can be evaluated to `TRUE` or `FALSE`?
+In the example with `where()` we used the `is.numeric` function to evaluate if the columns where numeric or not. We will be doing the same for rows!
+
+Now, using `is.numeric` on a row won't help, because every row-value in a column will be of the same type, that is how the data set works. All values in a column must be of the same type. 
+
+So what can we do? Well, we can check if the values meet certain criteria or not. Like values being above 20, or factors being a specific factor. 
+
+
+``` r
+filter(penguins, body_mass_g < 3000)
+```
+
+``` output
+# A tibble: 9 × 8
+  species   island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+  <fct>     <fct>              <dbl>         <dbl>             <int>       <int>
+1 Adelie    Dream               37.5          18.9               179        2975
+2 Adelie    Biscoe              34.5          18.1               187        2900
+3 Adelie    Biscoe              36.5          16.6               181        2850
+4 Adelie    Biscoe              36.4          17.1               184        2850
+5 Adelie    Dream               33.1          16.1               178        2900
+6 Adelie    Biscoe              37.9          18.6               193        2925
+7 Adelie    Torgersen           38.6          17                 188        2900
+8 Chinstrap Dream               43.2          16.6               187        2900
+9 Chinstrap Dream               46.9          16.6               192        2700
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+Here, we've filtered so that we only have observations where the body mass was less than 3 kilos. 
+We can also filter for specific values, but beware! you must use double equals (`==`) for comparisons, as single equals (`=`) are for argument names in functions. 
+
+
+``` r
+filter(penguins, body_mass_g == 2900)
+```
+
+``` output
+# A tibble: 4 × 8
+  species   island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+  <fct>     <fct>              <dbl>         <dbl>             <int>       <int>
+1 Adelie    Biscoe              34.5          18.1               187        2900
+2 Adelie    Dream               33.1          16.1               178        2900
+3 Adelie    Torgersen           38.6          17                 188        2900
+4 Chinstrap Dream               43.2          16.6               187        2900
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+What is happening, is that R will check if the values in `body_mass_g` are the same as 2900 (`TRUE`) or not (`FALSE`), and will do this for every row in the data set. Then at the end, it will discard all those that are `FALSE`, and keep those that are `TRUE`.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 8
+Filter the data so you only have observations from the "Dream" island.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+
+## Solution 
+
+
+``` r
+filter(penguins, island == "Dream")
+```
+
+``` output
+# A tibble: 124 × 8
+   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Dream            39.5          16.7               178        3250
+ 2 Adelie  Dream            37.2          18.1               178        3900
+ 3 Adelie  Dream            39.5          17.8               188        3300
+ 4 Adelie  Dream            40.9          18.9               184        3900
+ 5 Adelie  Dream            36.4          17                 195        3325
+ 6 Adelie  Dream            39.2          21.1               196        4150
+ 7 Adelie  Dream            38.8          20                 190        3950
+ 8 Adelie  Dream            42.2          18.5               180        3550
+ 9 Adelie  Dream            37.6          19.3               181        3300
+10 Adelie  Dream            39.8          19.1               184        4650
+# ℹ 114 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 9
+Filter the data so you only have observations after 2008
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution 
+
+
+``` r
+filter(penguins, year >= 2008)
+```
+
+``` output
+# A tibble: 234 × 8
+   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Biscoe           39.6          17.7               186        3500
+ 2 Adelie  Biscoe           40.1          18.9               188        4300
+ 3 Adelie  Biscoe           35            17.9               190        3450
+ 4 Adelie  Biscoe           42            19.5               200        4050
+ 5 Adelie  Biscoe           34.5          18.1               187        2900
+ 6 Adelie  Biscoe           41.4          18.6               191        3700
+ 7 Adelie  Biscoe           39            17.5               186        3550
+ 8 Adelie  Biscoe           40.6          18.8               193        3800
+ 9 Adelie  Biscoe           36.5          16.6               181        2850
+10 Adelie  Biscoe           37.6          19.1               194        3750
+# ℹ 224 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+### Multiple filters
+
+Many times, we will want to have several filters applied at once. What if you only want Adelie penguins that are below 3 kilos?
+`filter()` can take as many statements as you want! Combine them by adding commas (,) between each statement, and that will work as 'and'.
+
+
+``` r
+filter(penguins, 
+       species == "Chinstrap",
+       body_mass_g < 3000)
+```
+
+``` output
+# A tibble: 2 × 8
+  species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+  <fct>     <fct>           <dbl>         <dbl>             <int>       <int>
+1 Chinstrap Dream            43.2          16.6               187        2900
+2 Chinstrap Dream            46.9          16.6               192        2700
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+You can also use the `&` sign, which in R is the comparison character for 'and', like `==` is for 'equals'.
+
+``` r
+filter(penguins, 
+       species == "Chinstrap" &
+         body_mass_g < 3000)
+```
+
+``` output
+# A tibble: 2 × 8
+  species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+  <fct>     <fct>           <dbl>         <dbl>             <int>       <int>
+1 Chinstrap Dream            43.2          16.6               187        2900
+2 Chinstrap Dream            46.9          16.6               192        2700
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+Here we are filtering the penguins data set keeping only the species "Chinstrap" **and** those below 3.5 kilos.
+And we can keep going!
+
+
+``` r
+filter(penguins, 
+       species == "Chinstrap",
+       body_mass_g < 3000,
+       sex == "male")
+```
+
+``` output
+# A tibble: 0 × 8
+# ℹ 8 variables: species <fct>, island <fct>, bill_length_mm <dbl>,
+#   bill_depth_mm <dbl>, flipper_length_mm <int>, body_mass_g <int>, sex <fct>,
+#   year <int>
+```
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 10
+Filter the data so you only have observations after 2008, and from "Biscoe" island
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution 
+
+
+``` r
+filter(penguins, 
+       year >= 2008,
+       island == "Biscoe")
+```
+
+``` output
+# A tibble: 124 × 8
+   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Biscoe           39.6          17.7               186        3500
+ 2 Adelie  Biscoe           40.1          18.9               188        4300
+ 3 Adelie  Biscoe           35            17.9               190        3450
+ 4 Adelie  Biscoe           42            19.5               200        4050
+ 5 Adelie  Biscoe           34.5          18.1               187        2900
+ 6 Adelie  Biscoe           41.4          18.6               191        3700
+ 7 Adelie  Biscoe           39            17.5               186        3550
+ 8 Adelie  Biscoe           40.6          18.8               193        3800
+ 9 Adelie  Biscoe           36.5          16.6               181        2850
+10 Adelie  Biscoe           37.6          19.1               194        3750
+# ℹ 114 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 11
+Filter the data so you only have observations of male penguins of the Chinstrap species
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution 
+
+
+``` r
+filter(penguins, 
+       sex == "male",
+       species == "Chinstrap")
+```
+
+``` output
+# A tibble: 34 × 8
+   species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>     <fct>           <dbl>         <dbl>             <int>       <int>
+ 1 Chinstrap Dream            50            19.5               196        3900
+ 2 Chinstrap Dream            51.3          19.2               193        3650
+ 3 Chinstrap Dream            52.7          19.8               197        3725
+ 4 Chinstrap Dream            51.3          18.2               197        3750
+ 5 Chinstrap Dream            51.3          19.9               198        3700
+ 6 Chinstrap Dream            51.7          20.3               194        3775
+ 7 Chinstrap Dream            52            18.1               201        4050
+ 8 Chinstrap Dream            50.5          19.6               201        4050
+ 9 Chinstrap Dream            50.3          20                 197        3300
+10 Chinstrap Dream            49.2          18.2               195        4400
+# ℹ 24 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+### The difference between `&` (and)  and `|`(or)
+
+But what if we want all the Chinstrap penguins **or** if body mass is below 3 kilos? When we use the comma (or the &), we make sure that all statements are `TRUE`. But what if we want it so that _either_ statement is true? Then we can use the **or** character `|` .
+
+
+``` r
+filter(penguins, 
+       species == "Chinstrap" | 
+         body_mass_g < 3000)
+```
+
+``` output
+# A tibble: 75 × 8
+   species   island   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>     <fct>             <dbl>         <dbl>             <int>       <int>
+ 1 Adelie    Dream              37.5          18.9               179        2975
+ 2 Adelie    Biscoe             34.5          18.1               187        2900
+ 3 Adelie    Biscoe             36.5          16.6               181        2850
+ 4 Adelie    Biscoe             36.4          17.1               184        2850
+ 5 Adelie    Dream              33.1          16.1               178        2900
+ 6 Adelie    Biscoe             37.9          18.6               193        2925
+ 7 Adelie    Torgers…           38.6          17                 188        2900
+ 8 Chinstrap Dream              46.5          17.9               192        3500
+ 9 Chinstrap Dream              50            19.5               196        3900
+10 Chinstrap Dream              51.3          19.2               193        3650
+# ℹ 65 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+This now gives us both all chinstrap penguins, and the smallest Adelie penguins!
+By combining AND and OR statements this way, we can slowly create the filtering we are after.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 12
+Filter the data so you only have observations of either male penguins or the Chinstrap species
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution 
+
+
+``` r
+filter(penguins, 
+       sex == "male" |
+       species == "Chinstrap")
+```
+
+``` output
+# A tibble: 202 × 8
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.3          20.6               190        3650
+ 3 Adelie  Torgersen           39.2          19.6               195        4675
+ 4 Adelie  Torgersen           38.6          21.2               191        3800
+ 5 Adelie  Torgersen           34.6          21.1               198        4400
+ 6 Adelie  Torgersen           42.5          20.7               197        4500
+ 7 Adelie  Torgersen           46            21.5               194        4200
+ 8 Adelie  Biscoe              37.7          18.7               180        3600
+ 9 Adelie  Biscoe              38.2          18.1               185        3950
+10 Adelie  Biscoe              38.8          17.2               180        3800
+# ℹ 192 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+# Wrap-up
+Now we've learned about subsetting our data, so we can create data sets that are suited to our needs.
diff --git a/04-data-sorting-pipes.md b/04-data-sorting-pipes.md
new file mode 100644
index 0000000..5bcf77a
--- /dev/null
+++ b/04-data-sorting-pipes.md
@@ -0,0 +1,636 @@
+---
+title: "Data sorting and pipes dplyr"
+teaching: 60
+exercises: 7
+---
+
+
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How can I sort the rows in my data?
+- How can I avoid storing intermediate data objects?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Use `arrange()` to sort rows
+- Use the pipe `|>` to chain commands together
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+## Motivation
+
+Getting an overview of our data can be challenging. Breaking it up in smaller pieces can help us get a better understanding of its content.
+Being able to subset data is one part of that, another is to be able to re-arrange rows to get a clearer idea of their content.  
+
+
+## Creating subsetted objects
+
+So far, we have kept working on the penguins data set, without actually altering it. So far, all our actions have been executed, then forgotten by R. Like it never happened. This is actually quite smart, since it makes it harder to do mistakes you can have difficulties changing. 
+
+To store the changes, we have to "assign" the data to a new object in the R environment. Like the penguins data set, which already is an object in our environment we have called "penguins". 
+
+We will now store a filtered version including only the chinstrap penguins, in an object we call `chinstraps`.
+
+
+``` r
+chinstraps <- filter(penguins, species == "Chinstrap")
+```
+
+You will likely notice that when we execute this command, nothing is output to the console. That is expected. When we assign the output of a function somewhere, and everything works (*i.e.*, no errors or warnings), nothing happens in the console.
+
+But you should be able to see the new chinstraps object in your environment, and when we type `chinstraps` in the R console, it prints our chinstraps data.
+
+
+``` r
+chinstraps
+```
+
+``` output
+# A tibble: 68 × 8
+   species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>     <fct>           <dbl>         <dbl>             <int>       <int>
+ 1 Chinstrap Dream            46.5          17.9               192        3500
+ 2 Chinstrap Dream            50            19.5               196        3900
+ 3 Chinstrap Dream            51.3          19.2               193        3650
+ 4 Chinstrap Dream            45.4          18.7               188        3525
+ 5 Chinstrap Dream            52.7          19.8               197        3725
+ 6 Chinstrap Dream            45.2          17.8               198        3950
+ 7 Chinstrap Dream            46.1          18.2               178        3250
+ 8 Chinstrap Dream            51.3          18.2               197        3750
+ 9 Chinstrap Dream            46            18.9               195        4150
+10 Chinstrap Dream            51.3          19.9               198        3700
+# ℹ 58 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+Maybe in this chinstrap data we are also not interested in the bill measurements, so we want to remove them.
+
+
+``` r
+chinstraps <- select(chinstraps, -starts_with("bill"))
+chinstraps
+```
+
+``` output
+# A tibble: 68 × 6
+   species   island flipper_length_mm body_mass_g sex     year
+   <fct>     <fct>              <int>       <int> <fct>  <int>
+ 1 Chinstrap Dream                192        3500 female  2007
+ 2 Chinstrap Dream                196        3900 male    2007
+ 3 Chinstrap Dream                193        3650 male    2007
+ 4 Chinstrap Dream                188        3525 female  2007
+ 5 Chinstrap Dream                197        3725 male    2007
+ 6 Chinstrap Dream                198        3950 female  2007
+ 7 Chinstrap Dream                178        3250 female  2007
+ 8 Chinstrap Dream                197        3750 male    2007
+ 9 Chinstrap Dream                195        4150 female  2007
+10 Chinstrap Dream                198        3700 male    2007
+# ℹ 58 more rows
+```
+Now our data has two less columns, and many fewer rows. A simpler data set for us to work with. But assigning the chinstrap data twice like this is a lot of typing, and there is a simpler way, using something we call the "pipe".
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 1
+Create a new data set called "biscoe", where you only have data from "Biscoe" island, and where you only have the first 4 columns of data.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+ biscoe <- filter(penguins, island == "Biscoe") 
+ biscoe <- select(biscoe, 1:4)
+```
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+## The pipe `|>`
+
+We often want to string together series of functions. This is achieved using pipe operator `|>`. This takes the value on the left, and passes it as the first argument to the function call on the right. 
+
+`|>` is not limited to {dplyr} functions. It's an alternative way of writing any R code.
+
+You can enable the pipe in RStudio by going to Tools -> Global options -> Code -> Use native pipe operator. 
+
+The shortcut to insert the pipe operator is `Ctrl`+`Shift`+`M` for Windows/Linux, and `Cmd`+`Shift`+`M` for Mac.
+
+In the `chinstraps` example, we had the following code to filter the rows and then select our columns.
+
+
+``` r
+chinstraps <- filter(penguins, species == "Chinstrap")
+chinstraps <- select(chinstraps, -starts_with("bill"))
+```
+
+Here we first create the chinstraps data from the filtered penguins data set. Then use that chinstraps data to reduce the columns and write it again back to the same chinstraps object.
+It's a little messy. With the pipe, we can make it more streamlined.
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> assign to the "chinstraps" object, 
+> taking the penguins dataset, and then
+> filtering the species column so we only have Chinstraps, and then
+> selecting away all columns that start with the string "bill"
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+chinstraps <- penguins |> 
+  filter(species == "Chinstrap") |> 
+  select(-starts_with("bill"))
+```
+
+
+The end result is the same, but there is less typing and we can "read" the pipeline of data subsetting more like language, if we know how. You can read the pipe operator as **"and then"**. 
+
+So if we translate the code above to human language we could read it as:
+
+take the penguins data set, and then
+keep only rows for the chinstrap penguins, and then
+remove the columns starting with bill
+and assign the end result to chinstraps.
+
+Learning to read pipes is a great skill, R is not the only programming language that can do this (though the operator is different between languages, the functionality exists in many). 
+
+We can do the entire pipe chain step by step to see what is happening. 
+
+
+``` r
+penguins
+```
+
+``` output
+# A tibble: 344 × 8
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+penguins |> 
+  filter(species == "Chinstrap")
+```
+
+``` output
+# A tibble: 68 × 8
+   species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>     <fct>           <dbl>         <dbl>             <int>       <int>
+ 1 Chinstrap Dream            46.5          17.9               192        3500
+ 2 Chinstrap Dream            50            19.5               196        3900
+ 3 Chinstrap Dream            51.3          19.2               193        3650
+ 4 Chinstrap Dream            45.4          18.7               188        3525
+ 5 Chinstrap Dream            52.7          19.8               197        3725
+ 6 Chinstrap Dream            45.2          17.8               198        3950
+ 7 Chinstrap Dream            46.1          18.2               178        3250
+ 8 Chinstrap Dream            51.3          18.2               197        3750
+ 9 Chinstrap Dream            46            18.9               195        4150
+10 Chinstrap Dream            51.3          19.9               198        3700
+# ℹ 58 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset, and then
+> filtering the species column so we only have Chinstraps
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+penguins |> 
+  filter(species == "Chinstrap") |> 
+  select(-starts_with("bill"))
+```
+
+``` output
+# A tibble: 68 × 6
+   species   island flipper_length_mm body_mass_g sex     year
+   <fct>     <fct>              <int>       <int> <fct>  <int>
+ 1 Chinstrap Dream                192        3500 female  2007
+ 2 Chinstrap Dream                196        3900 male    2007
+ 3 Chinstrap Dream                193        3650 male    2007
+ 4 Chinstrap Dream                188        3525 female  2007
+ 5 Chinstrap Dream                197        3725 male    2007
+ 6 Chinstrap Dream                198        3950 female  2007
+ 7 Chinstrap Dream                178        3250 female  2007
+ 8 Chinstrap Dream                197        3750 male    2007
+ 9 Chinstrap Dream                195        4150 female  2007
+10 Chinstrap Dream                198        3700 male    2007
+# ℹ 58 more rows
+```
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset, and then
+> filtering the species column so we only have Chinstraps, and then
+> selecting away all columns that start with the string "bill"
+
+::::::::::::::::::::::::::::::
+
+So, for each chain step, the output of the previous step is fed into the next step, and that way the commands build on each other until a final end result is made.
+
+And as before, we still are seeing the output of the command chain in the console, meaning we are not storing it.
+Let us do that, again using the assignment.
+
+
+``` r
+chinstraps <- penguins |> 
+  filter(species == "Chinstrap") |> 
+  select(-starts_with("bill"))
+
+chinstraps
+```
+
+``` output
+# A tibble: 68 × 6
+   species   island flipper_length_mm body_mass_g sex     year
+   <fct>     <fct>              <int>       <int> <fct>  <int>
+ 1 Chinstrap Dream                192        3500 female  2007
+ 2 Chinstrap Dream                196        3900 male    2007
+ 3 Chinstrap Dream                193        3650 male    2007
+ 4 Chinstrap Dream                188        3525 female  2007
+ 5 Chinstrap Dream                197        3725 male    2007
+ 6 Chinstrap Dream                198        3950 female  2007
+ 7 Chinstrap Dream                178        3250 female  2007
+ 8 Chinstrap Dream                197        3750 male    2007
+ 9 Chinstrap Dream                195        4150 female  2007
+10 Chinstrap Dream                198        3700 male    2007
+# ℹ 58 more rows
+```
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 2
+Create a new data set called "biscoe", where you only have data from "Biscoe" island, and where you only have the first 4 columns of data. This time use the pipe.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+penguins |> 
+  filter(island == "Biscoe") |> 
+  select(1:4)
+```
+
+``` output
+# A tibble: 168 × 4
+   species island bill_length_mm bill_depth_mm
+   <fct>   <fct>           <dbl>         <dbl>
+ 1 Adelie  Biscoe           37.8          18.3
+ 2 Adelie  Biscoe           37.7          18.7
+ 3 Adelie  Biscoe           35.9          19.2
+ 4 Adelie  Biscoe           38.2          18.1
+ 5 Adelie  Biscoe           38.8          17.2
+ 6 Adelie  Biscoe           35.3          18.9
+ 7 Adelie  Biscoe           40.6          18.6
+ 8 Adelie  Biscoe           40.5          17.9
+ 9 Adelie  Biscoe           37.9          18.6
+10 Adelie  Biscoe           40.5          18.9
+# ℹ 158 more rows
+```
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+## Sorting rows
+So far, we have looked at subsetting the data. But some times, we want to reorganize the data without altering it. In tables, we are used to be able to sort columns in ascending or descending order.
+ 
+This can also be done with {dplyr}'s `arrange()` function. arrange does not alter the data *per se*, just the order in which the rows are stored.
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset, and then
+> arrainging the rows by the island column
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+penguins |> 
+  arrange(island)
+```
+
+``` output
+# A tibble: 344 × 8
+   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Biscoe           37.8          18.3               174        3400
+ 2 Adelie  Biscoe           37.7          18.7               180        3600
+ 3 Adelie  Biscoe           35.9          19.2               189        3800
+ 4 Adelie  Biscoe           38.2          18.1               185        3950
+ 5 Adelie  Biscoe           38.8          17.2               180        3800
+ 6 Adelie  Biscoe           35.3          18.9               187        3800
+ 7 Adelie  Biscoe           40.6          18.6               183        3550
+ 8 Adelie  Biscoe           40.5          17.9               187        3200
+ 9 Adelie  Biscoe           37.9          18.6               172        3150
+10 Adelie  Biscoe           40.5          18.9               180        3950
+# ℹ 334 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+Here we have sorted the data by the island column. Since island is a factor, it will order by the facor levels, which in this case has Biscoe island as the first category. 
+If we sort a numeric column, it will sort by numeric value.
+
+By default, arrange sorts in ascending order. If you want it sorted by descending order, wrap the column name in `desc()`
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset, and then
+> arrainging the rows by the island column in descending order
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+penguins |> 
+  arrange(desc(island))
+```
+
+``` output
+# A tibble: 344 × 8
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 3
+Arrange the penguins data set by `body_mass_g`.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+penguins |> 
+  arrange(body_mass_g)
+```
+
+``` output
+# A tibble: 344 × 8
+   species   island   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>     <fct>             <dbl>         <dbl>             <int>       <int>
+ 1 Chinstrap Dream              46.9          16.6               192        2700
+ 2 Adelie    Biscoe             36.5          16.6               181        2850
+ 3 Adelie    Biscoe             36.4          17.1               184        2850
+ 4 Adelie    Biscoe             34.5          18.1               187        2900
+ 5 Adelie    Dream              33.1          16.1               178        2900
+ 6 Adelie    Torgers…           38.6          17                 188        2900
+ 7 Chinstrap Dream              43.2          16.6               187        2900
+ 8 Adelie    Biscoe             37.9          18.6               193        2925
+ 9 Adelie    Dream              37.5          18.9               179        2975
+10 Adelie    Dream              37            16.9               185        3000
+# ℹ 334 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 4
+ Arrange the penguins data set by descending order of `flipper_length_mm`.
+ 
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+penguins |> 
+  arrange(desc(flipper_length_mm))
+```
+
+``` output
+# A tibble: 344 × 8
+   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
+ 1 Gentoo  Biscoe           54.3          15.7               231        5650
+ 2 Gentoo  Biscoe           50            16.3               230        5700
+ 3 Gentoo  Biscoe           59.6          17                 230        6050
+ 4 Gentoo  Biscoe           49.8          16.8               230        5700
+ 5 Gentoo  Biscoe           48.6          16                 230        5800
+ 6 Gentoo  Biscoe           52.1          17                 230        5550
+ 7 Gentoo  Biscoe           51.5          16.3               230        5500
+ 8 Gentoo  Biscoe           55.1          16                 230        5850
+ 9 Gentoo  Biscoe           49.5          16.2               229        5800
+10 Gentoo  Biscoe           49.8          15.9               229        5950
+# ℹ 334 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 5
+You can arrange on multiple columns! Try arranging the penguins data set by ascending `island` and descending `flipper_length_mm`, using a comma between the two arguments.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+``` r
+penguins |> 
+  arrange(island, desc(flipper_length_mm))
+```
+
+``` output
+# A tibble: 344 × 8
+   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
+ 1 Gentoo  Biscoe           54.3          15.7               231        5650
+ 2 Gentoo  Biscoe           50            16.3               230        5700
+ 3 Gentoo  Biscoe           59.6          17                 230        6050
+ 4 Gentoo  Biscoe           49.8          16.8               230        5700
+ 5 Gentoo  Biscoe           48.6          16                 230        5800
+ 6 Gentoo  Biscoe           52.1          17                 230        5550
+ 7 Gentoo  Biscoe           51.5          16.3               230        5500
+ 8 Gentoo  Biscoe           55.1          16                 230        5850
+ 9 Gentoo  Biscoe           49.5          16.2               229        5800
+10 Gentoo  Biscoe           49.8          15.9               229        5950
+# ℹ 334 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+
+## Putting it all together
+Now that you have learned about ggplot, filter, select and arrange, we can have a look at how we can combine all these to get a better understanding and control over the data.
+By piping commands together, we can slowly build a better understanding of the data in our minds.
+
+We can for instance explore the numeric columns arranged by Island
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset, and then
+> arrainging the rows by the islan column, and then 
+> selecing all columns that are numeric
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+penguins |> 
+  arrange(island) |>
+  select(where(is.numeric)) 
+```
+
+``` output
+# A tibble: 344 × 5
+   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g  year
+            <dbl>         <dbl>             <int>       <int> <int>
+ 1           37.8          18.3               174        3400  2007
+ 2           37.7          18.7               180        3600  2007
+ 3           35.9          19.2               189        3800  2007
+ 4           38.2          18.1               185        3950  2007
+ 5           38.8          17.2               180        3800  2007
+ 6           35.3          18.9               187        3800  2007
+ 7           40.6          18.6               183        3550  2007
+ 8           40.5          17.9               187        3200  2007
+ 9           37.9          18.6               172        3150  2007
+10           40.5          18.9               180        3950  2007
+# ℹ 334 more rows
+```
+
+And we can continue that by looking at the data for only male penguins
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset, and then
+> arrainging the rows by the islan column, and then
+> selecing the island column and all columns that are numeric, and then
+> filtering toe rows so that sex is equals to male
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+penguins |> 
+  arrange(island) |>
+  select(island, where(is.numeric)) |>
+  filter(sex == "male")
+```
+
+``` error
+Error in `filter()`:
+ℹ In argument: `sex == "male"`.
+Caused by error:
+! object 'sex' not found
+```
+
+Whoops! What happened there?
+Try looking at the error message and see if you can understand it.
+
+Its telling us that there is no `sex` column. How can that be?
+Well, we took it away in our select! 
+Since we've only kept numeric data and the island column, the sex column is missing!
+
+The order in which you chain commands together matters. Since the pipe sends the output of the previous command into the next, we have two ways of being able to filter by sex: 
+
+1. by adding sex to our selection
+2. by filtering the data before our selection.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 6
+Fix the previous code bit by applying one of the two solutions suggested.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |> 
+  arrange(island) |>
+  select(sex, island, where(is.numeric)) |>
+  filter(sex == "male")
+```
+
+``` output
+# A tibble: 168 × 7
+   sex   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g  year
+   <fct> <fct>           <dbl>         <dbl>             <int>       <int> <int>
+ 1 male  Biscoe           37.7          18.7               180        3600  2007
+ 2 male  Biscoe           38.2          18.1               185        3950  2007
+ 3 male  Biscoe           38.8          17.2               180        3800  2007
+ 4 male  Biscoe           40.6          18.6               183        3550  2007
+ 5 male  Biscoe           40.5          18.9               180        3950  2007
+ 6 male  Biscoe           40.1          18.9               188        4300  2008
+ 7 male  Biscoe           42            19.5               200        4050  2008
+ 8 male  Biscoe           41.4          18.6               191        3700  2008
+ 9 male  Biscoe           40.6          18.8               193        3800  2008
+10 male  Biscoe           37.6          19.1               194        3750  2008
+# ℹ 158 more rows
+```
+
+``` r
+penguins |> 
+  filter(sex == "male") |>
+  arrange(island) |>
+  select(island, where(is.numeric))
+```
+
+``` output
+# A tibble: 168 × 6
+   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g  year
+   <fct>           <dbl>         <dbl>             <int>       <int> <int>
+ 1 Biscoe           37.7          18.7               180        3600  2007
+ 2 Biscoe           38.2          18.1               185        3950  2007
+ 3 Biscoe           38.8          17.2               180        3800  2007
+ 4 Biscoe           40.6          18.6               183        3550  2007
+ 5 Biscoe           40.5          18.9               180        3950  2007
+ 6 Biscoe           40.1          18.9               188        4300  2008
+ 7 Biscoe           42            19.5               200        4050  2008
+ 8 Biscoe           41.4          18.6               191        3700  2008
+ 9 Biscoe           40.6          18.8               193        3800  2008
+10 Biscoe           37.6          19.1               194        3750  2008
+# ℹ 158 more rows
+```
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+# Wrap-up
+
+Now we've learned about subsetting and sorting our data, so we can create data sets that are suited to our needs.
+We also learned about chaining commands, the use of the pipe to create a series of commands that build on each other to create a final wanted output.
diff --git a/05-data-plotting-scales.md b/05-data-plotting-scales.md
new file mode 100644
index 0000000..d7e5439
--- /dev/null
+++ b/05-data-plotting-scales.md
@@ -0,0 +1,398 @@
+---
+title: "Data visualisation and scales"
+teaching: 60
+exercises: 5
+---
+
+
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How can I change the colour in my plots?
+- How can I change the general look of my plot?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Use `scale_fill_xxx()` and `scale_colour_xxx()` to change colours in your plot.
+- Use the `theme()` functions to change the general look of your plot.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+## Motivation
+
+Now that we know how to subset and re-arrange our data a little, its time to explore the data again in plots. 
+
+Knowing how to apply what we know so far, with plotting, can help us create more exciting and informative plots.
+Additionally, changing the colour and general look of the plot might be necessary to adapt to journal expectation or company branding.
+
+# Piping into ggplot
+
+Since we know about pipes, we should also explore how we can combine the pipes with 
+ggplot, to reduce the data solely for the purpose of a plot, without changing the actual data. 
+Perhaps you only want to plot the bill length of the males, to explore that data more directly.
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset, and then
+> filter the rows so we only have male penguins, and then
+> plot the data with ggplot, with bill length on the x-axis, and add
+> a bar chart
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+penguins |> 
+  filter(sex == "male") |>
+  ggplot(aes(bill_length_mm)) +
+  geom_bar()
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-2-1.png" style="display: block; margin: auto;" />
+
+Now we only plot data from the male penguins, if we are particularly interested in those.
+This can be quite convenient if you have particularly large data and need to reduce it to get a proper idea of what the variables really look like.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 1
+Create a plot of only data from the Dream island, putting flipper length on the y-axis and species on the x-axis. Make it a box-plot.
+
+:::::::::::::::::::::::::::::::::::::::: hint
+Try geom_boxplot
+:::::::::::::::::::::::::::::::::::::::: 
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+penguins |> 
+  filter(island == "Dream") |> 
+  ggplot(aes(x = species, y = flipper_length_mm)) + 
+  geom_boxplot()
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-3-1.png" style="display: block; margin: auto;" />
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+# Adding colour
+
+This plot is a little boring, so let us spruce it up!
+How about adding colour to the boxplot? 
+We do this by using the `colour`/`color` argument in ggplot2.
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset, and then
+> filter the rows so we penguins from the Dream island, and then
+> plot the data with ggplot, with species on the x-axis and flipper length on the y-axis, and add
+> a box plot
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+penguins |> 
+  filter(island == "Dream") |> 
+  ggplot(aes(x = species, y = flipper_length_mm)) + 
+  geom_boxplot(aes(colour = species))
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-4-1.png" style="display: block; margin: auto;" />
+
+Did that look as you expected? 
+Maybe you expected the rectangles of the boxes to be coloured, rather than the edges?
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 2
+Change the previous boxplot argument `colour` to `fill` 
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+Learning the difference between using `fill` and `colour`/`color` can take a little time,
+but in general colour gives colour to edges, while fill floods elements.
+
+
+``` r
+penguins |> 
+  filter(island == "Dream") |> 
+  ggplot(aes(x = species, y = flipper_length_mm)) + 
+  geom_boxplot(aes(fill = species))
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-5-1.png" style="display: block; margin: auto;" />
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+# Changing colour
+
+Now, default colours are well and fine for quick plots and exploring data, but we usually all end up changing the colours when we start preparing for publication or reports. 
+In ggplot, we change the colours using the `scale_`functions.
+The scale functions actually cover much more than just colour/fill. 
+They can change the types of points in point plots, different types of scales for the axes (logarithmic, percent, currency), and lots more! 
+We will focus on colour/fill here, but once you start exploring these options, there are almost no limits to what you can do!
+
+Let's say you are publishing in a journal with strict policy on black and white only.
+Its better to prepare you  plot in back and white your self, rather than relying on conversion of colour to black and white, you might be surprised at how little distinction there are between colours when the actually colour is stripped.
+
+Let us start with the plot we just made, and test what types of options we get when starting to add `scale_fill_` in the script. 
+We get lots of preview options, "brewer", "continuous", "gradient", too many options?
+
+There's one called `scale_fill_grey()` let us try that one for convenience!
+
+
+``` r
+penguins |> 
+  filter(island == "Dream") |> 
+  ggplot(aes(x = species, y = flipper_length_mm)) + 
+  geom_boxplot(aes(fill = species)) +
+  scale_fill_grey()
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-6-1.png" style="display: block; margin: auto;" />
+
+Ok! The colours are now changed, and the legend with it, quite convenient.
+But, the grey used is the same as for the lines, masking the median line for the Adelie box.
+That won't do. Let us try something else.
+
+
+
+``` r
+penguins |> 
+  filter(island == "Dream") |> 
+  ggplot(aes(x = species, y = flipper_length_mm)) + 
+  geom_boxplot(aes(fill = species)) +
+  scale_fill_manual(values = c("black", "white"))
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-7-1.png" style="display: block; margin: auto;" />
+
+This is maybe a little stark, but the difference is clear between the two, and that's what we are after right now.
+Using the `manual` version of scales means you manually add the colours you want to use. 
+You can specify colours by name and hexidecimal code, whichever you find better to work with.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 3
+Base you plot on the same as we have used so far.
+Change the colours to coral and cyan
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+"coral" and "cyan" are built in colour names, that you can call directly. 
+There are lots of these names, [datanovia](https://www.datanovia.com/en/blog/awesome-list-of-657-r-color-names/) has a great list of them
+
+
+``` r
+penguins |> 
+  filter(island == "Dream") |> 
+  ggplot(aes(x = species, y = flipper_length_mm)) + 
+  geom_boxplot(aes(fill = species)) +
+  scale_fill_manual(values = c("coral", "cyan"))
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-8-1.png" style="display: block; margin: auto;" />
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 4
+Base you plot on the same as we have used so far.
+Change the colours to the hexidecmial colours "#6597aa" and "#cc6882"
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+hexidecimal colour codes are often use in webdesign, and are a way of coding 
+red, blue and green. To explore colours in hexidecmial, there are lots of we resources
+like [color-hex.com](https://www.color-hex.com/)
+
+
+``` r
+penguins |> 
+  filter(island == "Dream") |> 
+  ggplot(aes(x = species, y = flipper_length_mm)) + 
+  geom_boxplot(aes(fill = species)) +
+  scale_fill_manual(values = c("#6597aa", "#cc6882"))
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-9-1.png" style="display: block; margin: auto;" />
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 5
+Base you plot on the same as we have used so far.
+Change the order for the hexidecimal colours in the previous plot.
+what did that do?
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+The order you provide the manual colours dictate which category gets which colour.
+
+
+``` r
+penguins |> 
+  filter(island == "Dream") |> 
+  ggplot(aes(x = species, y = flipper_length_mm)) + 
+  geom_boxplot(aes(fill = species)) +
+  scale_fill_manual(values = c("#cc6882", "#6597aa"))
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-10-1.png" style="display: block; margin: auto;" />
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 6
+Now, make an entirely different plot. Take the entire penguins dataset,
+and plot bill depth on the x-axis and bill length on the y.
+Create a point plot, with the points coloured by bill length.
+Try changing the colour of the points. What types of scales can you use?
+
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+There is not single answer here, there are many different options.
+The key difference between what we did before and this, is that the colouring scale
+is continuous, rather than categorical, so we need _slightly_ different versions.
+
+
+``` r
+penguins |> 
+  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) + 
+  geom_point(aes(colour = bill_length_mm)) +
+  scale_colour_viridis_c()
+```
+
+``` warning
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-11-1.png" style="display: block; margin: auto;" />
+
+
+``` r
+penguins |> 
+  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) + 
+  geom_point(aes(colour = bill_length_mm)) +
+  scale_colour_gradientn(colours = c("#6597aa", "#cc6882"))
+```
+
+``` warning
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-12-1.png" style="display: block; margin: auto;" />
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+# Changing the overall look
+
+Now that we know more about changing the colours, we might want something else than the default look with the grey background etc. Just like with the default colours, it serves its generally quick look purpose, but we likely want to change it.
+
+The `theme()` functions are there to help you get control over how a plot looks. 
+There are lots of different themes to choose from, that form a great basis for all you need.
+
+
+``` r
+penguins |> 
+  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) + 
+  geom_point(aes(colour = bill_length_mm)) +
+  scale_colour_gradientn(colours = c("#6597aa", "#cc6882")) +
+  theme_minimal()
+```
+
+``` warning
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-13-1.png" style="display: block; margin: auto;" />
+
+Here we have chosen `theme_minimal()` which strips axis lines and the grey background, its more minimal. 
+Explore some different options by typing `theme_` and pressing the `tab` key to see what options there are.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 7
+Use the same plot we have been working on, and change the theme to the "classic theme
+
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+The classic theme is one often wanted by strict and old-school journals. 
+Its very handy to have a short-cut to it.
+
+
+
+``` r
+penguins |> 
+  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) + 
+  geom_point(aes(colour = bill_length_mm)) +
+  scale_colour_gradientn(colours = c("#6597aa", "#cc6882")) +
+  theme_classic()
+```
+
+``` warning
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-14-1.png" style="display: block; margin: auto;" />
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 8
+Now try the void theme. Is this a meaningful theme to use for data plots=
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+The void theme strips all axis and background, leaving the plot alone.
+This is generally not a meaningful theme to use for publication, but could
+be good to use if you ever dwelve into the world of [generative art](https://blog.djnavarro.net/posts/2021-10-19_rtistry-posts/).
+
+
+
+``` r
+penguins |> 
+  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) + 
+  geom_point(aes(colour = bill_length_mm)) +
+  scale_colour_gradientn(colours = c("#6597aa", "#cc6882")) +
+  theme_void()
+```
+
+``` warning
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/05-data-plotting-scales-rendered-unnamed-chunk-15-1.png" style="display: block; margin: auto;" />
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+# Wrap up
+
+There is a lot more we could teach you about customising your plots to look how you want. 
+There are many web resources you can look at to help you along they way, like on [The MockUp](https://themockup.blog/posts/2020-12-26-creating-and-using-custom-ggplot2-themes/).
+But if you dont want to deal with too many details, you can always isntall and use tne [ggthemes](https://jrnold.github.io/ggthemes/reference/index.html) package, which can create
+plots that look like your old favourite tools made them (like SPSS, Stata, excel. etc.).
diff --git a/06-data-manipulation.md b/06-data-manipulation.md
new file mode 100644
index 0000000..a24e586
--- /dev/null
+++ b/06-data-manipulation.md
@@ -0,0 +1,555 @@
+---
+title: "Data manipulation with dplyr"
+teaching: 60
+exercises: 5
+---
+
+
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How can I add variables to my data?
+- How can I alter the variables already in my data?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Use `mutate()` to add and alter variables
+- Use `if_else()` where appropriate
+- Use `case_when()` where appropriate
+- Understand basic consents of different data types
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+## Motivation
+
+Often, the data we have do not contain exactly what we need. We might need to change the order of factors, create new variables based on other columns in the data, or even variables conditional on specific values in other columns. 
+
+
+# Adding new variables,
+
+In {tidyverse}, when we add new variables, we use the `mutate()` function. Just like the other {tidyverse} functions, mutate work specifically with data sets, and provides a nice shorthand for working directly with the columns in the data set. 
+
+
+``` r
+penguins |> 
+  mutate(new_var = 1)
+```
+
+``` output
+# A tibble: 344 × 9
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 3 more variables: sex <fct>, year <int>, new_var <dbl>
+```
+
+The output of this can be hard to spot, depending on the size of the screen.
+Let us for convenience create a subsetted data set to work on so we can easily see what we are doing.
+
+
+``` r
+penguins_s <- penguins |> 
+  select(1:3, starts_with("bill"))
+```
+
+Lets try our command again on this new data.
+
+
+``` r
+penguins_s |> 
+  mutate(new_var = 1)
+```
+
+``` output
+# A tibble: 344 × 5
+   species island    bill_length_mm bill_depth_mm new_var
+   <fct>   <fct>              <dbl>         <dbl>   <dbl>
+ 1 Adelie  Torgersen           39.1          18.7       1
+ 2 Adelie  Torgersen           39.5          17.4       1
+ 3 Adelie  Torgersen           40.3          18         1
+ 4 Adelie  Torgersen           NA            NA         1
+ 5 Adelie  Torgersen           36.7          19.3       1
+ 6 Adelie  Torgersen           39.3          20.6       1
+ 7 Adelie  Torgersen           38.9          17.8       1
+ 8 Adelie  Torgersen           39.2          19.6       1
+ 9 Adelie  Torgersen           34.1          18.1       1
+10 Adelie  Torgersen           42            20.2       1
+# ℹ 334 more rows
+```
+
+There is now a new column in the data set called "new_var", and it has the value 1 for all rows!
+This is what we told `mutate()` to do! We specified a new column by name, and gave it a specific value, `1`. 
+
+This works because its easy to assigning a single value to all rows. What if we try to give it three values? What would we expect?
+
+
+``` r
+penguins_s |> 
+  mutate(var = 1:3)
+```
+
+``` error
+Error in `mutate()`:
+ℹ In argument: `var = 1:3`.
+Caused by error:
+! `var` must be size 344 or 1, not 3.
+```
+
+Here, it's failing with a mysterious message. The error is telling us that input must be of size 344 or 1. 344 are the number of rows in the data set, so its telling us the input we gave it is not suitable because its neither of length 344 nor of length 1. 
+
+So now we know the premises for mutate, it takes inputs that are either of the same length as there are rows in the data set or length 1. 
+
+``` r
+penguins_s |> 
+  mutate(var = 1:344)
+```
+
+``` output
+# A tibble: 344 × 5
+   species island    bill_length_mm bill_depth_mm   var
+   <fct>   <fct>              <dbl>         <dbl> <int>
+ 1 Adelie  Torgersen           39.1          18.7     1
+ 2 Adelie  Torgersen           39.5          17.4     2
+ 3 Adelie  Torgersen           40.3          18       3
+ 4 Adelie  Torgersen           NA            NA       4
+ 5 Adelie  Torgersen           36.7          19.3     5
+ 6 Adelie  Torgersen           39.3          20.6     6
+ 7 Adelie  Torgersen           38.9          17.8     7
+ 8 Adelie  Torgersen           39.2          19.6     8
+ 9 Adelie  Torgersen           34.1          18.1     9
+10 Adelie  Torgersen           42            20.2    10
+# ℹ 334 more rows
+```
+
+But generally, we create new columns based on other data in the data set. So let's do a more useful example. For instance, perhaps we want to use the ratio between the bill length and depth as a measurement for a model.
+
+
+``` r
+penguins_s |> 
+  mutate(bill_ratio = bill_length_mm / bill_depth_mm)
+```
+
+``` output
+# A tibble: 344 × 5
+   species island    bill_length_mm bill_depth_mm bill_ratio
+   <fct>   <fct>              <dbl>         <dbl>      <dbl>
+ 1 Adelie  Torgersen           39.1          18.7       2.09
+ 2 Adelie  Torgersen           39.5          17.4       2.27
+ 3 Adelie  Torgersen           40.3          18         2.24
+ 4 Adelie  Torgersen           NA            NA        NA   
+ 5 Adelie  Torgersen           36.7          19.3       1.90
+ 6 Adelie  Torgersen           39.3          20.6       1.91
+ 7 Adelie  Torgersen           38.9          17.8       2.19
+ 8 Adelie  Torgersen           39.2          19.6       2   
+ 9 Adelie  Torgersen           34.1          18.1       1.88
+10 Adelie  Torgersen           42            20.2       2.08
+# ℹ 334 more rows
+```
+
+So, here we have asked for the ratio between bill length and depth to be calculated and stored in a column named `bill_ratio`. Then we selected just the `bill` columns to have a peak at the output more directly. 
+
+We can do almost anything within a `mutate()` to get the values as we want them, also use functions that exist in R to transform the data. For instance, perhaps we want to scale the variables of interest to have a mean of 0 and standard deviation of 1, which is quite common to improve statistical modelling. We can do that with the `scale()` function.
+
+
+``` r
+penguins_s |> 
+  mutate(bill_ratio = bill_length_mm / bill_depth_mm,
+         bill_length_mm_z = scale(bill_length_mm))
+```
+
+``` output
+# A tibble: 344 × 6
+   species island   bill_length_mm bill_depth_mm bill_ratio bill_length_mm_z[,1]
+   <fct>   <fct>             <dbl>         <dbl>      <dbl>                <dbl>
+ 1 Adelie  Torgers…           39.1          18.7       2.09               -0.883
+ 2 Adelie  Torgers…           39.5          17.4       2.27               -0.810
+ 3 Adelie  Torgers…           40.3          18         2.24               -0.663
+ 4 Adelie  Torgers…           NA            NA        NA                  NA    
+ 5 Adelie  Torgers…           36.7          19.3       1.90               -1.32 
+ 6 Adelie  Torgers…           39.3          20.6       1.91               -0.847
+ 7 Adelie  Torgers…           38.9          17.8       2.19               -0.920
+ 8 Adelie  Torgers…           39.2          19.6       2                  -0.865
+ 9 Adelie  Torgers…           34.1          18.1       1.88               -1.80 
+10 Adelie  Torgers…           42            20.2       2.08               -0.352
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 1
+Create a column where bill length is transformed to cm. To transform mm to cm, you must divide the mm value by 10. Name the column bill_length_cm.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins_s |>
+  mutate(bill_length_cm = bill_length_mm / 10)
+```
+
+``` output
+# A tibble: 344 × 5
+   species island    bill_length_mm bill_depth_mm bill_length_cm
+   <fct>   <fct>              <dbl>         <dbl>          <dbl>
+ 1 Adelie  Torgersen           39.1          18.7           3.91
+ 2 Adelie  Torgersen           39.5          17.4           3.95
+ 3 Adelie  Torgersen           40.3          18             4.03
+ 4 Adelie  Torgersen           NA            NA            NA   
+ 5 Adelie  Torgersen           36.7          19.3           3.67
+ 6 Adelie  Torgersen           39.3          20.6           3.93
+ 7 Adelie  Torgersen           38.9          17.8           3.89
+ 8 Adelie  Torgersen           39.2          19.6           3.92
+ 9 Adelie  Torgersen           34.1          18.1           3.41
+10 Adelie  Torgersen           42            20.2           4.2 
+# ℹ 334 more rows
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 2
+Create a column for body mass in kilos, rather than grams, in the main penguins data set. Name the column body_mass_kg. To transform grams to kilograms, divide the grams by 1000. 
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |>
+  mutate(body_mass_kg = body_mass_g / 1000)
+```
+
+``` output
+# A tibble: 344 × 9
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 3 more variables: sex <fct>, year <int>, body_mass_kg <dbl>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+
+## Adding conditional variables
+
+Sometimes, we want to assign certain data values based on other variables in the data set. 
+For instance, maybe we want to classify all penguins with body mass above 4.5 kg as "large" while the rest are "normal"?
+
+The `if_else()` function takes expressions, much like `filter()`.
+The first value after the expression is the value assigned if the expression is `TRUE`, while the second is if the expression is `FALSE`
+
+
+``` r
+penguin_weight <- penguins |> 
+  select(year, body_mass_g)
+
+penguin_weight |> 
+  mutate(size = if_else(condition = body_mass_g > 4500, 
+                        true = "large", 
+                        false = "normal"))
+```
+
+``` output
+# A tibble: 344 × 3
+    year body_mass_g size  
+   <int>       <int> <chr> 
+ 1  2007        3750 normal
+ 2  2007        3800 normal
+ 3  2007        3250 normal
+ 4  2007          NA <NA>  
+ 5  2007        3450 normal
+ 6  2007        3650 normal
+ 7  2007        3625 normal
+ 8  2007        4675 large 
+ 9  2007        3475 normal
+10  2007        4250 normal
+# ℹ 334 more rows
+```
+
+Now we have a column with two values, `large` and `normal` based on whether the penguins are above or below 4.5 kilos.
+
+We can for instance use that in a plot.
+
+
+``` r
+penguin_weight |> 
+  mutate(size = if_else(condition = body_mass_g > 4500, 
+                        true = "large", 
+                        false = "normal")) |> 
+  ggplot() +
+  geom_jitter(mapping = aes(x = year, y = body_mass_g, colour = size))
+```
+
+``` warning
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/06-data-manipulation-rendered-unnamed-chunk-12-1.png" style="display: block; margin: auto;" />
+
+That shows us clearly that we have grouped the penguins based on their size. But there is this strange `NA` in the plot legend. what is that? 
+
+In R, missing values are usually given the value `NA` which stands for `Not applicable`, *i.e.*, missing data. This is a very special name in R. Like `TRUE` and `FALSE` are capitalized, RStudio immediately recognizes the combination of capital letters and gives it another colour than all other values. In this case it means, there are some penguins we do not have the body mass of.
+
+Now we know how to create new variables, and even how to make them if there are conditions on how to add the data.
+
+But, we often want to add several columns of different types, and maybe even add new variables based on other new columns!
+Oh, it's starting to sound complicated, but it does not have to be!
+
+`mutate()` is so-called lazy-evaluated. This sounds weird, but it means that each new column you make is made in the sequence you make them. So as long as you think about the order of your `mutate()` creations, you can do that in a single mutate call.
+
+
+``` r
+penguins_s |> 
+  mutate(
+    bill_ratio = bill_depth_mm / bill_length_mm,
+    bill_type = if_else(condition = bill_ratio < 0.5, 
+                        true = "elongated", 
+                        false = "stumped")
+  )
+```
+
+``` output
+# A tibble: 344 × 6
+   species island    bill_length_mm bill_depth_mm bill_ratio bill_type
+   <fct>   <fct>              <dbl>         <dbl>      <dbl> <chr>    
+ 1 Adelie  Torgersen           39.1          18.7      0.478 elongated
+ 2 Adelie  Torgersen           39.5          17.4      0.441 elongated
+ 3 Adelie  Torgersen           40.3          18        0.447 elongated
+ 4 Adelie  Torgersen           NA            NA       NA     <NA>     
+ 5 Adelie  Torgersen           36.7          19.3      0.526 stumped  
+ 6 Adelie  Torgersen           39.3          20.6      0.524 stumped  
+ 7 Adelie  Torgersen           38.9          17.8      0.458 elongated
+ 8 Adelie  Torgersen           39.2          19.6      0.5   stumped  
+ 9 Adelie  Torgersen           34.1          18.1      0.531 stumped  
+10 Adelie  Torgersen           42            20.2      0.481 elongated
+# ℹ 334 more rows
+```
+
+Now you've created two variables. One for `bill_ratio`, and then another one conditional on the values of the `bill_ratio`.
+
+If you switched the order of these two, R would produce an error, because there would be no bill ratio to create the other column.
+
+
+``` r
+penguins_s |> 
+  mutate(
+    bill_ratio = bill_depth_mm / bill_length_mm,
+    bill_type = if_else(condition = bill_ratio < 0.5, 
+                        true = "elongated", 
+                        false = "stumped"),
+    bill_ratio = bill_depth_mm / bill_length_mm
+  )
+```
+
+``` output
+# A tibble: 344 × 6
+   species island    bill_length_mm bill_depth_mm bill_ratio bill_type
+   <fct>   <fct>              <dbl>         <dbl>      <dbl> <chr>    
+ 1 Adelie  Torgersen           39.1          18.7      0.478 elongated
+ 2 Adelie  Torgersen           39.5          17.4      0.441 elongated
+ 3 Adelie  Torgersen           40.3          18        0.447 elongated
+ 4 Adelie  Torgersen           NA            NA       NA     <NA>     
+ 5 Adelie  Torgersen           36.7          19.3      0.526 stumped  
+ 6 Adelie  Torgersen           39.3          20.6      0.524 stumped  
+ 7 Adelie  Torgersen           38.9          17.8      0.458 elongated
+ 8 Adelie  Torgersen           39.2          19.6      0.5   stumped  
+ 9 Adelie  Torgersen           34.1          18.1      0.531 stumped  
+10 Adelie  Torgersen           42            20.2      0.481 elongated
+# ℹ 334 more rows
+```
+
+But what if we want to categorize based on more than one condition? Nested `if_else()`?
+
+
+``` r
+penguins_s |> 
+  mutate(
+    bill_ratio = bill_depth_mm / bill_length_mm,
+    bill_type = if_else(condition = bill_ratio < 0.35,
+                        true =  "elongated", 
+                        false = if_else(condition = bill_ratio < 0.45,
+                                        true = "normal",
+                                        false = "stumped")))
+```
+
+``` output
+# A tibble: 344 × 6
+   species island    bill_length_mm bill_depth_mm bill_ratio bill_type
+   <fct>   <fct>              <dbl>         <dbl>      <dbl> <chr>    
+ 1 Adelie  Torgersen           39.1          18.7      0.478 stumped  
+ 2 Adelie  Torgersen           39.5          17.4      0.441 normal   
+ 3 Adelie  Torgersen           40.3          18        0.447 normal   
+ 4 Adelie  Torgersen           NA            NA       NA     <NA>     
+ 5 Adelie  Torgersen           36.7          19.3      0.526 stumped  
+ 6 Adelie  Torgersen           39.3          20.6      0.524 stumped  
+ 7 Adelie  Torgersen           38.9          17.8      0.458 stumped  
+ 8 Adelie  Torgersen           39.2          19.6      0.5   stumped  
+ 9 Adelie  Torgersen           34.1          18.1      0.531 stumped  
+10 Adelie  Torgersen           42            20.2      0.481 stumped  
+# ℹ 334 more rows
+```
+
+what if you have even more conditionals? It can get pretty messy pretty fast.
+Thankfully, {dplyr} has a smarter way of doing this, called `case_when()`. This function is similar to `if_else()`, but where you specify what each condition should be assigned.
+On the left you have the logical expression, and the on the right of the tilde (`~`) is the value to be assigned if that expression is `TRUE`
+
+
+``` r
+penguins_s |> 
+  mutate(
+    bill_ratio = bill_depth_mm / bill_length_mm,
+    bill_type = case_when(
+      bill_ratio < 0.35 ~ "elongated",
+      bill_ratio < 0.45 ~ "normal",
+      TRUE              ~ "stumped")
+  ) |> 
+  ggplot(mapping = aes(x = bill_length_mm,
+                       y = bill_depth_mm,
+                       colour = bill_type)) +
+  geom_point()
+```
+
+``` warning
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/06-data-manipulation-rendered-unnamed-chunk-16-1.png" style="display: block; margin: auto;" />
+
+
+That looks almost the same. The `NA`'s are gone! That's not right. We cannot categorize values that are missing. It's our last statement that does this, which just says "make the remainder this value". Which is not what we want. We need the `NA`s to stay `NA`'s. 
+
+`case_when()`, like the `mutate()`, evaluates the expressions in sequence. Which is why we can have two statements evaluating the same column with similar expressions (below 0.35 and then below 0.45). All values that are below 0.45 are also below 0.35. Since we first assign everything below 0.35, and then below 0.45, they do not collide. We can do the same for our last statement, saying that all values that are not `NA` should be given this category.
+
+
+``` r
+penguins |> 
+  mutate(
+    bill_ratio = bill_depth_mm / bill_length_mm,
+    bill_type = case_when(
+      bill_ratio < 0.35  ~ "elongated",
+      bill_ratio < 0.45  ~ "normal",
+      !is.na(bill_ratio) ~ "stumped")
+  ) |> 
+  ggplot(mapping = aes(x = bill_length_mm,
+                       y = bill_depth_mm,
+                       colour = bill_type)) +
+  geom_point()
+```
+
+``` warning
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+<img src="fig/06-data-manipulation-rendered-unnamed-chunk-17-1.png" style="display: block; margin: auto;" />
+
+Here, we use the `is.na()`, which is a special function in R to detect `NA` values. But it also has an `!` in front, what does that mean? In R's logical expressions, the `!` is a negation specifier. It means it flips the logical so the `TRUE` becomes `FALSE`, and *vice versa*. So here, it means the `bill_ratio` is **not** `NA`.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 3
+Create a column named `bill_ld_ratio_log` that is the natural logarithm (using the `log()` function) of `bill_length_mm` divided by `bill_depth_mm`
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |>
+  mutate(bill_ld_ratio_log = log(bill_length_mm / bill_depth_mm))
+```
+
+``` output
+# A tibble: 344 × 9
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 3 more variables: sex <fct>, year <int>, bill_ld_ratio_log <dbl>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 4
+Create a new column called `body_type`, where animals below 3 kg are `small`, animals between 3 and 4.5 kg are `normal`, and animals larger than 4.5 kg are `large`. In the same command, create a new column named `biscoe` and its content should be `TRUE` if the island is `Biscoe` and `FALSE` for everything else.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |>
+  mutate(
+    body_type = case_when(
+       body_mass_g < 3000 ~ "small",
+       body_mass_g >= 3000 & body_mass_g < 4500 ~ "normal",
+       body_mass_g >= 4500 ~ "large"),
+    biscoe = if_else(island == "Biscoe", 
+                     true = TRUE,
+                     false = FALSE)
+  )
+```
+
+``` output
+# A tibble: 344 × 10
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 4 more variables: sex <fct>, year <int>, body_type <chr>, biscoe <lgl>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+# Wrap up
+
+Now we've learned a little about adding and altering variables in data sets using {dplyr}'s `mutate()` function. 
+You should be able to play around with the examples provided and learn more about how things work through trial and error. 
diff --git a/07-data-reshaping.md b/07-data-reshaping.md
new file mode 100644
index 0000000..c18bd35
--- /dev/null
+++ b/07-data-reshaping.md
@@ -0,0 +1,676 @@
+---
+title: "Reshaping data with tidyr"
+teaching: 60
+exercises: 6
+---
+
+
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How can I make my data into a longer format?
+- How can I get my data into a wider format?
+
+::::::::::::::::::::::::::::::::::::::
+
+:::::::::::::::::::::::::::::::::::::: objectives
+
+- Use `pivot_longer()` to reshape data into a longer format
+- Use `pivot_wider()` to reshape data into a wider format
+
+::::::::::::::::::::::::::::::::::::::
+
+## Motivation
+
+Data come in a myriad of different shapes, and talking about data set can often become confusing as people are used to data being in different formats, and they call these formats different things.
+In the tidyverse, "tidy" data is a very opinionated term so that we can all talk about data with more common ground.
+
+The goal of the tidyr package is to help you create tidy data. 
+
+Tidy data is data where:
+
+- Every column is variable.  
+- Every row is an observation.  
+- Every cell is a single value.  
+
+Tidy data describes a standard way of storing data that is used wherever possible throughout the tidyverse. 
+If you ensure that your data is tidy, you'll spend less time fighting with the tools and more time working on your analysis. 
+Learn more about tidy data in `vignette("tidy-data")`.
+
+## Tall/long vs. wide data
+
+- Tall (or long) data are considered "tidy", in that they adhere to the three tidy-data principles  
+
+- Wide data are not necessarily "messy", but have a shape less ideal for easy handling in the tidyverse  
+
+Example in longitudinal data design:
+
+- wide data: each participant has a single row of data, with all longitudinal observations in separate columns  
+- tall data: a participant has as many rows as longitudinal time points, with measures in separate columns
+
+
+<img src="fig/06-tall_wide.gif" style="display: block; margin: auto;" />
+
+# Creating longer data
+
+Let us first talk about creating longer data.
+In most cases, you will encounter data that is in wide format, this is what is often taught in many disciplines and also necessary to run certain analyses in statistical programs like SPSS. 
+In R, and specifically the tidyverse, working on long data has clear advantages, which we wil be exploring here while we also do the transformations.
+
+As before, we need to start off by making sure we have the tidyverse package loaded, and the penguins dataset ready at hand.
+
+
+In tidyverse, there is a single function to create longer data sets, called `pivot_longer`. Those of you who might have some prior experience with tidyverse, or you might encounter it when googling for help,  might have seen the `gather` function. This is an older function of similar capabilities which we will not cover here, as the `pivot_longer` function supersedes it. 
+
+
+``` r
+penguins |> 
+  pivot_longer(contains("_")) 
+```
+
+``` output
+# A tibble: 1,376 × 6
+   species island    sex     year name               value
+   <fct>   <fct>     <fct>  <int> <chr>              <dbl>
+ 1 Adelie  Torgersen male    2007 bill_length_mm      39.1
+ 2 Adelie  Torgersen male    2007 bill_depth_mm       18.7
+ 3 Adelie  Torgersen male    2007 flipper_length_mm  181  
+ 4 Adelie  Torgersen male    2007 body_mass_g       3750  
+ 5 Adelie  Torgersen female  2007 bill_length_mm      39.5
+ 6 Adelie  Torgersen female  2007 bill_depth_mm       17.4
+ 7 Adelie  Torgersen female  2007 flipper_length_mm  186  
+ 8 Adelie  Torgersen female  2007 body_mass_g       3800  
+ 9 Adelie  Torgersen female  2007 bill_length_mm      40.3
+10 Adelie  Torgersen female  2007 bill_depth_mm       18  
+# ℹ 1,366 more rows
+```
+
+pivot_longer takes tidy-select column arguments, so it is easy to grab all the columns you are after. Here, we are pivoting longer all columns that contain an underscore. And what happens? We now have less columns, but also two new columns we did not have before! In the `name` column, all our previous columns names are, one after the other. And in the `value` column, all the cell values for the observations! 
+So before, the data was wider, in that each of the columns with `_` had their own column, while now, they are all collected into two columns instead of 4.
+
+Why would we want to do that? Well, perhaps we want to plot all the variables in a single ggplot call? Now that the measurement types are collected in these two ways, we can facet over the `name` column to create a sub-plot per measurement type!
+
+
+``` r
+penguins |> 
+  pivot_longer(contains("_")) |> 
+  ggplot(aes(y = value, 
+             x = species,
+             fill = species)) +
+  geom_boxplot() +
+  facet_wrap(~name, scales = "free_y")
+```
+
+``` warning
+Warning: Removed 8 rows containing non-finite outside the scale range
+(`stat_boxplot()`).
+```
+
+<img src="fig/07-data-reshaping-rendered-unnamed-chunk-4-1.png" style="display: block; margin: auto;" />
+
+That's pretty neat. By pivoting the data into this longer shape we are able to create sub-plots for all measurements easily with the same ggplot call and have them consistent, and nicely aligned. This longer format is also great for summaries, which we will be covering tomorrow.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 1
+Pivot longer all columns ending with "mm" .
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |>
+  pivot_longer(ends_with("mm"))
+```
+
+``` output
+# A tibble: 1,032 × 7
+   species island    body_mass_g sex     year name              value
+   <fct>   <fct>           <int> <fct>  <int> <chr>             <dbl>
+ 1 Adelie  Torgersen        3750 male    2007 bill_length_mm     39.1
+ 2 Adelie  Torgersen        3750 male    2007 bill_depth_mm      18.7
+ 3 Adelie  Torgersen        3750 male    2007 flipper_length_mm 181  
+ 4 Adelie  Torgersen        3800 female  2007 bill_length_mm     39.5
+ 5 Adelie  Torgersen        3800 female  2007 bill_depth_mm      17.4
+ 6 Adelie  Torgersen        3800 female  2007 flipper_length_mm 186  
+ 7 Adelie  Torgersen        3250 female  2007 bill_length_mm     40.3
+ 8 Adelie  Torgersen        3250 female  2007 bill_depth_mm      18  
+ 9 Adelie  Torgersen        3250 female  2007 flipper_length_mm 195  
+10 Adelie  Torgersen          NA <NA>    2007 bill_length_mm     NA  
+# ℹ 1,022 more rows
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 2
+Pivot the penguins data so that all the bill measurements are in the same column.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |>
+  pivot_longer(starts_with("bill"))
+```
+
+``` output
+# A tibble: 688 × 8
+   species island    flipper_length_mm body_mass_g sex     year name       value
+   <fct>   <fct>                 <int>       <int> <fct>  <int> <chr>      <dbl>
+ 1 Adelie  Torgersen               181        3750 male    2007 bill_leng…  39.1
+ 2 Adelie  Torgersen               181        3750 male    2007 bill_dept…  18.7
+ 3 Adelie  Torgersen               186        3800 female  2007 bill_leng…  39.5
+ 4 Adelie  Torgersen               186        3800 female  2007 bill_dept…  17.4
+ 5 Adelie  Torgersen               195        3250 female  2007 bill_leng…  40.3
+ 6 Adelie  Torgersen               195        3250 female  2007 bill_dept…  18  
+ 7 Adelie  Torgersen                NA          NA <NA>    2007 bill_leng…  NA  
+ 8 Adelie  Torgersen                NA          NA <NA>    2007 bill_dept…  NA  
+ 9 Adelie  Torgersen               193        3450 female  2007 bill_leng…  36.7
+10 Adelie  Torgersen               193        3450 female  2007 bill_dept…  19.3
+# ℹ 678 more rows
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 3
+As mentioned, pivot_longer accepts tidy-selectors. Pivot longer all numerical columns.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |>
+  pivot_longer(where(is.numeric))
+```
+
+``` output
+# A tibble: 1,720 × 5
+   species island    sex    name               value
+   <fct>   <fct>     <fct>  <chr>              <dbl>
+ 1 Adelie  Torgersen male   bill_length_mm      39.1
+ 2 Adelie  Torgersen male   bill_depth_mm       18.7
+ 3 Adelie  Torgersen male   flipper_length_mm  181  
+ 4 Adelie  Torgersen male   body_mass_g       3750  
+ 5 Adelie  Torgersen male   year              2007  
+ 6 Adelie  Torgersen female bill_length_mm      39.5
+ 7 Adelie  Torgersen female bill_depth_mm       17.4
+ 8 Adelie  Torgersen female flipper_length_mm  186  
+ 9 Adelie  Torgersen female body_mass_g       3800  
+10 Adelie  Torgersen female year              2007  
+# ℹ 1,710 more rows
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Altering names during pivots
+
+While often you can get away with leaving the default naming of the two columns as is, especially if you are just doing something quick like making a plot, most times you will likely want to control the names of your two new columns.
+
+
+``` r
+penguins |> 
+  pivot_longer(contains("_"),
+               names_to = "columns",
+               values_to = "content")
+```
+
+``` output
+# A tibble: 1,376 × 6
+   species island    sex     year columns           content
+   <fct>   <fct>     <fct>  <int> <chr>               <dbl>
+ 1 Adelie  Torgersen male    2007 bill_length_mm       39.1
+ 2 Adelie  Torgersen male    2007 bill_depth_mm        18.7
+ 3 Adelie  Torgersen male    2007 flipper_length_mm   181  
+ 4 Adelie  Torgersen male    2007 body_mass_g        3750  
+ 5 Adelie  Torgersen female  2007 bill_length_mm       39.5
+ 6 Adelie  Torgersen female  2007 bill_depth_mm        17.4
+ 7 Adelie  Torgersen female  2007 flipper_length_mm   186  
+ 8 Adelie  Torgersen female  2007 body_mass_g        3800  
+ 9 Adelie  Torgersen female  2007 bill_length_mm       40.3
+10 Adelie  Torgersen female  2007 bill_depth_mm        18  
+# ℹ 1,366 more rows
+```
+
+Here, we change the "names" to "columns" and "values" to "content". The pivot defaults are usually quite sensible, making it clear what is the column names and what are the cell values. But English might not be your working language or you might find something more obvious for your self. 
+
+But we have even more power in the renaming of columns. Pivots actually have quite a lot of options, making it possible for us to create outputs looking just like we want. Notice how the names of the columns we pivoted follow a specific structure. First is the name of the body part, then the type of measurement, then the unit of the measurement. This clear logic we can use to our advantage.
+
+
+``` r
+penguins |> 
+  pivot_longer(contains("_"),
+               names_to = c("part", "measure" , "unit"),
+               names_sep = "_")
+```
+
+``` output
+# A tibble: 1,376 × 8
+   species island    sex     year part    measure unit   value
+   <fct>   <fct>     <fct>  <int> <chr>   <chr>   <chr>  <dbl>
+ 1 Adelie  Torgersen male    2007 bill    length  mm      39.1
+ 2 Adelie  Torgersen male    2007 bill    depth   mm      18.7
+ 3 Adelie  Torgersen male    2007 flipper length  mm     181  
+ 4 Adelie  Torgersen male    2007 body    mass    g     3750  
+ 5 Adelie  Torgersen female  2007 bill    length  mm      39.5
+ 6 Adelie  Torgersen female  2007 bill    depth   mm      17.4
+ 7 Adelie  Torgersen female  2007 flipper length  mm     186  
+ 8 Adelie  Torgersen female  2007 body    mass    g     3800  
+ 9 Adelie  Torgersen female  2007 bill    length  mm      40.3
+10 Adelie  Torgersen female  2007 bill    depth   mm      18  
+# ℹ 1,366 more rows
+```
+
+now, the pivot gave us 4 columns in stead of two! We told pivot that the column name could be split into the columns "part", "measure" and "unit", and that these were separated by underscore. Again we see how great consistent and logical naming of columns can be such a great help when working with data!
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 4
+Pivot longer all the bill measurements, and alter the names in one go, so that there are three columns named "part", "measure" and "unit" after the pivot.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |>
+    pivot_longer(starts_with("bill"),
+               names_to = c("part", "measure" , "unit"),
+               names_sep = "_")
+```
+
+``` output
+# A tibble: 688 × 10
+   species island  flipper_length_mm body_mass_g sex    year part  measure unit 
+   <fct>   <fct>               <int>       <int> <fct> <int> <chr> <chr>   <chr>
+ 1 Adelie  Torger…               181        3750 male   2007 bill  length  mm   
+ 2 Adelie  Torger…               181        3750 male   2007 bill  depth   mm   
+ 3 Adelie  Torger…               186        3800 fema…  2007 bill  length  mm   
+ 4 Adelie  Torger…               186        3800 fema…  2007 bill  depth   mm   
+ 5 Adelie  Torger…               195        3250 fema…  2007 bill  length  mm   
+ 6 Adelie  Torger…               195        3250 fema…  2007 bill  depth   mm   
+ 7 Adelie  Torger…                NA          NA <NA>   2007 bill  length  mm   
+ 8 Adelie  Torger…                NA          NA <NA>   2007 bill  depth   mm   
+ 9 Adelie  Torger…               193        3450 fema…  2007 bill  length  mm   
+10 Adelie  Torger…               193        3450 fema…  2007 bill  depth   mm   
+# ℹ 678 more rows
+# ℹ 1 more variable: value <dbl>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 5
+Pivot longer all the bill measurements, and use the `names_prefix` argument. Give it the string "bill_". What did that do?
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |>
+    pivot_longer(starts_with("bill"),
+               names_prefix = "bill_")
+```
+
+``` output
+# A tibble: 688 × 8
+   species island    flipper_length_mm body_mass_g sex     year name      value
+   <fct>   <fct>                 <int>       <int> <fct>  <int> <chr>     <dbl>
+ 1 Adelie  Torgersen               181        3750 male    2007 length_mm  39.1
+ 2 Adelie  Torgersen               181        3750 male    2007 depth_mm   18.7
+ 3 Adelie  Torgersen               186        3800 female  2007 length_mm  39.5
+ 4 Adelie  Torgersen               186        3800 female  2007 depth_mm   17.4
+ 5 Adelie  Torgersen               195        3250 female  2007 length_mm  40.3
+ 6 Adelie  Torgersen               195        3250 female  2007 depth_mm   18  
+ 7 Adelie  Torgersen                NA          NA <NA>    2007 length_mm  NA  
+ 8 Adelie  Torgersen                NA          NA <NA>    2007 depth_mm   NA  
+ 9 Adelie  Torgersen               193        3450 female  2007 length_mm  36.7
+10 Adelie  Torgersen               193        3450 female  2007 depth_mm   19.3
+# ℹ 678 more rows
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 6
+Pivot longer all the bill measurements, and use the `names_prefix`, `names_to` and `names_sep` arguments. What do you need to change in `names_to` from the previous example to make it work now that we also use `names_prefix`?
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |>
+    pivot_longer(starts_with("bill"),
+              names_prefix = "bill_",
+              names_to = c("bill_measure" , "unit"),
+              names_sep = "_")
+```
+
+``` output
+# A tibble: 688 × 9
+   species island   flipper_length_mm body_mass_g sex    year bill_measure unit 
+   <fct>   <fct>                <int>       <int> <fct> <int> <chr>        <chr>
+ 1 Adelie  Torgers…               181        3750 male   2007 length       mm   
+ 2 Adelie  Torgers…               181        3750 male   2007 depth        mm   
+ 3 Adelie  Torgers…               186        3800 fema…  2007 length       mm   
+ 4 Adelie  Torgers…               186        3800 fema…  2007 depth        mm   
+ 5 Adelie  Torgers…               195        3250 fema…  2007 length       mm   
+ 6 Adelie  Torgers…               195        3250 fema…  2007 depth        mm   
+ 7 Adelie  Torgers…                NA          NA <NA>   2007 length       mm   
+ 8 Adelie  Torgers…                NA          NA <NA>   2007 depth        mm   
+ 9 Adelie  Torgers…               193        3450 fema…  2007 length       mm   
+10 Adelie  Torgers…               193        3450 fema…  2007 depth        mm   
+# ℹ 678 more rows
+# ℹ 1 more variable: value <dbl>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Cleaning up values during pivots.
+
+When pivoting, it is common that quite some `NA` values appear in the values column. 
+We can remove these immediately by making the argument `values_drop_na` be  `TRUE`
+
+
+``` r
+penguins |> 
+  pivot_longer(starts_with("bill"),
+               values_drop_na = TRUE)
+```
+
+``` output
+# A tibble: 684 × 8
+   species island    flipper_length_mm body_mass_g sex     year name       value
+   <fct>   <fct>                 <int>       <int> <fct>  <int> <chr>      <dbl>
+ 1 Adelie  Torgersen               181        3750 male    2007 bill_leng…  39.1
+ 2 Adelie  Torgersen               181        3750 male    2007 bill_dept…  18.7
+ 3 Adelie  Torgersen               186        3800 female  2007 bill_leng…  39.5
+ 4 Adelie  Torgersen               186        3800 female  2007 bill_dept…  17.4
+ 5 Adelie  Torgersen               195        3250 female  2007 bill_leng…  40.3
+ 6 Adelie  Torgersen               195        3250 female  2007 bill_dept…  18  
+ 7 Adelie  Torgersen               193        3450 female  2007 bill_leng…  36.7
+ 8 Adelie  Torgersen               193        3450 female  2007 bill_dept…  19.3
+ 9 Adelie  Torgersen               190        3650 male    2007 bill_leng…  39.3
+10 Adelie  Torgersen               190        3650 male    2007 bill_dept…  20.6
+# ℹ 674 more rows
+```
+
+This extra argument will ensure that all `NA` values in the `value` column are removed. This is some times convenient as we might move on to analyses etc of the data, which often are made more complicated (or impossible) when there is missing data. 
+
+We should put everything together and create a new object that is our long formatted penguin data set.
+
+
+``` r
+penguins_long <- penguins |> 
+  pivot_longer(contains("_"),
+               names_to = c("part", "measure" , "unit"),
+               names_sep = "_",
+               values_drop_na = TRUE)
+penguins_long
+```
+
+``` output
+# A tibble: 1,368 × 8
+   species island    sex     year part    measure unit   value
+   <fct>   <fct>     <fct>  <int> <chr>   <chr>   <chr>  <dbl>
+ 1 Adelie  Torgersen male    2007 bill    length  mm      39.1
+ 2 Adelie  Torgersen male    2007 bill    depth   mm      18.7
+ 3 Adelie  Torgersen male    2007 flipper length  mm     181  
+ 4 Adelie  Torgersen male    2007 body    mass    g     3750  
+ 5 Adelie  Torgersen female  2007 bill    length  mm      39.5
+ 6 Adelie  Torgersen female  2007 bill    depth   mm      17.4
+ 7 Adelie  Torgersen female  2007 flipper length  mm     186  
+ 8 Adelie  Torgersen female  2007 body    mass    g     3800  
+ 9 Adelie  Torgersen female  2007 bill    length  mm      40.3
+10 Adelie  Torgersen female  2007 bill    depth   mm      18  
+# ℹ 1,358 more rows
+```
+
+## Pivoting data wider
+
+While long data formats are ideal when you are working in the tidyverse, you might encounter packages or pipelines in R that require wide-format data. Knowing how to transform a long data set into wide is just as important a knowing how to go from wide to long. 
+You will also experience that this skill can be convenient when creating data summaries tomorrow.
+
+Before we start using the penguins_longer dataset we made, let us make another simpler longer data set, for the first look a the pivor wider function.
+
+
+``` r
+penguins_long_simple <- penguins |> 
+  pivot_longer(contains("_"))
+penguins_long_simple
+```
+
+``` output
+# A tibble: 1,376 × 6
+   species island    sex     year name               value
+   <fct>   <fct>     <fct>  <int> <chr>              <dbl>
+ 1 Adelie  Torgersen male    2007 bill_length_mm      39.1
+ 2 Adelie  Torgersen male    2007 bill_depth_mm       18.7
+ 3 Adelie  Torgersen male    2007 flipper_length_mm  181  
+ 4 Adelie  Torgersen male    2007 body_mass_g       3750  
+ 5 Adelie  Torgersen female  2007 bill_length_mm      39.5
+ 6 Adelie  Torgersen female  2007 bill_depth_mm       17.4
+ 7 Adelie  Torgersen female  2007 flipper_length_mm  186  
+ 8 Adelie  Torgersen female  2007 body_mass_g       3800  
+ 9 Adelie  Torgersen female  2007 bill_length_mm      40.3
+10 Adelie  Torgersen female  2007 bill_depth_mm       18  
+# ℹ 1,366 more rows
+```
+
+`penguins_long_simple` now contains the lover penguins dataset, with column names in the "name" column, and values in the "value" column. 
+
+If we want to make this wider again we can try the following:
+
+
+``` r
+penguins_long_simple |> 
+  pivot_wider(names_from = name, 
+              values_from = value)
+```
+
+``` warning
+Warning: Values from `value` are not uniquely identified; output will contain list-cols.
+• Use `values_fn = list` to suppress this warning.
+• Use `values_fn = {summary_fun}` to summarise duplicates.
+• Use the following dplyr code to identify duplicates.
+  {data} |>
+  dplyr::summarise(n = dplyr::n(), .by = c(species, island, sex, year, name))
+  |>
+  dplyr::filter(n > 1L)
+```
+
+``` output
+# A tibble: 35 × 8
+   species island    sex     year bill_length_mm bill_depth_mm flipper_length_mm
+   <fct>   <fct>     <fct>  <int> <list>         <list>        <list>           
+ 1 Adelie  Torgersen male    2007 <dbl [7]>      <dbl [7]>     <dbl [7]>        
+ 2 Adelie  Torgersen female  2007 <dbl [8]>      <dbl [8]>     <dbl [8]>        
+ 3 Adelie  Torgersen <NA>    2007 <dbl [5]>      <dbl [5]>     <dbl [5]>        
+ 4 Adelie  Biscoe    female  2007 <dbl [5]>      <dbl [5]>     <dbl [5]>        
+ 5 Adelie  Biscoe    male    2007 <dbl [5]>      <dbl [5]>     <dbl [5]>        
+ 6 Adelie  Dream     female  2007 <dbl [9]>      <dbl [9]>     <dbl [9]>        
+ 7 Adelie  Dream     male    2007 <dbl [10]>     <dbl [10]>    <dbl [10]>       
+ 8 Adelie  Dream     <NA>    2007 <dbl [1]>      <dbl [1]>     <dbl [1]>        
+ 9 Adelie  Biscoe    female  2008 <dbl [9]>      <dbl [9]>     <dbl [9]>        
+10 Adelie  Biscoe    male    2008 <dbl [9]>      <dbl [9]>     <dbl [9]>        
+# ℹ 25 more rows
+# ℹ 1 more variable: body_mass_g <list>
+```
+
+ok what is happening here? It does not at all look as we expected! Our columns have something very weird in them, with this strange `<dbl [7]>` thing, what does that mean?
+Lets look at the warning message our code gave us and see if we can figure it out.
+**Values are not uniquely identified; output will contain list-cols**. We are being told the pivot wider cannot uniquely identify the observations, and so cannot place a single value into the columns. Is returning lists of values. 
+
+yikes! That's super annoying. Let's go back to our penguins data set and see if we can do something to help.
+
+
+``` r
+penguins
+```
+
+``` output
+# A tibble: 344 × 8
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 2 more variables: sex <fct>, year <int>
+```
+Have you noticed that there is no column that uniquely identifies an observation? Other than each observation being on its own row, we have nothing to make sure that we can identify which observations belong together once we make the data long. As long as they are in the original format, this is ok, but once we pivoted the data longer, we lost the ability to identify which rows of observations belong together. 
+
+We can remedy that by adding row numbers to the original data before we pivot. The `row_number()`  function is great for this. 
+By doing a mutate adding the row number to the data set, we should then have a clear variable identifying each observation.
+
+
+``` r
+penguins_long_simple <- penguins |> 
+  mutate(sample = row_number()) |> 
+  pivot_longer(contains("_"))
+penguins_long_simple
+```
+
+``` output
+# A tibble: 1,376 × 7
+   species island    sex     year sample name               value
+   <fct>   <fct>     <fct>  <int>  <int> <chr>              <dbl>
+ 1 Adelie  Torgersen male    2007      1 bill_length_mm      39.1
+ 2 Adelie  Torgersen male    2007      1 bill_depth_mm       18.7
+ 3 Adelie  Torgersen male    2007      1 flipper_length_mm  181  
+ 4 Adelie  Torgersen male    2007      1 body_mass_g       3750  
+ 5 Adelie  Torgersen female  2007      2 bill_length_mm      39.5
+ 6 Adelie  Torgersen female  2007      2 bill_depth_mm       17.4
+ 7 Adelie  Torgersen female  2007      2 flipper_length_mm  186  
+ 8 Adelie  Torgersen female  2007      2 body_mass_g       3800  
+ 9 Adelie  Torgersen female  2007      3 bill_length_mm      40.3
+10 Adelie  Torgersen female  2007      3 bill_depth_mm       18  
+# ℹ 1,366 more rows
+```
+
+Notice now that in the sample column, the numbers repeat several rows. Where sample equals 1, all those are observations from the first row of data in the original penguins data set! Let us try to pivot that wider again.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 6
+Turn the penguins_long_simple dataset back to its original state
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins_long_simple |> 
+  pivot_wider(names_from = name,
+              values_from = value)
+```
+
+``` output
+# A tibble: 344 × 9
+   species island    sex     year sample bill_length_mm bill_depth_mm
+   <fct>   <fct>     <fct>  <int>  <int>          <dbl>         <dbl>
+ 1 Adelie  Torgersen male    2007      1           39.1          18.7
+ 2 Adelie  Torgersen female  2007      2           39.5          17.4
+ 3 Adelie  Torgersen female  2007      3           40.3          18  
+ 4 Adelie  Torgersen <NA>    2007      4           NA            NA  
+ 5 Adelie  Torgersen female  2007      5           36.7          19.3
+ 6 Adelie  Torgersen male    2007      6           39.3          20.6
+ 7 Adelie  Torgersen female  2007      7           38.9          17.8
+ 8 Adelie  Torgersen male    2007      8           39.2          19.6
+ 9 Adelie  Torgersen <NA>    2007      9           34.1          18.1
+10 Adelie  Torgersen <NA>    2007     10           42            20.2
+# ℹ 334 more rows
+# ℹ 2 more variables: flipper_length_mm <dbl>, body_mass_g <dbl>
+```
+And now it worked! Now, the remaining columns were able to uniquely identify which observations belonged together. And the data looks just like the original penguins data set now, with the addition of the sample column, and the columns being slightly rearranged.
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Pivoting wider with more arguments
+
+We should re-create our penguins long data set, to make sure we don't have this problem again.
+
+
+``` r
+penguins_long <- penguins |> 
+  mutate(sample = row_number()) |> 
+  pivot_longer(contains("_"),
+               names_to = c("part", "measure" , "unit"),
+               names_sep = "_",
+               values_drop_na = TRUE)
+penguins_long
+```
+
+``` output
+# A tibble: 1,368 × 9
+   species island    sex     year sample part    measure unit   value
+   <fct>   <fct>     <fct>  <int>  <int> <chr>   <chr>   <chr>  <dbl>
+ 1 Adelie  Torgersen male    2007      1 bill    length  mm      39.1
+ 2 Adelie  Torgersen male    2007      1 bill    depth   mm      18.7
+ 3 Adelie  Torgersen male    2007      1 flipper length  mm     181  
+ 4 Adelie  Torgersen male    2007      1 body    mass    g     3750  
+ 5 Adelie  Torgersen female  2007      2 bill    length  mm      39.5
+ 6 Adelie  Torgersen female  2007      2 bill    depth   mm      17.4
+ 7 Adelie  Torgersen female  2007      2 flipper length  mm     186  
+ 8 Adelie  Torgersen female  2007      2 body    mass    g     3800  
+ 9 Adelie  Torgersen female  2007      3 bill    length  mm      40.3
+10 Adelie  Torgersen female  2007      3 bill    depth   mm      18  
+# ℹ 1,358 more rows
+```
+
+Much as the first example of pivot_longer, pivot_wider in its simplest form is relatively straight forward. But your penguins long data set is much more complex. The column names are split into several columns, how do we fix that?
+Like pivot_longer, pivot_wider has arguments that will let us get back to the original state, with much of the same syntax as with pivot_longer!
+
+
+``` r
+penguins_long |> 
+  pivot_wider(names_from = c("part", "measure", "unit"),
+              names_sep = "_",
+              values_from = value)
+```
+
+``` output
+# A tibble: 342 × 9
+   species island    sex     year sample bill_length_mm bill_depth_mm
+   <fct>   <fct>     <fct>  <int>  <int>          <dbl>         <dbl>
+ 1 Adelie  Torgersen male    2007      1           39.1          18.7
+ 2 Adelie  Torgersen female  2007      2           39.5          17.4
+ 3 Adelie  Torgersen female  2007      3           40.3          18  
+ 4 Adelie  Torgersen female  2007      5           36.7          19.3
+ 5 Adelie  Torgersen male    2007      6           39.3          20.6
+ 6 Adelie  Torgersen female  2007      7           38.9          17.8
+ 7 Adelie  Torgersen male    2007      8           39.2          19.6
+ 8 Adelie  Torgersen <NA>    2007      9           34.1          18.1
+ 9 Adelie  Torgersen <NA>    2007     10           42            20.2
+10 Adelie  Torgersen <NA>    2007     11           37.8          17.1
+# ℹ 332 more rows
+# ℹ 2 more variables: flipper_length_mm <dbl>, body_mass_g <dbl>
+```
+
+Those arguments and inputs should be familiar to the call from pivot_longer. So we are lucky that if you understand one of them, it is easier to understand the other.
+
+# Wrap up
+We have been exploring how to pivot data into longer and wider shapes.
+Pivoting is a vital part of the "tidyverse"-way, and very powerful tool once you get used to it.
+We will see pivots in action more tomorrow as we create summaries and play around with combining all the things we have been exploring.
+
diff --git a/08-data-summaries.md b/08-data-summaries.md
new file mode 100644
index 0000000..eeaf0cd
--- /dev/null
+++ b/08-data-summaries.md
@@ -0,0 +1,739 @@
+---
+title: "Data summaries with dplyr"
+teaching: 60
+exercises: 8
+---
+
+
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How can I create summary tables of my data?
+- How can I create different types of summaries based on groups in my data?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Use `summarise()` to create data summaries
+- Use `group_by()` to create summaries of groups
+- Use `tally()`/`count()` to create a quick frequency table
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+# Motivation
+
+Next to visualizing data, creating summaries of the data in tables is a quick way to get an idea of what type of data you have at hand. 
+It might help you spot incorrect data or extreme values, or whether specific analysis approaches are needed.
+To summarize data with the {tidyverse} efficiently, we need to utilize the tools we have learned the previous days, 
+like adding new variables, tidy-selections, pivots and grouping data. All these tools combine amazingly when we start making summaries. 
+
+Let us start from the beginning with summaries, and work our way up to the more complex variations as we go.
+
+First, we must again prepare our workspace with our packages and data.
+
+
+``` r
+library(tidyverse)
+penguins <- palmerpenguins::penguins
+```
+
+We should start to feel quite familiar with our penguins by now. Let us start by finding the mean of the bill length
+
+
+``` r
+penguins |> 
+  summarise(bill_length_mean = mean(bill_length_mm))
+```
+
+``` output
+# A tibble: 1 × 1
+  bill_length_mean
+             <dbl>
+1               NA
+```
+
+`NA`. as we remember, there are some `NA` values in our data. 
+R is very clear about trying to do calculations when there is an `NA`. 
+If there is an `NA`, i.e. a value we do not know, it cannot create a correct calulcation, so it will return `NA` again.
+This is a nice way of quickly seeing that you have missing values in your data.
+Right now, we will ignore those.
+We can omit these by adding the `na.rm = TRUE` argument, which will remove all `NA`'s before calculating the mean.
+
+
+``` r
+penguins |> 
+  summarise(bill_length_mean = mean(bill_length_mm, na.rm = TRUE))
+```
+
+``` output
+# A tibble: 1 × 1
+  bill_length_mean
+             <dbl>
+1             43.9
+```
+
+An alternative way to remove missing values from a column is to pass the column to {tidyr}'s `drop_na()` function. 
+
+
+``` r
+penguins |> 
+  drop_na(bill_length_mm) |> 
+  summarise(bill_length_mean = mean(bill_length_mm))
+```
+
+``` output
+# A tibble: 1 × 1
+  bill_length_mean
+             <dbl>
+1             43.9
+```
+
+
+
+``` r
+penguins |> 
+  drop_na(bill_length_mm) |> 
+  summarise(bill_length_mean = mean(bill_length_mm),
+            bill_length_min = min(bill_length_mm),
+            bill_length_max = max(bill_length_mm))
+```
+
+``` output
+# A tibble: 1 × 3
+  bill_length_mean bill_length_min bill_length_max
+             <dbl>           <dbl>           <dbl>
+1             43.9            32.1            59.6
+```
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 1
+ First start by trying to summarise a single column, `body_mass_g`, by calculating its mean in *kilograms*.
+ 
+ :::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |> 
+  drop_na(body_mass_g) |> 
+  summarise(body_mass_kg_mean = mean(body_mass_g / 1000))
+```
+
+``` output
+# A tibble: 1 × 1
+  body_mass_kg_mean
+              <dbl>
+1              4.20
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 2
+ Add a column with the standard deviation of `body_mass_g` on *kilogram* scale.
+ 
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+``` r
+penguins |> 
+    drop_na(body_mass_g) |> 
+    summarise(
+        body_mass_kg_mean = mean(body_mass_g / 1000),
+        body_mass_kg_sd = sd(body_mass_g / 1000)
+    )
+```
+
+``` output
+# A tibble: 1 × 2
+  body_mass_kg_mean body_mass_kg_sd
+              <dbl>           <dbl>
+1              4.20           0.802
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 3
+ Now add the same two metrics for `flipper_length_mm` on *centimeter* scale and 
+ give the columns clear names. Why could the `drop_na()` step give us wrong results? 
+ 
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |> 
+    drop_na(body_mass_g, flipper_length_mm) |> 
+    summarise(
+        body_mass_kg_mean      = mean(body_mass_g / 1000),
+        body_mass_kg_sd        = sd(body_mass_g / 1000),
+        flipper_length_cm_mean = mean(flipper_length_mm / 10),
+        flipper_length_cm_sd   = sd(flipper_length_mm / 10)
+    )
+```
+
+``` output
+# A tibble: 1 × 4
+  body_mass_kg_mean body_mass_kg_sd flipper_length_cm_mean flipper_length_cm_sd
+              <dbl>           <dbl>                  <dbl>                <dbl>
+1              4.20           0.802                   20.1                 1.41
+```
+When we use drop_na on multiple columns, it will drop the _entire row_ of data where there is `NA` in 
+any of the columns we specify. This means that we might be dropping valid data from body mass because
+flipper length is missing, and vice versa.
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Summarising grouped data
+
+All the examples we have gone through so far with summarizing data, we have summarized the entire data set. 
+But most times, we want to have a look at groups in our data, and summarize based on these groups. 
+How can we manage to summarize while preserving grouping information?
+
+We've already worked a little with the `group_by()` function, and we will use it again! 
+Because, once we know how to summarize data, summarizing data by groups is as simple as adding one more line to our code.
+
+Let us start with our first example of getting the mean of a single column.
+
+
+``` r
+penguins |> 
+  drop_na(body_mass_g) |> 
+  summarise(body_mass_g_mean = mean(body_mass_g))
+```
+
+``` output
+# A tibble: 1 × 1
+  body_mass_g_mean
+             <dbl>
+1            4202.
+```
+
+Here, we are getting a single mean for the entire data set. 
+In order to get, for instance the means of each of the species, we can group the data set by species before we summarize.
+
+
+``` r
+penguins |> 
+  drop_na(body_mass_g) |> 
+  group_by(species) |> 
+  summarise(body_mass_kg_mean = mean(body_mass_g / 1000))
+```
+
+``` output
+# A tibble: 3 × 2
+  species   body_mass_kg_mean
+  <fct>                 <dbl>
+1 Adelie                 3.70
+2 Chinstrap              3.73
+3 Gentoo                 5.08
+```
+
+And now we suddenly have three means! And they are tidily collected in each their row.
+To this we can keep adding as we did before.
+
+
+``` r
+penguins |> 
+    drop_na(body_mass_g) |> 
+    group_by(species) |>
+    summarise(
+        body_mass_kg_mean = mean(body_mass_g / 1000),
+        body_mass_kg_min = min(body_mass_g / 1000),
+        body_mass_kg_max = max(body_mass_g / 1000)
+    )
+```
+
+``` output
+# A tibble: 3 × 4
+  species   body_mass_kg_mean body_mass_kg_min body_mass_kg_max
+  <fct>                 <dbl>            <dbl>            <dbl>
+1 Adelie                 3.70             2.85             4.78
+2 Chinstrap              3.73             2.7              4.8 
+3 Gentoo                 5.08             3.95             6.3 
+```
+
+Now we are suddenly able to easily compare groups within our data, since they are so neatly summarized here. 
+
+## Simple frequency tables
+
+So far, we have created custom summary tables with means and standard deviations etc.
+But what if you want a really quick count of all the records in different groups, a frequency table.
+
+One way, would be to use the summarise function together with the `n()` function, which counts the number of rows in each group.
+
+
+``` r
+penguins |> 
+  group_by(species) |> 
+  summarise(n = n())
+```
+
+``` output
+# A tibble: 3 × 2
+  species       n
+  <fct>     <int>
+1 Adelie      152
+2 Chinstrap    68
+3 Gentoo      124
+```
+
+This is super nice, and `n()` is a nice function to remember when you are making your own custom tables.
+But if all you want is the frequency table, we would suggest using the functions `count()` or `tally()`.
+They are synonymous in what they do, so you can choose the one that feels more appropriate.
+
+
+``` r
+penguins |> 
+  group_by(species) |> 
+  tally()
+```
+
+``` output
+# A tibble: 3 × 2
+  species       n
+  <fct>     <int>
+1 Adelie      152
+2 Chinstrap    68
+3 Gentoo      124
+```
+
+``` r
+penguins |> 
+  group_by(species) |> 
+  count()
+```
+
+``` output
+# A tibble: 3 × 2
+# Groups:   species [3]
+  species       n
+  <fct>     <int>
+1 Adelie      152
+2 Chinstrap    68
+3 Gentoo      124
+```
+
+These are two really nice convenience functions for getting a quick frequency table of your data.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 4
+ Create a table that gives the mean and standard deviation of bill length, grouped by island
+ 
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |> 
+    drop_na(bill_length_mm) |> 
+    group_by(island) |>
+    summarise(
+        bill_length_mm_mean = mean(bill_length_mm),
+        bill_length_mm_sd   = sd(bill_length_mm )
+    )
+```
+
+``` output
+# A tibble: 3 × 3
+  island    bill_length_mm_mean bill_length_mm_sd
+  <fct>                   <dbl>             <dbl>
+1 Biscoe                   45.3              4.77
+2 Dream                    44.2              5.95
+3 Torgersen                39.0              3.03
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 5
+ Create a table that gives the mean and standard deviation of bill length, grouped by island and sex.
+ 
+:::::::::::::::::::::::::::::::::::::::: solution 
+
+## Solution
+
+
+``` r
+penguins |> 
+    drop_na(bill_length_mm) |> 
+    group_by(island, sex) |>
+    summarise(
+        bill_length_mm_mean = mean(bill_length_mm),
+        bill_length_mm_sd   = sd(bill_length_mm )
+    )
+```
+
+``` output
+`summarise()` has grouped output by 'island'. You can override using the
+`.groups` argument.
+```
+
+``` output
+# A tibble: 9 × 4
+# Groups:   island [3]
+  island    sex    bill_length_mm_mean bill_length_mm_sd
+  <fct>     <fct>                <dbl>             <dbl>
+1 Biscoe    female                43.3              4.18
+2 Biscoe    male                  47.1              4.69
+3 Biscoe    <NA>                  45.6              1.37
+4 Dream     female                42.3              5.53
+5 Dream     male                  46.1              5.77
+6 Dream     <NA>                  37.5             NA   
+7 Torgersen female                37.6              2.21
+8 Torgersen male                  40.6              3.03
+9 Torgersen <NA>                  37.9              3.23
+```
+
+::::::::::::::::::::::::::::::::::::::::
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Ungrouping for future control
+
+We've been grouping a lot and not ungrouping. 
+Which might seem fine now, because we have not really done anything more after the summarize. 
+But in many cases we might continue our merry data handling way and do lots more, and then the 
+preserving of the grouping can give us some unexpected results. Let us explore that a little.
+
+
+``` r
+penguins |> 
+  group_by(species) |> 
+  count()
+```
+
+``` output
+# A tibble: 3 × 2
+# Groups:   species [3]
+  species       n
+  <fct>     <int>
+1 Adelie      152
+2 Chinstrap    68
+3 Gentoo      124
+```
+
+When we group by a single column and summarize, the output data is no longer grouped. 
+In a way, the `summarize()` uses up one group while summarizing, as based on species, the data can not be condensed any further than this.
+When we group by two columns, it actually has the same behavior. 
+
+
+``` r
+penguins |> 
+  group_by(species, island) |> 
+  count()
+```
+
+``` output
+# A tibble: 5 × 3
+# Groups:   species, island [5]
+  species   island        n
+  <fct>     <fct>     <int>
+1 Adelie    Biscoe       44
+2 Adelie    Dream        56
+3 Adelie    Torgersen    52
+4 Chinstrap Dream        68
+5 Gentoo    Biscoe      124
+```
+
+But because we used to have two groups, we now are left with one. 
+In this case "species" is still a  grouping variable. 
+Lets say we want a column now, that counts the total number of penguins observations. 
+That would be the sum of the "n" column.
+
+
+``` r
+penguins |> 
+  group_by(species, island) |> 
+  count() |> 
+  mutate(total = sum(n))
+```
+
+``` output
+# A tibble: 5 × 4
+# Groups:   species, island [5]
+  species   island        n total
+  <fct>     <fct>     <int> <int>
+1 Adelie    Biscoe       44    44
+2 Adelie    Dream        56    56
+3 Adelie    Torgersen    52    52
+4 Chinstrap Dream        68    68
+5 Gentoo    Biscoe      124   124
+```
+
+But that is not what we are expecting! why? Because the data is still grouped by species, it is now taking the sum within each species, rather than the whole. To get the whole we need first to `ungroup()`, and then try again.
+
+
+``` r
+penguins |> 
+  group_by(species, island) |> 
+  count() |> 
+  ungroup() |> 
+  mutate(total = sum(n))
+```
+
+``` output
+# A tibble: 5 × 4
+  species   island        n total
+  <fct>     <fct>     <int> <int>
+1 Adelie    Biscoe       44   344
+2 Adelie    Dream        56   344
+3 Adelie    Torgersen    52   344
+4 Chinstrap Dream        68   344
+5 Gentoo    Biscoe      124   344
+```
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 6
+Create a table that gives the mean and standard deviation of bill length, grouped by island and sex,
+then add another column that has the mean for all the data 
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |> 
+    drop_na(bill_length_mm) |> 
+    group_by(island, sex) |>
+    summarise(
+        bill_length_mm_mean = mean(bill_length_mm),
+        bill_length_mm_sd   = sd(bill_length_mm )
+    ) |>
+    ungroup() |>
+    mutate(mean = mean(bill_length_mm_mean))
+```
+
+``` output
+`summarise()` has grouped output by 'island'. You can override using the
+`.groups` argument.
+```
+
+``` output
+# A tibble: 9 × 5
+  island    sex    bill_length_mm_mean bill_length_mm_sd  mean
+  <fct>     <fct>                <dbl>             <dbl> <dbl>
+1 Biscoe    female                43.3              4.18  42.0
+2 Biscoe    male                  47.1              4.69  42.0
+3 Biscoe    <NA>                  45.6              1.37  42.0
+4 Dream     female                42.3              5.53  42.0
+5 Dream     male                  46.1              5.77  42.0
+6 Dream     <NA>                  37.5             NA     42.0
+7 Torgersen female                37.6              2.21  42.0
+8 Torgersen male                  40.6              3.03  42.0
+9 Torgersen <NA>                  37.9              3.23  42.0
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Grouped data manipulation
+
+You might have noticed that we managed to do some data manipulation (i.e. `mutate`) while the data were still grouped, 
+which in our example before produced unwanted results.
+But, often, grouping before data manipulation can unlock great new possibilities for working with our data.
+
+Let us use the data we made where we summarised the body mass of penguins in kilograms, and let us group by species and sex.
+
+
+``` r
+penguins |> 
+    drop_na(body_mass_g) |> 
+    group_by(species, sex) |>
+    summarise(
+        body_mass_kg_mean = mean(body_mass_g / 1000),
+        body_mass_kg_min = min(body_mass_g / 1000),
+        body_mass_kg_max = max(body_mass_g / 1000)
+    )
+```
+
+``` output
+`summarise()` has grouped output by 'species'. You can override using the
+`.groups` argument.
+```
+
+``` output
+# A tibble: 8 × 5
+# Groups:   species [3]
+  species   sex    body_mass_kg_mean body_mass_kg_min body_mass_kg_max
+  <fct>     <fct>              <dbl>            <dbl>            <dbl>
+1 Adelie    female              3.37             2.85             3.9 
+2 Adelie    male                4.04             3.32             4.78
+3 Adelie    <NA>                3.54             2.98             4.25
+4 Chinstrap female              3.53             2.7              4.15
+5 Chinstrap male                3.94             3.25             4.8 
+6 Gentoo    female              4.68             3.95             5.2 
+7 Gentoo    male                5.48             4.75             6.3 
+8 Gentoo    <NA>                4.59             4.1              4.88
+```
+
+The data we get out after that, is still grouped by species.
+Let us say that we want to know, the relative size of the penguin sexes body mass to the species mean.
+We would need the species mean, in addition to the species sex means.
+We can add this, as the data is already grouped by sex, with a mutate.
+
+
+``` r
+penguins |> 
+    drop_na(body_mass_g) |> 
+    group_by(species, sex) |>
+    summarise(
+        body_mass_kg_mean = mean(body_mass_g / 1000),
+        body_mass_kg_min = min(body_mass_g / 1000),
+        body_mass_kg_max = max(body_mass_g / 1000)
+    ) |> 
+    mutate(
+        species_mean = mean(body_mass_kg_mean)
+    )
+```
+
+``` output
+`summarise()` has grouped output by 'species'. You can override using the
+`.groups` argument.
+```
+
+``` output
+# A tibble: 8 × 6
+# Groups:   species [3]
+  species sex   body_mass_kg_mean body_mass_kg_min body_mass_kg_max species_mean
+  <fct>   <fct>             <dbl>            <dbl>            <dbl>        <dbl>
+1 Adelie  fema…              3.37             2.85             3.9          3.65
+2 Adelie  male               4.04             3.32             4.78         3.65
+3 Adelie  <NA>               3.54             2.98             4.25         3.65
+4 Chinst… fema…              3.53             2.7              4.15         3.73
+5 Chinst… male               3.94             3.25             4.8          3.73
+6 Gentoo  fema…              4.68             3.95             5.2          4.92
+7 Gentoo  male               5.48             4.75             6.3          4.92
+8 Gentoo  <NA>               4.59             4.1              4.88         4.92
+```
+
+Notice that now, the same value is in the species_mean column for all the rows of each species.
+This means our calculation worked!
+So, in the same data set, we have everything we need to calculate the relative difference between the species mean of body mass and each of the sexes.
+
+
+
+``` r
+penguins |> 
+    drop_na(body_mass_g) |> 
+    group_by(species, sex) |>
+    summarise(
+        body_mass_kg_mean = mean(body_mass_g / 1000),
+        body_mass_kg_min = min(body_mass_g / 1000),
+        body_mass_kg_max = max(body_mass_g / 1000)
+    ) |> 
+    mutate(
+        species_mean = mean(body_mass_kg_mean),
+        rel_species = species_mean - body_mass_kg_mean
+    )
+```
+
+``` output
+`summarise()` has grouped output by 'species'. You can override using the
+`.groups` argument.
+```
+
+``` output
+# A tibble: 8 × 7
+# Groups:   species [3]
+  species sex   body_mass_kg_mean body_mass_kg_min body_mass_kg_max species_mean
+  <fct>   <fct>             <dbl>            <dbl>            <dbl>        <dbl>
+1 Adelie  fema…              3.37             2.85             3.9          3.65
+2 Adelie  male               4.04             3.32             4.78         3.65
+3 Adelie  <NA>               3.54             2.98             4.25         3.65
+4 Chinst… fema…              3.53             2.7              4.15         3.73
+5 Chinst… male               3.94             3.25             4.8          3.73
+6 Gentoo  fema…              4.68             3.95             5.2          4.92
+7 Gentoo  male               5.48             4.75             6.3          4.92
+8 Gentoo  <NA>               4.59             4.1              4.88         4.92
+# ℹ 1 more variable: rel_species <dbl>
+```
+
+Now we can see, with how much the male penguins usually weight compared to the female ones.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 7
+Calculate the difference in flipper length between the different species of penguin
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |> 
+    drop_na(flipper_length_mm) |> 
+    group_by(species) |> 
+    summarise(
+        flipper_mean = mean(flipper_length_mm),
+    ) |> 
+    mutate(
+        species_mean = mean(flipper_mean),
+        flipper_species_diff = species_mean - flipper_mean
+    )
+```
+
+``` output
+# A tibble: 3 × 4
+  species   flipper_mean species_mean flipper_species_diff
+  <fct>            <dbl>        <dbl>                <dbl>
+1 Adelie            190.         201.                11.0 
+2 Chinstrap         196.         201.                 5.16
+3 Gentoo            217.         201.               -16.2 
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 8
+Calculate the difference in flipper length between the different species of penguin, split by the penguins sex.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |> 
+    drop_na(flipper_length_mm) |> 
+    group_by(species, sex) |> 
+    summarise(
+        flipper_mean = mean(flipper_length_mm),
+    ) |> 
+    mutate(
+        species_mean = mean(flipper_mean),
+        flipper_species_diff = species_mean - flipper_mean
+    )
+```
+
+``` output
+`summarise()` has grouped output by 'species'. You can override using the
+`.groups` argument.
+```
+
+``` output
+# A tibble: 8 × 5
+# Groups:   species [3]
+  species   sex    flipper_mean species_mean flipper_species_diff
+  <fct>     <fct>         <dbl>        <dbl>                <dbl>
+1 Adelie    female         188.         189.                0.807
+2 Adelie    male           192.         189.               -3.81 
+3 Adelie    <NA>           186.         189.                3.00 
+4 Chinstrap female         192.         196.                4.09 
+5 Chinstrap male           200.         196.               -4.09 
+6 Gentoo    female         213.         217.                3.96 
+7 Gentoo    male           222.         217.               -4.88 
+8 Gentoo    <NA>           216.         217.                0.916
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
diff --git a/09-data-complex-pipelines.md b/09-data-complex-pipelines.md
new file mode 100644
index 0000000..19a2d48
--- /dev/null
+++ b/09-data-complex-pipelines.md
@@ -0,0 +1,824 @@
+---
+title: "Complex data pipelines"
+teaching: 60
+exercises: 7
+---
+
+
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How can I combine everything I've learned so far?
+- How can I get my data into a wider format?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- To be able to combine the different functions we have covered in tandem to create seamless chains of data handling
+- Creating custom, complex data summaries
+- Creating complex plots with grids of subplots
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+# Motivation
+This session is going to be a little different than the others. 
+We will be working with more challenges and exploring different way of combining the things we have learned these days.
+
+Before the break, and a little scattered through the sessions, we have been combining the things we have learned. 
+It's when we start using the tidyverse as a whole, all functions together that they start really becoming powerful.
+In this last session, we will be working on the things we have learned and applying them together in ways that uncover some of the cool things we can get done.
+
+Lets say we want to summarise _all_ the measurement variables, i.e. all the columns containing "_". 
+We've learned about summaries and grouped summaries. 
+Can you think of a way we can do that using the things we've learned?
+
+
+``` r
+penguins |> 
+  pivot_longer(contains("_"))
+```
+
+``` output
+# A tibble: 1,376 × 6
+   species island    sex     year name               value
+   <fct>   <fct>     <fct>  <int> <chr>              <dbl>
+ 1 Adelie  Torgersen male    2007 bill_length_mm      39.1
+ 2 Adelie  Torgersen male    2007 bill_depth_mm       18.7
+ 3 Adelie  Torgersen male    2007 flipper_length_mm  181  
+ 4 Adelie  Torgersen male    2007 body_mass_g       3750  
+ 5 Adelie  Torgersen female  2007 bill_length_mm      39.5
+ 6 Adelie  Torgersen female  2007 bill_depth_mm       17.4
+ 7 Adelie  Torgersen female  2007 flipper_length_mm  186  
+ 8 Adelie  Torgersen female  2007 body_mass_g       3800  
+ 9 Adelie  Torgersen female  2007 bill_length_mm      40.3
+10 Adelie  Torgersen female  2007 bill_depth_mm       18  
+# ℹ 1,366 more rows
+```
+
+We've done this before, why is it a clue now? Now that we have learned grouping and summarising, 
+what if we now also group by the new name column to get summaries for each column as a row already here!
+
+
+``` r
+penguins |> 
+  pivot_longer(contains("_")) |> 
+  group_by(name) |> 
+  summarise(mean = mean(value, na.rm = TRUE))
+```
+
+``` output
+# A tibble: 4 × 2
+  name                mean
+  <chr>              <dbl>
+1 bill_depth_mm       17.2
+2 bill_length_mm      43.9
+3 body_mass_g       4202. 
+4 flipper_length_mm  201. 
+```
+Now we are talking! Now we have the mean of each of our observational columns! Lets add other common summary statistics.
+
+
+``` r
+penguins |> 
+  pivot_longer(contains("_")) |> 
+  group_by(name) |> 
+  summarise(
+    mean = mean(value, na.rm = TRUE),
+    sd = sd(value, na.rm = TRUE),
+    min = min(value, na.rm = TRUE),
+    max = max(value, na.rm = TRUE)
+  )
+```
+
+``` output
+# A tibble: 4 × 5
+  name                mean     sd    min    max
+  <chr>              <dbl>  <dbl>  <dbl>  <dbl>
+1 bill_depth_mm       17.2   1.97   13.1   21.5
+2 bill_length_mm      43.9   5.46   32.1   59.6
+3 body_mass_g       4202.  802.   2700   6300  
+4 flipper_length_mm  201.   14.1   172    231  
+```
+
+That's a pretty neat table! The repetition of `na.rm = TRUE` in all is a little tedious, though. Let us use an extra argument in the pivot longer to remove `NA`s in the value column
+
+
+``` r
+penguins |> 
+  pivot_longer(contains("_")) |>
+  drop_na(value) |> 
+  group_by(name) |> 
+  summarise(
+    mean = mean(value),
+    sd = sd(value),
+    min = min(value),
+    max = max(value)
+  )
+```
+
+``` output
+# A tibble: 4 × 5
+  name                mean     sd    min    max
+  <chr>              <dbl>  <dbl>  <dbl>  <dbl>
+1 bill_depth_mm       17.2   1.97   13.1   21.5
+2 bill_length_mm      43.9   5.46   32.1   59.6
+3 body_mass_g       4202.  802.   2700   6300  
+4 flipper_length_mm  201.   14.1   172    231  
+```
+
+Now we have a pretty decent summary table of our data. 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 1
+In our code making the summary table. Add another summary column for the number of records, giving it the name `n`.
+
+:::::::::::::::::::::::::::::::::::::::: hint
+Try the `n()` function.
+:::::::::::::::::::::::::::::::::::::::: 
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |> 
+  pivot_longer(contains("_")) |> 
+  drop_na(value) |> 
+  group_by(name) |> 
+  summarise(
+    mean = mean(value),
+    sd   = sd(value),
+    min  = min(value),
+    max  = max(value),
+    n = n()
+  )
+```
+
+``` output
+# A tibble: 4 × 6
+  name                mean     sd    min    max     n
+  <chr>              <dbl>  <dbl>  <dbl>  <dbl> <int>
+1 bill_depth_mm       17.2   1.97   13.1   21.5   342
+2 bill_length_mm      43.9   5.46   32.1   59.6   342
+3 body_mass_g       4202.  802.   2700   6300     342
+4 flipper_length_mm  201.   14.1   172    231     342
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 2
+Try grouping by more variables, like species and island, is the output what you would expect it to be?
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |> 
+  pivot_longer(contains("_")) |> 
+  drop_na(value) |> 
+  group_by(name, species, island) |> 
+  summarise(
+    mean = mean(value),
+    sd   = sd(value),
+    min  = min(value),
+    max  = max(value),
+    n = n()
+  )
+```
+
+``` output
+`summarise()` has grouped output by 'name', 'species'. You can override using
+the `.groups` argument.
+```
+
+``` output
+# A tibble: 20 × 8
+# Groups:   name, species [12]
+   name              species   island      mean      sd    min    max     n
+   <chr>             <fct>     <fct>      <dbl>   <dbl>  <dbl>  <dbl> <int>
+ 1 bill_depth_mm     Adelie    Biscoe      18.4   1.19    16     21.1    44
+ 2 bill_depth_mm     Adelie    Dream       18.3   1.13    15.5   21.2    56
+ 3 bill_depth_mm     Adelie    Torgersen   18.4   1.34    15.9   21.5    51
+ 4 bill_depth_mm     Chinstrap Dream       18.4   1.14    16.4   20.8    68
+ 5 bill_depth_mm     Gentoo    Biscoe      15.0   0.981   13.1   17.3   123
+ 6 bill_length_mm    Adelie    Biscoe      39.0   2.48    34.5   45.6    44
+ 7 bill_length_mm    Adelie    Dream       38.5   2.47    32.1   44.1    56
+ 8 bill_length_mm    Adelie    Torgersen   39.0   3.03    33.5   46      51
+ 9 bill_length_mm    Chinstrap Dream       48.8   3.34    40.9   58      68
+10 bill_length_mm    Gentoo    Biscoe      47.5   3.08    40.9   59.6   123
+11 body_mass_g       Adelie    Biscoe    3710.  488.    2850   4775      44
+12 body_mass_g       Adelie    Dream     3688.  455.    2900   4650      56
+13 body_mass_g       Adelie    Torgersen 3706.  445.    2900   4700      51
+14 body_mass_g       Chinstrap Dream     3733.  384.    2700   4800      68
+15 body_mass_g       Gentoo    Biscoe    5076.  504.    3950   6300     123
+16 flipper_length_mm Adelie    Biscoe     189.    6.73   172    203      44
+17 flipper_length_mm Adelie    Dream      190.    6.59   178    208      56
+18 flipper_length_mm Adelie    Torgersen  191.    6.23   176    210      51
+19 flipper_length_mm Chinstrap Dream      196.    7.13   178    212      68
+20 flipper_length_mm Gentoo    Biscoe     217.    6.48   203    231     123
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 3
+Create another summary table, with the same descriptive statistics (mean, sd ,min,max and n), 
+but for all numerical variables. Grouped only by the variable names.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins |> 
+  pivot_longer(where(is.numeric)) |> 
+  drop_na(value) |> 
+  group_by(name) |> 
+  summarise(
+    mean = mean(value),
+    sd   = sd(value),
+    min  = min(value),
+    max  = max(value),
+    n = n()
+  )
+```
+
+``` output
+# A tibble: 5 × 6
+  name                mean      sd    min    max     n
+  <chr>              <dbl>   <dbl>  <dbl>  <dbl> <int>
+1 bill_depth_mm       17.2   1.97    13.1   21.5   342
+2 bill_length_mm      43.9   5.46    32.1   59.6   342
+3 body_mass_g       4202.  802.    2700   6300     342
+4 flipper_length_mm  201.   14.1    172    231     342
+5 year              2008.    0.818 2007   2009     344
+```
+
+::::::::::::::::::::::::::::::::::::::::
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Plotting summaries
+
+Now that we have the summaries, we can use them in plots too! But keep typing or copying the same code over and over is tedious. 
+So let us save the summary in its own object, and keep using that.
+
+
+``` r
+penguins_sum <- penguins |> 
+  pivot_longer(contains("_")) |> 
+  drop_na(value) |> 
+  group_by(name, species, island) |> 
+  summarise(
+    mean = mean(value),
+    sd   = sd(value),
+    min  = min(value),
+    max  = max(value),
+    n = n()
+  ) |> 
+  ungroup()
+```
+
+``` output
+`summarise()` has grouped output by 'name', 'species'. You can override using
+the `.groups` argument.
+```
+
+We can for instance make a bar chart with the values from the summary statistics.
+
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = island,
+             y = mean,
+             colour = species)) +
+  geom_point() +
+  facet_wrap(~ name, scales = "free_y")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-10-1.png" style="display: block; margin: auto;" />
+
+oh, but the points are stacking on top of each other and are hard to see. T
+
+
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = island,
+             y = mean,
+             colour = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  facet_wrap(~ name, scales = "free_y")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-11-1.png" style="display: block; margin: auto;" />
+
+That is starting to look like something nice.
+What position_dodge is doing, is move the dts to each side a little, so they are not directly on top of each other, but you can still see them and which island they belong to clearly.
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 4
+Create a point plot based om the penguins summary data, where the standard deviations are on the y axis and species are on the x axis. 
+Make sure to dodge the bar for easier comparisons. 
+Create subplots on the different observational types 
+
+:::::::::::::::::::::::::::::::::::::::: hint
+Use facet_wrap()
+:::::::::::::::::::::::::::::::::::::::: 
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = island, 
+             y = sd,
+             fill = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  facet_wrap(~ name)
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-12-1.png" style="display: block; margin: auto;" />
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 5
+Change it so that species is both on the x-axis and the colour for the bar chart, and remove the dodge. 
+What argument do you need to add to `facet_wrap()` to make the y-axis scale vary freely between the subplots? 
+Why is this plot misleading?
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = species, 
+             y = sd,
+             fill = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  facet_wrap(~ name, scales = "free")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-13-1.png" style="display: block; margin: auto;" />
+The last plot is misleading because the data we have summary data by species and island. 
+Ignoring the island in the plot, means that the values for the different measurements cannot be distinguished from eachother.
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+A common thing to add to this type of plot, is the confidence intervals, or the error bars. This is calculated by the standard error, which we dont have, but for the sake of showing how to add error bars, we will use the standard deviation in stead.
+
+To do that, we add the `geom_errorbar()` function to the ggplot calls. `geom_errorbar` is a little different than other geoms we have seen, it takes very specific arguments, namely the minimum and maximum value the error bars should span.
+In our case, it would be the mean - sd, for minimum, and the mean + sd for the maximum.
+
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = island,
+             y = mean,
+             colour = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  geom_errorbar(aes(
+    ymin = mean - sd,
+    ymax = mean + sd
+  )) +
+  facet_wrap(~ name, scales = "free_y")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-14-1.png" style="display: block; margin: auto;" />
+
+Right, so now we have error bars, but they dont connect to the dots!
+Perhaps we can dodge those too?
+
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = island,
+             y = mean,
+             colour = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  geom_errorbar(aes(
+    ymin = mean - sd,
+    ymax = mean + sd
+  ),
+  position = position_dodge(width = 1)) +
+  facet_wrap(~ name, scales = "free_y")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-15-1.png" style="display: block; margin: auto;" />
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 6
+The width of the top horizontal lines in the error bars are are little too wide.
+Try adjusting them by setting the width argument to 0.3
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = island,
+             y = mean,
+             colour = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  geom_errorbar(aes(
+    ymin = mean - sd,
+    ymax = mean + sd
+  ),
+  position = position_dodge(width = 1),
+  width = .2) +
+  facet_wrap(~ name, scales = "free_y")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-16-1.png" style="display: block; margin: auto;" />
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+## Facetting as a grid
+
+But we can get even more creative! 
+Lets recreate our summary table, and add year as a grouping, so we can get an idea of how the measurements change over time.
+
+
+``` r
+penguins_sum <- penguins |> 
+  pivot_longer(contains("_")) |> 
+  drop_na(value) |> 
+  group_by(name, species, island, year) |> 
+  summarise(
+    mean = mean(value),
+    sd   = sd(value),
+    min  = min(value),
+    max  = max(value),
+    n = n()
+  ) |> 
+  ungroup()
+```
+
+``` output
+`summarise()` has grouped output by 'name', 'species', 'island'. You can
+override using the `.groups` argument.
+```
+
+``` r
+penguins_sum
+```
+
+``` output
+# A tibble: 60 × 9
+   name          species   island     year  mean    sd   min   max     n
+   <chr>         <fct>     <fct>     <int> <dbl> <dbl> <dbl> <dbl> <int>
+ 1 bill_depth_mm Adelie    Biscoe     2007  18.4 0.585  17.2  19.2    10
+ 2 bill_depth_mm Adelie    Biscoe     2008  18.1 1.20   16.2  21.1    18
+ 3 bill_depth_mm Adelie    Biscoe     2009  18.6 1.44   16    20.7    16
+ 4 bill_depth_mm Adelie    Dream      2007  18.7 1.21   16.7  21.2    20
+ 5 bill_depth_mm Adelie    Dream      2008  18.3 0.993  16.1  20.3    16
+ 6 bill_depth_mm Adelie    Dream      2009  17.7 0.994  15.5  20.1    20
+ 7 bill_depth_mm Adelie    Torgersen  2007  19.0 1.47   17.1  21.5    19
+ 8 bill_depth_mm Adelie    Torgersen  2008  18.1 1.11   16.1  19.4    16
+ 9 bill_depth_mm Adelie    Torgersen  2009  18.0 1.20   15.9  20.5    16
+10 bill_depth_mm Chinstrap Dream      2007  18.5 1.00   16.6  20.3    26
+# ℹ 50 more rows
+```
+
+And then let us re-create our last plot with this new summary table.
+
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = island,
+             y = mean,
+             colour = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  geom_errorbar(aes(
+    ymin = mean - sd,
+    ymax = mean + sd
+  ),
+  width = 0.3,
+  position = position_dodge(width = 1)) +
+  facet_wrap(~ name, scales = "free_y")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-18-1.png" style="display: block; margin: auto;" />
+
+What is happening here?
+Because we've now added year to the groups in the summary, we have multiple means per species and island, for each of the measurement years.
+So we need to add something to the plot so we can tease those appart.
+We have added to variables to the facet before. 
+Remember how we did that?
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 7
+The width of the top horizontal lines in the error bars are are little too wide.
+Try adjusting them by setting the width argument to 0.3
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = island,
+             y = mean,
+             colour = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  geom_errorbar(aes(
+    ymin = mean - sd,
+    ymax = mean + sd
+  ),
+  position = position_dodge(width = 1),
+  width = .2) +
+  facet_wrap(~ name + year, scales = "free_y")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-19-1.png" style="display: block; margin: auto;" />
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+OK, so now we have it all. But its a little messy to compare over time, and what are we really looking at?
+I find it often makes more sense to plot time variables on the x-axis, and then facets over categories. 
+Lets switch that up.
+
+
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = year,
+             y = mean,
+             colour = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  geom_errorbar(aes(
+    ymin = mean - sd,
+    ymax = mean + sd
+  ),
+  position = position_dodge(width = 1),
+  width = .2) +
+  facet_wrap(~ name + island, scales = "free_y")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-20-1.png" style="display: block; margin: auto;" />
+ok, so we got what we asked, the year part makes more sense, but its a very "busy" plot.
+Its really quite hard to compare everything from Bisoe, or all the Adelie's, to each other.
+How can we make it easier?
+
+We will switch `facet_wrap()` to `facet_grid()` which creates a grid of subplots. 
+The formula for the grid is using both side of the `~` sign. 
+And you can think of it like `rows ~ columns`.
+So here we are saying we want the `island` values as rows, and `name` values as columns in the plot grid.
+
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = year,
+             y = mean,
+             colour = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  geom_errorbar(aes(
+    ymin = mean - sd,
+    ymax = mean + sd
+  ),
+  position = position_dodge(width = 1),
+  width = .2) +
+  facet_grid(island ~ name)
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-21-1.png" style="display: block; margin: auto;" />
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 8
+It is hard to see the different metrics in the subplots, because they are all on such different scales. 
+Try setting the y-axis to be set freely to allow differences betweem the subplots. 
+Was this the effect you expected?
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = year,
+             y = mean,
+             colour = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  geom_errorbar(aes(
+    ymin = mean - sd,
+    ymax = mean + sd
+  ),
+  position = position_dodge(width = 1),
+  width = .2) +
+  facet_grid(island ~ name, scales = "free_y")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-22-1.png" style="display: block; margin: auto;" />
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 9
+Try switching up what is plotted as rows and columns in the facet. Does this help the plot?
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+``` r
+penguins_sum |> 
+  ggplot(aes(x = year,
+             y = mean,
+             colour = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  geom_errorbar(aes(
+    ymin = mean - sd,
+    ymax = mean + sd
+  ),
+  position = position_dodge(width = 1),
+  width = .2) +
+  facet_grid(name ~ island, scales = "free_y")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-23-1.png" style="display: block; margin: auto;" />
+`facet_grid` is more complex than `facet_wrap` as it will always force the y-axis for rows, and x-axis for columns remain the same.
+So wile setting scales to free will help a little, it will only do so within each row and column, not each subplot. 
+When the results do not look as you like, swapping what are rows and columns in the grid can often create better results. 
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+## Altering ggplot colours and theme
+
+We now have a plot that is quite nicely summarising the data we have.
+But we want to customise it more. 
+While the defaults in ggplot are fine enough, we usually want to improve it from the default look. 
+
+Before we do that, lets save the plot as an object, so we dont have to keep track of the part of the code we are not changing.
+Saving a ggplot object is just like saving a dataset object.
+We have to assign it a name at the beginning.
+
+
+``` r
+penguins_plot <- penguins_sum |> 
+  ggplot(aes(x = year,
+             y = mean,
+             colour = species)) +
+  geom_point(position = position_dodge(width = 1)) +
+  geom_errorbar(aes(
+    ymin = mean - sd,
+    ymax = mean + sd
+  ),
+  position = position_dodge(width = 1),
+  width = .2) +
+  facet_grid(name ~ island, scales = "free_y")
+```
+
+Did you notice that it did not make a new plot?
+Just like when you assign a data set it wont show in the console, when you assign a plot, it wont show in the plot pane.
+
+To re-initiate the plot in the plot pane, write its name in the console and press enter.
+
+
+``` r
+penguins_plot
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-25-1.png" style="display: block; margin: auto;" />
+
+From there, we can keep adding more ggplot geoms or facets etc.
+In this first version, we will add a "theme". A theme is a change of the overall "look" of the plot.
+
+
+``` r
+penguins_plot +
+  theme_classic()
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-26-1.png" style="display: block; margin: auto;" />
+the classic theme is preferred by many journals, but for facet grid, its not super nice, since we loose grid information.
+
+
+``` r
+penguins_plot +
+  theme_light()
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-27-1.png" style="display: block; margin: auto;" />
+Theme light could be a nice option, but the white text of light grey makes the panel text hard to read.
+
+
+``` r
+penguins_plot +
+  theme_dark()
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-28-1.png" style="display: block; margin: auto;" />
+
+Theme dark could theoretically be really nice, but then we'll need other colours for the points and error bars!
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 10
+Try different themes and find one you like. 
+
+:::::::::::::::::::::::::::::::::::::::: hint
+You can type "theme" and press the tab button, to look at all the possibilities.
+:::::::::::::::::::::::::::::::::::::::: 
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+What themes did you find that you liked?
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+We are going to have a go at `theme_linedraw` which has a simple but clear design.
+
+
+``` r
+penguins_plot +
+  theme_linedraw()
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-29-1.png" style="display: block; margin: auto;" />
+
+Now that we have a theme, we can have a look at changing the colours of the points and error bars. 
+We do this through something called "scales".
+
+
+``` r
+penguins_plot +
+  theme_linedraw() +
+  scale_colour_brewer(palette = "Dark2")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-30-1.png" style="display: block; margin: auto;" />
+
+So here, we are changing the colour aesthetic, using a "brewer" palette "Dark2".
+What is a brewer palette?
+THe brewer palettes are a curated library of colour palettes to choose from in ggplot.
+You can have a peak at all possible brewer palettes by typing
+
+
+``` r
+RColorBrewer::display.brewer.all()
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-31-1.png" style="display: block; margin: auto;" />
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 11
+Try another brewer palette by replacing the palette name with another in the brewer list of palettes. 
+
+::::::::::::::::::::::::::::::::::::: solution 
+
+## Solution
+
+``` r
+penguins_plot +
+  theme_linedraw() +
+  scale_colour_brewer(palette = "Accent")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-32-1.png" style="display: block; margin: auto;" />
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 12
+Apply the dark theme in stead, and a pastel colour palette.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution
+
+``` r
+penguins_plot +
+  theme_dark() +
+  scale_colour_brewer(palette = "Pastel2")
+```
+
+<img src="fig/09-data-complex-pipelines-rendered-unnamed-chunk-33-1.png" style="display: block; margin: auto;" />
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+Amazing! 
+We have now adapted our plot to look nicer and more to our liking.
+There are plenty of packages out there with specialised themes and colour palettes to choose from. 
+Harry Potter colours, Wes Anderson colours, Ghibli move colours. You can find almost anything you like! 
+
+## Wrap-up
+
+Its the end of day two, and we are all super tired. 
+We've been through so much material, and learned so many things.
+We hope you have now the tools in your belt to start working more confidently in the tidyverse with your data, and that you can get to where you need from here.
+
diff --git a/10-data-manipulation-across.md b/10-data-manipulation-across.md
new file mode 100644
index 0000000..427f2ca
--- /dev/null
+++ b/10-data-manipulation-across.md
@@ -0,0 +1,455 @@
+---
+title: "Data manipulation across columns"
+teaching: 45
+exercises: 6
+---
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How can I calculate the mean of several columns for every row of data?
+- How can I apply the same function across several related columns?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+
+
+## Motivation
+
+We have covered many topics so far, and changing (or mutating) variables has been a key concept. 
+The need to create new columns of data, often based on information in other columns of data, is a type of operation that we need very often.
+But some times, you also need to calculate something _per row_ for several solumns. 
+For instance, you want the sum of all columns in a certain collection, or the mean of them, how can we do that?
+
+One way, is to write it our entirely.
+Let just pretend there is a good reason to get the sum of bill length and bill depth. 
+Let us also make a subsetted sample with just the bill measurements so we cab easily see what we are doing.
+We can do that in the following way.
+
+
+``` r
+penguins_s <- penguins |>
+    select(species, starts_with("bill"))
+
+penguins_s |> 
+  mutate(
+    bill_sum = bill_depth_mm + bill_length_mm
+    )
+```
+
+``` output
+# A tibble: 344 × 4
+   species bill_length_mm bill_depth_mm bill_sum
+   <fct>            <dbl>         <dbl>    <dbl>
+ 1 Adelie            39.1          18.7     57.8
+ 2 Adelie            39.5          17.4     56.9
+ 3 Adelie            40.3          18       58.3
+ 4 Adelie            NA            NA       NA  
+ 5 Adelie            36.7          19.3     56  
+ 6 Adelie            39.3          20.6     59.9
+ 7 Adelie            38.9          17.8     56.7
+ 8 Adelie            39.2          19.6     58.8
+ 9 Adelie            34.1          18.1     52.2
+10 Adelie            42            20.2     62.2
+# ℹ 334 more rows
+```
+
+We've seen similar types of operations before.
+But what if you want to sum 20 columns, you would need to type our all 20 column names!
+Again, tedious. 
+We have a special type of operations we can do to get that easily. 
+We will use the function `sum` to calculate the sum of several variables when using this pipeline.
+
+
+``` r
+penguins_s |>
+    mutate(bill_sum = sum(c_across(starts_with("bill"))))
+```
+
+``` output
+# A tibble: 344 × 4
+   species bill_length_mm bill_depth_mm bill_sum
+   <fct>            <dbl>         <dbl>    <dbl>
+ 1 Adelie            39.1          18.7       NA
+ 2 Adelie            39.5          17.4       NA
+ 3 Adelie            40.3          18         NA
+ 4 Adelie            NA            NA         NA
+ 5 Adelie            36.7          19.3       NA
+ 6 Adelie            39.3          20.6       NA
+ 7 Adelie            38.9          17.8       NA
+ 8 Adelie            39.2          19.6       NA
+ 9 Adelie            34.1          18.1       NA
+10 Adelie            42            20.2       NA
+# ℹ 334 more rows
+```
+
+hm, that is not what we expected.
+I know why, but the reason is not always easy to understand. 
+By default, `c_across` will summarise all the rows for all the bill columns, and give a _single_ value for the entire data set.
+There are some `NA`s the entire data set, so it returns `NA`. 
+So how can we force it to work in a row-wise fashion?
+We can apply a function called `rowwise()` which is a special type of `group_by` that groups your data by each row, so each row is its own group.
+Then, `c_across()` will calculate the mean of the columns just for that group (i.e. row in this case).
+
+
+``` r
+penguins_s |>
+    rowwise() |>
+    mutate(bill_sum = sum(c_across(starts_with("bill"))))
+```
+
+``` output
+# A tibble: 344 × 4
+# Rowwise: 
+   species bill_length_mm bill_depth_mm bill_sum
+   <fct>            <dbl>         <dbl>    <dbl>
+ 1 Adelie            39.1          18.7     57.8
+ 2 Adelie            39.5          17.4     56.9
+ 3 Adelie            40.3          18       58.3
+ 4 Adelie            NA            NA       NA  
+ 5 Adelie            36.7          19.3     56  
+ 6 Adelie            39.3          20.6     59.9
+ 7 Adelie            38.9          17.8     56.7
+ 8 Adelie            39.2          19.6     58.8
+ 9 Adelie            34.1          18.1     52.2
+10 Adelie            42            20.2     62.2
+# ℹ 334 more rows
+```
+
+Now we can see that we get the row sum of all the bill columns for each row, and the tibble tells us it is "Rowwise".
+To stop the data set being rowwise, we can use the `ungroup()` function we learned before.
+
+
+``` r
+penguins_s |>
+    rowwise() |>
+    mutate(bill_sum = sum(c_across(starts_with("bill")))) |>
+    ungroup()
+```
+
+``` output
+# A tibble: 344 × 4
+   species bill_length_mm bill_depth_mm bill_sum
+   <fct>            <dbl>         <dbl>    <dbl>
+ 1 Adelie            39.1          18.7     57.8
+ 2 Adelie            39.5          17.4     56.9
+ 3 Adelie            40.3          18       58.3
+ 4 Adelie            NA            NA       NA  
+ 5 Adelie            36.7          19.3     56  
+ 6 Adelie            39.3          20.6     59.9
+ 7 Adelie            38.9          17.8     56.7
+ 8 Adelie            39.2          19.6     58.8
+ 9 Adelie            34.1          18.1     52.2
+10 Adelie            42            20.2     62.2
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 1
+Calculate the mean of all the columns with millimeter measurements, an call it `mm_mean`, for each row of data.
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution 
+
+
+``` r
+penguins |> 
+  rowwise() |>
+  mutate(
+    mm_mean = mean(c_across(ends_with("mm")))
+  )
+```
+
+``` output
+# A tibble: 344 × 9
+# Rowwise: 
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 3 more variables: sex <fct>, year <int>, mm_mean <dbl>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 2
+Calculate the mean of all the columns with millimeter measurements, an call it `mm_mean`, for each row of data.
+Then, group the data by species, and calculate the mean of the `mm_mean` within each species and add it as a column named `mm_mean_species`.
+Ignore `NA`s in the last calculation
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution 
+
+
+``` r
+penguins |> 
+  rowwise() |>
+  mutate(
+    mm_mean = mean(c_across(ends_with("mm"))),
+  ) |>
+  group_by(species) |>
+  mutate(mm_mean_species = mean(mm_mean, na.rm = TRUE))
+```
+
+``` output
+# A tibble: 344 × 10
+# Groups:   species [3]
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 4 more variables: sex <fct>, year <int>, mm_mean <dbl>,
+#   mm_mean_species <dbl>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Mutating several columns in one go
+
+So far, we've been looking at adding or summarising variables one by one, or in a pivoted fashion.
+This is of course something we do all the time, but some times we need to do the same change to multiple columns at once. 
+Imagine you have a data set with 20 column and you want to scale them all to the same scale.
+Writing the same command with different columns 20 times is very tedious. 
+
+In our case, let us say we want to scale the three columns with millimeter measurements so that they have a mean of 0 and standard deviation of 1. 
+We've already used the `scale()` function once before, so we will do it again.
+
+In this simple example we might have done so:
+
+
+``` r
+penguins |> 
+  mutate(
+    bill_depth_sc = scale(bill_depth_mm),
+    bill_length_sc = scale(bill_length_mm),
+    flipper_length_sc = scale(flipper_length_mm)
+)
+```
+
+``` output
+# A tibble: 344 × 11
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 5 more variables: sex <fct>, year <int>, bill_depth_sc <dbl[,1]>,
+#   bill_length_sc <dbl[,1]>, flipper_length_sc <dbl[,1]>
+```
+
+Its just three columns, we can do that. 
+But let us imagine we have 20 of these, typing all that out is tedious and error prone. 
+You might forget to alter the name or keep the same type of naming convention. 
+We are only human, we easily make mistakes.
+With {dplyr}'s `across()` we can combine our knowledge of tidy-selectors and mutate to create the entire transformation for these columns at once.
+
+
+``` r
+penguins |> 
+  mutate(across(.cols = ends_with("mm"), 
+                .fns = scale))
+```
+
+``` output
+# A tibble: 344 × 8
+   species island    bill_length_mm[,1] bill_depth_mm[,1] flipper_length_mm[,1]
+   <fct>   <fct>                  <dbl>             <dbl>                 <dbl>
+ 1 Adelie  Torgersen             -0.883             0.784                -1.42 
+ 2 Adelie  Torgersen             -0.810             0.126                -1.06 
+ 3 Adelie  Torgersen             -0.663             0.430                -0.421
+ 4 Adelie  Torgersen             NA                NA                    NA    
+ 5 Adelie  Torgersen             -1.32              1.09                 -0.563
+ 6 Adelie  Torgersen             -0.847             1.75                 -0.776
+ 7 Adelie  Torgersen             -0.920             0.329                -1.42 
+ 8 Adelie  Torgersen             -0.865             1.24                 -0.421
+ 9 Adelie  Torgersen             -1.80              0.480                -0.563
+10 Adelie  Torgersen             -0.352             1.54                 -0.776
+# ℹ 334 more rows
+# ℹ 3 more variables: body_mass_g <int>, sex <fct>, year <int>
+```
+
+Whoa! So fast!
+Now the three columns are scaled. 
+`.col` argument takes a tidy-selection of columns, and `.fns` it where you let it know which function to apply.
+
+But oh no! The columns have been overwritten. Rather than creating new ones, we replaced the old ones.
+This might be your intention in some instances, or maybe you will just create a new data set with the scaled variables. 
+
+
+``` r
+penguins_mm_sc <- penguins |> 
+  mutate(across(.cols = ends_with("mm"),
+                .fns = scale))
+```
+
+but often, we'd like to keep the original but add the new variants. We can do that to within the across!
+
+
+``` r
+penguins |> 
+  mutate(across(.cols = ends_with("mm"),
+                .fns = scale, 
+                .names = "{.col}_sc")) |> 
+  select(contains("mm"))
+```
+
+``` output
+# A tibble: 344 × 6
+   bill_length_mm bill_depth_mm flipper_length_mm bill_length_mm_sc[,1]
+            <dbl>         <dbl>             <int>                 <dbl>
+ 1           39.1          18.7               181                -0.883
+ 2           39.5          17.4               186                -0.810
+ 3           40.3          18                 195                -0.663
+ 4           NA            NA                  NA                NA    
+ 5           36.7          19.3               193                -1.32 
+ 6           39.3          20.6               190                -0.847
+ 7           38.9          17.8               181                -0.920
+ 8           39.2          19.6               195                -0.865
+ 9           34.1          18.1               193                -1.80 
+10           42            20.2               190                -0.352
+# ℹ 334 more rows
+# ℹ 2 more variables: bill_depth_mm_sc <dbl[,1]>,
+#   flipper_length_mm_sc <dbl[,1]>
+```
+
+Now they are all there! neat! But that `.names` argument is a little weird. What does it really mean?
+
+Internally, `across()` stores the column names in a vector it calls `.col`.
+We can use this knowledge to tell the across function what to name our new columns. 
+In this case, we want to append the column name with `_sc`. 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 3
+Transform all the colmns with an underscore in their name so they are scaled, and add the _prefix_ `sc_` to the columns names.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution 
+
+
+``` r
+penguins |> 
+  mutate(across(.cols = contains("_"),
+                .fns = scale, 
+                .names = "sc_{.col}"))
+```
+
+``` output
+# A tibble: 344 × 12
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 6 more variables: sex <fct>, year <int>, sc_bill_length_mm <dbl[,1]>,
+#   sc_bill_depth_mm <dbl[,1]>, sc_flipper_length_mm <dbl[,1]>,
+#   sc_body_mass_g <dbl[,1]>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+## Challenge 4
+Transform all the colmns with an underscore in their name so they are scaled, and add the _prefix_ `sc_` to the columns names. 
+Add another _standard_ change of the body mass column to kilograms
+
+:::::::::::::::::::::::::::::::::::::::: hint
+You can add a standard mutate within the same mutate as across
+:::::::::::::::::::::::::::::::::::::::: 
+
+:::::::::::::::::::::::::::::::::::::::: solution 
+## Solution 
+
+
+``` r
+penguins |> 
+  mutate(
+    across(.cols = contains("_"),
+           .fns = scale, 
+           .names = "sc_{.col}"),
+    body_mass_kg = body_mass_g / 1000
+  )
+```
+
+``` output
+# A tibble: 344 × 13
+   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
+ 1 Adelie  Torgersen           39.1          18.7               181        3750
+ 2 Adelie  Torgersen           39.5          17.4               186        3800
+ 3 Adelie  Torgersen           40.3          18                 195        3250
+ 4 Adelie  Torgersen           NA            NA                  NA          NA
+ 5 Adelie  Torgersen           36.7          19.3               193        3450
+ 6 Adelie  Torgersen           39.3          20.6               190        3650
+ 7 Adelie  Torgersen           38.9          17.8               181        3625
+ 8 Adelie  Torgersen           39.2          19.6               195        4675
+ 9 Adelie  Torgersen           34.1          18.1               193        3475
+10 Adelie  Torgersen           42            20.2               190        4250
+# ℹ 334 more rows
+# ℹ 7 more variables: sex <fct>, year <int>, sc_bill_length_mm <dbl[,1]>,
+#   sc_bill_depth_mm <dbl[,1]>, sc_flipper_length_mm <dbl[,1]>,
+#   sc_body_mass_g <dbl[,1]>, body_mass_kg <dbl>
+```
+
+:::::::::::::::::::::::::::::::::::::::: 
+::::::::::::::::::::::::::::::::::::: 
+
+
+## Wrap-up
+
+We hope these sessions have given your a leg-up in starting you R and tidyverse journey. 
+Remember that learning to code is like learning a new language, the best way to learn is to keep trying. 
+We promise, your efforts will not be in vain as you uncover the power of R and the tidyverse.
+
+### Learning more
+As and end to this workshop, we would like to provide you with some learning materials that might aid you in further pursuits of learning R. 
+
+The [tidyverse webpage](https://www.tidyverse.org/) offers lots of resources on learning the tidyverse way of working, and information
+about what great things you can do with this collection of packages. 
+There is an [R for Datascience](https://www.rfordatasci.com/) learning community that is an excellent and 
+welcoming community of other learners navigating the tidyverse. We wholeheartedly recommend joining this community!
+The [Rstudio community](https://community.rstudio.com/) is also a great place to ask questions or 
+look for solutions for questions you may have, and so is [stackoverflow](https://stackoverflow.com/).
+
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
new file mode 100644
index 0000000..8b98edd
--- /dev/null
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,13 @@
+---
+title: "Contributor Code of Conduct"
+---
+
+As contributors and maintainers of this project,
+we pledge to follow the [Carpentry Code of Conduct][coc].
+
+Instances of abusive, harassing, or otherwise unacceptable behavior
+may be reported by following our [reporting guidelines][coc-reporting].
+
+
+[coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html
+[coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html
diff --git a/LICENSE.md b/LICENSE.md
new file mode 100644
index 0000000..b6f5f2c
--- /dev/null
+++ b/LICENSE.md
@@ -0,0 +1,82 @@
+---
+title: "Licenses"
+---
+
+## Instructional Material
+
+All Software Carpentry, Data Carpentry, and Library Carpentry instructional material is
+made available under the [Creative Commons Attribution
+license][cc-by-human]. The following is a human-readable summary of
+(and not a substitute for) the [full legal text of the CC BY 4.0
+license][cc-by-legal].
+
+You are free:
+
+* to **Share**---copy and redistribute the material in any medium or format
+* to **Adapt**---remix, transform, and build upon the material
+
+for any purpose, even commercially.
+
+The licensor cannot revoke these freedoms as long as you follow the
+license terms.
+
+Under the following terms:
+
+* **Attribution**---You must give appropriate credit (mentioning that
+  your work is derived from work that is Copyright © Software
+  Carpentry and, where practical, linking to
+  http://software-carpentry.org/), provide a [link to the
+  license][cc-by-human], and indicate if changes were made. You may do
+  so in any reasonable manner, but not in any way that suggests the
+  licensor endorses you or your use.
+
+**No additional restrictions**---You may not apply legal terms or
+technological measures that legally restrict others from doing
+anything the license permits.  With the understanding that:
+
+Notices:
+
+* You do not have to comply with the license for elements of the
+  material in the public domain or where your use is permitted by an
+  applicable exception or limitation.
+* No warranties are given. The license may not give you all of the
+  permissions necessary for your intended use. For example, other
+  rights such as publicity, privacy, or moral rights may limit how you
+  use the material.
+
+## Software
+
+Except where otherwise noted, the example programs and other software
+provided by Software Carpentry and Data Carpentry are made available under the
+[OSI][osi]-approved
+[MIT license][mit-license].
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+## Trademark
+
+"Software Carpentry" and "Data Carpentry" and their respective logos
+are registered trademarks of [Community Initiatives][ci].
+
+[cc-by-human]: https://creativecommons.org/licenses/by/4.0/
+[cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode
+[mit-license]: https://opensource.org/licenses/mit-license.html
+[ci]: http://communityin.org/
+[osi]: https://opensource.org
diff --git a/config.yaml b/config.yaml
new file mode 100644
index 0000000..1d29389
--- /dev/null
+++ b/config.yaml
@@ -0,0 +1,84 @@
+#------------------------------------------------------------
+# Values for this lesson.
+#------------------------------------------------------------
+
+# Which carpentry is this (swc, dc, lc, or cp)?
+# swc: Software Carpentry
+# dc: Data Carpentry
+# lc: Library Carpentry
+# cp: Carpentries (to use for instructor traning for instance)
+carpentry: incubator
+
+# Overall title for pages.
+title: R and the Tidyverse for working with data
+
+# Date the lesson was created (this is empty by default)
+created: ~
+
+# Comma-separated list of keywords for the lesson
+keywords: software, data, lesson, The Carpentries, R, tidyverse
+
+# Life cycle stage of the lesson
+# possible values: pre-alpha, alpha, beta, stable
+life_cycle: alpha
+
+# License of the lesson
+license: CC-BY 4.0
+
+# Link to the source repository for this lesson
+source: https://github.com/athanasiamo/r-tidyverse-4-datasets
+
+# Default branch of your lesson
+branch: main
+
+# Who to contact if there are any issues
+contact: team@carpentries.org
+
+# Navigation ------------------------------------------------
+#
+# Use the following menu items to specify the order of
+# individual pages in each dropdown section. Leave blank to
+# include all pages in the folder.
+#
+# Example -------------
+#
+# episodes:
+# - introduction.md
+# - first-steps.md
+#
+# learners:
+# - setup.md
+#
+# instructors:
+# - instructor-notes.md
+#
+# profiles:
+# - one-learner.md
+# - another-learner.md
+
+# Order of episodes in your lesson
+episodes:
+  - 01-project-introduction.Rmd
+  - 02-data-visualisation.Rmd
+  - 03-data-subsetting.Rmd
+  - 04-data-sorting-pipes.Rmd
+  - 05-data-plotting-scales.Rmd
+  - 06-data-manipulation.Rmd
+  - 07-data-reshaping.Rmd
+  - 08-data-summaries.Rmd
+  - 09-data-complex-pipelines.Rmd
+  - 10-data-manipulation-across.Rmd
+
+# Information for Learners
+learners:
+- setup.md
+
+# Information for Instructors
+instructors:
+- instructor-notes.md
+
+# Learner Profiles
+profiles:
+- learner-profiles.md
+
+
diff --git a/fig/01-rstudio-script.png b/fig/01-rstudio-script.png
new file mode 100644
index 0000000..a361016
Binary files /dev/null and b/fig/01-rstudio-script.png differ
diff --git a/fig/01-rstudio.png b/fig/01-rstudio.png
new file mode 100644
index 0000000..da85e08
Binary files /dev/null and b/fig/01-rstudio.png differ
diff --git a/fig/01_bad_project.png b/fig/01_bad_project.png
new file mode 100644
index 0000000..fcfda0c
Binary files /dev/null and b/fig/01_bad_project.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-10-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-10-1.png
new file mode 100644
index 0000000..fbda139
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-10-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-11-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-11-1.png
new file mode 100644
index 0000000..428f631
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-11-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-12-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-12-1.png
new file mode 100644
index 0000000..70a03aa
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-12-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-13-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-13-1.png
new file mode 100644
index 0000000..f6655e9
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-13-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-14-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-14-1.png
new file mode 100644
index 0000000..22f3ac5
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-14-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-15-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-15-1.png
new file mode 100644
index 0000000..c1cdb4c
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-15-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-16-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-16-1.png
new file mode 100644
index 0000000..770063b
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-16-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-17-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-17-1.png
new file mode 100644
index 0000000..72b675d
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-17-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-18-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-18-1.png
new file mode 100644
index 0000000..1c7d6f2
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-18-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-19-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-19-1.png
new file mode 100644
index 0000000..92212d2
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-19-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-20-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-20-1.png
new file mode 100644
index 0000000..9d3451a
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-20-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-21-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-21-1.png
new file mode 100644
index 0000000..b2407d5
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-21-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-22-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-22-1.png
new file mode 100644
index 0000000..9b3359d
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-22-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-4-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-4-1.png
new file mode 100644
index 0000000..122dc97
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-4-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-5-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-5-1.png
new file mode 100644
index 0000000..774121d
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-5-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-6-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-6-1.png
new file mode 100644
index 0000000..d680bde
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-6-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-7-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-7-1.png
new file mode 100644
index 0000000..d5754b1
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-7-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-8-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-8-1.png
new file mode 100644
index 0000000..60f7008
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-8-1.png differ
diff --git a/fig/02-data-visualisation-rendered-unnamed-chunk-9-1.png b/fig/02-data-visualisation-rendered-unnamed-chunk-9-1.png
new file mode 100644
index 0000000..3d3149e
Binary files /dev/null and b/fig/02-data-visualisation-rendered-unnamed-chunk-9-1.png differ
diff --git a/fig/03-filtering.gif b/fig/03-filtering.gif
new file mode 100644
index 0000000..7b5755a
Binary files /dev/null and b/fig/03-filtering.gif differ
diff --git a/fig/03-selecting.gif b/fig/03-selecting.gif
new file mode 100644
index 0000000..f656a56
Binary files /dev/null and b/fig/03-selecting.gif differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-10-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-10-1.png
new file mode 100644
index 0000000..58bb782
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-10-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-11-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-11-1.png
new file mode 100644
index 0000000..67948de
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-11-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-12-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-12-1.png
new file mode 100644
index 0000000..bb38cf4
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-12-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-13-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-13-1.png
new file mode 100644
index 0000000..5173a21
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-13-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-14-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-14-1.png
new file mode 100644
index 0000000..da6fc71
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-14-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-15-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-15-1.png
new file mode 100644
index 0000000..8b9fc9f
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-15-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-2-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-2-1.png
new file mode 100644
index 0000000..23c0d48
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-2-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-3-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-3-1.png
new file mode 100644
index 0000000..7db9132
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-3-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-4-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-4-1.png
new file mode 100644
index 0000000..5f248a3
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-4-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-5-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-5-1.png
new file mode 100644
index 0000000..22e14d7
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-5-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-6-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-6-1.png
new file mode 100644
index 0000000..c7c2504
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-6-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-7-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-7-1.png
new file mode 100644
index 0000000..945ee22
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-7-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-8-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-8-1.png
new file mode 100644
index 0000000..9e21aa4
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-8-1.png differ
diff --git a/fig/05-data-plotting-scales-rendered-unnamed-chunk-9-1.png b/fig/05-data-plotting-scales-rendered-unnamed-chunk-9-1.png
new file mode 100644
index 0000000..4c67fef
Binary files /dev/null and b/fig/05-data-plotting-scales-rendered-unnamed-chunk-9-1.png differ
diff --git a/fig/06-data-manipulation-rendered-unnamed-chunk-12-1.png b/fig/06-data-manipulation-rendered-unnamed-chunk-12-1.png
new file mode 100644
index 0000000..f834429
Binary files /dev/null and b/fig/06-data-manipulation-rendered-unnamed-chunk-12-1.png differ
diff --git a/fig/06-data-manipulation-rendered-unnamed-chunk-16-1.png b/fig/06-data-manipulation-rendered-unnamed-chunk-16-1.png
new file mode 100644
index 0000000..aae0e15
Binary files /dev/null and b/fig/06-data-manipulation-rendered-unnamed-chunk-16-1.png differ
diff --git a/fig/06-data-manipulation-rendered-unnamed-chunk-17-1.png b/fig/06-data-manipulation-rendered-unnamed-chunk-17-1.png
new file mode 100644
index 0000000..9040a0f
Binary files /dev/null and b/fig/06-data-manipulation-rendered-unnamed-chunk-17-1.png differ
diff --git a/fig/06-tall_wide.gif b/fig/06-tall_wide.gif
new file mode 100644
index 0000000..55f2469
Binary files /dev/null and b/fig/06-tall_wide.gif differ
diff --git a/fig/07-data-reshaping-rendered-unnamed-chunk-4-1.png b/fig/07-data-reshaping-rendered-unnamed-chunk-4-1.png
new file mode 100644
index 0000000..067c82a
Binary files /dev/null and b/fig/07-data-reshaping-rendered-unnamed-chunk-4-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-10-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-10-1.png
new file mode 100644
index 0000000..8a7f784
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-10-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-11-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-11-1.png
new file mode 100644
index 0000000..9062abf
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-11-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-12-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-12-1.png
new file mode 100644
index 0000000..b024aa7
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-12-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-13-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-13-1.png
new file mode 100644
index 0000000..9af4bc6
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-13-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-14-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-14-1.png
new file mode 100644
index 0000000..797824c
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-14-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-15-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-15-1.png
new file mode 100644
index 0000000..3efacbd
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-15-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-16-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-16-1.png
new file mode 100644
index 0000000..db39664
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-16-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-18-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-18-1.png
new file mode 100644
index 0000000..fc7e338
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-18-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-19-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-19-1.png
new file mode 100644
index 0000000..30271b5
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-19-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-20-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-20-1.png
new file mode 100644
index 0000000..121e155
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-20-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-21-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-21-1.png
new file mode 100644
index 0000000..0e130d1
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-21-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-22-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-22-1.png
new file mode 100644
index 0000000..8aa5eea
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-22-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-23-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-23-1.png
new file mode 100644
index 0000000..6725523
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-23-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-25-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-25-1.png
new file mode 100644
index 0000000..6725523
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-25-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-26-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-26-1.png
new file mode 100644
index 0000000..73689fa
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-26-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-27-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-27-1.png
new file mode 100644
index 0000000..5b24c20
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-27-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-28-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-28-1.png
new file mode 100644
index 0000000..f72634a
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-28-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-29-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-29-1.png
new file mode 100644
index 0000000..397f229
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-29-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-30-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-30-1.png
new file mode 100644
index 0000000..674c6d6
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-30-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-31-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-31-1.png
new file mode 100644
index 0000000..1035eeb
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-31-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-32-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-32-1.png
new file mode 100644
index 0000000..8742798
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-32-1.png differ
diff --git a/fig/09-data-complex-pipelines-rendered-unnamed-chunk-33-1.png b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-33-1.png
new file mode 100644
index 0000000..67687a7
Binary files /dev/null and b/fig/09-data-complex-pipelines-rendered-unnamed-chunk-33-1.png differ
diff --git a/index.md b/index.md
new file mode 100644
index 0000000..af66276
--- /dev/null
+++ b/index.md
@@ -0,0 +1,9 @@
+---
+site: sandpaper::sandpaper_site
+---
+
+This is a new lesson built with [The Carpentries Workbench][workbench]. 
+
+
+[workbench]: https://carpentries.github.io/sandpaper-docs
+
diff --git a/instructor-notes.md b/instructor-notes.md
new file mode 100644
index 0000000..434e335
--- /dev/null
+++ b/instructor-notes.md
@@ -0,0 +1,5 @@
+---
+title: FIXME
+---
+
+This is a placeholder file. Please add content here. 
diff --git a/learner-profiles.md b/learner-profiles.md
new file mode 100644
index 0000000..434e335
--- /dev/null
+++ b/learner-profiles.md
@@ -0,0 +1,5 @@
+---
+title: FIXME
+---
+
+This is a placeholder file. Please add content here. 
diff --git a/links.md b/links.md
new file mode 100644
index 0000000..4c5cd2f
--- /dev/null
+++ b/links.md
@@ -0,0 +1,10 @@
+<!-- 
+Place links that you need to refer to multiple times across pages here. Delete
+any links that you are not going to use. 
+ -->
+
+[pandoc]: https://pandoc.org/MANUAL.html
+[r-markdown]: https://rmarkdown.rstudio.com/
+[rstudio]: https://www.rstudio.com/
+[carpentries-workbench]: https://carpentries.github.io/sandpaper-docs/
+
diff --git a/md5sum.txt b/md5sum.txt
new file mode 100644
index 0000000..68488c6
--- /dev/null
+++ b/md5sum.txt
@@ -0,0 +1,20 @@
+"file" "checksum" "built" "date"
+"CODE_OF_CONDUCT.md" "8d9e44dd5c39f241b5e8b47ecfc802d1" "site/built/CODE_OF_CONDUCT.md" "2024-12-08"
+"LICENSE.md" "afaf427b4223952624dcb6d8ded53ec0" "site/built/LICENSE.md" "2024-12-08"
+"config.yaml" "8a792282bff9e898778b401920b6f9a8" "site/built/config.yaml" "2024-12-08"
+"index.md" "a02c9c785ed98ddd84fe3d34ddb12fcd" "site/built/index.md" "2024-12-08"
+"links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2024-12-08"
+"episodes/01-project-introduction.Rmd" "85a6bc9fb924dcaaad246c479be6ed42" "site/built/01-project-introduction.md" "2024-12-08"
+"episodes/02-data-visualisation.Rmd" "f01af1c13c02f0248a05ea6893aa0ba8" "site/built/02-data-visualisation.md" "2024-12-08"
+"episodes/03-data-subsetting.Rmd" "9380373f6f554a0f109a2e399c34e137" "site/built/03-data-subsetting.md" "2024-12-08"
+"episodes/04-data-sorting-pipes.Rmd" "0642d7c251442344e27f7c057e2b714a" "site/built/04-data-sorting-pipes.md" "2024-12-08"
+"episodes/05-data-plotting-scales.Rmd" "819480fb05a22b435f22d2b025f41cfa" "site/built/05-data-plotting-scales.md" "2024-12-08"
+"episodes/06-data-manipulation.Rmd" "5fad3f4fbcaa6c605f2eb1928c8a55f7" "site/built/06-data-manipulation.md" "2024-12-08"
+"episodes/07-data-reshaping.Rmd" "83a42d7216c1c0e502be56f57bc39abe" "site/built/07-data-reshaping.md" "2024-12-08"
+"episodes/08-data-summaries.Rmd" "a37d3fecd97bb93441a8ac64c1d7d2aa" "site/built/08-data-summaries.md" "2024-12-08"
+"episodes/09-data-complex-pipelines.Rmd" "b9c48d1e0c0000a46c8c1fd5e6c45a29" "site/built/09-data-complex-pipelines.md" "2024-12-08"
+"episodes/10-data-manipulation-across.Rmd" "90d2a60c36c2d30825e5e0d6ad63c0d9" "site/built/10-data-manipulation-across.md" "2024-12-08"
+"instructors/instructor-notes.md" "60b93493cf1da06dfd63255d73854461" "site/built/instructor-notes.md" "2024-12-08"
+"learners/setup.md" "969ce71ddf0e8ed639bd94df7feb8858" "site/built/setup.md" "2024-12-08"
+"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2024-12-08"
+"renv/profiles/lesson-requirements/renv.lock" "21374fb340023ff47a842ce6181cff3c" "site/built/renv.lock" "2024-12-08"
diff --git a/renv.lock b/renv.lock
new file mode 100644
index 0000000..c423957
--- /dev/null
+++ b/renv.lock
@@ -0,0 +1,1527 @@
+{
+  "R": {
+    "Version": "4.4.2",
+    "Repositories": [
+      {
+        "Name": "carpentries",
+        "URL": "https://carpentries.r-universe.dev"
+      },
+      {
+        "Name": "carpentries_archive",
+        "URL": "https://carpentries.github.io/drat"
+      },
+      {
+        "Name": "CRAN",
+        "URL": "https://cran.rstudio.com"
+      }
+    ]
+  },
+  "Packages": {
+    "DBI": {
+      "Package": "DBI",
+      "Version": "1.2.3",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "methods"
+      ],
+      "Hash": "065ae649b05f1ff66bb0c793107508f5"
+    },
+    "MASS": {
+      "Package": "MASS",
+      "Version": "7.3-61",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "grDevices",
+        "graphics",
+        "methods",
+        "stats",
+        "utils"
+      ],
+      "Hash": "0cafd6f0500e5deba33be22c46bf6055"
+    },
+    "Matrix": {
+      "Package": "Matrix",
+      "Version": "1.7-1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "grDevices",
+        "graphics",
+        "grid",
+        "lattice",
+        "methods",
+        "stats",
+        "utils"
+      ],
+      "Hash": "5122bb14d8736372411f955e1b16bc8a"
+    },
+    "R6": {
+      "Package": "R6",
+      "Version": "2.5.1",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "470851b6d5d0ac559e9d01bb352b4021"
+    },
+    "RColorBrewer": {
+      "Package": "RColorBrewer",
+      "Version": "1.1-3",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "45f0398006e83a5b10b72a90663d8d8c"
+    },
+    "askpass": {
+      "Package": "askpass",
+      "Version": "1.2.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "sys"
+      ],
+      "Hash": "c39f4155b3ceb1a9a2799d700fbd4b6a"
+    },
+    "backports": {
+      "Package": "backports",
+      "Version": "1.5.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "e1e1b9d75c37401117b636b7ae50827a"
+    },
+    "base64enc": {
+      "Package": "base64enc",
+      "Version": "0.1-3",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "543776ae6848fde2f48ff3816d0628bc"
+    },
+    "bit": {
+      "Package": "bit",
+      "Version": "4.5.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "5dc7b2677d65d0e874fc4aaf0e879987"
+    },
+    "bit64": {
+      "Package": "bit64",
+      "Version": "4.5.2",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "bit",
+        "methods",
+        "stats",
+        "utils"
+      ],
+      "Hash": "e84984bf5f12a18628d9a02322128dfd"
+    },
+    "blob": {
+      "Package": "blob",
+      "Version": "1.2.4",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "methods",
+        "rlang",
+        "vctrs"
+      ],
+      "Hash": "40415719b5a479b87949f3aa0aee737c"
+    },
+    "broom": {
+      "Package": "broom",
+      "Version": "1.0.7",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "backports",
+        "dplyr",
+        "generics",
+        "glue",
+        "lifecycle",
+        "purrr",
+        "rlang",
+        "stringr",
+        "tibble",
+        "tidyr"
+      ],
+      "Hash": "8fcc818f3b9887aebaf206f141437cc9"
+    },
+    "bslib": {
+      "Package": "bslib",
+      "Version": "0.8.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "base64enc",
+        "cachem",
+        "fastmap",
+        "grDevices",
+        "htmltools",
+        "jquerylib",
+        "jsonlite",
+        "lifecycle",
+        "memoise",
+        "mime",
+        "rlang",
+        "sass"
+      ],
+      "Hash": "b299c6741ca9746fb227debcb0f9fb6c"
+    },
+    "cachem": {
+      "Package": "cachem",
+      "Version": "1.1.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "fastmap",
+        "rlang"
+      ],
+      "Hash": "cd9a672193789068eb5a2aad65a0dedf"
+    },
+    "callr": {
+      "Package": "callr",
+      "Version": "3.7.6",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "R6",
+        "processx",
+        "utils"
+      ],
+      "Hash": "d7e13f49c19103ece9e58ad2d83a7354"
+    },
+    "cellranger": {
+      "Package": "cellranger",
+      "Version": "1.1.0",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R",
+        "rematch",
+        "tibble"
+      ],
+      "Hash": "f61dbaec772ccd2e17705c1e872e9e7c"
+    },
+    "cli": {
+      "Package": "cli",
+      "Version": "3.6.3",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "utils"
+      ],
+      "Hash": "b21916dd77a27642b447374a5d30ecf3"
+    },
+    "clipr": {
+      "Package": "clipr",
+      "Version": "0.8.0",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "utils"
+      ],
+      "Hash": "3f038e5ac7f41d4ac41ce658c85e3042"
+    },
+    "colorspace": {
+      "Package": "colorspace",
+      "Version": "2.1-1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "grDevices",
+        "graphics",
+        "methods",
+        "stats"
+      ],
+      "Hash": "d954cb1c57e8d8b756165d7ba18aa55a"
+    },
+    "conflicted": {
+      "Package": "conflicted",
+      "Version": "1.2.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "memoise",
+        "rlang"
+      ],
+      "Hash": "bb097fccb22d156624fd07cd2894ddb6"
+    },
+    "cpp11": {
+      "Package": "cpp11",
+      "Version": "0.5.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "91570bba75d0c9d3f1040c835cee8fba"
+    },
+    "crayon": {
+      "Package": "crayon",
+      "Version": "1.5.3",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "grDevices",
+        "methods",
+        "utils"
+      ],
+      "Hash": "859d96e65ef198fd43e82b9628d593ef"
+    },
+    "curl": {
+      "Package": "curl",
+      "Version": "6.0.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "e8ba62486230951fcd2b881c5be23f96"
+    },
+    "data.table": {
+      "Package": "data.table",
+      "Version": "1.16.2",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "methods"
+      ],
+      "Hash": "2e00b378fc3be69c865120d9f313039a"
+    },
+    "dbplyr": {
+      "Package": "dbplyr",
+      "Version": "2.5.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "DBI",
+        "R",
+        "R6",
+        "blob",
+        "cli",
+        "dplyr",
+        "glue",
+        "lifecycle",
+        "magrittr",
+        "methods",
+        "pillar",
+        "purrr",
+        "rlang",
+        "tibble",
+        "tidyr",
+        "tidyselect",
+        "utils",
+        "vctrs",
+        "withr"
+      ],
+      "Hash": "39b2e002522bfd258039ee4e889e0fd1"
+    },
+    "digest": {
+      "Package": "digest",
+      "Version": "0.6.37",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "utils"
+      ],
+      "Hash": "33698c4b3127fc9f506654607fb73676"
+    },
+    "dplyr": {
+      "Package": "dplyr",
+      "Version": "1.1.4",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "R6",
+        "cli",
+        "generics",
+        "glue",
+        "lifecycle",
+        "magrittr",
+        "methods",
+        "pillar",
+        "rlang",
+        "tibble",
+        "tidyselect",
+        "utils",
+        "vctrs"
+      ],
+      "Hash": "fedd9d00c2944ff00a0e2696ccf048ec"
+    },
+    "dtplyr": {
+      "Package": "dtplyr",
+      "Version": "1.3.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "data.table",
+        "dplyr",
+        "glue",
+        "lifecycle",
+        "rlang",
+        "tibble",
+        "tidyselect",
+        "vctrs"
+      ],
+      "Hash": "54ed3ea01b11e81a86544faaecfef8e2"
+    },
+    "evaluate": {
+      "Package": "evaluate",
+      "Version": "1.0.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "3fd29944b231036ad67c3edb32e02201"
+    },
+    "fansi": {
+      "Package": "fansi",
+      "Version": "1.0.6",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "grDevices",
+        "utils"
+      ],
+      "Hash": "962174cf2aeb5b9eea581522286a911f"
+    },
+    "farver": {
+      "Package": "farver",
+      "Version": "2.1.2",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Hash": "680887028577f3fa2a81e410ed0d6e42"
+    },
+    "fastmap": {
+      "Package": "fastmap",
+      "Version": "1.2.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Hash": "aa5e1cd11c2d15497494c5292d7ffcc8"
+    },
+    "fontawesome": {
+      "Package": "fontawesome",
+      "Version": "0.5.3",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "htmltools",
+        "rlang"
+      ],
+      "Hash": "bd1297f9b5b1fc1372d19e2c4cd82215"
+    },
+    "forcats": {
+      "Package": "forcats",
+      "Version": "1.0.0",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R",
+        "cli",
+        "glue",
+        "lifecycle",
+        "magrittr",
+        "rlang",
+        "tibble"
+      ],
+      "Hash": "1a0a9a3d5083d0d573c4214576f1e690"
+    },
+    "fs": {
+      "Package": "fs",
+      "Version": "1.6.5",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "methods"
+      ],
+      "Hash": "7f48af39fa27711ea5fbd183b399920d"
+    },
+    "gargle": {
+      "Package": "gargle",
+      "Version": "1.5.2",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "fs",
+        "glue",
+        "httr",
+        "jsonlite",
+        "lifecycle",
+        "openssl",
+        "rappdirs",
+        "rlang",
+        "stats",
+        "utils",
+        "withr"
+      ],
+      "Hash": "fc0b272e5847c58cd5da9b20eedbd026"
+    },
+    "generics": {
+      "Package": "generics",
+      "Version": "0.1.3",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R",
+        "methods"
+      ],
+      "Hash": "15e9634c0fcd294799e9b2e929ed1b86"
+    },
+    "ggplot2": {
+      "Package": "ggplot2",
+      "Version": "3.5.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "MASS",
+        "R",
+        "cli",
+        "glue",
+        "grDevices",
+        "grid",
+        "gtable",
+        "isoband",
+        "lifecycle",
+        "mgcv",
+        "rlang",
+        "scales",
+        "stats",
+        "tibble",
+        "vctrs",
+        "withr"
+      ],
+      "Hash": "44c6a2f8202d5b7e878ea274b1092426"
+    },
+    "glue": {
+      "Package": "glue",
+      "Version": "1.8.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "methods"
+      ],
+      "Hash": "5899f1eaa825580172bb56c08266f37c"
+    },
+    "googledrive": {
+      "Package": "googledrive",
+      "Version": "2.1.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "gargle",
+        "glue",
+        "httr",
+        "jsonlite",
+        "lifecycle",
+        "magrittr",
+        "pillar",
+        "purrr",
+        "rlang",
+        "tibble",
+        "utils",
+        "uuid",
+        "vctrs",
+        "withr"
+      ],
+      "Hash": "e99641edef03e2a5e87f0a0b1fcc97f4"
+    },
+    "googlesheets4": {
+      "Package": "googlesheets4",
+      "Version": "1.1.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cellranger",
+        "cli",
+        "curl",
+        "gargle",
+        "glue",
+        "googledrive",
+        "httr",
+        "ids",
+        "lifecycle",
+        "magrittr",
+        "methods",
+        "purrr",
+        "rematch2",
+        "rlang",
+        "tibble",
+        "utils",
+        "vctrs",
+        "withr"
+      ],
+      "Hash": "d6db1667059d027da730decdc214b959"
+    },
+    "gtable": {
+      "Package": "gtable",
+      "Version": "0.3.6",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "glue",
+        "grid",
+        "lifecycle",
+        "rlang",
+        "stats"
+      ],
+      "Hash": "de949855009e2d4d0e52a844e30617ae"
+    },
+    "haven": {
+      "Package": "haven",
+      "Version": "2.5.4",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "cpp11",
+        "forcats",
+        "hms",
+        "lifecycle",
+        "methods",
+        "readr",
+        "rlang",
+        "tibble",
+        "tidyselect",
+        "vctrs"
+      ],
+      "Hash": "9171f898db9d9c4c1b2c745adc2c1ef1"
+    },
+    "highr": {
+      "Package": "highr",
+      "Version": "0.11",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "xfun"
+      ],
+      "Hash": "d65ba49117ca223614f71b60d85b8ab7"
+    },
+    "hms": {
+      "Package": "hms",
+      "Version": "1.1.3",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "lifecycle",
+        "methods",
+        "pkgconfig",
+        "rlang",
+        "vctrs"
+      ],
+      "Hash": "b59377caa7ed00fa41808342002138f9"
+    },
+    "htmltools": {
+      "Package": "htmltools",
+      "Version": "0.5.8.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "base64enc",
+        "digest",
+        "fastmap",
+        "grDevices",
+        "rlang",
+        "utils"
+      ],
+      "Hash": "81d371a9cc60640e74e4ab6ac46dcedc"
+    },
+    "httr": {
+      "Package": "httr",
+      "Version": "1.4.7",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "R6",
+        "curl",
+        "jsonlite",
+        "mime",
+        "openssl"
+      ],
+      "Hash": "ac107251d9d9fd72f0ca8049988f1d7f"
+    },
+    "ids": {
+      "Package": "ids",
+      "Version": "1.0.1",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "openssl",
+        "uuid"
+      ],
+      "Hash": "99df65cfef20e525ed38c3d2577f7190"
+    },
+    "isoband": {
+      "Package": "isoband",
+      "Version": "0.2.7",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "grid",
+        "utils"
+      ],
+      "Hash": "0080607b4a1a7b28979aecef976d8bc2"
+    },
+    "jquerylib": {
+      "Package": "jquerylib",
+      "Version": "0.1.4",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "htmltools"
+      ],
+      "Hash": "5aab57a3bd297eee1c1d862735972182"
+    },
+    "jsonlite": {
+      "Package": "jsonlite",
+      "Version": "1.8.9",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "methods"
+      ],
+      "Hash": "4e993b65c2c3ffbffce7bb3e2c6f832b"
+    },
+    "knitr": {
+      "Package": "knitr",
+      "Version": "1.49",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "evaluate",
+        "highr",
+        "methods",
+        "tools",
+        "xfun",
+        "yaml"
+      ],
+      "Hash": "9fcb189926d93c636dea94fbe4f44480"
+    },
+    "labeling": {
+      "Package": "labeling",
+      "Version": "0.4.3",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "graphics",
+        "stats"
+      ],
+      "Hash": "b64ec208ac5bc1852b285f665d6368b3"
+    },
+    "lattice": {
+      "Package": "lattice",
+      "Version": "0.22-6",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R",
+        "grDevices",
+        "graphics",
+        "grid",
+        "stats",
+        "utils"
+      ],
+      "Hash": "cc5ac1ba4c238c7ca9fa6a87ca11a7e2"
+    },
+    "lifecycle": {
+      "Package": "lifecycle",
+      "Version": "1.0.4",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "glue",
+        "rlang"
+      ],
+      "Hash": "b8552d117e1b808b09a832f589b79035"
+    },
+    "lubridate": {
+      "Package": "lubridate",
+      "Version": "1.9.3",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "generics",
+        "methods",
+        "timechange"
+      ],
+      "Hash": "680ad542fbcf801442c83a6ac5a2126c"
+    },
+    "magrittr": {
+      "Package": "magrittr",
+      "Version": "2.0.3",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "7ce2733a9826b3aeb1775d56fd305472"
+    },
+    "memoise": {
+      "Package": "memoise",
+      "Version": "2.0.1",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "cachem",
+        "rlang"
+      ],
+      "Hash": "e2817ccf4a065c5d9d7f2cfbe7c1d78c"
+    },
+    "mgcv": {
+      "Package": "mgcv",
+      "Version": "1.9-1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "Matrix",
+        "R",
+        "graphics",
+        "methods",
+        "nlme",
+        "splines",
+        "stats",
+        "utils"
+      ],
+      "Hash": "110ee9d83b496279960e162ac97764ce"
+    },
+    "mime": {
+      "Package": "mime",
+      "Version": "0.12",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "tools"
+      ],
+      "Hash": "18e9c28c1d3ca1560ce30658b22ce104"
+    },
+    "modelr": {
+      "Package": "modelr",
+      "Version": "0.1.11",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "broom",
+        "magrittr",
+        "purrr",
+        "rlang",
+        "tibble",
+        "tidyr",
+        "tidyselect",
+        "vctrs"
+      ],
+      "Hash": "4f50122dc256b1b6996a4703fecea821"
+    },
+    "munsell": {
+      "Package": "munsell",
+      "Version": "0.5.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "colorspace",
+        "methods"
+      ],
+      "Hash": "4fd8900853b746af55b81fda99da7695"
+    },
+    "nlme": {
+      "Package": "nlme",
+      "Version": "3.1-166",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R",
+        "graphics",
+        "lattice",
+        "stats",
+        "utils"
+      ],
+      "Hash": "ccbb8846be320b627e6aa2b4616a2ded"
+    },
+    "openssl": {
+      "Package": "openssl",
+      "Version": "2.2.2",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "askpass"
+      ],
+      "Hash": "d413e0fef796c9401a4419485f709ca1"
+    },
+    "palmerpenguins": {
+      "Package": "palmerpenguins",
+      "Version": "0.1.1",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "6c6861efbc13c1d543749e9c7be4a592"
+    },
+    "pillar": {
+      "Package": "pillar",
+      "Version": "1.9.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "cli",
+        "fansi",
+        "glue",
+        "lifecycle",
+        "rlang",
+        "utf8",
+        "utils",
+        "vctrs"
+      ],
+      "Hash": "15da5a8412f317beeee6175fbc76f4bb"
+    },
+    "pkgconfig": {
+      "Package": "pkgconfig",
+      "Version": "2.0.3",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "utils"
+      ],
+      "Hash": "01f28d4278f15c76cddbea05899c5d6f"
+    },
+    "prettyunits": {
+      "Package": "prettyunits",
+      "Version": "1.2.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "6b01fc98b1e86c4f705ce9dcfd2f57c7"
+    },
+    "processx": {
+      "Package": "processx",
+      "Version": "3.8.4",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "R6",
+        "ps",
+        "utils"
+      ],
+      "Hash": "0c90a7d71988856bad2a2a45dd871bb9"
+    },
+    "progress": {
+      "Package": "progress",
+      "Version": "1.2.3",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "R6",
+        "crayon",
+        "hms",
+        "prettyunits"
+      ],
+      "Hash": "f4625e061cb2865f111b47ff163a5ca6"
+    },
+    "ps": {
+      "Package": "ps",
+      "Version": "1.8.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "utils"
+      ],
+      "Hash": "b4404b1de13758dea1c0484ad0d48563"
+    },
+    "purrr": {
+      "Package": "purrr",
+      "Version": "1.0.2",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "lifecycle",
+        "magrittr",
+        "rlang",
+        "vctrs"
+      ],
+      "Hash": "1cba04a4e9414bdefc9dcaa99649a8dc"
+    },
+    "ragg": {
+      "Package": "ragg",
+      "Version": "1.3.3",
+      "Source": "Repository",
+      "Repository": "https://carpentries.r-universe.dev",
+      "RemoteType": "repository",
+      "RemoteUrl": "https://github.com/r-lib/ragg",
+      "RemoteRef": "v1.3.3",
+      "RemoteSha": "6f2279ae8cd0e0d7e9d0e1ede2b742666f9f1d49",
+      "Requirements": [
+        "systemfonts",
+        "textshaping"
+      ],
+      "Hash": "d7ccdaa1187d3b5d49714774af6a6f29"
+    },
+    "rappdirs": {
+      "Package": "rappdirs",
+      "Version": "0.3.3",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "5e3c5dc0b071b21fa128676560dbe94d"
+    },
+    "readr": {
+      "Package": "readr",
+      "Version": "2.1.5",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "R6",
+        "cli",
+        "clipr",
+        "cpp11",
+        "crayon",
+        "hms",
+        "lifecycle",
+        "methods",
+        "rlang",
+        "tibble",
+        "tzdb",
+        "utils",
+        "vroom"
+      ],
+      "Hash": "9de96463d2117f6ac49980577939dfb3"
+    },
+    "readxl": {
+      "Package": "readxl",
+      "Version": "1.4.3",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cellranger",
+        "cpp11",
+        "progress",
+        "tibble",
+        "utils"
+      ],
+      "Hash": "8cf9c239b96df1bbb133b74aef77ad0a"
+    },
+    "rematch": {
+      "Package": "rematch",
+      "Version": "2.0.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Hash": "cbff1b666c6fa6d21202f07e2318d4f1"
+    },
+    "rematch2": {
+      "Package": "rematch2",
+      "Version": "2.1.2",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "tibble"
+      ],
+      "Hash": "76c9e04c712a05848ae7a23d2f170a40"
+    },
+    "renv": {
+      "Package": "renv",
+      "Version": "1.0.11",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "utils"
+      ],
+      "Hash": "47623f66b4e80b3b0587bc5d7b309888"
+    },
+    "reprex": {
+      "Package": "reprex",
+      "Version": "2.1.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "callr",
+        "cli",
+        "clipr",
+        "fs",
+        "glue",
+        "knitr",
+        "lifecycle",
+        "rlang",
+        "rmarkdown",
+        "rstudioapi",
+        "utils",
+        "withr"
+      ],
+      "Hash": "97b1d5361a24d9fb588db7afe3e5bcbf"
+    },
+    "rlang": {
+      "Package": "rlang",
+      "Version": "1.1.4",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "utils"
+      ],
+      "Hash": "3eec01f8b1dee337674b2e34ab1f9bc1"
+    },
+    "rmarkdown": {
+      "Package": "rmarkdown",
+      "Version": "2.29",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "bslib",
+        "evaluate",
+        "fontawesome",
+        "htmltools",
+        "jquerylib",
+        "jsonlite",
+        "knitr",
+        "methods",
+        "tinytex",
+        "tools",
+        "utils",
+        "xfun",
+        "yaml"
+      ],
+      "Hash": "df99277f63d01c34e95e3d2f06a79736"
+    },
+    "rstudioapi": {
+      "Package": "rstudioapi",
+      "Version": "0.17.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Hash": "5f90cd73946d706cfe26024294236113"
+    },
+    "rvest": {
+      "Package": "rvest",
+      "Version": "1.0.4",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "glue",
+        "httr",
+        "lifecycle",
+        "magrittr",
+        "rlang",
+        "selectr",
+        "tibble",
+        "xml2"
+      ],
+      "Hash": "0bcf0c6f274e90ea314b812a6d19a519"
+    },
+    "sass": {
+      "Package": "sass",
+      "Version": "0.4.9",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R6",
+        "fs",
+        "htmltools",
+        "rappdirs",
+        "rlang"
+      ],
+      "Hash": "d53dbfddf695303ea4ad66f86e99b95d"
+    },
+    "scales": {
+      "Package": "scales",
+      "Version": "1.3.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "R6",
+        "RColorBrewer",
+        "cli",
+        "farver",
+        "glue",
+        "labeling",
+        "lifecycle",
+        "munsell",
+        "rlang",
+        "viridisLite"
+      ],
+      "Hash": "c19df082ba346b0ffa6f833e92de34d1"
+    },
+    "selectr": {
+      "Package": "selectr",
+      "Version": "0.4-2",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R",
+        "R6",
+        "methods",
+        "stringr"
+      ],
+      "Hash": "3838071b66e0c566d55cc26bd6e27bf4"
+    },
+    "stringi": {
+      "Package": "stringi",
+      "Version": "1.8.4",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R",
+        "stats",
+        "tools",
+        "utils"
+      ],
+      "Hash": "39e1144fd75428983dc3f63aa53dfa91"
+    },
+    "stringr": {
+      "Package": "stringr",
+      "Version": "1.5.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "glue",
+        "lifecycle",
+        "magrittr",
+        "rlang",
+        "stringi",
+        "vctrs"
+      ],
+      "Hash": "960e2ae9e09656611e0b8214ad543207"
+    },
+    "sys": {
+      "Package": "sys",
+      "Version": "3.4.3",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Hash": "de342ebfebdbf40477d0758d05426646"
+    },
+    "systemfonts": {
+      "Package": "systemfonts",
+      "Version": "1.1.0",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R",
+        "cpp11",
+        "lifecycle"
+      ],
+      "Hash": "213b6b8ed5afbf934843e6c3b090d418"
+    },
+    "textshaping": {
+      "Package": "textshaping",
+      "Version": "0.4.0",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R",
+        "cpp11",
+        "lifecycle",
+        "systemfonts"
+      ],
+      "Hash": "5142f8bc78ed3d819d26461b641627ce"
+    },
+    "tibble": {
+      "Package": "tibble",
+      "Version": "3.2.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "fansi",
+        "lifecycle",
+        "magrittr",
+        "methods",
+        "pillar",
+        "pkgconfig",
+        "rlang",
+        "utils",
+        "vctrs"
+      ],
+      "Hash": "a84e2cc86d07289b3b6f5069df7a004c"
+    },
+    "tidyr": {
+      "Package": "tidyr",
+      "Version": "1.3.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "cpp11",
+        "dplyr",
+        "glue",
+        "lifecycle",
+        "magrittr",
+        "purrr",
+        "rlang",
+        "stringr",
+        "tibble",
+        "tidyselect",
+        "utils",
+        "vctrs"
+      ],
+      "Hash": "915fb7ce036c22a6a33b5a8adb712eb1"
+    },
+    "tidyselect": {
+      "Package": "tidyselect",
+      "Version": "1.2.1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "glue",
+        "lifecycle",
+        "rlang",
+        "vctrs",
+        "withr"
+      ],
+      "Hash": "829f27b9c4919c16b593794a6344d6c0"
+    },
+    "tidyverse": {
+      "Package": "tidyverse",
+      "Version": "2.0.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "broom",
+        "cli",
+        "conflicted",
+        "dbplyr",
+        "dplyr",
+        "dtplyr",
+        "forcats",
+        "ggplot2",
+        "googledrive",
+        "googlesheets4",
+        "haven",
+        "hms",
+        "httr",
+        "jsonlite",
+        "lubridate",
+        "magrittr",
+        "modelr",
+        "pillar",
+        "purrr",
+        "ragg",
+        "readr",
+        "readxl",
+        "reprex",
+        "rlang",
+        "rstudioapi",
+        "rvest",
+        "stringr",
+        "tibble",
+        "tidyr",
+        "xml2"
+      ],
+      "Hash": "c328568cd14ea89a83bd4ca7f54ae07e"
+    },
+    "timechange": {
+      "Package": "timechange",
+      "Version": "0.3.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cpp11"
+      ],
+      "Hash": "c5f3c201b931cd6474d17d8700ccb1c8"
+    },
+    "tinytex": {
+      "Package": "tinytex",
+      "Version": "0.54",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "xfun"
+      ],
+      "Hash": "3ec7e3ddcacc2d34a9046941222bf94d"
+    },
+    "tzdb": {
+      "Package": "tzdb",
+      "Version": "0.4.0",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cpp11"
+      ],
+      "Hash": "f561504ec2897f4d46f0c7657e488ae1"
+    },
+    "utf8": {
+      "Package": "utf8",
+      "Version": "1.2.4",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "62b65c52671e6665f803ff02954446e9"
+    },
+    "uuid": {
+      "Package": "uuid",
+      "Version": "1.2-1",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "34e965e62a41fcafb1ca60e9b142085b"
+    },
+    "vctrs": {
+      "Package": "vctrs",
+      "Version": "0.6.5",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "cli",
+        "glue",
+        "lifecycle",
+        "rlang"
+      ],
+      "Hash": "c03fa420630029418f7e6da3667aac4a"
+    },
+    "viridisLite": {
+      "Package": "viridisLite",
+      "Version": "0.4.2",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R"
+      ],
+      "Hash": "c826c7c4241b6fc89ff55aaea3fa7491"
+    },
+    "vroom": {
+      "Package": "vroom",
+      "Version": "1.6.5",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "bit64",
+        "cli",
+        "cpp11",
+        "crayon",
+        "glue",
+        "hms",
+        "lifecycle",
+        "methods",
+        "progress",
+        "rlang",
+        "stats",
+        "tibble",
+        "tidyselect",
+        "tzdb",
+        "vctrs",
+        "withr"
+      ],
+      "Hash": "390f9315bc0025be03012054103d227c"
+    },
+    "withr": {
+      "Package": "withr",
+      "Version": "3.0.2",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "grDevices",
+        "graphics"
+      ],
+      "Hash": "cc2d62c76458d425210d1eb1478b30b4"
+    },
+    "xfun": {
+      "Package": "xfun",
+      "Version": "0.49",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Requirements": [
+        "R",
+        "grDevices",
+        "stats",
+        "tools"
+      ],
+      "Hash": "8687398773806cfff9401a2feca96298"
+    },
+    "xml2": {
+      "Package": "xml2",
+      "Version": "1.3.6",
+      "Source": "Repository",
+      "Repository": "RSPM",
+      "Requirements": [
+        "R",
+        "cli",
+        "methods",
+        "rlang"
+      ],
+      "Hash": "1d0336142f4cd25d8d23cd3ba7a8fb61"
+    },
+    "yaml": {
+      "Package": "yaml",
+      "Version": "2.3.10",
+      "Source": "Repository",
+      "Repository": "CRAN",
+      "Hash": "51dab85c6c98e50a18d7551e9d49f76c"
+    }
+  }
+}
diff --git a/setup.md b/setup.md
new file mode 100644
index 0000000..26a3eff
--- /dev/null
+++ b/setup.md
@@ -0,0 +1,77 @@
+---
+title: Setup
+---
+
+
+## Software Setup
+
+:::::::::::::::: solution
+
+### Windows
+
+You can watch the [YouTube video tutorial](https://www.youtube.com/watch?v=q0PjTAylwoU) for complete instructions.
+
+- [ ] Install R by downloading and running [this .exe file](http://cran.r-project.org/bin/windows/base/release.htm") from [CRAN](http://cran.r-project.org/index.html).  
+- [ ] Please, also [install Rtools](https://cran.r-project.org/bin/windows/Rtools/rtools42/rtools.html)
+  - Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on `.exe` file and select "Run as administrator" instead of double-clicking). Otherwise problems may occur later,  for example when installing R packages.
+
+:::::::::::::::::::::::::
+
+:::::::::::::::: solution
+
+### MacOS
+
+Follow the [video tutorial](https://www.youtube.com/watch?v=5-ly3kyxwEg) for detailed instructions.
+
+- [ ] Install R by downloading and running [this .pkg file](http://cran.r-project.org/bin/macosx/R-latest.pkg) from [CRAN](http://cran.r-project.org/index.html)
+
+:::::::::::::::::::::::::
+
+
+:::::::::::::::: solution
+
+### Linux
+
+- [ ] You can download the binary files for your distribution
+        from [CRAN](http://cran.r-project.org/index.html
+
+**Or**
+
+- [ ] you can use your package manager (e.g. for Debian/Ubuntu
+
+```
+sudo apt-get install r-base
+```
+
+- [ ] for Fedora run
+```
+sudo dnf install R
+```
+
+:::::::::::::::::::::::::
+
+
+
+**There are some extra things to install for all operating systems.**
+<br><br>
+
+::::::::::::::::::::::::: solution
+
+### RStudio
+
+Please install the [RStudio IDE](http://www.rstudio.com/ide/download/desktop). 
+It is the user interface towards R, and is required for this workshop.
+
+::::::::::::::::::::::::: 
+
+::::::::::::::::::::::::: solution
+
+### R packages 
+Lastly, you will need to install two packages to join the workshop, namely the {tidyverse} and {palmerpenguins} packages.
+You can do this by opening RStudio, and in the panel labelled "console" (usually in the bottom left corner), type the following:
+
+```r
+install.packages(c("tidyverse", "palmerpenguins"))
+```
+
+:::::::::::::::::::::::::