diff --git a/01-project-introduction.md b/01-project-introduction.md
new file mode 100644
index 0000000..556df93
--- /dev/null
+++ b/01-project-introduction.md
@@ -0,0 +1,276 @@
+---
+title: "Introduction to R and RStudio"
+teaching: 45
+exercises: 4
+---
+
+
+
+::::::::::::::::::::::::::::::::::::: questions
+
+- How to find your way around RStudio?
+- How to interact with R?
+- How to organise your project files?
+- How to install packages?
+
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Describe the purpose and use of each pane in the RStudio IDE
+- Locate buttons and options in the RStudio IDE
+- Create and use R-projects
+- How to organise and access project files
+
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: keypoints
+- Use RStudio to write and run R programs.
+- Create and start an R-project
+- Use `install.packages()` to install packages (libraries).
+- Use the `here` package to access project files
+:::::::::::::::::::::::::::::::::::::
+
+
+## Motivation
+
+Working with data can often be challenging.
+Data are rarely in the shape and format that is most convenient for your end product.
+Everyone working with data knows that there is a lot of work that goes into altering the data to make sure you can explore and highlight the interesting aspects of it.
+In this lesson, we will use the dataset from the
+[palmerpenguins](https://allisonhorst.github.io/palmerpenguins/) R-package, which contains observational data on arctic penguins.
+Data were collected and made available by [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) and the [Palmer Station, Antarctica LTER](https://pal.lternet.edu/), a member of the [Long Term Ecological Research Network](https://lternet.edu/).
+This lesson focuses on using the [tidyverse](https://www.tidyverse.org/) packages, a opinionated collection of packages that are tailored to the needs of data scientists.
+Can you organise your project in an orderly fashion and access all the files?
+Can you navigate a dataset in R?
+Can you add columns and change column names?
+Can you efficiently summarise the data?
+Can you create visualizations to show key aspects of the data?
+At the end of this lesson, you should be able to do al these things!
+
+## Before Starting The Workshop
+
+Please ensure you have the latest version of R and RStudio installed on your machine.
+This is important, as some packages used in the workshop may not install correctly (or at all) if R is not up to date.
+
+* [Download and install the latest version of R here](https://www.r-project.org/)
+* [Download and install RStudio here](https://www.rstudio.com/)
+* If you are on a windows computer, also download and install [RTools](https://cran.r-project.org/bin/windows/Rtools/)
+
+## Introduction to RStudio
+
+Welcome to the R portion of the Software Carpentry workshop.
+
+Throughout this lesson, we're going to teach you some of the best-practice ways of working with data and projects using the tidyverse framework for R.
+
+We'll be using RStudio: a free, open source R Integrated Development Environment (IDE).
+It provides a built in editor, works on all platforms (including on servers) and provides many advantages such as integration with version control and project management.
+
+
+**Basic layout**
+
+When you first open RStudio, you will be greeted by three panels:
+
+ * The interactive R console/Terminal (entire left)
+ * Environment/History/Connections (tabbed in upper right)
+ * Files/Plots/Packages/Help/Viewer (tabbed in lower right)
+
+
+
+
+Once you open files, such as R scripts, an editor panel will also open
+in the top left.
+
+
+
+## Work flow within RStudio
+There are two main ways one can work within RStudio:
+
+1. Test and play within the interactive R console then copy code into
+a .R file to run later.
+ * This works well when doing small tests and initially starting off.
+ * It quickly becomes laborious
+2. Start writing in a .R file and use RStudio's short cut keys for the Run command to push the current line, selected lines or modified lines to the
+interactive R console.
+ * This is a great way to start; all your code is saved for later
+ * You will be able to run the file you create from within RStudio
+ or using R's `source()` function.
+
+::::::::::::::::::::::::::::::::::::: callout
+
+## Tip: Running segments of your code
+
+RStudio offers you great flexibility in running code from within the editor window.
+There are buttons, menu choices, and keyboard shortcuts.
+To run the current line, you can:
+1. click on the `Run` button above the editor panel, or
+2. select "Run Lines" from the "Code" menu, or
+3. hit `ctrl`+`return` in Windows or Linux
+or `cmd`+ `return` on OS X.
+(This shortcut can also be seen by hovering
+the mouse over the button). To run a block of code, select it and then `Run`.
+
+:::::::::::::::::::::::::::::::::::::
+
+## Introduction to R
+
+Much of your time in R will be spent in the R interactive console.
+This is where you will run all of your code, and can be a useful environment to try out ideas before adding them to an R script file.
+This console in RStudio is the same as the one you would get if you typed in `R` in your command-line environment.
+
+The first thing you will see in the R interactive session is a bunch of information, followed by a ">" and a blinking cursor.
+In many ways this is similar to the shell environment you learned about during the shell lessons: it operates on the same idea of a "Read, evaluate,
+print loop": you type in commands, R tries to execute them, and then
+returns a result.
+
+
+## Using R-projects
+
+Any data analysis process is naturally incremental, and many projects
+start life as random notes, some code, then a manuscript, and
+eventually everything is a bit mixed together.
+
+
+
+
+Most people tend to organize their projects like this:
+
+
+
+
+There are many reasons why we should *ALWAYS* avoid this:
+
+1. It is really hard to tell which version of your data is
+the original and which is the modified;
+2. It gets really messy because it mixes files with various
+extensions together;
+3. It probably takes you a lot of time to actually find
+things, and relate the correct figures to the exact code
+that has been used to generate it;
+
+A good project layout will ultimately make your life easier:
+
+* It will help ensure the integrity of your data;
+* It makes it simpler to share your code with someone else
+(a lab-mate, collaborator, or supervisor);
+* It allows you to easily upload your code with your manuscript submission;
+* It makes it easier to pick the project back up after a break.
+
+## A possible solution
+
+Fortunately, there are tools and packages which can help you manage your work effectively.
+
+One of the most powerful and useful aspects of RStudio is its project management functionality.
+We'll be using this today to create a self-contained, reproducible
+project.
+
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Challenge 1: Creating a self-contained project
+
+We're going to create a new project in RStudio:
+
+1. Click the "File" menu button, then "New Project".
+2. Click "New Directory".
+3. Click "New Project".
+4. Type in the name of the directory to store your project, e.g. "my_project".
+5. If available, select the checkbox for "Create a git repository."
+6. Click the "Create Project" button.
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+The simplest way to open an RStudio project once it has been created is to click through your file system to get to the directory where it was saved and double click on the `.Rproj` file.
+This will open RStudio and start your R session in the same directory as the `.Rproj` file.
+All your data, plots and scripts will now be relative to the project directory. RStudio projects have the added benefit of allowing you to open multiple projects at the same time each open to its own project directory.
+This allows you to keep multiple projects open without them interfering with each other.
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Challenge 2: Opening an RStudio project through the file system
+
+1. Exit RStudio.
+2. Navigate to the directory where you created a project in Challenge 1.
+3. Double click on the `.Rproj` file in that directory.
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+## Best practices for project organization
+
+Although there is no "best" way to lay out a project, there are some general principles to adhere to that will make project management easier:
+
+### Treat data as read only
+
+This is probably the most important goal of setting up a project. Data is
+typically time consuming and/or expensive to collect.
+Working with them interactively (e.g., in Excel) where they can be modified means you are never sure of where the data came from, or how it has been modified since collection.
+It is therefore a good idea to treat your data as "read-only".
+
+### Data Cleaning
+
+In many cases your data will be "dirty": it will need significant preprocessing to get into a format R (or any other programming language) will find useful.
+This task is sometimes called "data munging".
+Storing these scripts in a separate folder, and creating a second "read-only" data folder to hold the "cleaned" data sets can prevent confusion between the two sets.
+
+### Treat generated output as disposable
+
+Anything generated by your scripts should be treated as disposable: it should all be able to be regenerated from your scripts.
+
+There are lots of different ways to manage this output. Having an output folder with different sub-directories for each separate analysis makes it easier later.
+Since many analyses are exploratory and don't end up being used in the final project, and some of the analyses get shared between projects.
+
+::::::::::::::::::::::::::::::::::::: callout
+
+## Tip: Good Enough Practices for Scientific Computing
+
+[Good Enough Practices for Scientific computing](https://github.com/swcarpentry/good-enough-practices-in-scientific-computing/blob/gh-pages/good-enough-practices-for-scientific-computing.pdf) gives the following recommendations for project organization:
+
+1. Put each project in its own directory, which is named after the project.
+2. Put text documents associated with the project in the `doc` directory.
+3. Put raw data and metadata in the `data` directory, and files generated during cleanup and analysis in a `results` directory.
+4. Put source for the project's scripts and programs in the `src` directory, and programs brought in from elsewhere or compiled locally in the `bin` directory.
+5. Name all files to reflect their content or function.
+
+:::::::::::::::::::::::::::::::::::::
+
+
+### Separate function definition and application
+
+One of the more effective ways to work with R is to start by writing the code you want to run directly in a .R script, and then running the selected lines (either using the keyboard shortcuts in RStudio or clicking the "Run" button) in the interactive R console.
+
+When your project is in its early stages, the initial `.R` script file usually contains many lines of directly executed code.
+Make sure to comment your code, so you know the intention of each bit, and once you have a clearer idea of what you want, tidy up your script so it only contains what is important.
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Challenge 3
+Set up your project folders. For this workshop we will need folders for data, results and scripts.
+
+1. In the bottom right pane of RStudio, click on "Files".
+2. Click on "New folder" and create a folder named `data`
+3. Repeat to create `results` and `scripts`
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Challenge 4
+Download the palmer penguins data and place it in your `data` folder, calling it `penguins.csv`
+
+1. Go to [the raw palmer penguins data](https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv)
+2. Right click in the browser window
+3. Choose "save as..."
+4. Navigate to your project's data folder
+5. Save the file to this location
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: callout
+
+## Tip: command line in RStudio
+
+The Terminal tab in the console pane provides a convenient place directly
+within RStudio to interact directly with the command line.
+:::::::::::::::::::::::::::::::::::::
+
+### Version Control
+
+It is important to use version control with projects.
+Go [here for a good lesson which describes using Git with RStudio](https://swcarpentry.github.io/git-novice/14-supplemental-rstudio/).
+
diff --git a/02-data-visualisation.md b/02-data-visualisation.md
new file mode 100644
index 0000000..93902a3
--- /dev/null
+++ b/02-data-visualisation.md
@@ -0,0 +1,574 @@
+---
+title: "Visualisation with ggplot2"
+teaching: 60
+exercises: 8
+---
+
+:::::::::::::::::::::::::::::::::::::: questions
+
+- How do I access my data in R?
+- How do I visualise my data with ggplot2?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Read data into R
+- To be able to use `ggplot2` to generate publication quality graphics.
+- To understand the basic grammar of graphics, including the aesthetics and geometry layers, adding statistics, transforming scales, and colouring or panelling by groups.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: keypoints
+
+- Read data into R
+- Use ggplot2 to create different types of plots
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+
+
+
+## Motivation
+
+Plotting the data is one of the best ways to quickly explore it and generate hypotheses about various relationships between variables.
+
+There are several plotting systems in R, but today we will focus on `ggplot2` which implements **grammar of graphics** - a coherent system for describing components that constitute visual representation of data.
+For more information regarding principles and thinking behind `ggplot2` graphic system, please refer to [Layered grammar of graphics](https://vita.had.co.nz/papers/layered-grammar.pdf) by Hadley Wickham (@hadleywickham).
+
+The advantage of `ggplot2` is that it allows R users to create publication quality graphics with a few lines of code. `ggplot2` has a large user base and is constantly developed and extended by the community.
+
+## Getting data into R
+We will start by reading the data into R, from the `data` folder you placed them in the last part of the introduction.
+
+
+``` r
+penguins <- read.csv("data/penguins.csv")
+```
+
+This is our first bit of R code to "assign" data to an object in our "R environment".
+The R environment can be seen in the upper right hand corner, and it lists everything R has access to at the moment.
+You should see an object called "penguins", which is a Dataset with 344 observations and 8 variables.
+We created this object with the line of code we just ran.
+You can "read" the line, right to left as:
+"read the penguins.csv into R, and assign (<-) it to an object called penguins".
+The arrow, or assignment, is R's way of creating new objects to work on.
+
+**Note** a key difference from R and programs like SPSS or excel, is that when data is used in R, we do not automatically alter the data in the file we read it from. Everything we do with the penguins data in R from now on, only happens in R, and does not change the originating file. This way we cannot easily accidentally alter our raw data, which is a very good thing.
+
+::::::::::::::::::::::::::::::::::::: keypoints
+## Tip: We can inspect the data in several ways
+
+1. Click the data name in the Environment, and the data opens as a tab in the scripts pane.
+2. Click the little arrow next to the data name in the Evironment, and you'll see a short preview of the data.
+3. Type `penguins` in the R console, and a preview will be shown of the data.
+:::::::::::::::::::::::::::::::::::::
+
+The dataset contains the following fields:
+
+- **species**: penguin species
+- **island**: island of observation
+- **bill_length_mm**: bill length in millimetres
+- **bill_depth_mm**: bill depth in millimetres
+- **flipper_length_mm**: flipper length in millimetres
+- **body_mass_g**: body mass in grams
+- **sex**: penguin sex
+- **year**: year of observation
+
+
+## Introduction to ggplot2
+
+`ggplot2` is a core member of `tidyverse` family of packages. Installing and loading the package under the same name will load all of the packages we will need for this workshop. Lets get started!
+
+
+``` r
+# install.packages("tidyverse")
+library(tidyverse)
+── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
+✔ dplyr 1.1.4 ✔ readr 2.1.5
+✔ forcats 1.0.0 ✔ stringr 1.5.1
+✔ ggplot2 3.5.1 ✔ tibble 3.2.1
+✔ lubridate 1.9.3 ✔ tidyr 1.3.1
+✔ purrr 1.0.2
+── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
+✖ dplyr::filter() masks stats::filter()
+✖ dplyr::lag() masks stats::lag()
+ℹ Use the conflicted package () to force all conflicts to become errors
+```
+
+Here's a question that we would like to answer using `penguins` data: _Do penguins with deep beaks also have long beaks?_ This might seem like a silly question, but it gets us exploring our data.
+
+To plot `penguins`, run the following code in the R-chunk or in console. The following code will put `bill_depth_mm` on the x-axis and `bill_length_mm` on the y-axis:
+
+
+``` r
+ggplot(data = penguins) +
+ geom_point(
+ mapping = aes(x = bill_depth_mm,
+ y = bill_length_mm)
+ )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+Note that we split the function into several lines.
+In R, any function has a name and is followed by parentheses. Inside the parentheses we place any information the function needs to run.
+Here, we are using two main functions, `ggplot()` and `geom_point()`.
+To save screen space, we have placed each function on its own line, and also split up arguments into several lines.
+How this is done depends on you, there are no real rules for this.
+We will use the tidyverse coding style throughout this course, to be consistent and also save space on the screen.
+The plus sign indicates that the ggplot is not over yet and that the next line should be interpreted as additional layer to the preceding `ggplot()` function. In other words, when writing a `ggplot()` function spanning several lines, the `+` sign goes at the end of the line, not in the beginning.
+
+**Note** that in order to create a plot using `ggplot2` system, you should start your command with `ggplot()` function. It creates an empty coordinate system and initializes the dataset to be used in the graph (which is supplied as a first argument into the `ggplot()` function). In order to create graphical representation of the data, we can add one or more layers to our otherwise empty graph. Functions starting with the prefix `geom_` create a visual representation of data. In this case we added scattered points, using `geom_point()` function. There are many `geoms` in `ggplot2`, some of which we will learn in this lesson.
+
+`geom_` functions create _mapping_ of variables from the earlier defined dataset to certain aesthetic elements of the graph, such as axis, shapes or colours. The first argument of any `geom_` function expects the user to specify these mappings, wrapped in the `aes()` (short for _aesthetics_) function. In this case, we mapped `bill_depth_mm` and `bill_length_mm` variables from `penguins` dataset to x and y-axis, respectively (using `x` and `y` arguments of `aes()` function).
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 1a
+How has bill length changed over time? What do you observe?
+
+:::::::::::::::::::::::::::::::::::::::: hint
+
+The* `penguins` *dataset has a column called `year`, which should appear on the x-axis.
+::::::::::::::::::::::::::::::::::::::::
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+ggplot(data = penguins) +
+ geom_point(
+ mapping = aes(x = year,
+ y = bill_length_mm)
+ )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 1b
+Try a different `geom_` function called `geom_jitter`. How is that different from `geom_point`?
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+ggplot(data = penguins) +
+ geom_jitter(
+ mapping = aes(x = year,
+ y = bill_length_mm)
+ )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+
+## Mapping data
+
+What if we want to combine graphs from the previous two challenges and show the relationship between three variables in the same graph? Turns out, we don't necessarily need to use third geometrical dimension, we can employ colour.
+
+The following graph maps `island` variable from `penguins` dataset to the `colour` aesthetic of the plot. Let's take a look:
+
+
+``` r
+ggplot(data = penguins) +
+ geom_jitter(
+ mapping = aes(x = bill_depth_mm,
+ y = bill_length_mm,
+ colour = island)
+ )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 2
+What will happen if you switch colour to also be by year? Is the graph still useful? Why or why not? What is the difference in the plot between when you colour by island and when you colour by year?
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+ggplot(data = penguins) +
+ geom_jitter(
+ mapping = aes(x = bill_depth_mm,
+ y = bill_length_mm,
+ colour = year)
+ )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+Island is categorical character variable with a discrete range of possible values. This, like the data type of factor, is represented with colours by assigning a specific colour to each member of the discrete set. `year` is a continuous numeric variable in which any number of potential values can exist between known values. To represent this, R uses a colour bar with a continuous gradient.
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+
+There are other aesthetics that can come handy. One of them is `size`. The idea is that we can vary the size of data points to illustrate another continuous variable, such as species bill depth. Lets look at four dimensions at once!
+
+
+``` r
+ggplot(data = penguins) +
+ geom_jitter(
+ mapping = aes(x = bill_depth_mm,
+ y = bill_length_mm,
+ colour = species,
+ size = year)
+ )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+It might be even better to try another type of aesthetic, like shape, for categorical data like species.
+
+
+``` r
+ggplot(data = penguins) +
+ geom_jitter(
+ mapping = aes(x = bill_depth_mm,
+ y = bill_length_mm,
+ colour = species,
+ shape = species)
+ )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+Playing around with different aesthetic mappings until you find something that really makes the data "pop" is a good idea. A plot is rarely made nice on the first try, we all try different configurations until we find the one we like.
+
+# Setting values
+
+Until now, we explored different aesthetic properties of a graph mapped to certain variables. What if you want to recolour or use a certain shape to plot all data points? Well, that means that such colour or shape will no longer be *mapped* to any data, so you need to supply it to `geom_` function as a separate argument (outside of the `mapping`).
+This is called "setting" in the ggplot2-world. We "map" aesthetics to data columns, or we "set" single values outside aesthetics to apply to the entire geom or plot.
+Here's our initial graph with all colours coloured in blue.
+
+
+``` r
+ggplot(data = penguins) +
+ geom_point(
+ mapping = aes(x = bill_depth_mm,
+ y = bill_length_mm),
+ colour = "blue"
+ )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+Once more, observe that the colour is now not mapped to any particular variable from the `penguins` dataset and applies equally to all data points, therefore it is outside the `mapping` argument and is not wrapped into `aes()` function. Note that set colours are supplied as characters (in quotes).
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 3
+Change the transparency (alpha) of the data points by year.
+
+:::::::::::::::::::::::::::::::::::::::: hint
+`alpha` takes a value from 0 (transparent) to 1 (solid).
+::::::::::::::::::::::::::::::::::::::::
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+ggplot(data = penguins) +
+ geom_point(
+ mapping = aes(x = bill_depth_mm,
+ y = bill_length_mm,
+ alpha = year)
+ )
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 4
+Move the transparency outside the `aes()` and set it to `0.5`. What can we benefit of each one of these methods?
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+ggplot(data = penguins) +
+ geom_point(
+ mapping = aes(x = bill_depth_mm,
+ y = bill_length_mm),
+ alpha = 0.5)
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+Controlling the transparency can be a great way to "mute" the visual effect of certain data, while still keeping it visible. Its a great tool when you have many data points or if you have several geoms together, like we will see soon.
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+
+
+# Geometrical objects
+
+Next, we will consider different options for `geoms`. Using different `geom_` functions user can highlight different aspects of data.
+
+A useful geom function is `geom_boxplot()`. It adds a layer with the "box and whiskers" plot illustrating the distribution of values within categories. The following chart breaks down bill length by island, where the box represents first and third quartile (the 25th and 75th percentiles), the middle bar signifies the median value and the whiskers extent to cover 95% confidence interval. Outliers (outside of the 95% confidence interval range) are shown separately.
+
+
+``` r
+ggplot(data = penguins) +
+ geom_boxplot(
+ mapping = aes(x = species,
+ y = bill_length_mm)
+ )
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_boxplot()`).
+```
+
+
+
+Layers can be added on top of each other. In the following graph we will place the boxplots **over** jittered points to see the distribution of outliers more clearly. We can map two aesthetic properties to the same variable. Here we will also use different colour for each island.
+
+
+``` r
+ggplot(data = penguins) +
+ geom_jitter(
+ mapping = aes(x = species,
+ y = bill_length_mm,
+ colour = species)
+ ) +
+ geom_boxplot(
+ mapping = aes(x = species,
+ y = bill_length_mm)
+ )
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_boxplot()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+Now, this was slightly inefficient due to duplication of code - we had to specify the same mappings for two layers. To avoid it, you can move common arguments of `geom_` functions to the main `ggplot()` function. In this case every layer will "inherit" the same arguments, specified in the "parent" function.
+
+
+``` r
+ggplot(data = penguins,
+ mapping = aes(x = island,
+ y = bill_length_mm)
+) +
+ geom_jitter(aes(colour = island)) +
+ geom_boxplot(alpha = .6)
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_boxplot()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+You can still add layer-specific mappings or other arguments by specifying them within individual geoms. Here, we've set the transparency of the boxplot to .6, so we can see the points behind it, and also mapped colour to island in the points. We would recommend building each layer separately and then moving common arguments up to the "parent" function.
+
+We can use linear models to highlight differences in dependency between bill length and body mass by island. Notice that we added a separate argument to the `geom_smooth()` function to specify the type of model we want `ggplot2` to built using the data (linear model). The `geom_smooth()` function has also helpfully provided confidence intervals, indicating "goodness of fit" for each model (shaded gray area). For more information on statistical models, please refer to help (by typing `?geom_smooth`)
+
+
+``` r
+ggplot(data = penguins,
+ mapping = aes(x = bill_depth_mm,
+ y = bill_length_mm)
+) +
+ geom_point(alpha = 0.5) +
+ geom_smooth(method = "lm")
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Challenge 5
+Modify the plot so the the points are coloured by island, but there is a single regression line.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+
+## Solution
+
+
+``` r
+ggplot(data = penguins,
+ mapping = aes(x = bill_depth_mm,
+ y = bill_length_mm)) +
+ geom_point(mapping = aes(colour = species),
+ alpha = 0.5) +
+ geom_smooth(method = "lm")
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+In the graph above, each geom inherited all three mappings: x, y and colour. If we want only single linear model to be built, we would need to limit the effect of `colour` aesthetic to only `geom_point()` function, by moving it from the "parent" function to the layer where we want it to apply. Note, though, that because we want the `colour` to be still mapped to the `island` variable, it needs to be wrapped into `aes()` function and supplied to `mapping` argument.
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 6
+Add a regression line to the plot that plots one line for each species, while also plotting one across all species.
+
+:::::::::::::::::::::::::::::::::::::::: hint
+Add another geom!
+::::::::::::::::::::::::::::::::::::::::
+
+:::::::::::::::::::::::::::::::::::::::: solution
+
+## Solution
+
+
+``` r
+ggplot(penguins,
+ aes(x = bill_depth_mm,
+ y = bill_length_mm)) +
+ geom_point(aes(colour = species),
+ alpha = 0.5) +
+ geom_smooth(method = "lm",
+ aes(colour = species)) +
+ geom_smooth(method = "lm",
+ colour = "black")
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+Look at that! The data actually reveals something called the "simpsons paradox". It's when a relationship looks to go in a specific direction, but when looking into groups within the data the relationship is the opposite. Here, the overall relationship between bill length and depths looks negative, but when we take into account that there are different species, the relationship is actually positive.
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+
+## Sub-plots (plot panels)
+
+The last thing we will cover for plots is creating sub-plots.
+Often, we'd like to create the same set of plots, but as distinctly different subplots.
+This way, we dont need to map soo many aesthetics (it can end up being really messy).
+
+Lets say, the last plot we made, we want to understand if there are also differences between male and female penguins.
+In ggplot2, this is called a "facet", and the function we use is called either `facet_wrap` or `facet_grid`.
+
+
+``` r
+ggplot(penguins,
+ aes(x = bill_depth_mm,
+ y = bill_length_mm,
+ colour = species)) +
+ geom_point(alpha = 0.5) +
+ geom_smooth(method = "lm") +
+ facet_wrap(~ sex)
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+The facet's take formula arguments, meaning they contain the `tilde` (~).
+The way often we think about it, trying to "read" the code, is that we facet "over" sex (in this case).
+
+This plot looks a little crazy though, as we have penguins with missing sex information getting their own panel, and really, it makes more sense to compare the sexes within each species rather than the other way around.
+Let us swap the places of species and sex.
+
+
+``` r
+ggplot(penguins,
+ aes(x = bill_depth_mm,
+ y = bill_length_mm,
+ colour = sex)) +
+ geom_point(alpha = 0.5) +
+ geom_smooth(method = "lm") +
+ facet_wrap(~ species)
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+The NA's still look weird, but its definitely better, I think.
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Challenge 7
+To the plot we just made before, try adding another variable to facet by. For instance, facet by species and island.
+
+:::::::::::::::::::::::::::::::::::::::: hint
+Add another facet variable with the `+`
+::::::::::::::::::::::::::::::::::::::::
+
+:::::::::::::::::::::::::::::::::::::::: solution
+
+## Solution
+
+
+``` r
+ggplot(penguins,
+ aes(x = bill_depth_mm,
+ y = bill_length_mm,
+ colour = sex)) +
+ geom_point(alpha = 0.5) +
+ geom_smooth(method = "lm") +
+ facet_wrap(~ species + island)
+`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 2 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 2 rows containing missing values or values outside the scale range
+(`geom_point()`).
+```
+
+
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+## Wrap-up
+
+We learned about different parameters of ggplot functions, and how to combine different geoms into more complex charts.
+
diff --git a/03-data-subsetting.md b/03-data-subsetting.md
new file mode 100644
index 0000000..40c3875
--- /dev/null
+++ b/03-data-subsetting.md
@@ -0,0 +1,876 @@
+---
+title: "Subsetting data with dplyr"
+teaching: 60
+exercises: 12
+---
+
+
+:::::::::::::::::::::::::::::::::::::: questions
+
+- How can I subset the number of columns in my data set?
+- How can I reduce the number of rows in my data set?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Use `select()` to reduce columns
+- Use tidyselectors like `starts_with()` within `select()` to reduce columns
+- Use `filter()` to reduce rows
+- Understand common logical operations using `filter()`
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: keypoints
+
+- Subsetting rows and columns
+- Using tidyselectors
+- Understanding logical operations
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+## Motivation
+
+In many cases, we are working with data sets that contain more data than we need, or we want to inspect certain parts of the data set before we continue.
+Subsetting data sets can be challenging in base R, because there is a fair bit of repetition.
+This can make code difficult to readn and understand.
+
+## The {dplyr} package
+
+The [{dplyr}](https://cran.r-project.org/web/packages/dplyr/index.html) package provides a number of very useful functions for manipulating data sets in a way that will reduce the probability of making errors, and even save you some typing time. As an added bonus, you might even find the {dplyr} grammar easier to read.
+
+We're going to cover 6 of the most commonly used functions as well as using pipes (`|>`) to combine them.
+
+1. `select()` (covered in this session)
+2. `filter()` (covered in this session)
+3. `arrange()` (covered in this session)
+4. `mutate()` (covered in next session)
+5. `group_by()` (covered in Day 2 session)
+6. `summarize()` (covered in Day 2 session)
+
+
+## Selecting columns
+
+Let us first talk about selecting columns. In {dplyr}, the function name for selecting columns is `select()`! Most {tidyverse} function names for functions are inspired by English grammar, which will help us when we are writing our code.
+
+
+
+We first need to make sure we have the tidyverse loaded and the penguins data set at hand.
+
+``` r
+library(tidyverse)
+penguins <- read_csv("data/penguins.csv")
+```
+
+To select data, we must first tell select which data set we are selecting from, and then give it our selection. Here, we are asking R to `select()` from the `penguins` data set the `island`, `species` and `sex` columns
+
+
+``` r
+select(penguins, island, species, sex)
+```
+
+``` output
+# A tibble: 344 × 3
+ island species sex
+
+ 1 Torgersen Adelie male
+ 2 Torgersen Adelie female
+ 3 Torgersen Adelie female
+ 4 Torgersen Adelie
+ 5 Torgersen Adelie female
+ 6 Torgersen Adelie male
+ 7 Torgersen Adelie female
+ 8 Torgersen Adelie male
+ 9 Torgersen Adelie
+10 Torgersen Adelie
+# ℹ 334 more rows
+```
+
+When we use `select()` we don't need to use quotations, we write in the names directly. We can also use the numeric indexes for the column, if we are 100% certain of the order of the columns:
+
+
+``` r
+select(penguins, 1:3, 6)
+```
+
+``` output
+# A tibble: 344 × 4
+ species island bill_length_mm body_mass_g
+
+ 1 Adelie Torgersen 39.1 3750
+ 2 Adelie Torgersen 39.5 3800
+ 3 Adelie Torgersen 40.3 3250
+ 4 Adelie Torgersen NA NA
+ 5 Adelie Torgersen 36.7 3450
+ 6 Adelie Torgersen 39.3 3650
+ 7 Adelie Torgersen 38.9 3625
+ 8 Adelie Torgersen 39.2 4675
+ 9 Adelie Torgersen 34.1 3475
+10 Adelie Torgersen 42 4250
+# ℹ 334 more rows
+```
+
+In some cases, we want to remove columns, and not necessarily state all columns we want to keep.
+Select also allows for this by adding a minus (`-`) sign in front of the column name you don't want.
+
+
+``` r
+select(penguins, -bill_length_mm, -bill_depth_mm)
+```
+
+``` output
+# A tibble: 344 × 6
+ species island flipper_length_mm body_mass_g sex year
+
+ 1 Adelie Torgersen 181 3750 male 2007
+ 2 Adelie Torgersen 186 3800 female 2007
+ 3 Adelie Torgersen 195 3250 female 2007
+ 4 Adelie Torgersen NA NA 2007
+ 5 Adelie Torgersen 193 3450 female 2007
+ 6 Adelie Torgersen 190 3650 male 2007
+ 7 Adelie Torgersen 181 3625 female 2007
+ 8 Adelie Torgersen 195 4675 male 2007
+ 9 Adelie Torgersen 193 3475 2007
+10 Adelie Torgersen 190 4250 2007
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 1
+Select the columns sex, year, and species from the penguins dataset.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+select(penguins, sex, year, species)
+```
+
+``` output
+# A tibble: 344 × 3
+ sex year species
+
+ 1 male 2007 Adelie
+ 2 female 2007 Adelie
+ 3 female 2007 Adelie
+ 4 2007 Adelie
+ 5 female 2007 Adelie
+ 6 male 2007 Adelie
+ 7 female 2007 Adelie
+ 8 male 2007 Adelie
+ 9 2007 Adelie
+10 2007 Adelie
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Challenge 2
+Change your selection so that species comes before sex. What is the difference in the output?
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+select(penguins, species, sex, year)
+```
+
+``` output
+# A tibble: 344 × 3
+ species sex year
+
+ 1 Adelie male 2007
+ 2 Adelie female 2007
+ 3 Adelie female 2007
+ 4 Adelie 2007
+ 5 Adelie female 2007
+ 6 Adelie male 2007
+ 7 Adelie female 2007
+ 8 Adelie male 2007
+ 9 Adelie 2007
+10 Adelie 2007
+# ℹ 334 more rows
+```
+select does not only subset columns, but it can also re-arrange them. The columns appear in the order your selection is specified.
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+### Tidy selections
+
+These selections are quite convenient and fast! But they can be even better.
+
+For instance, what if we want to choose all the columns with millimeter measurements? That could be quite convenient, making sure the variables we are working with have the same measurement scale.
+
+We could of course type them all out, but the penguins data set has names that make it even easier for us, using something called tidy-selectors.
+
+Here, we use a tidy-selector `ends_with()`, can you guess what it does? yes, it looks for columns that end with the string you provide it, here `"mm"`.
+
+
+``` r
+select(penguins, ends_with("mm"))
+```
+
+``` output
+# A tibble: 344 × 3
+ bill_length_mm bill_depth_mm flipper_length_mm
+
+ 1 39.1 18.7 181
+ 2 39.5 17.4 186
+ 3 40.3 18 195
+ 4 NA NA NA
+ 5 36.7 19.3 193
+ 6 39.3 20.6 190
+ 7 38.9 17.8 181
+ 8 39.2 19.6 195
+ 9 34.1 18.1 193
+10 42 20.2 190
+# ℹ 334 more rows
+```
+
+So convenient! There are several other tidy-selectors you can choose, [which you can find here](https://dplyr.tidyverse.org/reference/select.html), but often people resort to three specific ones:
+
+- `ends_with()` - column names ending with a character string
+- `starts_with()` - column names starting with a character string
+- `contains()` - column names containing a character string
+
+If you are working with a well named data set, these functions should make your data selecting much simpler. And if you are making your own data, you can think of such convenient naming for your data, so your work can be easier for you and others.
+
+Lets only pick the measurements of the bill, we are not so interested in the flipper. Then we might want to change to `starts_with()` in stead.
+
+
+``` r
+select(penguins, starts_with("bill"))
+```
+
+``` output
+# A tibble: 344 × 2
+ bill_length_mm bill_depth_mm
+
+ 1 39.1 18.7
+ 2 39.5 17.4
+ 3 40.3 18
+ 4 NA NA
+ 5 36.7 19.3
+ 6 39.3 20.6
+ 7 38.9 17.8
+ 8 39.2 19.6
+ 9 34.1 18.1
+10 42 20.2
+# ℹ 334 more rows
+```
+
+The tidy selector can be combined with each other and other selectors. So you can build exactly the data you want!
+
+
+``` r
+select(penguins, island, species, year, starts_with("bill"))
+```
+
+``` output
+# A tibble: 344 × 5
+ island species year bill_length_mm bill_depth_mm
+
+ 1 Torgersen Adelie 2007 39.1 18.7
+ 2 Torgersen Adelie 2007 39.5 17.4
+ 3 Torgersen Adelie 2007 40.3 18
+ 4 Torgersen Adelie 2007 NA NA
+ 5 Torgersen Adelie 2007 36.7 19.3
+ 6 Torgersen Adelie 2007 39.3 20.6
+ 7 Torgersen Adelie 2007 38.9 17.8
+ 8 Torgersen Adelie 2007 39.2 19.6
+ 9 Torgersen Adelie 2007 34.1 18.1
+10 Torgersen Adelie 2007 42 20.2
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 3
+Select all columns containing an underscore ("_").
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+select(penguins, contains("_"))
+```
+
+``` output
+# A tibble: 344 × 4
+ bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 39.1 18.7 181 3750
+ 2 39.5 17.4 186 3800
+ 3 40.3 18 195 3250
+ 4 NA NA NA NA
+ 5 36.7 19.3 193 3450
+ 6 39.3 20.6 190 3650
+ 7 38.9 17.8 181 3625
+ 8 39.2 19.6 195 4675
+ 9 34.1 18.1 193 3475
+10 42 20.2 190 4250
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Challenge 4
+Select the species and sex columns, in addition to all columns ending with "mm"
+
+:::::::::::::::::::::::::::::::::::::::: solution
+
+## Solution
+
+
+``` r
+select(penguins, species, sex, ends_with("mm"))
+```
+
+``` output
+# A tibble: 344 × 5
+ species sex bill_length_mm bill_depth_mm flipper_length_mm
+
+ 1 Adelie male 39.1 18.7 181
+ 2 Adelie female 39.5 17.4 186
+ 3 Adelie female 40.3 18 195
+ 4 Adelie NA NA NA
+ 5 Adelie female 36.7 19.3 193
+ 6 Adelie male 39.3 20.6 190
+ 7 Adelie female 38.9 17.8 181
+ 8 Adelie male 39.2 19.6 195
+ 9 Adelie 34.1 18.1 193
+10 Adelie 42 20.2 190
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 5
+De-select all the columns with bill measurements
+
+:::::::::::::::::::::::::::::::::::::::: solution
+
+## Solution
+
+
+``` r
+select(penguins, -starts_with("bill"))
+```
+
+``` output
+# A tibble: 344 × 6
+ species island flipper_length_mm body_mass_g sex year
+
+ 1 Adelie Torgersen 181 3750 male 2007
+ 2 Adelie Torgersen 186 3800 female 2007
+ 3 Adelie Torgersen 195 3250 female 2007
+ 4 Adelie Torgersen NA NA 2007
+ 5 Adelie Torgersen 193 3450 female 2007
+ 6 Adelie Torgersen 190 3650 male 2007
+ 7 Adelie Torgersen 181 3625 female 2007
+ 8 Adelie Torgersen 195 4675 male 2007
+ 9 Adelie Torgersen 193 3475 2007
+10 Adelie Torgersen 190 4250 2007
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+### Tidy selections with `where`
+
+The last tidy-selector we'll mention is `where()`. `where()` is a very special tidy selector, that uses logical evaluations to select the data. Let's have a look at it in action, and see if we can explain it better that way.
+
+Say you are running a correlation analysis. For correlations, you need all the columns in your data to be numeric, as you cannot correlate strings or categories. Going through each individual column and seeing if it is numeric is a bit of a chore. That is where `where()` comes in!
+
+
+``` r
+select(penguins, where(is.numeric))
+```
+
+``` output
+# A tibble: 344 × 5
+ bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year
+
+ 1 39.1 18.7 181 3750 2007
+ 2 39.5 17.4 186 3800 2007
+ 3 40.3 18 195 3250 2007
+ 4 NA NA NA NA 2007
+ 5 36.7 19.3 193 3450 2007
+ 6 39.3 20.6 190 3650 2007
+ 7 38.9 17.8 181 3625 2007
+ 8 39.2 19.6 195 4675 2007
+ 9 34.1 18.1 193 3475 2007
+10 42 20.2 190 4250 2007
+# ℹ 334 more rows
+```
+
+Magic! Let's break that down.
+`is.numeric()` is a function in R that checks if a vector is numeric. If the vector is numeric, it returns `TRUE` if not it returns `FALSE`.
+
+
+``` r
+is.numeric(5)
+```
+
+``` output
+[1] TRUE
+```
+
+``` r
+is.numeric("something")
+```
+
+``` output
+[1] FALSE
+```
+
+Let us look at the penguins data set again
+
+``` r
+penguins
+```
+
+``` output
+# A tibble: 344 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Adelie Torgersen 39.1 18.7 181 3750
+ 2 Adelie Torgersen 39.5 17.4 186 3800
+ 3 Adelie Torgersen 40.3 18 195 3250
+ 4 Adelie Torgersen NA NA NA NA
+ 5 Adelie Torgersen 36.7 19.3 193 3450
+ 6 Adelie Torgersen 39.3 20.6 190 3650
+ 7 Adelie Torgersen 38.9 17.8 181 3625
+ 8 Adelie Torgersen 39.2 19.6 195 4675
+ 9 Adelie Torgersen 34.1 18.1 193 3475
+10 Adelie Torgersen 42 20.2 190 4250
+# ℹ 334 more rows
+# ℹ 2 more variables: sex , year
+```
+
+The penguins data is stored as a `tibble`, which is a special kind of data set in R that gives a nice print out of the data.
+Notice, right below the column name, there is some information in `<>` marks. This tells us the class of the columns.
+Species and island are factors, while bill columns are "double" which is a decimal numeric class.
+
+`where()` goes through all the columns and checks if they are numeric, and returns the ones that are.
+
+
+``` r
+select(penguins, where(is.numeric))
+```
+
+``` output
+# A tibble: 344 × 5
+ bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year
+
+ 1 39.1 18.7 181 3750 2007
+ 2 39.5 17.4 186 3800 2007
+ 3 40.3 18 195 3250 2007
+ 4 NA NA NA NA 2007
+ 5 36.7 19.3 193 3450 2007
+ 6 39.3 20.6 190 3650 2007
+ 7 38.9 17.8 181 3625 2007
+ 8 39.2 19.6 195 4675 2007
+ 9 34.1 18.1 193 3475 2007
+10 42 20.2 190 4250 2007
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 6
+Select only the columns that are factors from the `penguins` data set.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+select(penguins, where(is.factor))
+```
+
+``` output
+# A tibble: 344 × 3
+ species island sex
+
+ 1 Adelie Torgersen male
+ 2 Adelie Torgersen female
+ 3 Adelie Torgersen female
+ 4 Adelie Torgersen
+ 5 Adelie Torgersen female
+ 6 Adelie Torgersen male
+ 7 Adelie Torgersen female
+ 8 Adelie Torgersen male
+ 9 Adelie Torgersen
+10 Adelie Torgersen
+# ℹ 334 more rows
+```
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Challenge 7
+Select the columns `island`, `species`, as well as all numeric columns from the `penguins` data set.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+select(penguins, island, species, where(is.numeric))
+```
+
+``` output
+# A tibble: 344 × 7
+ island species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Torgersen Adelie 39.1 18.7 181 3750
+ 2 Torgersen Adelie 39.5 17.4 186 3800
+ 3 Torgersen Adelie 40.3 18 195 3250
+ 4 Torgersen Adelie NA NA NA NA
+ 5 Torgersen Adelie 36.7 19.3 193 3450
+ 6 Torgersen Adelie 39.3 20.6 190 3650
+ 7 Torgersen Adelie 38.9 17.8 181 3625
+ 8 Torgersen Adelie 39.2 19.6 195 4675
+ 9 Torgersen Adelie 34.1 18.1 193 3475
+10 Torgersen Adelie 42 20.2 190 4250
+# ℹ 334 more rows
+# ℹ 1 more variable: year
+```
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+
+## Filtering rows
+
+Now that we know how to select the columns we want, we should take a look at how we filter the rows.
+Row filtering is done with the function `filter()`, which takes statements that can be evaluated to `TRUE` or `FALSE`.
+
+
+
+What do we mean with statements that can be evaluated to `TRUE` or `FALSE`?
+In the example with `where()` we used the `is.numeric` function to evaluate if the columns where numeric or not. We will be doing the same for rows!
+
+Now, using `is.numeric` on a row won't help, because every row-value in a column will be of the same type, that is how the data set works. All values in a column must be of the same type.
+
+So what can we do? Well, we can check if the values meet certain criteria or not. Like values being above 20, or factors being a specific factor.
+
+
+``` r
+filter(penguins, body_mass_g < 3000)
+```
+
+``` output
+# A tibble: 9 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+1 Adelie Dream 37.5 18.9 179 2975
+2 Adelie Biscoe 34.5 18.1 187 2900
+3 Adelie Biscoe 36.5 16.6 181 2850
+4 Adelie Biscoe 36.4 17.1 184 2850
+5 Adelie Dream 33.1 16.1 178 2900
+6 Adelie Biscoe 37.9 18.6 193 2925
+7 Adelie Torgersen 38.6 17 188 2900
+8 Chinstrap Dream 43.2 16.6 187 2900
+9 Chinstrap Dream 46.9 16.6 192 2700
+# ℹ 2 more variables: sex , year
+```
+
+Here, we've filtered so that we only have observations where the body mass was less than 3 kilos.
+We can also filter for specific values, but beware! you must use double equals (`==`) for comparisons, as single equals (`=`) are for argument names in functions.
+
+
+``` r
+filter(penguins, body_mass_g == 2900)
+```
+
+``` output
+# A tibble: 4 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+1 Adelie Biscoe 34.5 18.1 187 2900
+2 Adelie Dream 33.1 16.1 178 2900
+3 Adelie Torgersen 38.6 17 188 2900
+4 Chinstrap Dream 43.2 16.6 187 2900
+# ℹ 2 more variables: sex , year
+```
+
+What is happening, is that R will check if the values in `body_mass_g` are the same as 2900 (`TRUE`) or not (`FALSE`), and will do this for every row in the data set. Then at the end, it will discard all those that are `FALSE`, and keep those that are `TRUE`.
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 8
+Filter the data so you only have observations from the "Dream" island.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+
+## Solution
+
+
+``` r
+filter(penguins, island == "Dream")
+```
+
+``` output
+# A tibble: 124 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Adelie Dream 39.5 16.7 178 3250
+ 2 Adelie Dream 37.2 18.1 178 3900
+ 3 Adelie Dream 39.5 17.8 188 3300
+ 4 Adelie Dream 40.9 18.9 184 3900
+ 5 Adelie Dream 36.4 17 195 3325
+ 6 Adelie Dream 39.2 21.1 196 4150
+ 7 Adelie Dream 38.8 20 190 3950
+ 8 Adelie Dream 42.2 18.5 180 3550
+ 9 Adelie Dream 37.6 19.3 181 3300
+10 Adelie Dream 39.8 19.1 184 4650
+# ℹ 114 more rows
+# ℹ 2 more variables: sex , year
+```
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 9
+Filter the data so you only have observations after 2008
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+filter(penguins, year >= 2008)
+```
+
+``` output
+# A tibble: 234 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Adelie Biscoe 39.6 17.7 186 3500
+ 2 Adelie Biscoe 40.1 18.9 188 4300
+ 3 Adelie Biscoe 35 17.9 190 3450
+ 4 Adelie Biscoe 42 19.5 200 4050
+ 5 Adelie Biscoe 34.5 18.1 187 2900
+ 6 Adelie Biscoe 41.4 18.6 191 3700
+ 7 Adelie Biscoe 39 17.5 186 3550
+ 8 Adelie Biscoe 40.6 18.8 193 3800
+ 9 Adelie Biscoe 36.5 16.6 181 2850
+10 Adelie Biscoe 37.6 19.1 194 3750
+# ℹ 224 more rows
+# ℹ 2 more variables: sex , year
+```
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+### Multiple filters
+
+Many times, we will want to have several filters applied at once. What if you only want Adelie penguins that are below 3 kilos?
+`filter()` can take as many statements as you want! Combine them by adding commas (,) between each statement, and that will work as 'and'.
+
+
+``` r
+filter(penguins,
+ species == "Chinstrap",
+ body_mass_g < 3000)
+```
+
+``` output
+# A tibble: 2 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+1 Chinstrap Dream 43.2 16.6 187 2900
+2 Chinstrap Dream 46.9 16.6 192 2700
+# ℹ 2 more variables: sex , year
+```
+
+You can also use the `&` sign, which in R is the comparison character for 'and', like `==` is for 'equals'.
+
+``` r
+filter(penguins,
+ species == "Chinstrap" &
+ body_mass_g < 3000)
+```
+
+``` output
+# A tibble: 2 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+1 Chinstrap Dream 43.2 16.6 187 2900
+2 Chinstrap Dream 46.9 16.6 192 2700
+# ℹ 2 more variables: sex , year
+```
+
+Here we are filtering the penguins data set keeping only the species "Chinstrap" **and** those below 3.5 kilos.
+And we can keep going!
+
+
+``` r
+filter(penguins,
+ species == "Chinstrap",
+ body_mass_g < 3000,
+ sex == "male")
+```
+
+``` output
+# A tibble: 0 × 8
+# ℹ 8 variables: species , island , bill_length_mm ,
+# bill_depth_mm , flipper_length_mm , body_mass_g , sex ,
+# year
+```
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 10
+Filter the data so you only have observations after 2008, and from "Biscoe" island
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+filter(penguins,
+ year >= 2008,
+ island == "Biscoe")
+```
+
+``` output
+# A tibble: 124 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Adelie Biscoe 39.6 17.7 186 3500
+ 2 Adelie Biscoe 40.1 18.9 188 4300
+ 3 Adelie Biscoe 35 17.9 190 3450
+ 4 Adelie Biscoe 42 19.5 200 4050
+ 5 Adelie Biscoe 34.5 18.1 187 2900
+ 6 Adelie Biscoe 41.4 18.6 191 3700
+ 7 Adelie Biscoe 39 17.5 186 3550
+ 8 Adelie Biscoe 40.6 18.8 193 3800
+ 9 Adelie Biscoe 36.5 16.6 181 2850
+10 Adelie Biscoe 37.6 19.1 194 3750
+# ℹ 114 more rows
+# ℹ 2 more variables: sex , year
+```
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 11
+Filter the data so you only have observations of male penguins of the Chinstrap species
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+filter(penguins,
+ sex == "male",
+ species == "Chinstrap")
+```
+
+``` output
+# A tibble: 34 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Chinstrap Dream 50 19.5 196 3900
+ 2 Chinstrap Dream 51.3 19.2 193 3650
+ 3 Chinstrap Dream 52.7 19.8 197 3725
+ 4 Chinstrap Dream 51.3 18.2 197 3750
+ 5 Chinstrap Dream 51.3 19.9 198 3700
+ 6 Chinstrap Dream 51.7 20.3 194 3775
+ 7 Chinstrap Dream 52 18.1 201 4050
+ 8 Chinstrap Dream 50.5 19.6 201 4050
+ 9 Chinstrap Dream 50.3 20 197 3300
+10 Chinstrap Dream 49.2 18.2 195 4400
+# ℹ 24 more rows
+# ℹ 2 more variables: sex , year
+```
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+### The difference between `&` (and) and `|`(or)
+
+But what if we want all the Chinstrap penguins **or** if body mass is below 3 kilos? When we use the comma (or the &), we make sure that all statements are `TRUE`. But what if we want it so that _either_ statement is true? Then we can use the **or** character `|` .
+
+
+``` r
+filter(penguins,
+ species == "Chinstrap" |
+ body_mass_g < 3000)
+```
+
+``` output
+# A tibble: 75 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Adelie Dream 37.5 18.9 179 2975
+ 2 Adelie Biscoe 34.5 18.1 187 2900
+ 3 Adelie Biscoe 36.5 16.6 181 2850
+ 4 Adelie Biscoe 36.4 17.1 184 2850
+ 5 Adelie Dream 33.1 16.1 178 2900
+ 6 Adelie Biscoe 37.9 18.6 193 2925
+ 7 Adelie Torgers… 38.6 17 188 2900
+ 8 Chinstrap Dream 46.5 17.9 192 3500
+ 9 Chinstrap Dream 50 19.5 196 3900
+10 Chinstrap Dream 51.3 19.2 193 3650
+# ℹ 65 more rows
+# ℹ 2 more variables: sex , year
+```
+
+This now gives us both all chinstrap penguins, and the smallest Adelie penguins!
+By combining AND and OR statements this way, we can slowly create the filtering we are after.
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 12
+Filter the data so you only have observations of either male penguins or the Chinstrap species
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+filter(penguins,
+ sex == "male" |
+ species == "Chinstrap")
+```
+
+``` output
+# A tibble: 202 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Adelie Torgersen 39.1 18.7 181 3750
+ 2 Adelie Torgersen 39.3 20.6 190 3650
+ 3 Adelie Torgersen 39.2 19.6 195 4675
+ 4 Adelie Torgersen 38.6 21.2 191 3800
+ 5 Adelie Torgersen 34.6 21.1 198 4400
+ 6 Adelie Torgersen 42.5 20.7 197 4500
+ 7 Adelie Torgersen 46 21.5 194 4200
+ 8 Adelie Biscoe 37.7 18.7 180 3600
+ 9 Adelie Biscoe 38.2 18.1 185 3950
+10 Adelie Biscoe 38.8 17.2 180 3800
+# ℹ 192 more rows
+# ℹ 2 more variables: sex , year
+```
+
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+
+# Wrap-up
+Now we've learned about subsetting our data, so we can create data sets that are suited to our needs.
diff --git a/04-data-sorting-pipes.md b/04-data-sorting-pipes.md
new file mode 100644
index 0000000..5bcf77a
--- /dev/null
+++ b/04-data-sorting-pipes.md
@@ -0,0 +1,636 @@
+---
+title: "Data sorting and pipes dplyr"
+teaching: 60
+exercises: 7
+---
+
+
+
+:::::::::::::::::::::::::::::::::::::: questions
+
+- How can I sort the rows in my data?
+- How can I avoid storing intermediate data objects?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Use `arrange()` to sort rows
+- Use the pipe `|>` to chain commands together
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+## Motivation
+
+Getting an overview of our data can be challenging. Breaking it up in smaller pieces can help us get a better understanding of its content.
+Being able to subset data is one part of that, another is to be able to re-arrange rows to get a clearer idea of their content.
+
+
+## Creating subsetted objects
+
+So far, we have kept working on the penguins data set, without actually altering it. So far, all our actions have been executed, then forgotten by R. Like it never happened. This is actually quite smart, since it makes it harder to do mistakes you can have difficulties changing.
+
+To store the changes, we have to "assign" the data to a new object in the R environment. Like the penguins data set, which already is an object in our environment we have called "penguins".
+
+We will now store a filtered version including only the chinstrap penguins, in an object we call `chinstraps`.
+
+
+``` r
+chinstraps <- filter(penguins, species == "Chinstrap")
+```
+
+You will likely notice that when we execute this command, nothing is output to the console. That is expected. When we assign the output of a function somewhere, and everything works (*i.e.*, no errors or warnings), nothing happens in the console.
+
+But you should be able to see the new chinstraps object in your environment, and when we type `chinstraps` in the R console, it prints our chinstraps data.
+
+
+``` r
+chinstraps
+```
+
+``` output
+# A tibble: 68 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Chinstrap Dream 46.5 17.9 192 3500
+ 2 Chinstrap Dream 50 19.5 196 3900
+ 3 Chinstrap Dream 51.3 19.2 193 3650
+ 4 Chinstrap Dream 45.4 18.7 188 3525
+ 5 Chinstrap Dream 52.7 19.8 197 3725
+ 6 Chinstrap Dream 45.2 17.8 198 3950
+ 7 Chinstrap Dream 46.1 18.2 178 3250
+ 8 Chinstrap Dream 51.3 18.2 197 3750
+ 9 Chinstrap Dream 46 18.9 195 4150
+10 Chinstrap Dream 51.3 19.9 198 3700
+# ℹ 58 more rows
+# ℹ 2 more variables: sex , year
+```
+
+Maybe in this chinstrap data we are also not interested in the bill measurements, so we want to remove them.
+
+
+``` r
+chinstraps <- select(chinstraps, -starts_with("bill"))
+chinstraps
+```
+
+``` output
+# A tibble: 68 × 6
+ species island flipper_length_mm body_mass_g sex year
+
+ 1 Chinstrap Dream 192 3500 female 2007
+ 2 Chinstrap Dream 196 3900 male 2007
+ 3 Chinstrap Dream 193 3650 male 2007
+ 4 Chinstrap Dream 188 3525 female 2007
+ 5 Chinstrap Dream 197 3725 male 2007
+ 6 Chinstrap Dream 198 3950 female 2007
+ 7 Chinstrap Dream 178 3250 female 2007
+ 8 Chinstrap Dream 197 3750 male 2007
+ 9 Chinstrap Dream 195 4150 female 2007
+10 Chinstrap Dream 198 3700 male 2007
+# ℹ 58 more rows
+```
+Now our data has two less columns, and many fewer rows. A simpler data set for us to work with. But assigning the chinstrap data twice like this is a lot of typing, and there is a simpler way, using something we call the "pipe".
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Challenge 1
+Create a new data set called "biscoe", where you only have data from "Biscoe" island, and where you only have the first 4 columns of data.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+ biscoe <- filter(penguins, island == "Biscoe")
+ biscoe <- select(biscoe, 1:4)
+```
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+## The pipe `|>`
+
+We often want to string together series of functions. This is achieved using pipe operator `|>`. This takes the value on the left, and passes it as the first argument to the function call on the right.
+
+`|>` is not limited to {dplyr} functions. It's an alternative way of writing any R code.
+
+You can enable the pipe in RStudio by going to Tools -> Global options -> Code -> Use native pipe operator.
+
+The shortcut to insert the pipe operator is `Ctrl`+`Shift`+`M` for Windows/Linux, and `Cmd`+`Shift`+`M` for Mac.
+
+In the `chinstraps` example, we had the following code to filter the rows and then select our columns.
+
+
+``` r
+chinstraps <- filter(penguins, species == "Chinstrap")
+chinstraps <- select(chinstraps, -starts_with("bill"))
+```
+
+Here we first create the chinstraps data from the filtered penguins data set. Then use that chinstraps data to reduce the columns and write it again back to the same chinstraps object.
+It's a little messy. With the pipe, we can make it more streamlined.
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> assign to the "chinstraps" object,
+> taking the penguins dataset, and then
+> filtering the species column so we only have Chinstraps, and then
+> selecting away all columns that start with the string "bill"
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+chinstraps <- penguins |>
+ filter(species == "Chinstrap") |>
+ select(-starts_with("bill"))
+```
+
+
+The end result is the same, but there is less typing and we can "read" the pipeline of data subsetting more like language, if we know how. You can read the pipe operator as **"and then"**.
+
+So if we translate the code above to human language we could read it as:
+
+take the penguins data set, and then
+keep only rows for the chinstrap penguins, and then
+remove the columns starting with bill
+and assign the end result to chinstraps.
+
+Learning to read pipes is a great skill, R is not the only programming language that can do this (though the operator is different between languages, the functionality exists in many).
+
+We can do the entire pipe chain step by step to see what is happening.
+
+
+``` r
+penguins
+```
+
+``` output
+# A tibble: 344 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Adelie Torgersen 39.1 18.7 181 3750
+ 2 Adelie Torgersen 39.5 17.4 186 3800
+ 3 Adelie Torgersen 40.3 18 195 3250
+ 4 Adelie Torgersen NA NA NA NA
+ 5 Adelie Torgersen 36.7 19.3 193 3450
+ 6 Adelie Torgersen 39.3 20.6 190 3650
+ 7 Adelie Torgersen 38.9 17.8 181 3625
+ 8 Adelie Torgersen 39.2 19.6 195 4675
+ 9 Adelie Torgersen 34.1 18.1 193 3475
+10 Adelie Torgersen 42 20.2 190 4250
+# ℹ 334 more rows
+# ℹ 2 more variables: sex , year
+```
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+penguins |>
+ filter(species == "Chinstrap")
+```
+
+``` output
+# A tibble: 68 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Chinstrap Dream 46.5 17.9 192 3500
+ 2 Chinstrap Dream 50 19.5 196 3900
+ 3 Chinstrap Dream 51.3 19.2 193 3650
+ 4 Chinstrap Dream 45.4 18.7 188 3525
+ 5 Chinstrap Dream 52.7 19.8 197 3725
+ 6 Chinstrap Dream 45.2 17.8 198 3950
+ 7 Chinstrap Dream 46.1 18.2 178 3250
+ 8 Chinstrap Dream 51.3 18.2 197 3750
+ 9 Chinstrap Dream 46 18.9 195 4150
+10 Chinstrap Dream 51.3 19.9 198 3700
+# ℹ 58 more rows
+# ℹ 2 more variables: sex , year
+```
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset, and then
+> filtering the species column so we only have Chinstraps
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+penguins |>
+ filter(species == "Chinstrap") |>
+ select(-starts_with("bill"))
+```
+
+``` output
+# A tibble: 68 × 6
+ species island flipper_length_mm body_mass_g sex year
+
+ 1 Chinstrap Dream 192 3500 female 2007
+ 2 Chinstrap Dream 196 3900 male 2007
+ 3 Chinstrap Dream 193 3650 male 2007
+ 4 Chinstrap Dream 188 3525 female 2007
+ 5 Chinstrap Dream 197 3725 male 2007
+ 6 Chinstrap Dream 198 3950 female 2007
+ 7 Chinstrap Dream 178 3250 female 2007
+ 8 Chinstrap Dream 197 3750 male 2007
+ 9 Chinstrap Dream 195 4150 female 2007
+10 Chinstrap Dream 198 3700 male 2007
+# ℹ 58 more rows
+```
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset, and then
+> filtering the species column so we only have Chinstraps, and then
+> selecting away all columns that start with the string "bill"
+
+::::::::::::::::::::::::::::::
+
+So, for each chain step, the output of the previous step is fed into the next step, and that way the commands build on each other until a final end result is made.
+
+And as before, we still are seeing the output of the command chain in the console, meaning we are not storing it.
+Let us do that, again using the assignment.
+
+
+``` r
+chinstraps <- penguins |>
+ filter(species == "Chinstrap") |>
+ select(-starts_with("bill"))
+
+chinstraps
+```
+
+``` output
+# A tibble: 68 × 6
+ species island flipper_length_mm body_mass_g sex year
+
+ 1 Chinstrap Dream 192 3500 female 2007
+ 2 Chinstrap Dream 196 3900 male 2007
+ 3 Chinstrap Dream 193 3650 male 2007
+ 4 Chinstrap Dream 188 3525 female 2007
+ 5 Chinstrap Dream 197 3725 male 2007
+ 6 Chinstrap Dream 198 3950 female 2007
+ 7 Chinstrap Dream 178 3250 female 2007
+ 8 Chinstrap Dream 197 3750 male 2007
+ 9 Chinstrap Dream 195 4150 female 2007
+10 Chinstrap Dream 198 3700 male 2007
+# ℹ 58 more rows
+```
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 2
+Create a new data set called "biscoe", where you only have data from "Biscoe" island, and where you only have the first 4 columns of data. This time use the pipe.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+penguins |>
+ filter(island == "Biscoe") |>
+ select(1:4)
+```
+
+``` output
+# A tibble: 168 × 4
+ species island bill_length_mm bill_depth_mm
+
+ 1 Adelie Biscoe 37.8 18.3
+ 2 Adelie Biscoe 37.7 18.7
+ 3 Adelie Biscoe 35.9 19.2
+ 4 Adelie Biscoe 38.2 18.1
+ 5 Adelie Biscoe 38.8 17.2
+ 6 Adelie Biscoe 35.3 18.9
+ 7 Adelie Biscoe 40.6 18.6
+ 8 Adelie Biscoe 40.5 17.9
+ 9 Adelie Biscoe 37.9 18.6
+10 Adelie Biscoe 40.5 18.9
+# ℹ 158 more rows
+```
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+## Sorting rows
+So far, we have looked at subsetting the data. But some times, we want to reorganize the data without altering it. In tables, we are used to be able to sort columns in ascending or descending order.
+
+This can also be done with {dplyr}'s `arrange()` function. arrange does not alter the data *per se*, just the order in which the rows are stored.
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset, and then
+> arrainging the rows by the island column
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+penguins |>
+ arrange(island)
+```
+
+``` output
+# A tibble: 344 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Adelie Biscoe 37.8 18.3 174 3400
+ 2 Adelie Biscoe 37.7 18.7 180 3600
+ 3 Adelie Biscoe 35.9 19.2 189 3800
+ 4 Adelie Biscoe 38.2 18.1 185 3950
+ 5 Adelie Biscoe 38.8 17.2 180 3800
+ 6 Adelie Biscoe 35.3 18.9 187 3800
+ 7 Adelie Biscoe 40.6 18.6 183 3550
+ 8 Adelie Biscoe 40.5 17.9 187 3200
+ 9 Adelie Biscoe 37.9 18.6 172 3150
+10 Adelie Biscoe 40.5 18.9 180 3950
+# ℹ 334 more rows
+# ℹ 2 more variables: sex , year
+```
+
+Here we have sorted the data by the island column. Since island is a factor, it will order by the facor levels, which in this case has Biscoe island as the first category.
+If we sort a numeric column, it will sort by numeric value.
+
+By default, arrange sorts in ascending order. If you want it sorted by descending order, wrap the column name in `desc()`
+
+::::::::::::::::::: instructor
+When reading this part, read it as follows when typing:
+
+> taking the penguins dataset, and then
+> arrainging the rows by the island column in descending order
+
+::::::::::::::::::::::::::::::
+
+
+``` r
+penguins |>
+ arrange(desc(island))
+```
+
+``` output
+# A tibble: 344 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Adelie Torgersen 39.1 18.7 181 3750
+ 2 Adelie Torgersen 39.5 17.4 186 3800
+ 3 Adelie Torgersen 40.3 18 195 3250
+ 4 Adelie Torgersen NA NA NA NA
+ 5 Adelie Torgersen 36.7 19.3 193 3450
+ 6 Adelie Torgersen 39.3 20.6 190 3650
+ 7 Adelie Torgersen 38.9 17.8 181 3625
+ 8 Adelie Torgersen 39.2 19.6 195 4675
+ 9 Adelie Torgersen 34.1 18.1 193 3475
+10 Adelie Torgersen 42 20.2 190 4250
+# ℹ 334 more rows
+# ℹ 2 more variables: sex , year
+```
+
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 3
+Arrange the penguins data set by `body_mass_g`.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+penguins |>
+ arrange(body_mass_g)
+```
+
+``` output
+# A tibble: 344 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+
+ 1 Chinstrap Dream 46.9 16.6 192 2700
+ 2 Adelie Biscoe 36.5 16.6 181 2850
+ 3 Adelie Biscoe 36.4 17.1 184 2850
+ 4 Adelie Biscoe 34.5 18.1 187 2900
+ 5 Adelie Dream 33.1 16.1 178 2900
+ 6 Adelie Torgers… 38.6 17 188 2900
+ 7 Chinstrap Dream 43.2 16.6 187 2900
+ 8 Adelie Biscoe 37.9 18.6 193 2925
+ 9 Adelie Dream 37.5 18.9 179 2975
+10 Adelie Dream 37 16.9 185 3000
+# ℹ 334 more rows
+# ℹ 2 more variables: sex , year
+```
+::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 4
+ Arrange the penguins data set by descending order of `flipper_length_mm`.
+
+:::::::::::::::::::::::::::::::::::::::: solution
+## Solution
+
+
+``` r
+penguins |>
+ arrange(desc(flipper_length_mm))
+```
+
+``` output
+# A tibble: 344 × 8
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+