diff --git a/01-project-introduction.html b/01-project-introduction.html index 775be499..be8203ce 100644 --- a/01-project-introduction.html +++ b/01-project-introduction.html @@ -309,7 +309,7 @@


Introduction to R and RStudio

-

Last updated on 2024-03-12 | +

Last updated on 2024-03-19 | Edit this page

@@ -736,8 +736,8 @@

Version Control


Visualisation with ggplot2

-

Last updated on 2024-03-12 | +

Last updated on 2024-03-19 | Edit this page

@@ -445,11 +445,11 @@

R# install.packages("tidyverse") library(tidyverse) ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── -✔ dplyr 1.1.2 ✔ readr 2.1.4 -✔ forcats 1.0.0 ✔ stringr 1.5.0 -✔ ggplot2 3.4.2 ✔ tibble 3.2.1 -✔ lubridate 1.9.2 ✔ tidyr 1.3.0 -✔ purrr 1.0.1 +✔ dplyr 1.1.4 ✔ readr 2.1.5 +✔ forcats 1.0.0 ✔ stringr 1.5.1 +✔ ggplot2 3.5.0 ✔ tibble 3.2.1 +✔ lubridate 1.9.3 ✔ tidyr 1.3.1 +✔ purrr 1.0.2 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() @@ -470,7 +470,8 @@

R mapping = aes(x = bill_depth_mm, y = bill_length_mm) ) -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

Note that we split the function into several lines. In R, any function has a name and is followed by parentheses. Inside the @@ -523,7 +524,7 @@

Challenge 1a

y = bill_length_mm) ) -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

@@ -571,7 +573,7 @@

Challenge 1b

mapping = aes(x = year, y = bill_length_mm) ) -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

@@ -605,7 +608,8 @@

R y = bill_length_mm, colour = island) ) -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

@@ -626,7 +630,7 @@

Challenge 2

y = bill_length_mm, colour = year) ) -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

Island is categorical character variable with a discrete range of possible values. This, like the data type of factor, is represented with @@ -663,7 +668,8 @@

R colour = species, size = year) ) -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

It might be even better to try another type of aesthetic, like shape, for categorical data like species.

@@ -677,7 +683,8 @@

R colour = species, shape = species) ) -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

Playing around with different aesthetic mappings until you find something that really makes the data “pop” is a good idea. A plot is @@ -703,7 +710,8 @@

R y = bill_length_mm), colour = "blue" ) -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

Once more, observe that the colour is now not mapped to any particular variable from the penguins dataset and applies @@ -728,7 +736,7 @@

Challenge 3

alpha = year) ) -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

@@ -776,7 +785,7 @@

Challenge 4

mapping = aes(x = bill_depth_mm, y = bill_length_mm), alpha = 0.5) -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

Controlling the transparency can be a great way to “mute” the visual @@ -818,7 +828,8 @@

R mapping = aes(x = species, y = bill_length_mm) ) -Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`). +Warning: Removed 2 rows containing non-finite outside the scale range +(`stat_boxplot()`).

Layers can be added on top of each other. In the following graph we will place the boxplots over jittered points to see the @@ -838,8 +849,10 @@

R mapping = aes(x = species, y = bill_length_mm) ) -Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`). -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing non-finite outside the scale range +(`stat_boxplot()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

Now, this was slightly inefficient due to duplication of code - we had to specify the same mappings for two layers. To avoid it, you can @@ -855,8 +868,10 @@

R) + geom_jitter(aes(colour = island)) + geom_boxplot(alpha = .6) -Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`). -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing non-finite outside the scale range +(`stat_boxplot()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

You can still add layer-specific mappings or other arguments by specifying them within individual geoms. Here, we’ve set the @@ -882,8 +897,10 @@

R geom_point(alpha = 0.5) + geom_smooth(method = "lm") `geom_smooth()` using formula = 'y ~ x' -Warning: Removed 2 rows containing non-finite values (`stat_smooth()`). -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing non-finite outside the scale range +(`stat_smooth()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

@@ -903,7 +920,7 @@

Challenge 5

alpha = 0.5) + geom_smooth(method = "lm") `geom_smooth()` using formula = 'y ~ x' -Warning: Removed 2 rows containing non-finite values (`stat_smooth()`). -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing non-finite outside the scale range +(`stat_smooth()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

In the graph above, each geom inherited all three mappings: x, y and @@ -950,7 +969,7 @@

Challenge 6

geom_smooth(method = "lm", colour = "black") `geom_smooth()` using formula = 'y ~ x' -Warning: Removed 2 rows containing non-finite values (`stat_smooth()`). -`geom_smooth()` using formula = 'y ~ x' -Warning: Removed 2 rows containing non-finite values (`stat_smooth()`). -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing non-finite outside the scale range +(`stat_smooth()`). +`geom_smooth()` using formula = 'y ~ x' +Warning: Removed 2 rows containing non-finite outside the scale range +(`stat_smooth()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

Look at that! The data actually reveals something called the “simpsons @@ -1014,8 +1036,10 @@

R geom_smooth(method = "lm") + facet_wrap(~ sex) `geom_smooth()` using formula = 'y ~ x' -Warning: Removed 2 rows containing non-finite values (`stat_smooth()`). -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing non-finite outside the scale range +(`stat_smooth()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

The facet’s take formula arguments, meaning they contain the tilde (~). The way often we think about it, trying to @@ -1035,8 +1059,10 @@

R geom_smooth(method = "lm") + facet_wrap(~ species) `geom_smooth()` using formula = 'y ~ x' -Warning: Removed 2 rows containing non-finite values (`stat_smooth()`). -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing non-finite outside the scale range +(`stat_smooth()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

The NA’s still look weird, but its definitely better, I think.

@@ -1057,7 +1083,7 @@

Challenge 7

geom_smooth(method = "lm") + facet_wrap(~ species + island) `geom_smooth()` using formula = 'y ~ x' -Warning: Removed 2 rows containing non-finite values (`stat_smooth()`). -Warning: Removed 2 rows containing missing values (`geom_point()`). +Warning: Removed 2 rows containing non-finite outside the scale range +(`stat_smooth()`). +Warning: Removed 2 rows containing missing values or values outside the scale range +(`geom_point()`).

@@ -1160,8 +1188,8 @@

Wrap-up




Data visualisation and scales

-

Last updated on 2024-03-12 | +

Last updated on 2024-03-19 | Edit this page

@@ -385,7 +385,7 @@

Challenge 1

datanovia @@ -573,7 +573,7 @@

Challenge 4


Data manipulation with dplyr

-

Last updated on 2024-03-12 | +

Last updated on 2024-03-19 | Edit this page

@@ -550,7 +550,7 @@

Challenge 1


Reshaping data with tidyr

-

Last updated on 2024-03-12 | +

Last updated on 2024-03-19 | Edit this page

@@ -437,7 +437,8 @@

R

WARNING

-
Warning: Removed 8 rows containing non-finite values (`stat_boxplot()`).
+
Warning: Removed 8 rows containing non-finite outside the scale range
+(`stat_boxplot()`).

That’s pretty neat. By pivoting the data into this longer shape we are able to create sub-plots for all measurements easily with the same @@ -950,9 +951,9 @@

WARNING