Skip to content

Commit

Permalink
fix: repaired image alignment, added alt text, caught typos
Browse files Browse the repository at this point in the history
  • Loading branch information
njlyon0 committed Aug 22, 2024
1 parent d65e094 commit 95556e1
Show file tree
Hide file tree
Showing 8 changed files with 22 additions and 12 deletions.
4 changes: 2 additions & 2 deletions _freeze/join/execute-results/html.json

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions _freeze/visualize/execute-results/html.json

Large diffs are not rendered by default.

Binary file modified _freeze/visualize/figure-html/geom-order-3-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _freeze/visualize/figure-html/theme-plus-1-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _freeze/visualize/figure-html/theme-plus-2-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _freeze/visualize/figure-html/theme-plus-3-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 16 additions & 6 deletions join.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ library(tidyverse); library(palmerpenguins)

## Combining data

Now that we know how to manipulate a single dataframe, how do we manipulate multiple dataframes? If we have multiple sources of data and we want to combine them together into one dataframe or table, we can **join** them through any shared column(s)! Data you'll be joining can be called "relational data", because there is some kind of relationship between the dataframes that you’ll be leveraging. In the `tidyverse`, combining data that has a relationship is called "joining". Let's look at some of `dplyr`'s many `join` functions!
Now that we know how to manipulate a single dataframe, how do we manipulate multiple dataframes? If we have multiple sources of data and we want to combine them together into one dataframe or table, we can **join** them through any shared column(s)! Data you'll be joining can be called "relational data", because there is some kind of relationship between the dataframes that you’ll be leveraging. In the Tidyverse, combining data that has a relationship is called "joining". Let's look at some of `dplyr`'s many `join` functions!

In each of the following `join` functions, you provide two dataframes, the one you arbitrarily provide first is called the "left" dataframe while the other is called the "right" dataframe. This is important because each of the different `join` functions brings the columns from one of the dataframes into the other depending on (1) which dataframe is left and which is right and (2) what type of `join` you specify.

Expand Down Expand Up @@ -56,7 +56,9 @@ Also, note that if column names include spaces (as in `Individual ID` and `Date

In a `left_join`, we bring the columns from the right dataframe that match rows found in the specified column(s) of the left dataframe.

<img src="images/join-left.png" align="center" width="50%" />
<p align="center">
<img src="images/join-left.png" alt="Graphic showing a left join" width="50%" />
</p>

We can specify the column that we want to join based on with `by = ...`. If we don't provide this argument, then `dplyr` will automatically join on **all** matching columns between the left and right dataframes. In our case, we want to `left_join` by `record_number`.

Expand All @@ -81,7 +83,9 @@ What we have in the end is `penguins_left_joined`, a dataframe with information

In a `right_join`, we bring rows from the left dataframe into the right dataframe based on the values in the specified column(s) of the right dataframe.

<img src="images/join-right.png" align="center" width="50%" />
<p align="center">
<img src="images/join-right.png" alt="Graphic showing a right join" width="50%" />
</p>

As the names imply, a `right_join` is the opposite of a `left_join`.
:::
Expand All @@ -93,7 +97,9 @@ As the names imply, a `right_join` is the opposite of a `left_join`.

In an `inner_join`, we keep only the rows where the values in the column we are joining `by` are found in both dataframes.

<img src="images/join-inner.png" align="center" width="50%" />
<p align="center">
<img src="images/join-inner.png" alt="Graphic showing an inner join" width="50%" />
</p>

This can be really useful when one of the dataframes includes supplementary data that has incomplete coverage on the other dataframe and you want to simultaneously combine the dataframes and remove the inevitable `NA`s that will be created.

Expand All @@ -111,7 +117,9 @@ Note that in an `inner_join` it doesn't matter which dataframe is "left" and whi

In a `full_join`, we keep all values and all rows.

<img src="images/join-full.png" align="center" width="50%" />
<p align="center">
<img src="images/join-full.png" alt="Graphic showing a full join" width="50%" />
</p>

A `full_join` is "smart" enough to fill with `NA`s in all rows that don't match between the two dataframes. Also, just like an `inner_join`, a `full_join` doesn't care about which dataframe is "left" and which is "right" because all columns are getting combined regardless of which is left vs. right.
:::
Expand All @@ -123,7 +131,9 @@ A `full_join` is "smart" enough to fill with `NA`s in all rows that don't match

In an `anti_join`, we return rows of the left dataframe that do not have a match in the right dataframe. This can be used to see what will **not** be included in a join.

<img src="images/join-anti.png" align="center" width="50%" />
<p align="center">
<img src="images/join-anti.png" alt="Graphic showing an anti join" width="50%" />
</p>

One case where an `anti_join` is particularly useful is that of "text mining" where you have one dataframe with a column of individual words that you've split apart from a larger block of free text. If you also have a dataframe of one column that contains words that you want to remove from your "actual" data (e.g., "and", "not", "I", "me", etc.), you can `anti_join` the two dataframes to quickly remove all of those unwanted words from your text mining dataframe.
:::
Expand Down
4 changes: 2 additions & 2 deletions visualize.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ library(tidyverse); library(palmerpenguins)

## `ggplot2` Overview

While the bulk of the `tidyverse` is focused on modifying a given data object, `ggplot2` is also a package in the `tidyverse` that is more concerned with--intuitively enough--*plotting* tidy data. `ggplot2` does share some syntax with the functions and packages that we've discussed so far but it also introduces some new elements that we'll discuss as we encounter them.
While the bulk of the Tidyverse is focused on modifying a given data object, `ggplot2` is also a package in the Tidyverse that is more concerned with--intuitively enough--*plotting* tidy data. `ggplot2` does share some syntax with the functions and packages that we've discussed so far but it also introduces some new elements that we'll discuss as we encounter them.

## Creating a Plot

Expand All @@ -46,7 +46,7 @@ Now that we have a baseline plot, we can add desired geometries using the `geom_

### Geometry Aside No. 1 - Adding Plot Elements

You may have noticed that the core plot is built with `ggplot` and `aes` but each subsequent component is added with one of the `geom_...` functions and realized the gap we haven't talked about yet: how do we combine these separate lines of code? The answer is part of what makes `ggplot` different from the rest of the `tidyverse`. In the rest of the `tidyverse` we chain together multiple lines of code with the `%>%` operator, however, **in `ggplot2` we use `+` to combine separate lines of code.**
You may have noticed that the core plot is built with `ggplot` and `aes` but each subsequent component is added with one of the `geom_...` functions and realized the gap we haven't talked about yet: how do we combine these separate lines of code? The answer is part of what makes `ggplot` different from the rest of the Tidyverse. In the rest of the Tidyverse we chain together multiple lines of code with the `%>%` operator, however, **in `ggplot2` we use `+` to combine separate lines of code.**

This has a distinct advantage that we'll discuss later but we'll use the `+` in the following example to show its use.

Expand Down

0 comments on commit 95556e1

Please sign in to comment.