Update questions for research and static communication

RohanAlexander · Oct 12, 2024 · 4290281 · 4290281
1 parent 6ca8c94
commit 4290281
Show file tree

Hide file tree

Showing 18 changed files with 763 additions and 649 deletions.
diff --git a/03-workflow.qmd b/03-workflow.qmd
@@ -4,6 +4,10 @@ engine: knitr
 
 # Reproducible workflows {#sec-reproducible-workflows}
 
+::: {.callout-note}
+Chapman and Hall/CRC published this book in July 2023. You can purchase that [here](https://www.routledge.com/Telling-Stories-with-Data-With-Applications-in-R/Alexander/p/book/9781032134772). This online version has some updates to what was printed.
+:::
+
 **Prerequisites**
 
 - Read *What has happened down here is the winds have changed*, [@Gelman2016]

diff --git a/04-writing_research.qmd b/04-writing_research.qmd
@@ -4,6 +4,10 @@ engine: knitr
 
 # Writing research {#sec-on-writing}
 
+::: {.callout-note}
+Chapman and Hall/CRC published this book in July 2023. You can purchase that [here](https://www.routledge.com/Telling-Stories-with-Data-With-Applications-in-R/Alexander/p/book/9781032134772). This online version has some updates to what was printed.
+:::
+
 **Prerequisites**
 
 - Read *By Design: Planning Research on Higher Education*, [@bydesignplanningresearch]
@@ -672,85 +676,100 @@ A variety of authors have established rules\index{writing!rules} for writing. Th
     d.  Reading it aloud.
     e.  Exchanging it with others.
 8. What are three features of a good research question (write a paragraph or two)?
-9. In the context of research approaches, what does data-first mean (pick one)?
+9. How do @bydesignplanningresearch recommend going from a broad theme to actually planning a study in detail (pick one)?
+    a.  Articulate a set of specific research questions.
+    b. Identify available data.
+    c. Talk to experts.
+10. Why do @bydesignplanningresearch believe research questions are so important (select all that apply)?
+    a.  Research questions are the only basis for making sensible planning decisions.
+    b.  Research questions identify the target population from which you will draw a sample.
+    c.  Research questions determine the appropriate level of aggregation.
+    d.  Research questions identify the outcome variable.
+    e.  Research questions idenitfy the key predictors.
+    f.  Research questions raise challenges for measurement and data collection.
+11. In the context of research approaches, what does data-first mean (pick one)?
     a. Developing research questions without considering data availability.
     b. Collecting new data specifically designed to answer a predefined question.
     c. Prioritizing theoretical frameworks over empirical evidence.
     d.  Starting with available data and then determining the questions that can be answered.
-10. What is a counterfactual (include examples and references and write at least three paragraphs)?
-11. What is a counterfactual (pick one)?
+12. What is a counterfactual (include examples and references and write at least three paragraphs)?
+13. What is a counterfactual (pick one)?
     a. An alternative hypothesis that contradicts the main theory.
     b. A fact that counters the main argument of the paper.
     c.  An if-then statement in which the if is false.
     d. A statistical method used to adjust for confounding variables.
-12. What is an estimate (pick one)?
+14. What is an estimate (pick one)?
     a. A rule for calculating an estimate of a given quantity based on observed data.
     b. The object of inquiry.
     c.  A result given a particular dataset and approach.
-13. What is an estimator (pick one)? 
+15. What is an estimator (pick one)? 
     a.  A rule for calculating an estimate of a given quantity based on observed data.
     b. The object of inquiry.
     c. A result given a particular dataset and approach.
-14. What is an estimand (pick one)? 
+16. What is an estimand (pick one)? 
     a. A rule for calculating an estimate of a given quantity based on observed data.
     b.  The object of inquiry.
     c. A result given a particular dataset and approach.
-15. What is an estimand (pick one)?
+17. What is an estimand (pick one)?
     a. A variable that is measured with error.
     b.  The true effect or quantity of interest that we aim to estimate.
     c. The process of using data to calculate an estimate.
     d. A biased estimator.
-16. What is the purpose of Directed Acyclic Graphs (DAGs) (pick one)?
+18. What is the purpose of Directed Acyclic Graphs (DAGs) (pick one)?
     a.  To visually represent causal relationships between variables.
     b. To perform statistical tests on non-linear data.
     c. To automatically generate statistical models.
     d. To create random samples from complex populations.
-17. In the context of DAGs, what is a "confounder" (pick one)?
+19. In the context of DAGs, what is a "confounder" (pick one)?
     a. A variable that is caused by both the predictor and outcome variables.
     b. A variable that lies on the causal path between the predictor and outcome variables.
     c.  A variable that affects both the predictor and outcome variables, potentially biasing the estimated effect.
     d. A variable that is unrelated to both the predictor and outcome variables.
-18. Which of the following is a key benefit of writing for the writer, even when the main focus is on the reader (pick one)?
+20. Which of the following is a key benefit of writing for the writer, even when the main focus is on the reader (pick one)?
     a.  It helps the writer to work out what they believe and how they came to believe it.
     b. It allows the writer to avoid rewriting the paper.
     c. It reduces the amount of feedback required from peers.
     d. It ensures that the writer’s work will be published.
-19. What is the main reason for removing unnecessary words, typos, and grammatical issues from a paper (pick one)?
+21. What is the main reason for removing unnecessary words, typos, and grammatical issues from a paper (pick one)?
     a. To meet word count limits.
     b. To impress reviewers with advanced vocabulary.
     c.  To enhance the credibility of the claims.
     d. To make the paper longer.
-20. Which of the following is the best title (pick one)?
+22. Which of the following is the best title (pick one)?
     a. "Problem Set 2"
     b. "Standard errors"
     c.  "Standard errors of estimates from small samples"
-21. Please write a new title for @fourcade2017seeing.
-22. What is a common structure for writing an abstract (pick one)?
+23. Please write a new title for @fourcade2017seeing.
+24. What is a common structure for writing an abstract (pick one)?
     a. Start with implications, then methods, and end with context.
     b.  First sentence about the general area, second about methods, third about main result, fourth about implications.
     c. Begin with limitations, followed by data sources, then results.
     d. A series of questions that the paper will answer.
-23. Using only the 1,000 most popular words in the English language, according to the [XKCD Simple Writer](https://xkcd.com/simplewriter/), rewrite the abstract of @chambliss1989mundanity so that it retains its original meaning.
-24. What do you want to achieve by from the data section (pick one)?
+25. Using only the 1,000 most popular words in the English language, according to the [XKCD Simple Writer](https://xkcd.com/simplewriter/), rewrite the abstract of @chambliss1989mundanity so that it retains its original meaning.
+26. What do you want to achieve by from the data section (pick one)?
     a. To demonstrate the complexity of the data to impress the reader.
     b.  To create a sense of place by thoroughly describing the data.
     c. To include as many graphs and tables as possible.
     d. To hide any weaknesses in the data.
-25. What should be in the model section (pick one)?
+27. What should be in the model section (pick one)?
     a.  The equations, explanations, and definitions of all components.
     b. Only the final results without any equations.
     c. A general description without mathematical notation.
     d. A summary of other models used in the literature.
-26. What is purpose of the results section (pick one)?
+28. What is purpose of the results section (pick one)?
     a. Interpreting the results and discussing their implications.
     b. Critiquing other researchers’ findings.
     c.  Presenting the outcomes of the analysis clearly, without extensive interpretation.
     d. Proposing future research directions.
-27. What is the purpose of the discussion section (pick one)?
+29. What is the purpose of the discussion section (pick one)?
     a. To repeat the results in more detail.
     b. To provide a detailed methodology.
     c.  To interpret the results, discuss implications, and acknowledge weaknesses.
     d. To list all the limitations of the study without offering solutions.
+30. Why does @Savage2019 recommend asking yourself if it is possible to preserve your original message without that punctuation mark, that word, that sentence, that paragraph or that section (pick one)?
+    a. To reduce the possibility of errors.
+    b. To keep the paper short.
+    c.  To achieve clarity.
 
 ### Activity {.unnumbered}
 

diff --git a/05-static_communication.qmd b/05-static_communication.qmd
@@ -4,6 +4,10 @@ engine: knitr
 
 # Static communication {#sec-static-communication}
 
+::: {.callout-note}
+Chapman and Hall/CRC published this book in July 2023. You can purchase that [here](https://www.routledge.com/Telling-Stories-with-Data-With-Applications-in-R/Alexander/p/book/9781032134772). This online version has some updates to what was printed.
+:::
+
 **Prerequisites**
 
 - Read *R for Data Science*, [@r4ds]
@@ -1497,10 +1501,10 @@ In this chapter we considered many ways of communicating data. We spent substant
 ### Quiz {.unnumbered}
 
 1. Assume the `tidyverse` and `datasauRus` are installed and loaded. What would be the outcome of the following code?
-    a.  Four vertical lines
-    b. Five vertical lines
-    c. Three vertical lines
-    d. Two vertical lines
+    a.  Four vertical lines.
+    b. Five vertical lines.
+    c. Three vertical lines.
+    d. Two vertical lines.
 
 ```{r}
 #| eval: false
@@ -1553,102 +1557,110 @@ beps |>
     b.  "RdBu"
     c. "GnBu"
     d. "Set1"
-6. Which geom should be used to make a scatter plot?
+6. Based on @vanderplas2020testing, which cognitive principle should be considered when creating graphs (pick one)?
+    a.  Proximity.
+    b. Volume estimation.
+    c. Axial positioning.
+    d. Relative motion.
+7. Based on @vanderplas2020testing, color can be used to (pick one)?
+    a. Identify magnitude.
+    b. Improve chart design aesthetics.
+    c.  Encode categorical and continuous variables and group plot elements.
+8. Which geom should be used to make a scatter plot?
     a. `geom_smooth()`
     b.  `geom_point()`
     c. `geom_bar()`
     d. `geom_dotplot()`
-7. Which of these would result in the largest number of bins?
+9. Which of these would result in the largest number of bins?
     a. `geom_histogram(binwidth = 5)`
     b.  `geom_histogram(binwidth = 2)`
-8. Suppose there is a dataset that contains the heights of 100 birds, each from one of three different species. If we are interested in understanding the distribution of these heights, then in a paragraph or two, please explain which type of graph should be used and why.
-9. Would this code `data |> ggplot(aes(x = col_one)) |> geom_point()` work if we assume the dataset and columns exist (pick one)?
-    a. Yes
-    b.  No
-10. Which of the following, if any, are elements of the layered grammar of graphics of @wickham2010layered (select all that apply)?
+10. Suppose there is a dataset that contains the heights of 100 birds, each from one of three different species. If we are interested in understanding the distribution of these heights, then in a paragraph or two, please explain which type of graph should be used and why.
+11. Would this code `data |> ggplot(aes(x = col_one)) |> geom_point()` work if we assume the dataset and columns exist (pick one)?
+    a. Yes.
+    b.  No.
+12. Which of the following, if any, are elements of the layered grammar of graphics of @wickham2010layered (select all that apply)?
     a. A default dataset and set of mappings from variables to aesthetics.
     b. One or more layers, with each layer having one geometric object, one statistical transformation, one position adjustment, and optionally, one dataset and set of aesthetic mappings.
     c. Colors that enable the reader to understand the main point.
     d. A coordinate system.
     e. The facet specification.
     f. One scale for each aesthetic mapping used.
-11. Which function from `modelsummary` could we use to create a table of descriptive statistics?
+13. Which function from `modelsummary` could we use to create a table of descriptive statistics?
     a. `datasummary_descriptive()`
     b. `datasummary_skim()`
     c. `datasummary_crosstab()`
     d.  `datasummary_balance()`
-12. What is the primary reason for always plotting your data before analysis (pick one)?
-    a. To check for missing values
-    b. To ensure the data meets normality assumptions
-    c.  To reveal underlying patterns and structures
-    d. To test the speed of your plotting software
-13. Which `ggplot2` geom is primarily used to create bar charts when you have already computed counts or frequencies (pick one)?
+14. What is the primary reason for always plotting data (pick one)?
+    a. To check for missing values.
+    b. To ensure the data are normal.
+    c.  To reveal underlying patterns and structures.
+15. Which `ggplot2` geom is primarily used to create bar charts when you have already computed counts or frequencies (pick one)?
     a. `geom_bar()`
     b.  `geom_col()`
     c. `geom_histogram()`
     d. `geom_line()`
-14. In `ggplot2`, what is the purpose of using facets in a plot (pick one)?
-    a. To change the color scheme of the plot
-    b. To add labels to data points
-    c.  To create multiple plots split by the values of one or more variables
-    d. To adjust the transparency of the points
-15. When creating a bar chart with `ggplot2`, which aesthetic is typically mapped to a categorical variable to fill bars with different colors (pick one)?
-    a. x
-    b. y
-    c.  fill
-    d. size
-16. What is the effect of adding `position = "dodge"` or `position = "dodge2"` to a `geom_bar()` in `ggplot2` (pick one)?
-    a. It stacks the bars on top of each other
-    b.  It places the bars side by side for each group
-    c. It changes the bar colors to grayscale
-    d. It adds transparency to the bars
-17. In the context of `ggplot2`, what is the primary difference between `geom_point()` and `geom_jitter()` (pick one)?
-    a.  `geom_jitter()` adds random noise to points to reduce overplotting
-    b. `geom_point()` plots points, `geom_jitter()` plots lines
-    c. `geom_point()` adds transparency, `geom_jitter()` does not
-    d. `geom_jitter()` is used for continuous data, `geom_point()` for categorical data
-18. Which `ggplot2` geom would you use to add a line of best fit to a scatterplot (pick one)?
+16. In `ggplot2`, what is the purpose of using facets in a plot (pick one)?
+    a. To change the color scheme of the plot.
+    b. To add labels to data points.
+    c.  To create multiple plots split by the values of one or more variables.
+    d. To adjust the transparency of the points.
+17. When creating a bar chart with `ggplot2`, which aesthetic is typically mapped to a categorical variable to fill bars with different colors (pick one)?
+    a. `x`
+    b. `y`
+    c.  `fill`
+    d. `size`
+18. What is the effect of adding `position = "dodge"` or `position = "dodge2"` to a `geom_bar()` in `ggplot2` (pick one)?
+    a. It stacks the bars on top of each other.
+    b.  It places the bars side by side for each group.
+    c. It changes the bar colors to grayscale.
+    d. It adds transparency to the bars.
+19. In the context of `ggplot2`, what is the primary difference between `geom_point()` and `geom_jitter()` (pick one)?
+    a.  `geom_jitter()` adds random noise to points to reduce overplotting.
+    b. `geom_point()` plots points, `geom_jitter()` plots lines.
+    c. `geom_point()` adds transparency, `geom_jitter()` does not.
+    d. `geom_jitter()` is used for continuous data, `geom_point()` for categorical data. 
+20. Which `ggplot2` geom would you use to add a line of best fit to a scatterplot (pick one)?
     a. `geom_line()`
     b. `geom_bar()`
     c. `geom_histogram()`
     d.  `geom_smooth()`
-19. What argument would you use in geom_smooth() to specify a linear model without standard errors (pick one)?
+21. What argument would you use in geom_smooth() to specify a linear model without standard errors (pick one)?
     a.  `method = lm, se = FALSE`
     b. `model = linear, error = FALSE`
     c. `type = "linear", ci = FALSE`
     d. `fit = lm, show_se = FALSE`
-20. In `ggplot2`, which geom is designed for creating histograms (pick one)?
+22. In `ggplot2`, which geom is designed for creating histograms (pick one)?
     a.  `geom_histogram()`
     b. `geom_bar()`
     c. `geom_col()`
     d. `geom_density()`
-21. When creating a histogram, adjusting the number of bins or binwidth affects:
-    a.  The overall shape of the underlying data distribution
-    b. The size of the data points
-    c. The colors used in the plot
-    d. The labels on the x-axis
-22. What is one disadvantage of using boxplots (pick one)?
-    a. They are too colorful
-    b.  They hide the underlying distribution of the data
-    c. They take too long to compute
-    d. They cannot show outliers
-23. Which of the following is a recommended way to enhance a boxplot to show more information about the data (pick one)?
-    a. Increase the box width
-    b.  Overlay the actual data points using `geom_jitter()`
-    c. Change the box color to gradient
-    d. Remove the whiskers from the boxplot
-24. In the context of `ggplot2`, what does `stat_ecdf()` compute (pick one)?
-    a. A histogram
-    b. A scatterplot with error bars
-    c. A boxplot
-    d.  A cumulative distribution function
-25. What is geocoding in the context of mapping data (pick one)?
-    a. The process of converting latitude and longitude into place names
-    b. The process of selecting a map projection
-    c. The process of drawing map boundaries
-    d.  The process of converting place names into latitude and longitude coordinates
-
-### Tutorial {.unnumbered}
+23. When creating a histogram, adjusting the number of bins or binwidth affects:
+    a.  The overall shape of the underlying data distribution.
+    b. The size of the data points.
+    c. The colors used in the plot.
+    d. The labels on the x-axis.
+24. What is one disadvantage of using boxplots (pick one)?
+    a. They are too colorful.
+    b.  They hide the underlying distribution of the data.
+    c. They take too long to compute.
+    d. They cannot show outliers.
+25. Which of the following is a recommended way to enhance a boxplot to show more information about the data (pick one)?
+    a. Increase the box width.
+    b.  Overlay the actual data points using `geom_jitter()`.
+    c. Change the box color to gradient.
+    d. Remove the whiskers from the boxplot.
+26. In the context of `ggplot2`, what does `stat_ecdf()` compute (pick one)?
+    a. A histogram.
+    b. A scatterplot with error bars.
+    c. A boxplot.
+    d.  A cumulative distribution function.
+27. What is geocoding in the context of mapping data (pick one)?
+    a. The process of converting latitude and longitude into place names.
+    b. The process of selecting a map projection.
+    c. The process of drawing map boundaries.
+    d.  The process of converting place names into latitude and longitude coordinates.
+
+### Activity {.unnumbered}
 
 Please create a graph using `ggplot2` and a map using `ggmap` and add explanatory text to accompany both. Be sure to include cross-references and captions, etc. Each of these should take about pages.