From d2c6be3d9c2d8ea8bd01e54e48a61e0d0386ef2e Mon Sep 17 00:00:00 2001 From: Rohan Alexander Date: Sun, 13 Oct 2024 09:55:58 -0400 Subject: [PATCH] Finalize chapter 1 mcq --- 00-errata.qmd | 2 +- 01-introduction.qmd | 110 ++- 05-static_communication.qmd | 18 +- docs/00-errata.html | 2 +- docs/01-introduction.html | 83 +- docs/02-drinking_from_a_fire_hose.html | 18 +- docs/05-static_communication.html | 18 +- .../figure-html/fig-readioovertime-1.png | Bin 110392 -> 110316 bytes docs/09-clean_and_prepare.html | 6 +- docs/11-eda.html | 16 +- docs/23-assessment.html | 880 +++++++++--------- docs/24-interaction.html | 12 +- docs/search.json | 10 +- docs/sitemap.xml | 6 +- ..._cd8148f91ef9529ea3c54ed1dc68fee4.rda.meta | Bin 345 -> 345 bytes 15 files changed, 603 insertions(+), 578 deletions(-) diff --git a/00-errata.qmd b/00-errata.qmd index cccf3974..25ec6ba9 100644 --- a/00-errata.qmd +++ b/00-errata.qmd @@ -10,7 +10,7 @@ Chapman and Hall/CRC published this book in July 2023. You can purchase that [he This online version has some updates to what was printed. An online version that matches the print version is available [here](https://rohanalexander.github.io/telling_stories-published/). ::: -*Last updated: 12 October 2024.* +*Last updated: 13 October 2024.* The book was reviewed by Piotr Fryzlewicz in *The American Statistician* [@Fryzlewicz2024] and Nick Cox on [Amazon](https://www.amazon.com/gp/customer-reviews/R3S602G9RUDOF/ref=cm_cr_dp_d_rvw_ttl?ie=UTF8&ASIN=1032134771). I am grateful they gave such a lot of their time to provide the review, as well as their corrections and suggestions. diff --git a/01-introduction.qmd b/01-introduction.qmd index 15c519b6..2ad02bee 100644 --- a/01-introduction.qmd +++ b/01-introduction.qmd @@ -4,6 +4,10 @@ engine: knitr # Telling stories with data {#sec-introduction} +::: {.callout-note} +Chapman and Hall/CRC published this book in July 2023. You can purchase that [here](https://www.routledge.com/Telling-Stories-with-Data-With-Applications-in-R/Alexander/p/book/9781032134772). This online version has some updates to what was printed. +::: + **Prerequisites** - Read *Counting the Countless*, [@keyes2019] @@ -194,90 +198,94 @@ Ultimately, we are all just telling stories with data, but these stories are inc ### Quiz {.unnumbered} 1. What is data science (in your own words)? -2. Based on @register2020, data decisions impact (pick one)? +2. From @register2020, data decisions impact (pick one)? a. Real people. b. No one. c. Those in the training set. d. Those in the test set. -3. Based on @keyes2019, what is data science (pick one)? - a. The inhumane reduction of humanity down to what can be counted. +3. From @keyes2019, what is data science (pick one)? + a. Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structured and unstructured data. b. The quantitative analysis of large amounts of data for the purpose of decision-making. - c. Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structured and unstructured data. -4. Based on @keyes2019, what is one consequence of data systems that require standardized categories? - a. Improved user experience - b. Enhanced security measures - c. Erasure of individual identities and experiences - d. Increased innovation in technology -5. Based on @kieranskitchen, what criticism about working with quantitative data is addressed? + c. The inhumane reduction of humanity down to what can be counted. +4. From @keyes2019, what is one consequence of data systems that require standardized categories? + a. Worse user experience. + b. Compromised security measures. + c. Increased innovation in technology. + d. Erasure of individual identities and experiences. +5. From @kieranskitchen, what is a common criticism about working with data? a. That it is too time-consuming and inefficient. - b. That it requires expensive equipment. - c. That it distances one from the reality of human lives behind the numbers. - d. That it is only suitable for scientists. -6. Based on @kieranskitchen, what response to the criticism that quantitative data inures people to human realities is made? - a. We should stop analyzing data. - b. Working with data forces a confrontation with questions of meaning. + b. That it distances one from the reality of human lives behind the numbers. + c. That it requires expensive software and extensive training to analyse. +6. From @kieranskitchen, what is a response that criticism? + a. Working with data forces a confrontation with questions of meaning. + b. Data analysis should not be done. c. Data should only be analyzed by automated processes. d. Qualitative approaches should be the predominate approach. 7. How can you reconcile @keyes2019 and @kieranskitchen? -8. Why is ethics a key element of *Telling Stories with Data* (pick one)? +8. Why is ethics a key element of data science (pick one)? a. Because data science always involves sensitive personal information. - b. Because datasets likely concern humans and require careful consideration of their context. - c. Because ethical considerations make the analysis more complex. + b. Because ethical considerations make the analysis easier to do. + c. Because datasets likely concern humans and require consideration of context. d. Because regulations require ethics approval for any data analysis. 9. According to @crawford, as described in this chapter, which of the following forces shape our world, and hence our data (select all that apply)? - a. Political. - b. Historical. - c. Cultural. - d. Social. -10. Consider the results of a survey that asked about gender. It finds the following counts: "man: 879", "woman: 912", "non-binary: 10" "prefer not to say: 3", and "other: 1". What is the appropriate way to consider "prefer not to say" (pick one)? + a. Political. + b. Physical. + c. Historical. + d. Cultural. + e. Social. +10. From @nottomford, what is a compiler (pick one)? + a. Software that takes the symbols you typed into a file and transforms them into lower-level instructions. + b. A sequence of symbols (using typical keyboard characters, saved to a file of some kind) that someone typed in, or copied, or pasted from elsewhere. + c. A clock with benefits. + d. Putting holes in punch cards, then into a box, then loading them, then the computer flips through the cards, identify where the holes were, and update parts of its memory. +11. Consider the results of a survey that asked about gender. It finds the following counts: "man: 879", "woman: 912", "non-binary: 10" "prefer not to say: 3", and "other: 1". What is the appropriate way to consider "prefer not to say" (pick one)? a. Drop them. - b. Merge it into "other". + b. It depends. c. Include them. - d. It depends. -11. Imagine that you have a job in which including race and/or sexuality as predictors improves the performance of your model. When deciding whether to include these in your analysis, what factors would you consider (in your own words)? -12. In *Telling Stories with Data* what is meant by reproducibility in data science (pick one)? + d. Merge it into "other". +12. Imagine that you have a job in which including race and/or sexuality as predictors improves the performance of your model. When deciding whether to include these in your analysis, what factors would you consider (in your own words)? +13. What is meant by reproducibility in data science (pick one)? a. Being able to produce similar results with different datasets. b. Ensuring that all steps of the analysis can be independently redone by others. c. Publishing results in peer-reviewed journals. d. Using proprietary software to protect data. -13. What challenge is associated with the measurement and data collection stage (pick one)? +14. What is a challenge associated with measurement (pick one)? a. It is usually straightforward and requires little attention. - b. Measurements are always accurate and consistent over time. - c. Data collection is entirely objective and free from bias. - d. Deciding what to measure and how to measure it is complex and context-dependent. -14. In the analogy to the sculptor, what does the act of sculpting represent in the data workflow (pick one)? + b. Deciding what and how to measure is complex and context-dependent. + c. Data collection is objective and free from bias. + d. Measurements are always accurate and consistent over time. +15. In the analogy to the sculptor, what does the act of sculpting represent in the data workflow (pick one)? a. Creating complex models to fit the data. b. Acquiring raw data. c. Cleaning and preparing the data to reveal the needed dataset. - d. Visualizing the final results. -15. Why is exploratory data analysis (EDA) considered an open-ended process (pick one)? + d. Visualizing the results. +16. Why is exploratory data analysis (EDA) an open-ended process (pick one)? a. Because it has a fixed set of steps to follow. - b. Because it involves testing hypotheses in a structured way. - c. Because it requires ongoing iteration to understand the data's shape and patterns. - d. Because it can be automated with modern software. -16. Why should statistical models be used carefully (pick one)? + b. Because it requires ongoing iteration to understand the data's shape and patterns. + c. Because it involves testing hypotheses in a structured way. + d. Because it can be automated. +17. Why should statistical models be used carefully (pick one)? a. Because they always provide definitive results. b. Because they can reflect the decisions made in earlier stages. c. Because they are too complicated for most audiences. d. Because they are unnecessary if the data are well-presented. -17. What is one of the key messages from measuring height (pick one)? +18. What is one lesson from thinking about the challenges of measuring height (pick one)? a. Height is a straightforward measurement with little variability. - b. All measurements are accurate if done with modern tools. + b. All measurements are accurate if done with the right instrument. c. Even simple measurements can have complexities that affect data quality. d. Height is not a useful variable in data analysis. -18. What is the danger of not considering who is missing from a dataset (pick one)? +19. What is the danger of not considering who is missing from a dataset (pick one)? a. It has no significant impact on the analysis. - b. It can lead to conclusions that do not represent the full context. - c. It simplifies the analysis by reducing the amount of data. -19. What is the primary purpose of statistical modeling (pick one)? - a. To prove hypotheses. - b. As a tool to help explore and understand the data. + b. It simplifies the analysis by reducing the amount of data. + c. It can lead to conclusions that do not represent the full context. +20. What is a purpose of statistical modeling (pick one)? + a. As a tool to help explore and understand the data. + b. To prove hypotheses. c. To replace exploratory data analysis. -20. What is meant by "our data are a simplification of the messy, complex world" (pick one)? +21. What is meant by "our data are a simplification of the messy, complex world" (pick one)? a. Data perfectly capture all aspects of reality. - b. Data are always inaccurate and useless. - c. Data simplify reality to make analysis possible, but they cannot capture every detail. - + b. Data simplify reality to make analysis possible, but they cannot capture every detail. + c. Data are always inaccurate and useless. ### Activity {.unnumbered} diff --git a/05-static_communication.qmd b/05-static_communication.qmd index 40536683..8cef031f 100644 --- a/05-static_communication.qmd +++ b/05-static_communication.qmd @@ -1634,7 +1634,7 @@ beps |> b. `geom_bar()` c. `geom_col()` d. `geom_density()` -23. When creating a histogram, adjusting the number of bins or binwidth affects: +23. What does adjusting the number of bins, or changing the binwith, affect for a histogram (pick one)? a. The overall shape of the underlying data distribution. b. The size of the data points. c. The colors used in the plot. @@ -1644,21 +1644,21 @@ beps |> b. They hide the underlying distribution of the data. c. They take too long to compute. d. They cannot show outliers. -25. Which of the following is a recommended way to enhance a boxplot to show more information about the data (pick one)? +25. How can you deal with that disadvantage (pick one)? a. Increase the box width. b. Overlay the actual data points using `geom_jitter()`. - c. Change the box color to gradient. + c. Add colors for each category. d. Remove the whiskers from the boxplot. -26. In the context of `ggplot2`, what does `stat_ecdf()` compute (pick one)? +26. What does `stat_ecdf()` compute (pick one)? a. A histogram. b. A scatterplot with error bars. c. A boxplot. d. A cumulative distribution function. -27. What is geocoding in the context of mapping data (pick one)? - a. The process of converting latitude and longitude into place names. - b. The process of selecting a map projection. - c. The process of drawing map boundaries. - d. The process of converting place names into latitude and longitude coordinates. +27. What is geocoding (pick one)? + a. Converting latitude and longitude into place names. + b. Picking a map projection. + c. Drawing map boundaries. + d. Converting place names into latitude and longitude. ### Activity {.unnumbered} diff --git a/docs/00-errata.html b/docs/00-errata.html index 1541cd6a..e71e4e61 100644 --- a/docs/00-errata.html +++ b/docs/00-errata.html @@ -478,7 +478,7 @@

Errors and updates

-

Last updated: 12 October 2024.

+

Last updated: 13 October 2024.

The book was reviewed by Piotr Fryzlewicz in The American Statistician (Fryzlewicz 2024) and Nick Cox on Amazon. I am grateful they gave such a lot of their time to provide the review, as well as their corrections and suggestions.

Since the publication of this book in July 2023, there have been a variety of changes in the world. The rise of generative AI has changed the way that people code, Python has become easier to integrate alongside R because of Quarto, and packages continue to update (not to mention a new cohort of students has started going through the book). One advantage of having an online version is that I can make improvements.

I am grateful for the corrections and suggestions of: Andrew Black, Clay Ford, Crystal Lewis, David Jankoski, Donna Mulkern, Emi Tanaka, Emily Su, Inessa De Angelis, James Wade, Julia Kim, Krishiv Jain, Seamus Ross, Tino Kanngiesser, and Zak Varty.

diff --git a/docs/01-introduction.html b/docs/01-introduction.html index 70b61504..9de0c6e8 100644 --- a/docs/01-introduction.html +++ b/docs/01-introduction.html @@ -471,6 +471,16 @@

+
+
+
+ +
+
+

Chapman and Hall/CRC published this book in July 2023. You can purchase that here. This online version has some updates to what was printed.

+
+
+

Prerequisites