Skip to content

Commit

Permalink
Update hunt and gather mcq
Browse files Browse the repository at this point in the history
  • Loading branch information
RohanAlexander committed Oct 13, 2024
1 parent 4290281 commit 7cf316b
Show file tree
Hide file tree
Showing 16 changed files with 807 additions and 566 deletions.
42 changes: 38 additions & 4 deletions 07-gather.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -1492,15 +1492,15 @@ In general the result is not too bad. OCR is a useful tool but is not perfect an

## Exercises

### Scales {.unnumbered}
### Practice {.unnumbered}

1. *(Plan)* Consider the following scenario: *A group of five undergraduates---Matt, Ash, Jacki, Rol, and Mike---each read some number of pages from a book each day for 100 days. Two of the undergraduates are a couple and so their number of pages is positively correlated, however all the others are independent.* Please sketch what that dataset could look like and then sketch a graph that you could build to show all observations.
2. *(Simulate)* Please further consider the scenario described and simulate the situation. Please include five tests based on the simulated data. Submit a link to a GitHub Gist that contains your code.
3. *(Acquire)* Please describe a possible source of such a dataset.
4. *(Explore)* Please use `ggplot2` to build the graph that you sketched using the data that you simulated. Submit a link to a GitHub Gist that contains your code.
5. *(Communicate)* Please write two paragraphs about what you did.

### Questions {.unnumbered}
### Quiz {.unnumbered}

1. Which of the following best describes an API in the context of data gathering (pick one)?
a. A standardized set of functions to process data locally.
Expand Down Expand Up @@ -1628,8 +1628,42 @@ un_data |>
b. The programming language used.
c. The quality and resolution of the scanned image.
d. The number of pages in the document.

### Tutorial {.unnumbered}
20. Based on @cirone, which of the following is NOT a common threat to inference when working with historical data (pick one)?
a. Selection bias.
b. Confirmation bias.
c. Time decay.
d. Over-representation of marginalized groups.
21. Based on @cirone, what is the "drunkard's search" problem in historical political economy (and more generally) (pick one)?
a. Selecting data that is easiest to access without considering representativeness.
b. Searching for data only from elite sources.
c. Over-relying on digital archives for research.
d. Misinterpreting historical texts due to modern biases.
22. Based on @cirone, what role do Directed Acyclical Graphs (DAGs) play in historical data analysis (pick one)?
a. They improve the accuracy of OCR for historical data.
b. They generate machine-readable text from historical sources.
c. They help researchers visualize and address causal relationships.
d. They serve as metadata to organize historical archives.
23. Based on @Johnson2021Two, what was the focus of early prison data collection by the U.S. Census Bureau in the 19th century (pick one)?
a. Documenting health conditions.
b. Investigating racial differences in sentencing.
c. Recording socioeconomic background and employment.
d. Counting the number of incarcerated people and their demographics.
24. Based on @Johnson2021Two, how does community-sourced prison data differ from state-sourced prison data (pick one)?
a. Community data is collected by government officials.
b. Community data emphasizes lived experiences and prison conditions.
c. State data is less reliable than community data.
d. State data is more reliable than community data.
25. Based on @Johnson2021Two, which of the following limitations does reliance on state-sourced data impose (pick one)?
a. State-sourced data is less reliable than academic studies.
b. It under-represents the prison population.
c. It may reproduce the biases and assumptions of earlier data collections and preclude much our ability to see the whole picture.
d. It focuses on nonviolent offenders only.
26. Based on @Johnson2021Two, what question should ask when looking at prison data collection (pick one)?
a. "Who established the data infrastructure and why?".
b. "How do the economic factors affect prison management?".
c. "Is the data being used to create public policy?".

### Activity {.unnumbered}

Please redo the web scraping example, but for one of: [Australia](https://en.wikipedia.org/wiki/List_of_prime_ministers_of_Australia), [Canada](https://en.wikipedia.org/wiki/List_of_prime_ministers_of_Canada), [India](https://en.wikipedia.org/wiki/List_of_prime_ministers_of_India), or [New Zealand](https://en.wikipedia.org/wiki/List_of_prime_ministers_of_New_Zealand).

Expand Down
145 changes: 111 additions & 34 deletions 08-hunt.qmd

Large diffs are not rendered by default.

24 changes: 12 additions & 12 deletions docs/02-drinking_from_a_fire_hose.html
Original file line number Diff line number Diff line change
Expand Up @@ -834,18 +834,18 @@ <h3 data-number="2.2.2" class="anchored" data-anchor-id="simulate"><span class="
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a>simulated_data</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 151 × 2
Division Party
&lt;int&gt; &lt;chr&gt;
1 1 National
2 2 National
3 3 Other
4 4 Liberal
5 5 Green
6 6 National
7 7 Labor
8 8 Liberal
9 9 Other
10 10 Other
Division Party
&lt;int&gt; &lt;chr&gt;
1 1 Green
2 2 Green
3 3 Labor
4 4 Other
5 5 Labor
6 6 Liberal
7 7 Green
8 8 Other
9 9 Labor
10 10 Labor
# ℹ 141 more rows</code></pre>
</div>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/03-workflow.html
Original file line number Diff line number Diff line change
Expand Up @@ -1540,7 +1540,7 @@ <h3 data-number="3.6.5" class="anchored" data-anchor-id="parallel-processing"><s
</div>
<div class="sourceCode cell-code" id="cb39"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb39-1"><a href="#cb39-1" aria-hidden="true" tabindex="-1"></a><span class="fu">toc</span>()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Second bit of code: 3.007 sec elapsed</code></pre>
<pre><code>Second bit of code: 3.008 sec elapsed</code></pre>
</div>
</div>
<p>And so we know that there is something slowing down the code. (In this artificial case it is <code>Sys.sleep()</code> causing a delay of three seconds.)</p>
Expand Down
66 changes: 57 additions & 9 deletions docs/07-gather.html
Original file line number Diff line number Diff line change
Expand Up @@ -492,9 +492,9 @@ <h2 id="toc-title">Table of contents</h2>
</ul></li>
<li><a href="#exercises" id="toc-exercises" class="nav-link" data-scroll-target="#exercises"><span class="header-section-number">7.5</span> Exercises</a>
<ul class="collapse">
<li><a href="#scales" id="toc-scales" class="nav-link" data-scroll-target="#scales">Scales</a></li>
<li><a href="#questions" id="toc-questions" class="nav-link" data-scroll-target="#questions">Questions</a></li>
<li><a href="#tutorial" id="toc-tutorial" class="nav-link" data-scroll-target="#tutorial">Tutorial</a></li>
<li><a href="#practice" id="toc-practice" class="nav-link" data-scroll-target="#practice">Practice</a></li>
<li><a href="#quiz" id="toc-quiz" class="nav-link" data-scroll-target="#quiz">Quiz</a></li>
<li><a href="#activity" id="toc-activity" class="nav-link" data-scroll-target="#activity">Activity</a></li>
</ul></li>
</ul>
<div class="toc-actions"><ul><li><a href="https://github.com/RohanAlexander/telling_stories/edit/main/07-gather.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li></ul></div></nav>
Expand Down Expand Up @@ -2191,8 +2191,8 @@ <h3 data-number="7.4.3" class="anchored" data-anchor-id="optical-character-recog
</section>
<section id="exercises" class="level2" data-number="7.5">
<h2 data-number="7.5" class="anchored" data-anchor-id="exercises"><span class="header-section-number">7.5</span> Exercises</h2>
<section id="scales" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="scales">Scales</h3>
<section id="practice" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="practice">Practice</h3>
<ol type="1">
<li><em>(Plan)</em> Consider the following scenario: <em>A group of five undergraduates—Matt, Ash, Jacki, Rol, and Mike—each read some number of pages from a book each day for 100 days. Two of the undergraduates are a couple and so their number of pages is positively correlated, however all the others are independent.</em> Please sketch what that dataset could look like and then sketch a graph that you could build to show all observations.</li>
<li><em>(Simulate)</em> Please further consider the scenario described and simulate the situation. Please include five tests based on the simulated data. Submit a link to a GitHub Gist that contains your code.</li>
Expand All @@ -2201,8 +2201,8 @@ <h3 class="unnumbered anchored" data-anchor-id="scales">Scales</h3>
<li><em>(Communicate)</em> Please write two paragraphs about what you did.</li>
</ol>
</section>
<section id="questions" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="questions">Questions</h3>
<section id="quiz" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="quiz">Quiz</h3>
<ol type="1">
<li>Which of the following best describes an API in the context of data gathering (pick one)?
<ol type="a">
Expand Down Expand Up @@ -2363,10 +2363,58 @@ <h3 class="unnumbered anchored" data-anchor-id="questions">Questions</h3>
<li>The quality and resolution of the scanned image.</li>
<li>The number of pages in the document.</li>
</ol></li>
<li>Based on <span class="citation" data-cites="cirone">Cirone and Spirling (<a href="99-references.html#ref-cirone" role="doc-biblioref">2021</a>)</span>, which of the following is NOT a common threat to inference when working with historical data (pick one)?
<ol type="a">
<li>Selection bias.</li>
<li>Confirmation bias.</li>
<li>Time decay.</li>
<li>Over-representation of marginalized groups.</li>
</ol></li>
<li>Based on <span class="citation" data-cites="cirone">Cirone and Spirling (<a href="99-references.html#ref-cirone" role="doc-biblioref">2021</a>)</span>, what is the “drunkard’s search” problem in historical political economy (and more generally) (pick one)?
<ol type="a">
<li>Selecting data that is easiest to access without considering representativeness.</li>
<li>Searching for data only from elite sources.</li>
<li>Over-relying on digital archives for research.</li>
<li>Misinterpreting historical texts due to modern biases.</li>
</ol></li>
<li>Based on <span class="citation" data-cites="cirone">Cirone and Spirling (<a href="99-references.html#ref-cirone" role="doc-biblioref">2021</a>)</span>, what role do Directed Acyclical Graphs (DAGs) play in historical data analysis (pick one)?
<ol type="a">
<li>They improve the accuracy of OCR for historical data.</li>
<li>They generate machine-readable text from historical sources.</li>
<li>They help researchers visualize and address causal relationships.</li>
<li>They serve as metadata to organize historical archives.</li>
</ol></li>
<li>Based on <span class="citation" data-cites="Johnson2021Two">Johnson (<a href="99-references.html#ref-Johnson2021Two" role="doc-biblioref">2021</a>)</span>, what was the focus of early prison data collection by the U.S. Census Bureau in the 19th century (pick one)?
<ol type="a">
<li>Documenting health conditions.</li>
<li>Investigating racial differences in sentencing.</li>
<li>Recording socioeconomic background and employment.</li>
<li>Counting the number of incarcerated people and their demographics.</li>
</ol></li>
<li>Based on <span class="citation" data-cites="Johnson2021Two">Johnson (<a href="99-references.html#ref-Johnson2021Two" role="doc-biblioref">2021</a>)</span>, how does community-sourced prison data differ from state-sourced prison data (pick one)?
<ol type="a">
<li>Community data is collected by government officials.</li>
<li>Community data emphasizes lived experiences and prison conditions.</li>
<li>State data is less reliable than community data.</li>
<li>State data is more reliable than community data.</li>
</ol></li>
<li>Based on <span class="citation" data-cites="Johnson2021Two">Johnson (<a href="99-references.html#ref-Johnson2021Two" role="doc-biblioref">2021</a>)</span>, which of the following limitations does reliance on state-sourced data impose (pick one)?
<ol type="a">
<li>State-sourced data is less reliable than academic studies.</li>
<li>It under-represents the prison population.</li>
<li>It may reproduce the biases and assumptions of earlier data collections and preclude much our ability to see the whole picture.</li>
<li>It focuses on nonviolent offenders only.</li>
</ol></li>
<li>Based on <span class="citation" data-cites="Johnson2021Two">Johnson (<a href="99-references.html#ref-Johnson2021Two" role="doc-biblioref">2021</a>)</span>, what question should ask when looking at prison data collection (pick one)?
<ol type="a">
<li>“Who established the data infrastructure and why?”.</li>
<li>“How do the economic factors affect prison management?”.</li>
<li>“Is the data being used to create public policy?”.</li>
</ol></li>
</ol>
</section>
<section id="tutorial" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="tutorial">Tutorial</h3>
<section id="activity" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="activity">Activity</h3>
<p>Please redo the web scraping example, but for one of: <a href="https://en.wikipedia.org/wiki/List_of_prime_ministers_of_Australia">Australia</a>, <a href="https://en.wikipedia.org/wiki/List_of_prime_ministers_of_Canada">Canada</a>, <a href="https://en.wikipedia.org/wiki/List_of_prime_ministers_of_India">India</a>, or <a href="https://en.wikipedia.org/wiki/List_of_prime_ministers_of_New_Zealand">New Zealand</a>.</p>
<p>Plan, gather, and clean the data, and then use it to create a similar table to the one created above. Write a few paragraphs about your findings. Then write a few paragraphs about the data source, what you gathered, and how you went about it. What took longer than you expected? When did it become fun? What would you do differently next time you do this? Your submission should be at least two pages, but likely more.</p>
<p>Use Quarto, and include an appropriate title, author, date, link to a GitHub repo, and citations. Submit a PDF.</p>
Expand Down
Binary file modified docs/07-gather_files/figure-html/fig-readioovertime-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 7cf316b

Please sign in to comment.