-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathshuffle.html
200 lines (195 loc) · 10 KB
/
shuffle.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="style.css" >
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Shuffle Data</title>
<script type="text/javascript" id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js">
</script>
<script>
MathJax = {
tex: {
inlineMath: [['$', '$'], ['\\(', '\\)']]
},
svg: {
fontCache:'global'
}
};
</script>
</head>
<body>
<header class="shuffle-header">
<nav class="nav main-nav">
<ul>
<li><a href="index.html" >Home</a></li>
<!--
<li><a href="blog.html">Blog</a></li>
--!>
</ul>
</nav>
<h1 class="section-header">Card Shuffling</h1>
</header>
<section>
<h2>Motivation</h2>
</section>
<section>
<h2>Methods</h2>
One hundred Magic: The Gathering cards were "single-sleeved" with a
set of <a href="https://www.dragonshield.com/webshop/us/solid-color-sleeves/70-dragon-shield-matte-black.html" target="_blank">Dragonshield Matte Black sleeves.</a>
Each sleeve was then given a label from 1 to 100 corresponding and
the deck was arranged so that the top card was labelled #1, the
second card #2, etc., and finally the bottom card was labeled #100.
<i>Note: The "bottom card" refers to the card in contact with the table
when the deck is placed facedown onto the table.</i>
Then the deck was shuffled using one of the investigated shuffling methods, namely, "side-shuffling" or riffle shuffling.
In later studies, the effect of deck size was also investigated: the total number of cards in the deck was decreased from 100 to 80, 60, or 40.
</section>
<section>
<h2>Analysis</h2>
<p>
The goal of this exercise was to assess how many shuffles it takes
to "sufficiently randomize" a deck of cards.
Unfortunately, "sufficiently randomized" is not an easily quantifiable
metric.
One way this was assessed here, was by exploiting the <em>Shannon Entropy</em>.
The Shannon Entropy can be described as "how little information do I have about a situation."
Mathematically, the Shannon Entropy, $S$, can be described via,
\[
S \propto - \sum_{i=1}^{N_{\text{outcomes}}}p_i \log p_i
\]
where $p_i$ is the probability of outcome $i$.
In a classic example of a coin flip, one observes that the Shannon Entropy is maximized when a coin is fair, that is the probability of heads is equal to that of tails as follows.
In a coin flip, there are two possible outcomes, so $S$ becomes,
\[
S = -k p_i \log p_i + -k(1-p_i)\log (1-p_i).
\]
where k is a positive constant.
To find the value of p_i that maximizes $S$, take the derivative with respect to $p_i$ and set it equal to zero,
\[
\frac{d S}{d p_i} = -p_i \frac{1}{p_i} - \log p_i (1) -
(1-p_i)\frac{-1}{1-p_i} + \log (1-p_i) = 0
\]
Simplifying,
\[
\log \frac{p_i}{1-pi} = 0
\]
\[
p_i = 1 - p_i
\]
\[
p_i = \frac{1}{2}
\]
Then we can find a value for $k$ such that the maximal value of $S$ is $1$.
\[
S(p_i=0.5) = 1 \implies k\log{2} = 1 \implies k = \frac{1}{\log 2}
\]
so for the case of two outcomes, with the indicated maximal value for $S$ and applying the change of base formula,
\[
\boxed{
S = -\sum_{i=1}^{N_{\text{outcomes}}}p_i \log_2 p_i.
}
\]
</p>
<p>
In the general case of N possible outcomes, we can solve for the relationship between the probabilities by optimizing $S$ subject to the constraint that the sum of the probabilities is 1.
\[
\vec{\nabla} \left[ S(\{p_i\}) + \lambda \left(1-\sum_i^N p_i \right)
\right]= \vec{0}
\]
where $\vec{\nabla}$ indicates
\[
\vec{\nabla} =
\begin{pmatrix}
\frac{\partial}{\partial p_1} \\
\frac{\partial}{\partial p_2} \\
\vdots \\
\frac{\partial}{\partial p_N} \\
\frac{\partial}{\partial \lambda}
\end{pmatrix}.
\]
Resulting in $N$ expressions of the form,
\[
-\log p_i - 1 = \lambda.
\]
Setting $\lambda = \lambda$,
\[
-\log p_i -1 = -\log p_j -1 \implies p_i=p_j=p \quad \forall \quad i, j
\]
and the original constraint,
\[
\sum_i^N p_i = 1 = p \sum_i^N 1 = 1 \implies p = 1/N
\]
The Shannon Entropy is visualized in Figure 1 wherein contours of constant $S$ in a three outcome system are shaded. In this figure, $p_3$ is implicit upon specification of $p_1$ and $p_2$.
</p>
<figure>
<img class="image" src="Images/trinomialEntropy.svg">
<figcaption>
Figure 1. Shannon Entropy for a System with 3 Possible Outcomes. The maximum occurs, as expected, when $p_1 = p_2 = p_3 = \frac{1}{3}$
</figcaption>
</figure>
<p>
In the context of shuffling a deck of cards, $S$ was used to assess the probability mass function (PMF) of the location of a card within the deck after $x$ shuffles.
Say, for instance, we know a certain card is on the bottom of the deck. When we shuffle the deck once, we can usually make a decent guess as to where the card is in the new arrangment of cards. That is, the probability that the card is in one location, say in the bottom 10-20 cards is greater than it suddenly having jumped to the top of the deck.
This non-uniform probability mass function (PMF) is precisely what is captured by $S$.
</p>
<p>
As we continue to shuffle, we intuitively "lose track" of where a particular card went, and would be equally likely to find it at any location within the deck.
Figure 2 illustrates this loss of information for a card initially known to located in the middle of the deck, position 50, after shuffling with a side shuffle the indicate number of times.
After the first shuffle, the PMF contains two peaks corresponding to the top and bottom of the deck in accordance with whether the deck was cut just before or just after the 50th card before beginning the side shuffle, respectively.
<i>
Note: in frequency trail plots, each curve is offset by a set amount to give an impression of the progression (or in this case regression) of a distribution with respect to some dependent variable. This means that the absolute "y"-values for two curves with different shading ought be compared with caution.
</i>
<figure>
<img class="image" src="Images/ProbabilityRedistributionPos50.svg">
<figcaption>
Figure 2. Frequency trail of the evolution of the PMF of card initially located at the 50th position in the deck for 167 side-shuffles. After 5 shuffles, the PMF is relatively uniform.
</figcaption>
</figure>
Figures 3 through 6 show much the same trend, with a marked decay in information on the whereabouts of the initial card.
<figure>
<img class="image" src="Images/ProbabilityRedistributionPos1.svg">
<figcaption>
Figure 3. Frequency trail for a card initially at the 1st position.
</figcaption>
</figure>
<figure>
<img class="image" src="Images/ProbabilityRedistributionPos25.svg">
<figcaption>
Figure 4. Frequency trail for a card initially at the 25th position.
</figcaption>
</figure>
<figure>
<img class="image" src="Images/ProbabilityRedistributionPos75.svg">
<figcaption>
Figure 5. Frequency trail for a card initially at the 75th position.
</figcaption>
</figure>
<figure>
<img class="image" src="Images/ProbabilityRedistributionPos100.svg">
<figcaption>
Figure 6. Frequency trail for a card initially at the 100th position.
</figcaption>
</figure>
</p>
<p>
Additionally, the PMF associated with how many cards were in the top pile after cutting the deck in two halves was compared to the established model of Gilber-Shannon-Reeds (Figure 7.). This model predicts the probability of having $k$ cards in the top pile after cutting an $n$-card deck to be,
\[
p(k) =
\begin{pmatrix}
n\\
k
\end{pmatrix}
2^{-n}.
\]
Curiously, my data are suggestive of a significantly sharper peaked distribution. To assess the confidence in the distribution of my data, my next step will be to perform bootstrap sampling and obtain error bars for each $p(k)$.
<figure>
<img class="image" src="Images/cut.svg">
<figcaption>
Figure 7. Probability of having N cards in the top half of the deck after cutting.
</figcaption>
</figure>
</p>
</section>
</body>
</html>