diff --git a/2-keras.md b/2-keras.md index 084d0291..1a4a66c9 100644 --- a/2-keras.md +++ b/2-keras.md @@ -479,7 +479,7 @@ For the one-hot encoding that we selected before a fitting loss function is the In Keras this is implemented in the `keras.losses.CategoricalCrossentropy` class. This loss function works well in combination with the `softmax` activation function we chose earlier. -The Categorical Crossentropy works by comparing the probabilities that the +The *categorical cross-entropy* works by comparing the probabilities that the neural network predicts with 'true' probabilities that we generated using the one-hot encoding. This is a measure for how close the distribution of the three neural network outputs corresponds to the distribution of the three values in the one-hot encoding. It is lower if the distributions are more similar. @@ -519,6 +519,14 @@ history = model.fit(X_train, y_train, epochs=100) The fit method returns a history object that has a history attribute with the training loss and potentially other metrics per training epoch. + +### Setting baseline expectations +What might be a good value for loss here when looking at categorical cross-entropy loss? In a classification context, we can establish a baseline by determining what loss we would get for simply guessing each class randomly. If predictions are completely random, the cross-entropy loss will be approximately log(n), where n is the number of classes. Any useful model should be able to outperform (have a lower loss) this baseline. +```python +import numpy as np +np.log(3) +``` + It can be very insightful to plot the training loss to see how the training progresses. Using seaborn we can do this as follow: ```python diff --git a/fig/03_tensorboard.png b/fig/03_tensorboard.png old mode 100755 new mode 100644 diff --git a/fig/04_conv_image.png b/fig/04_conv_image.png old mode 100755 new mode 100644 diff --git a/md5sum.txt b/md5sum.txt index a50325f0..99aafea6 100644 --- a/md5sum.txt +++ b/md5sum.txt @@ -6,7 +6,7 @@ "links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2024-06-04" "paper.md" "9a65da74572113c228e0c225f62e7b78" "site/built/paper.md" "2024-06-04" "episodes/1-introduction.Rmd" "70b9b736d80b1f243d82bfae89537b0e" "site/built/1-introduction.md" "2024-06-04" -"episodes/2-keras.Rmd" "88fb676ec63d72886c544e305e66cd4a" "site/built/2-keras.md" "2024-06-04" +"episodes/2-keras.Rmd" "6c31e04a20e52d4a156f629cd2f6c5af" "site/built/2-keras.md" "2024-06-04" "episodes/3-monitor-the-model.Rmd" "d9a73639b67c9c4a149a28d1df067dd2" "site/built/3-monitor-the-model.md" "2024-06-04" "episodes/4-advanced-layer-types.Rmd" "a62388b287760267459c61e677d3379d" "site/built/4-advanced-layer-types.md" "2024-06-04" "episodes/5-transfer-learning.Rmd" "03f95721c1981d0fdc19e3d1f5da35ec" "site/built/5-transfer-learning.md" "2024-06-04" @@ -19,4 +19,4 @@ "learners/reference.md" "ae95aeca6d28f5f0f994d053dc10d67c" "site/built/reference.md" "2024-06-04" "learners/setup.md" "8833a7eec970679022fca980623323ff" "site/built/setup.md" "2024-06-04" "profiles/learner-profiles.md" "698c27136a1a320b0c04303403859bdc" "site/built/learner-profiles.md" "2024-06-04" -"renv/profiles/lesson-requirements/renv.lock" "645b9b8534f309dd234b877a28ce71e8" "site/built/renv.lock" "2024-06-04" +"renv/profiles/lesson-requirements/renv.lock" "38d3eab909262adbaf1bf0ffe6f2fce7" "site/built/renv.lock" "2024-06-04"