Simply:
categorical_crossentropy
(cce
) produces a one-hot array containing the probable match for each category,sparse_categorical_crossentropy
(scce
) produces a category index of the most likely matching category.
Consider a classification problem with 5 categories (or classes).
- In the case of
cce
, the one-hot target may be[0, 1, 0, 0, 0]
and the model may predict[.2, .5, .1, .1, .1]
(probably right) - In the case of
scce
, the target index may be [1] and the model may predict: [.5].
Consider now a classification problem with 3 classes.
- In the case of
cce
, the one-hot target might be[0, 0, 1]
and the model may predict[.5, .1, .4]
(probably inaccurate, given that it gives more probability to the first class) - In the case of
scce
, the target index might be[0]
, and the model may predict[.5]
Many categorical models produce scce
output because you save space, but lose A LOT of information (for example, in the 2nd example, index 2 was also very close.) I generally prefer cce
output for model reliability.
There are a number of situations to use scce
, including:
- when your classes are mutually exclusive, i.e. you don’t care at all about other close-enough predictions,
- the number of categories is large to the prediction output becomes overwhelming.