On the following section, we will take for example spam filtering, where the challenge is to detect spam messages among normal messages (called also ham).
For a binary classification, Negative mean that the data we have (raw or predicted) says no to the question we would like to answer, and Positive correspond to a yes.
For instance, for Spam filters, the question is Is it a spam ?. In that case, negative stands for normal emails, and positive for spam message.
True and False correspond to the prediction we have made. If the classification is correct, it is true, else false. If an email is considered as spam and is effectively a spam, the test for this message is true, same for normal email classed as normal.
When talking about a true positive, the positive/negative term is relative to the classification (and not to the original class). true positive is a spam detected as spam whereas false positive is a message labeled as normal, but which is in fact a spam.
We can now those value if we have labbeled data, so mostly in supervised learning, using a training and a test set.
To summarize, considering the spam question
False | True | |
---|---|---|
Negative | Spam received | Spam blocked |
Positive | Email blocked | Email received |
NB : it is unacceptable to not receive normal email. False Positives is the value to reduce the most.
Sensitivity / True Positive rate / Recall
Probability to really detect all spam messages as spam (you miss none of them) 1 : all the spam messages has been found. 0 : None of them has been correctly classified
NB : It doesn't take into account normal messages considered as spam. You can have a value of 1 if you classify everything as spam !!
Specificity / True Negative Rate/ Negative Recall
Same as sensitivity, but for ham. 1 : All ham (of your prediction) have been detected as ham. 0 : None of the ham messages have been detected
Precision
Capacity to detect only spam. 1 : What you have detected is only spam 0 : You didn't classify as spam any real spam messages, only normal messages.
Fall-out :
Same as Precision, but for detecting only ham.
Miss Rate :
Probability to classify spam as ham 0 : good, no spam classified as ham 1 : very bad
NB : Miss-rate + Precision = 1
Accuracy :
Probability to do a good classification 1 : Good 0 : Bad
Error Rate :
Errors of classification 0 : Good 1 : Bad
Error Rate + Accuracy = 1
F-score
It can be useful to have a single value instead of multiples to summarize, if we are juste interested by a global score.
F is a mixture of accuracy and recall
More generally, for taking more into account
G-score
G stands for geometric
[TODO]
For multiple class classification (c stands for class), you can summarize the aboved values into F.
In the case of: