On the following section, we will take for example spam filtering, where the challenge is to detect spam messages among normal messages (called also ham).

1 What means true / false vs Negative / Positives

For a binary classification, Negative mean that the data we have (raw or predicted) says no to the question we would like to answer, and Positive correspond to a yes.

For instance, for Spam filters, the question is Is it a spam ?. In that case, negative stands for normal emails, and positive for spam message.

True and False correspond to the prediction we have made. If the classification is correct, it is true, else false. If an email is considered as spam and is effectively a spam, the test for this message is true, same for normal email classed as normal.

When talking about a true positive, the positive/negative term is relative to the classification (and not to the original class). true positive is a spam detected as spam whereas false positive is a message labeled as normal, but which is in fact a spam.

We can now those value if we have labbeled data, so mostly in supervised learning, using a training and a test set.

To summarize, considering the spam question

False True
Negative Spam received Spam blocked
Positive Email blocked Email received

NB : it is unacceptable to not receive normal email. False Positives is the value to reduce the most.

2 Derivated ranking values

Sensitivity / True Positive rate / Recall

Probability to really detect all spam messages as spam (you miss none of them) 1 : all the spam messages has been found. 0 : None of them has been correctly classified

NB : It doesn't take into account normal messages considered as spam. You can have a value of 1 if you classify everything as spam !!

\frac{True Positives}{True Positive + False Negatives}

Specificity / True Negative Rate/ Negative Recall

Same as sensitivity, but for ham. 1 : All ham (of your prediction) have been detected as ham. 0 : None of the ham messages have been detected

\frac{True Negatives}{True Negatives + False Positives}


Capacity to detect only spam. 1 : What you have detected is only spam 0 : You didn't classify as spam any real spam messages, only normal messages.

\frac{True Positives}{(True + False) Positives}

Fall-out :

Same as Precision, but for detecting only ham.

\frac{False Positives}{True Negative + False Negative}

Miss Rate :

Probability to classify spam as ham 0 : good, no spam classified as ham 1 : very bad

\frac{False Negatives}{(True + False )Positives}

NB : Miss-rate + Precision = 1

Accuracy :

Probability to do a good classification 1 : Good 0 : Bad

\frac{True (Positives + Negatives)}{(True + False)(Positives + Negatives)}

Error Rate :

Errors of classification 0 : Good 1 : Bad

\frac{False (Positives + Negatives)}{(True + False)(Positives + Negatives)}

Error Rate + Accuracy = 1


It can be useful to have a single value instead of multiples to summarize, if we are juste interested by a global score.

F is a mixture of accuracy and recall

F_1 = 2 \cdot \frac{Recall \dot Precision}{Recall + Precision}

More generally, for taking more into account

F_{\beta} = (1 + \Beta^2)\frac{Recall \cdot Precision}{(\Beta^2 \cdot Precision + Recall)}


G stands for geometric

G = \sqrt{Precision \cdot Recall}

3 Multiple classification


For multiple class classification (c stands for class), you can summarize the aboved values into F.

F_1 = \frac{2 \dot accuracy_c \dot recall_c}{accuracy_c + recall_c}
Indice c stands for class c

\mu F_1 = \Sigma_c w_c \dot F_{1,c}

In the case of:

  • macro
    w_c = \frac{1}{|L|}
  • μ Text Classification :
    w_c = \frac{|\{ u\in D | u=c \}|}{|D|}