Model Evaluation Metrics
Evaluation metrics are key tools for evaluating and comparing model performance. The evaluation index is calculated according to the difference between the model's prediction of the validation set and the real label of the validation set.
Classification
Log Loss
This metric, also known as logarithmic loss, is an important metric for evaluating the performance of a classification model, especially in binary classification and multi-class classification tasks, where it measures the accuracy of the model's predicted probabilities.
-
: Sample size.
-
: The actual label for sample .
-
: The probability that sample is predicted to be class 1.
-
: Natural logarithm.
Accuracy
This metric is called accuracy, and it measures the percentage of the total sample that the model predicts correctly.
-
(True Positives): The number of true cases, i.e. the number of samples that the model correctly predicts to be positive.
-
(True Negatives): The number of true negative cases, i.e. the number of samples for which the model correctly predicts a negative class.
-
(False Positives): The number of false positive cases, i.e. the number of samples in which the model incorrectly predicts a negative class as a positive class.
-
(False Negatives): The number of false negative cases, i.e. the number of samples in which the model incorrectly predicts a positive class as a negative class.
Precision
This metric is called the accuracy rate, and it measures the proportion of all samples predicted by the model to be positive that actually are.
In simple terms, it answers the question: "Out of all the samples predicted to be positive, how many are correct?" .
-
(True Positives): The number of true cases, i.e. the number of samples that the model correctly predicts to be positive.
-
(False Positives): The number of false positive cases, i.e. the number of samples in which the model incorrectly predicts a negative class as a positive class.
Recall
This metric is called the recall rate, and it measures the proportion of all samples that are actually positive that are correctly predicted to be positive by the model.
In simple terms, it answers the question: "Of all the truly positive samples, how many are correctly identified by the model?"
-
(True Positives): The number of true cases, i.e. the number of samples that the model correctly predicts to be positive.
-
(False Negatives): The number of false negative cases, i.e. the number of samples in which the model incorrectly predicts a positive class as a negative class.
F1 Score
This metric is the harmonic average of Precision and Recall, so both accuracy and completeness of model predictions are considered.
-
(True Positives): The number of true cases, i.e. the number of samples that the model correctly predicts to be positive.
-
(False Positives): The number of false positive cases, i.e. the number of samples in which the model incorrectly predicts a negative class as a positive class.
-
(False Negatives): The number of false negative cases, i.e. the number of samples in which the model incorrectly predicts a positive class as a negative class.
AUC-ROC
ROC stands for Receiver Operating Characteristic, while the AUC stands for Area Under the Curve.
This metric measures the model's ability to distinguish between categories (usually "positive" and "negative").
(1)Calculate TPR and FPR: By changing the classification threshold, calculate the true positive rate (TPR) and false positive rate (FPR) under each threshold.
-
Horizontal axis: False Positive Rate, .
-
Vertical axis: True Positive Rate, .
-
By changing the classification threshold, different TPR and FPR are calculated and plotted as curves.
(2)ROC curve is drawn: FPR is the horizontal axis and TPR is the vertical axis.
(3)Calculate the AUC: Calculate the area under the ROC curve. This is usually achieved by numerical methods such as the trapezoidal rule.
Regression
R-square
This metrics is called determination coefficient and is one of the commonly used evaluation indexes in regression tasks. It measures the ratio between the variability of the predicted value of the model and the variability of the actual value (the target variable).