Model Evaluation Metrics

tip

Evaluation metrics are key tools for evaluating and comparing model performance. The evaluation index is calculated according to the difference between the model's prediction of the validation set and the real label of the validation set.

Classification

Log Loss

info

This metric, also known as logarithmic loss, is an important metric for evaluating the performance of a classification model, especially in binary classification and multi-class classification tasks, where it measures the accuracy of the model's predicted probabilities.

Log Loss = -\frac{1}{N} \sum_{i=1}^{N} [y_i \log(p_i) + (1 - y_i) \log(1 - p_i)]

$N$ : Sample size.
$y_i$ : The actual label for sample $i$ .
$p_i$ : The probability that sample $i$ is predicted to be class 1.
$log$ : Natural logarithm.

Accuracy

info

This metric is called accuracy, and it measures the percentage of the total sample that the model predicts correctly.

Accuracy = \frac{TP+TN}{TP+TN+FP+FN}

$TP$ （True Positives）: The number of true cases, i.e. the number of samples that the model correctly predicts to be positive.
$TN$ （True Negatives）: The number of true negative cases, i.e. the number of samples for which the model correctly predicts a negative class.
$FP$ （False Positives）: The number of false positive cases, i.e. the number of samples in which the model incorrectly predicts a negative class as a positive class.
$FN$ （False Negatives）: The number of false negative cases, i.e. the number of samples in which the model incorrectly predicts a positive class as a negative class.

Precision

info

This metric is called the accuracy rate, and it measures the proportion of all samples predicted by the model to be positive that actually are.

In simple terms, it answers the question: "Out of all the samples predicted to be positive, how many are correct?" .

Precision = \frac{TP}{TP+FP}

$TP$ （True Positives）: The number of true cases, i.e. the number of samples that the model correctly predicts to be positive.
$FP$ （False Positives）: The number of false positive cases, i.e. the number of samples in which the model incorrectly predicts a negative class as a positive class.

Recall

info

This metric is called the recall rate, and it measures the proportion of all samples that are actually positive that are correctly predicted to be positive by the model.

In simple terms, it answers the question: "Of all the truly positive samples, how many are correctly identified by the model?"

Recall = \frac{TP}{TP+FN}

$TP$ （True Positives）: The number of true cases, i.e. the number of samples that the model correctly predicts to be positive.
$FN$ （False Negatives）: The number of false negative cases, i.e. the number of samples in which the model incorrectly predicts a positive class as a negative class.

F1 Score

info

This metric is the harmonic average of Precision and Recall, so both accuracy and completeness of model predictions are considered.

f_1 = \frac{2TP}{2TP+FP+FN}

$TP$ （True Positives）: The number of true cases, i.e. the number of samples that the model correctly predicts to be positive.
$FP$ （False Positives）: The number of false positive cases, i.e. the number of samples in which the model incorrectly predicts a negative class as a positive class.
$FN$ （False Negatives）: The number of false negative cases, i.e. the number of samples in which the model incorrectly predicts a positive class as a negative class.

AUC-ROC

info

ROC stands for Receiver Operating Characteristic, while the AUC stands for Area Under the Curve.

This metric measures the model's ability to distinguish between categories (usually "positive" and "negative").

（1）Calculate TPR and FPR: By changing the classification threshold, calculate the true positive rate (TPR) and false positive rate (FPR) under each threshold.

Horizontal axis: False Positive Rate, $FPR = \frac{FP}{FP + TN}$ .
Vertical axis: True Positive Rate, $TPR = \frac{TP}{TP + FN}$ .
By changing the classification threshold, different TPR and FPR are calculated and plotted as curves.

（2）ROC curve is drawn: FPR is the horizontal axis and TPR is the vertical axis.

（3）Calculate the AUC: Calculate the area under the ROC curve. This is usually achieved by numerical methods such as the trapezoidal rule.

Regression

R-square

info

This metrics is called determination coefficient and is one of the commonly used evaluation indexes in regression tasks. It measures the ratio between the variability of the predicted value of the model and the variability of the actual value (the target variable).

R^2=1-\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}

$n$ : Sample size.
$y_i$ : Actual value.
$\hat{y}_i$ : Predicted value.
$\bar{y}$ : The average of the actual values.

MSE

info

This metrics, called Mean Squared Error, is one of the commonly used evaluation indicators in regression tasks. It measures the average size of the difference between the predicted value of the model and the actual value.

MSE does this by calculating the average of the square of the difference between the predicted value and the actual value. This metric is very sensitive to outliers (i.e., those predicted values that deviate far from the actual value) because the square of the difference amplifies the effect of these values.

MSE=\frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2

$n$ : Sample size.
$y_i$ : Actual value.
$\hat{y}_i$ : Predicted value.

RMSE

info

This metrics is called Root Mean Square Error (RMS) and is one of the commonly used evaluation indicators in regression tasks.

RMSE measures the standard deviation of the difference between the predicted value of the model and the actual observed value. It is the Square root of the Mean Square Error (MSE) and is used to measure the size of the prediction error.

RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}

$n$ : Sample size.
$y_i$ : Actual value.
$\hat{y}_i$ : Predicted value.

MAE

info

This metrics is called Mean Absolute Error and is one of the commonly used evaluation indicators in regression tasks.

It measures the average absolute value of the difference between the predicted value of the model and the actual observed value. MAE provides an intuitive error measure that represents the average absolute deviation between the predicted value and the actual value.

MAE=\frac{1}{n}\sum_{i=1}^{n} |y_i - \hat{y}_i|

$n$ : Sample size.
$y_i$ : Actual value.
$\hat{y}_i$ : Predicted value.

Classification​

Log Loss​

Accuracy​

Precision​

Recall​

F1 Score​

AUC-ROC​

Regression​

R-square​

MSE​

RMSE​

MAE​

Classification

Log Loss

Accuracy

Precision

Recall

F1 Score

AUC-ROC

Regression

R-square

MSE

RMSE

MAE