Skip to main content

Model Report

How to obtain model report?

The platform can not only generates a ML model , but also gathers remarkable insights and findings about the dataset and the ML task during ML process in model report, which provides insights for better understandings the data and the task.

You can click “Download report” to obtain the model report, as shown in the following figure:

image

After downloading the file, you can obtain the report_*.zip file. Decompress the file using the decompression tool to obtain the model report.

info

For example of model report download, please refer to Appendix 2: Model Report Example.

Model Report Interpretation

The platform will automatically generate a compressed model report for users, and the content after decompression is roughly as follows:

image

This includes the original image folder and report files in different formats (Word, PDF and Markdown are supported).

info

Below we will show you how to access the relevant contents of the report through a model report of an example classification task.

The report provided by this platform includes three aspects: feature distribution of training data, model visualization, and various indicators of validation data.

Feature Spatial Distribution

Note that the following metrics, including some “importance” metrics don’t mean any replaceability among features because advanced features are interpretations from different perspectives.

We provide various feature validity analysis methods, which can be simply divided into before-training methods and after-training methods. For example, before training, we can select high relevant features to reduce model complexity. After training, if we use a tree model, we will get the importance value of each feature, which can also be used to repeatedly adjust our trained model.(For the sake of the length and brevity of the report, we only analyze and output top part of the data.)

1.Data Analysis Before Training

1.1 Heat Map

Heap map shows the correlation between features. Here is the correlation between the top 10 features of the model feature importance and the label.

The correlation coefficient calculated here is the Pearson correlation coefficient, and its formula is as follows:

ρX,Y=cov(X,Y)σXσY\rho_{X,Y} = \frac{\text{cov}(X,Y)}{\sigma_X\sigma_Y}

The Pearson correlation coefficient measures the linear correlation between two variables and ranges from -1 to 1. Specifically speaking:

  • 1 indicates a perfectly positive correlation: when one variable increases, the other variable increases by an equal proportion.

  • 0 means no correlation: there is no linear relationship between the two variables.

  • -1 indicates a perfect negative correlation: one variable increases and the other decreases by an equal proportion.

image

1.2 Kernel Density Estimation(KDE)

info

Kernel Density Estimation (KDE) is a nonparametric method for estimating probability density functions. It places a kernel (usually a normally distributed kernel) around each data point and then accumulate kernels and form a smooth estimated probability density function.

At present, we choose advanced features or primitive features with the largest feature contribution for kernel density estimation.

Kernel Density Estimation maps can help you understand the distribution of individual variables, including peaks and the shape of the distribution. The data can be analyzed from the following two aspects:

1.Peak value: The peak represents the height of the highest point in the probability density estimate plot.

  • In KDE, the higher the peak, the higher the density of data points at that location, that is, the more concentrated the data points near that location.

  • The height of the peak does not directly give a probability value, but can be used to compare data density at different locations.

2.Overall shape: The overall shape of the probability density estimate map reflects the trend of the data distribution.

  • The smoothness and volatility of the graph can be used to judge the variability of the data.

  • For example, a flat KDE plot indicates that the data is relatively evenly distributed, while a graph with peaks and fluctuations may indicate that the data is more dense in some areas.

image

1.3 Box Plot

info

Box plot is an effective tool for visualizing data distribution and outliers.

At present, we choose the advanced feature or the original feature with the greatest feature contribution to draw the boxplot.

When analyzing a boxplot, you can focus on the following key elements:

(1)Box: The box shows the quartile range of the data, i.e. the middle 50% of the data.

  • The bottom and top of the box indicate the 1st quartile (Q1, lower quartile) and 3rd quartile (Q3, upper quartile), respectively, while the lines inside the box indicate the median (Q2).

(2)Whiskers: Whiskers must extend from both ends of the box to represent the maximum and minimum values of the data, but do not take into account outliers.

  • The length of the whiskers is usually based on the distribution of the data, and the specific calculation method may vary.

(3)Outliers: In a boxplot, data points that exceed 1.5 times the required quartile distance are usually defined as outliers and are represented by points.

  • An outlier may be an outlier in a data set.

(4)Overall shape: Looking at the overall shape of the boxplot can provide information about the distribution of the data.

  • For example, the length and position of the box, the extension of the whiskers, and the distribution of outliers.
image

1.4 Feature and label feature curves

info

Seeing if trends in features and labels are consistent can help build more efficient, interpretive, and generalizing machine learning models.

At present, we select the advanced features or original features with the largest feature contribution for curve drawing, and select the first 100 samples for display.

We normalize features to between [1, 2] and labels to between [0, 1].

image

2.Feature Analysis After Training

2.1 Feature importance

The degree to which each feature contributes to the tree model, the following is an overview of the feature importance scores for common tree models.

(1)LightGBM: LightGBM uses a tree-based learning algorithm, and feature importance is mainly based on Split Gain.

(2)CatBoost: CatBoost is also a tree-based learning algorithm, and its feature importance calculation is similar to LightGBM, mainly based on split gain.

(3)XGBoost: XGBoost assesses the importance of a feature by calculating its Gain.

 XGBoost also provides a way to calculate the importance of features based on Coverage, which represents how often each feature is used in the tree.

(4)Random Forest: Random Forest uses indicators such as Gini impurity or Information Gain to select the best fragmentation features.

 The sum or average of these times can be used to measure the importance of the feature.

(5)Extra Trees: Limit trees are a variant of random forests that use more randomness when nodes split. Feature importance is calculated in a manner similar to that of a random forest.

info

When the tree model is split, it may use the same feature several times, and the importance of its feature is the weighted average of these nodes. Different models have different strategies.

image

3.How to Data Analysis

The general manual build process for machine learning is as follows:

(1)Data analysis to understand the distribution of data, such as:

  • Check for redundancy: In general, if the absolute value of the correlation coefficient between features in the thermal map is greater than 0.8, the feature selection should be carefully considered.

  • View the correlation between features and labels: In general, the higher the correlation coefficient between features and labels in the thermal map, the higher the importance of the features is likely to be.

  • Look at the probability density distribution of features: Determine if the data is dense, which may result in a small degree of differentiation.

  • Other methods.

(2)Feature engineering

 For example, if the correlation between feature A and feature B is very low, A simple combination of feature A and feature B (such as addition, subtraction, multiplication and division) can get feature C, and feature C may be of great help to the model.

 To take a simple example, let's say we measure the level of obesity. We have height (hh) and weight (ww), w÷h2w \div h^2 is a feature discovered through feature combination.

 This feature contributes significantly to the classification, and that's BMI, which gives it the physical meaning of body mass index, a measure of how obese people are.

(3)Model tuning

Model tuning is mainly divided into two parts: selecting the most suitable model and selecting the best model parameters.

Feature engineering and model tuning generally go together, and it can be simply understood that different data sets should correspond to different optimal models. Then these two steps will take a lot of time and effort to achieve.

info

Data analysis using ChangTianML:

1.Now that you have a valid combination of feature generation, you can reverse reason about the physical meaning of the new feature, for example: Taking BMI (w÷h2w \div h^2) as an example, we can explain: In order to consider the obesity degree of different height, we adopted the division method. The introduction of the square term makes the index more sensitive to the change of height, so as to better reflect the ratio of weight to height.

2.If you need more detailed data details, you can use the changtianml tool for analysis.

Model Visualization

warning

Model visualization currently only supports LightGBM, if the final model is not LightGBM, then this part of the image may not be generated. Visualizations of other models are under development, so stay updated!

1.Structure Visualization of Tree Model

info

Importance of tree model visualization:

(1) As an interpretable model, you can clearly see the split conditions of each decision node and the predicted results of the leaf nodes. This helps to understand how the model makes predictions based on input features, making the working process of the model more transparent and explainable.

(2) You can understand the impact of features on the prediction result, such as whether feature A places the data into the correct category at a certain threshold.

(3) A visual tree model helps explain how the model works to non-specialists, stakeholders, or team members.

The following describes the display of various tree models.

Interpretation of tree model diagrams

  • Select the feature XiX_i, and the feature XiX_i will split according to the threshold SiS_i. The left is less than or equal to the threshold SiS_i, and the right is vice versa.

  • If a good result is obtained in the split sample, the split is stopped, and if not, the feature XiX_i split continues to be selected.

  • Repeat the two steps above until all samples have been assigned.

Vertical expansion of the tree model

image

Horizontal expansion of the tree model

image

Simplified expansion of the tree model

image

A shallow expansion of the tree model

image

2.Prediction Path of Decision Tree

info

The prediction path of the decision tree can visualize the process of seeing how the model determines which category a particular sample belongs to.

The full prediction path of the decision tree

The prediction path has been marked with orange boxes. A random sample of the test set is selected and the specific information of the sample is listed below.

image

Prediction path simplification of decision tree

image

3.Tree Leaf Node Sample Tree Statistics

info

The statistics of how many samples are included when each leaf node of the tree model is trained.

image
info

The training of each leaf node of the tree model includes statistics on how many samples, and how many of each leaf node belong to each category.

image

Validation Set Metrics

info

ROC curve and confusion matrix currently only support classification tasks with output categories less than or equal to 5.

1.ROC Curve

The Receiver Operating Characteristic Curve (ROC Curve) is a graphical tool used to evaluate the performance of a classifier. Focus on two important performance metrics: True Positive Rate (TPR) and False Positive Rate (FPR).

info

True Positive Rate (Sensitivity or recall rate): TPR represents the proportion of all actual positive cases in which the model is successfully identified as positive. On the ROC curve, TPR corresponds to the vertical axis, ranging from 0 to 1.

False Positive Rate (False positive case rate): FPR represents the proportion of all actual negative cases in which the model is incorrectly identified as positive. On the ROC curve, FPR corresponds to the horizontal axis, again ranging from 0 to 1.

Meaning and use:

  • Model comparison: ROC curves provide a visual tool for comparing the performance of different models. The area under the curve (AUC) is used to quantify the performance of different models in the entire ROC space, and the larger the AUC, the better the model performance.

  • Threshold selection: The ROC curve helps select a classification threshold. Different application scenarios may pay different attention to False Positive Rate and True Positive Rate. You can balance the false positive rate and true positive rate by adjusting the classification threshold.

  • Diagnostic performance: ROC curves show the performance of the model at different operating points to help you understand the diagnostic performance of the model, especially in binary classification problems.

  • Not affected by class imbalance: The ROC curve is more robust for class imbalance problems because it is based on the proportional relationship between true and false positive case rates.

image

2.Confusion Matrix

info

Validation set metrics (Accuracy, Recall,Precision,F1) can be obtained in AutoML of the task log.

  • True Positives(TP): The model correctly predicts positive examples as positive examples.

  • True Negatives(TN): The model correctly predicts negative examples as negative examples.

  • False Positives(FP): The model incorrectly predicts negative examples as positive examples.

  • False Negatives(FN): The model incorrectly predicts positive examples as negative examples.

The meaning and use of confusion matrix:

  • Performance evaluation: The Confusion Matrix provides a comprehensive performance evaluation that provides an intuitive understanding of the model's predictive accuracy across different categories.

  • Accuracy calculation: Accuracy is the proportion of the total number of samples correctly predicted by the classifier to the total number of samples, which can be calculated from the confusion matrix. Accuracy=TruePositives+TrueNegativesTotalSamplesAccuracy=\frac{True Positives + True Negatives}{Total Samples},The accuracy of this task is 0.7847533632286996.

  • Recall calculation: The recall rate (also known as sensitivity or true case rate) represents the proportion of positive cases correctly identified by the model out of all actual positive cases. Recall=TruePositivesTruePositives+FalseNegativesRecall=\frac{True Positives}{True Positives + False Negatives},The recall rate of this task is: 0.7847533632286996.

  • Precision calculation: Accuracy represents the proportion of samples that the model predicted to be positive examples that actually were. Precision=TruePositivesTruePositives+FalsePositivesPrecision=\frac{True Positives}{True Positives + False Positives},The accuracy of this task is 0.7839107317605237.

  • F1 Score calculation: The F1 score is a harmonic average of accuracy and recall, which is used to consider the accuracy and comprehensiveness of the model. F1=2PrecisionRecallPrecision+RecallF1=\frac{2*Precision*Recall}{Precision+Recall},The F1 of this task is 0.7842670856605463.

image

Appendix 1: Mapping Table of Advanced Features and Feature Contribution Degree

info

When the model is visualized, the bundled features may not be clear. You can query the corresponding advanced features and feature contribution degree through the following table.

Feature NameAdvanced FeatureContribution Degree
AdvFeat1AdvFeat1(Ticket÷Fare)÷(cos(Ticket))(Ticket\div Fare)\div (cos(Ticket))35.55555555555556
AdvFeat2AdvFeat2(cos(Fare))÷(Ticket÷Age)(cos(Fare))\div (Ticket\div Age)62.22222222222222
AdvFeat3AdvFeat3((cos(Fare))÷(sin(Ticket)))÷(Min((Fare÷Age)(AggMax(Age,Cabin))((cos(Fare))\div (sin(Ticket)))\div (Min((Fare\div Age)(AggMax(Age,Cabin))68.88888888888889
AdvFeat4AdvFeat4(cos(Fare))×(Ticket÷Age)(cos(Fare))\times (Ticket\div Age)31.11111111111111
AdvFeat5AdvFeat5Min(((sin(Ticket))(Age+Ticket)))((cos(Fare))÷(sin(Ticket))))Min(((sin(Ticket))-(Age+Ticket)))((cos(Fare))\div (sin(Ticket))))11.11111111111111
AdvFeat6AdvFeat6((sin(Ticket))×(cos(Fare)))×(Max((Age×Ticket)(Ticket÷Fare)))((sin(Ticket))\times (cos(Fare)))\times (Max((Age\times Ticket)(Ticket\div Fare)))20.0
AdvFeat7AdvFeat7(cos(Fare))÷(Fare×Ticket)(cos(Fare))\div (Fare\times Ticket)17.77777777777778
AdvFeat8AdvFeat8(cos(Ticket))÷(Fare+Ticket)(cos(Ticket))\div (Fare+Ticket)40.0
AdvFeat9AdvFeat9(cos(Ticket))÷(Fare×Ticket)(cos(Ticket))\div (Fare\times Ticket)35.55555555555556
AdvFeat10AdvFeat10(Ticket÷Age)÷(cos(Fare))(Ticket\div Age)\div (cos(Fare))24.444444444444443
AdvFeat11AdvFeat11(Fare×Ticket)÷(cos(Ticket))(Fare\times Ticket)\div (cos(Ticket))26.666666666666668
AdvFeat12AdvFeat12(Fare+Ticket)(sin(Ticket)))(Fare+Ticket)-(sin(Ticket)))55.55555555555556
AdvFeat13AdvFeat13(Ticket÷Age)+(Fare+Ticket))(Ticket\div Age)+(Fare+Ticket))22.22222222222222
AdvFeat14AdvFeat14(cos(Fare))(Fare+Ticket))(cos(Fare))-(Fare+Ticket))26.666666666666668
AdvFeat15AdvFeat15(Ticket÷Fare)÷(cos(Fare))(Ticket\div Fare)\div (cos(Fare))31.11111111111111
AdvFeat16AdvFeat16(sin(Ticket))÷(Ticket÷Age)(sin(Ticket))\div (Ticket\div Age)37.77777777777778
AdvFeat17AdvFeat17(cos(Fare))×(Fare×Ticket)(cos(Fare))\times (Fare\times Ticket)15.555555555555555
AdvFeat18AdvFeat18(Fare+Ticket)÷(cos(Ticket))(Fare+Ticket)\div (cos(Ticket))44.44444444444444
AdvFeat19AdvFeat19(Fare+Ticket)÷(cos(Fare))(Fare+Ticket)\div (cos(Fare))17.77777777777778
AdvFeat20AdvFeat20Max(((cos(Fare))+(cos(Ticket))))(Min((Fare×Ticket)(Fare÷Age))))Max(((cos(Fare))+(cos(Ticket))))(Min((Fare\times Ticket)(Fare\div Age))))15.55555555555555
AdvFeat21AdvFeat21((cos(Ticket))(sin(Ticket))))(Max((Age×Ticket)(Ticket÷Fare))))((cos(Ticket))-(sin(Ticket))))-(Max((Age\times Ticket)(Ticket\div Fare))))44.44444444444444
AdvFeat22AdvFeat22Max(((cos(Fare))(cos(Ticket))))((cos(Ticket))÷(sin(Ticket))))Max(((cos(Fare))-(cos(Ticket))))((cos(Ticket))\div (sin(Ticket))))46.666666666666664
AdvFeat23AdvFeat23(cos(Fare))÷(Fare+Ticket)(cos(Fare))\div (Fare+Ticket)22.22222222222222
AdvFeat24AdvFeat24((cos(Fare))×(cos(Ticket)))×(sin(sin(Ticket)))((cos(Fare))\times (cos(Ticket)))\times (sin(sin(Ticket)))28.888888888888886
AdvFeat25AdvFeat25(sin(Ticket))÷(Fare×Ticket)(sin(Ticket))\div (Fare\times Ticket)40.0
AdvFeat26AdvFeat26(Fare×Ticket)÷(cos(Fare))(Fare\times Ticket)\div (cos(Fare))13.333333333333334
AdvFeat27AdvFeat27(cos(Fare))+(Fare+Ticket))(cos(Fare))+(Fare+Ticket))20.0
AdvFeat28AdvFeat28Max((cos(Fare))(cos(Ticket)))Max((cos(Fare))(cos(Ticket)))53.333333333333336
AdvFeat29AdvFeat29Count(Sex)Count(Sex)11.11111111111111

Appendix 2: Example Model Report