changtianml

info

The tabular_incrml class provides a range of methods for working with tabular data, model training, and prediction. The following is the public API for this class and its instructions for us.

Constructor Function

`init`

init(path: str)

【Instructions】

Initializes an instance of the tabular_incrml class.

【Arguments】

path (str): The path to the directory where the ChangTianML training results are located.

【Return】

None

Prediction Function

`predict`

predict(test_path: Union[str, pd.DataFrame]) -> np.ndarray

【Instructions】

Use models to make predictions about test data.

【Arguments】

test_path (str or pd.DataFrame): The following three loading modes are available:

（1）Enter the address of the csv data set.

（2）Enter a folder that contains only csv files.

（3）Enter pd.DataFrame directly.

【Return】

np.ndarray: One-dimensional numpy array of prediction results.

`predict_proba`

predict_proba(test_path: Union[str, pd.DataFrame]) -> np.ndarray

【Instructions】

Gets the prediction probability of the model for each category in the classification task.

【Arguments】

test_path (str or pd.DataFrame): The following loading modes are available:

（1）Enter the address of the csv data set.

（2）Enter a folder that contains only csv files.

（3）Enter pd.DataFrame directly.

【Return】

np.ndarray: A two-dimensional numpy array of predicted probabilities.

Data Processing Function

`preprocess`

preprocess(input: Union[str, pd.DataFrame], return_y: bool = False) -> pd.DataFrame

【Instructions】

The input data is preprocessed, including feature combination and other operations.

【Arguments】

input (Union[str, pd.DataFrame]): The input data can be the path to the CSV file or the loaded DataFrame.
return_y (bool, optional): Indicates whether to return a label column, which defaults to False. If True, the returned DataFrame contains the label. If the data is unlabeled, no label information is ever returned.

【Return】

pd.DataFrame: The pre-processed data.

`labelencode`

labelencode(X: Union[pd.Series, pd.DataFrame], feature_name: Optional[str] = None) -> pd.Series

【Instructions】

The classification features are encoded and converted into a format that the model can handle.

【Arguments】

X (Union[pd.Series, pd.DataFrame]): Data that needs to be encoded.
feature_name (Optional[str], optional): Feature name. This parameter is mandatory when X is DataFrame.

【Return】

pd.Series: The encoded data.

`labeldecode`

labeldecode(X: Union[pd.Series, pd.DataFrame], feature_name: Optional[str] = None) -> pd.Series

【Instructions】

The encoded feature is decoded and restored to the original format.

【Arguments】

X (Union[pd.Series, pd.DataFrame]): Data that needs to be decoded.
feature_name (Optional[str], optional): Feature name. This parameter is mandatory when X is DataFrame.

【Return】

pd.Series: Decoded data.

Fetch Function

`get_train_validation_data`

get_train_validation_data(path: Optional[str] = None) -> tuple(pd.DataFrame, pd.DataFrame, pd.Series, pd.Series)

【Instructions】

Gets the training set and the verification set.

【Arguments】

path (str, optional): The following four loading modes are available:

（1）Enter the address of the csv data set.

（2）Enter a folder that contains only csv files.

（3）Enter None, before calling load_training_data to load data.

（4）Enter the DataFrame directly.

【Return】

tuple(pd.DataFrame, pd.DataFrame, pd.Series, pd.Series): Returns a DataFrame containing X_train, X_val, y_train, y_val.

`obtain_model_configuration`

obtain_model_configuration()

【Instructions】

Gets and returns the configuration information for the ChangTianML model. This function extracts the type of model and its optimal configuration parameters. To ensure the accuracy and availability of parameter information.

【Return】

dict: Dictionaries containing the following key-value pairs:
- model_type: AutoML selects the best model type.
- model_params: Optimal configuration parameters of the model. Some parameters have been appropriately named and formatted.
- task: The task types of the model are mainly classification and regression.

`get_feature_importance`

【Instructions】

Obtain the feature importance of the training model.

Drawing Function

`plot_box`

plot_box(X: pd.DataFrame, y: pd.DataFrame, feature_name: str, target_name: str, path: Optional[str] = None)

【Instructions】

A box plot is drawn to analyze the relationship between features and targets.

【Arguments】

X (pd.DataFrame): The DataFrame that contains the feature.
y (pd.Series): Target variable。
feature_name (str): The name of the feature to be analyzed。
target_name (str): The name of the target variable。
path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.

【Return】

None

`plot_kde`

plot_kde(X: pd.DataFrame, feature_name: str, path: Optional[str] = None)

【Instructions】

Plot a kernel density estimate map to estimate the distribution of features.

【Arguments】

X (pd.DataFrame): The DataFrame that contains the feature.
feature_name (str): The name of the feature to be used for kernel density estimation.
path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.

【Return】

None

`plot_trend_comparison`

plot_trend_comparison(X: pd.DataFrame, feature_name: str, target_name: str, path: Optional[str] = None)

【Instructions】

Plot trends between features and targets.

【Arguments】

X (pd.DataFrame): The DataFrame containing the data.
feature_name (str): Feature names for trends to analyze.
target_name (str): The name of the target variable.
path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.

【Return】

None

`plot_roc_curve`

plot_roc_curve(y_true: Union[pd.Series, np.ndarray], y_score: Union[pd.Series, np.ndarray], n_classes: Optional[int] = None, roc_type: str = 'both', path: Optional[str] = None)

【Instructions】

Receiver operating characteristic (ROC) curves are drawn to evaluate classification model performance.

【Arguments】

y_true (pd.Series, np.ndarray): The true label value.
y_score (pd.Series, np.ndarray): The probability value or score predicted by the model.
n_classes(int, optional): The number of categories in a classification task. If it is not specified, it is automatically calculated based on the data.
roc_type (str, optional): ROC curve type. Options include 'both' (plotting macro and micro ROC curves at the same time), 'macro', 'micro'. The default is 'both'.
path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.

【Return】

None

`plot_confusion_matrix`

plot_confusion_matrix(y_true: Union[pd.Series, np.ndarray], y_pred: Union[pd.Series, np.ndarray], normalize: bool = False, title: Optional[str] = None, cmap = plt.cm.Blues, path: Optional[str] = None)

【Instructions】

Draw a confusion matrix to show the prediction accuracy and error types of the classification model.

【Arguments】

y_true (pd.Series, np.ndarray): The true label value.
y_pred (pd.Series, np.ndarray): The prediction results of the model.
n_classes (int, optional): The number of categories in a classification task. If it is not specified, it is automatically calculated based on the data.
normalize (bool, optional): Whether to normalize the confusion matrix. Default is False。
title (str, optional): The title of the chart. Default is None.
cmap (Colormap, optional): Use color mapping. The default is plt.cm.Blues。
path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.

【Return】

None

`plot_regression_scatter`

plot_regression_scatter(y_true: Union[pd.Series, np.ndarray], y_pred: Union[pd.Series, np.ndarray], model_name: str = 'Model', path: Optional[str] = None)

【Instructions】

Plot a scatter plot of the regression tasks and compare the predicted and actual values of the model.

【Arguments】

y_true (pd.Series, np.ndarray): The true target value.
y_pred (pd.Series, np.ndarray): The predicted value of the model.
model_name (str, optional): The name of the model used in the legend. Default is 'Model'.
path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.

【Return】

None

`plot_shap`

plot_shap(X_test:Union[pd.DataFrame, None]=None, save_path:str=None, plot_type:str='dot')

【Instructions】

Draw a shap diagram to see how important the features are.

【Arguments】

X_test (pd.DataFrame, None): The dataset distributed by shap does not enter data by default, and uses the verification data during training for testing.
save_path (str, optional): Save the path to the chart, or if None, do not save the chart.
plot_type(str, optional): The type of shap diagram can be selected as "dot", "bar", "violin", or "compact_dot".

【Return】

None

Constructor Function​

init​

Prediction Function​

predict​

predict_proba​

Data Processing Function​

preprocess​

labelencode​

labeldecode​

Fetch Function​

get_train_validation_data​

obtain_model_configuration​

get_feature_importance​

Drawing Function​

plot_box​

plot_kde​

plot_trend_comparison​

plot_roc_curve​

plot_confusion_matrix​

plot_regression_scatter​

plot_shap​

Constructor Function

`init`

Prediction Function

`predict`

`predict_proba`

Data Processing Function

`preprocess`

`labelencode`

`labeldecode`

Fetch Function

`get_train_validation_data`

`obtain_model_configuration`

`get_feature_importance`

Drawing Function

`plot_box`

`plot_kde`

`plot_trend_comparison`

`plot_roc_curve`

`plot_confusion_matrix`

`plot_regression_scatter`

`plot_shap`