changtianml
The tabular_incrml class provides a range of methods for working with tabular data, model training, and prediction. The following is the public API for this class and its instructions for us.
Constructor Function
init
init(path: str)
【Instructions】
Initializes an instance of the tabular_incrml class.
【Arguments】
path (str): The path to the directory where the ChangTianML training results are located.
【Return】
None
Prediction Function
predict
predict(test_path: Union[str, pd.DataFrame]) -> np.ndarray
【Instructions】
Use models to make predictions about test data.
【Arguments】
test_path (str or pd.DataFrame): The following three loading modes are available:
(1)Enter the address of the csv data set.
(2)Enter a folder that contains only csv files.
(3)Enter pd.DataFrame directly.
【Return】
np.ndarray:One-dimensionalnumpyarray of prediction results.
predict_proba
predict_proba(test_path: Union[str, pd.DataFrame]) -> np.ndarray
【Instructions】
Gets the prediction probability of the model for each category in the classification task.
【Arguments】
test_path (str or pd.DataFrame): The following loading modes are available:
(1)Enter the address of the csv data set.
(2)Enter a folder that contains only csv files.
(3)Enter pd.DataFrame directly.
【Return】
np.ndarray: A two-dimensionalnumpyarray of predicted probabilities.
Data Processing Function
preprocess
preprocess(input: Union[str, pd.DataFrame], return_y: bool = False) -> pd.DataFrame
【Instructions】
The input data is preprocessed, including feature combination and other operations.
【Arguments】
-
input (Union[str, pd.DataFrame]): The input data can be the path to the CSV file or the loadedDataFrame. -
return_y (bool, optional): Indicates whether to return a label column, which defaults toFalse. IfTrue, the returnedDataFramecontains the label. If the data is unlabeled, no label information is ever returned.
【Return】
pd.DataFrame: The pre-processed data.
labelencode
labelencode(X: Union[pd.Series, pd.DataFrame], feature_name: Optional[str] = None) -> pd.Series
【Instructions】
The classification features are encoded and converted into a format that the model can handle.
【Arguments】
-
X (Union[pd.Series, pd.DataFrame]): Data that needs to be encoded. -
feature_name (Optional[str], optional): Feature name. This parameter is mandatory whenXisDataFrame.
【Return】
pd.Series: The encoded data.
labeldecode
labeldecode(X: Union[pd.Series, pd.DataFrame], feature_name: Optional[str] = None) -> pd.Series
【Instructions】
The encoded feature is decoded and restored to the original format.
【Arguments】
-
X (Union[pd.Series, pd.DataFrame]): Data that needs to be decoded. -
feature_name (Optional[str], optional): Feature name. This parameter is mandatory whenXisDataFrame.
【Return】
pd.Series: Decoded data.
Fetch Function
get_train_validation_data
get_train_validation_data(path: Optional[str] = None) -> tuple(pd.DataFrame, pd.DataFrame, pd.Series, pd.Series)
【Instructions】
Gets the training set and the verification set.
【Arguments】
path (str, optional): The following four loading modes are available:
(1)Enter the address of the csv data set.
(2)Enter a folder that contains only csv files.
(3)Enter None, before calling load_training_data to load data.
(4)Enter the DataFrame directly.
【Return】
tuple(pd.DataFrame, pd.DataFrame, pd.Series, pd.Series): Returns aDataFramecontainingX_train,X_val,y_train,y_val.
obtain_model_configuration
obtain_model_configuration()
【Instructions】
Gets and returns the configuration information for the ChangTianML model. This function extracts the type of model and its optimal configuration parameters. To ensure the accuracy and availability of parameter information.
【Return】
-
dict: Dictionaries containing the following key-value pairs:-
model_type:AutoMLselects the best model type. -
model_params: Optimal configuration parameters of the model. Some parameters have been appropriately named and formatted. -
task: The task types of the model are mainly classification and regression.
-
get_feature_importance
【Instructions】
Obtain the feature importance of the training model.
Drawing Function
plot_box
plot_box(X: pd.DataFrame, y: pd.DataFrame, feature_name: str, target_name: str, path: Optional[str] = None)
【Instructions】
A box plot is drawn to analyze the relationship between features and targets.
【Arguments】
-
X (pd.DataFrame): TheDataFramethat contains the feature. -
y (pd.Series): Target variable。 -
feature_name (str): The name of the feature to be analyzed。 -
target_name (str): The name of the target variable。 -
path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.
【Return】
None
plot_kde
plot_kde(X: pd.DataFrame, feature_name: str, path: Optional[str] = None)
【Instructions】
Plot a kernel density estimate map to estimate the distribution of features.
【Arguments】
-
X (pd.DataFrame): TheDataFramethat contains the feature. -
feature_name (str): The name of the feature to be used for kernel density estimation. -
path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.
【Return】
None
plot_trend_comparison
plot_trend_comparison(X: pd.DataFrame, feature_name: str, target_name: str, path: Optional[str] = None)
【Instructions】
Plot trends between features and targets.
【Arguments】
-
X (pd.DataFrame): TheDataFramecontaining the data. -
feature_name (str): Feature names for trends to analyze. -
target_name (str): The name of the target variable. -
path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.
【Return】
None
plot_roc_curve
plot_roc_curve(y_true: Union[pd.Series, np.ndarray], y_score: Union[pd.Series, np.ndarray], n_classes: Optional[int] = None, roc_type: str = 'both', path: Optional[str] = None)
【Instructions】
Receiver operating characteristic (ROC) curves are drawn to evaluate classification model performance.
【Arguments】
-
y_true (pd.Series, np.ndarray): The true label value. -
y_score (pd.Series, np.ndarray): The probability value or score predicted by the model. -
n_classes(int, optional): The number of categories in a classification task. If it is not specified, it is automatically calculated based on the data. -
roc_type (str, optional): ROC curve type. Options include'both'(plotting macro and micro ROC curves at the same time),'macro','micro'. The default is'both'. -
path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.
【Return】
None
plot_confusion_matrix
plot_confusion_matrix(y_true: Union[pd.Series, np.ndarray], y_pred: Union[pd.Series, np.ndarray], normalize: bool = False, title: Optional[str] = None, cmap = plt.cm.Blues, path: Optional[str] = None)
【Instructions】
Draw a confusion matrix to show the prediction accuracy and error types of the classification model.
【Arguments】
-
y_true (pd.Series, np.ndarray): The true label value. -
y_pred (pd.Series, np.ndarray): The prediction results of the model. -
n_classes (int, optional): The number of categories in a classification task. If it is not specified, it is automatically calculated based on the data. -
normalize (bool, optional): Whether to normalize the confusion matrix. Default isFalse。 -
title (str, optional):The title of the chart. Default is None. -
cmap (Colormap, optional): Use color mapping. The default isplt.cm.Blues。 -
path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.
【Return】
None
plot_regression_scatter
plot_regression_scatter(y_true: Union[pd.Series, np.ndarray], y_pred: Union[pd.Series, np.ndarray], model_name: str = 'Model', path: Optional[str] = None)
【Instructions】
Plot a scatter plot of the regression tasks and compare the predicted and actual values of the model.
【Arguments】
-
y_true (pd.Series, np.ndarray): The true target value. -
y_pred (pd.Series, np.ndarray): The predicted value of the model. -
model_name (str, optional): The name of the model used in the legend. Default is'Model'. -
path (Optional[str], optional):The path to save the chart. If None, the chart is not saved.
【Return】
None
plot_shap
plot_shap(X_test:Union[pd.DataFrame, None]=None, save_path:str=None, plot_type:str='dot')
【Instructions】
Draw a shap diagram to see how important the features are.
【Arguments】
-
X_test (pd.DataFrame, None): The dataset distributed by shap does not enter data by default, and uses the verification data during training for testing. -
save_path (str, optional): Save the path to the chart, or if None, do not save the chart. -
plot_type(str, optional): The type of shap diagram can be selected as "dot", "bar", "violin", or "compact_dot".
【Return】
None