Skip to main content

changtianml

info

The tabular_incrml class provides a range of methods for working with tabular data, model training, and prediction. The following is the public API for this class and its instructions for us.

Constructor Function

init

init(path: str)

Instructions

 Initializes an instance of the tabular_incrml class.

Arguments

  • path (str): The path to the directory where the ChangTianML training results are located.

Return

 None

Prediction Function

predict

predict(test_path: Union[str, pd.DataFrame]) -> np.ndarray

Instructions

 Use models to make predictions about test data.

Arguments

  • test_path (str or pd.DataFrame): The following three loading modes are available:

 (1)Enter the address of the csv data set.

 (2)Enter a folder that contains only csv files.

 (3)Enter pd.DataFrame directly.

Return

  • np.ndarray: One-dimensional numpy array of prediction results.

predict_proba

predict_proba(test_path: Union[str, pd.DataFrame]) -> np.ndarray

Instructions

 Gets the prediction probability of the model for each category in the classification task.

Arguments

  • test_path (str or pd.DataFrame): The following loading modes are available:

 (1)Enter the address of the csv data set.

 (2)Enter a folder that contains only csv files.

 (3)Enter pd.DataFrame directly.

Return

  • np.ndarray: A two-dimensional numpy array of predicted probabilities.

Data Processing Function

preprocess

preprocess(input: Union[str, pd.DataFrame], return_y: bool = False) -> pd.DataFrame

Instructions

 The input data is preprocessed, including feature combination and other operations.

Arguments

  • input (Union[str, pd.DataFrame]): The input data can be the path to the CSV file or the loaded DataFrame.

  • return_y (bool, optional): Indicates whether to return a label column, which defaults to False. If True, the returned DataFrame contains the label. If the data is unlabeled, no label information is ever returned.

Return

  • pd.DataFrame: The pre-processed data.

labelencode

labelencode(X: Union[pd.Series, pd.DataFrame], feature_name: Optional[str] = None) -> pd.Series

Instructions

 The classification features are encoded and converted into a format that the model can handle.

Arguments

  • X (Union[pd.Series, pd.DataFrame]): Data that needs to be encoded.

  • feature_name (Optional[str], optional): Feature name. This parameter is mandatory when X is DataFrame.

Return

  • pd.Series: The encoded data.

labeldecode

labeldecode(X: Union[pd.Series, pd.DataFrame], feature_name: Optional[str] = None) -> pd.Series

Instructions

 The encoded feature is decoded and restored to the original format.

Arguments

  • X (Union[pd.Series, pd.DataFrame]): Data that needs to be decoded.

  • feature_name (Optional[str], optional): Feature name. This parameter is mandatory when X is DataFrame.

Return

  • pd.Series: Decoded data.

Fetch Function

get_train_validation_data

get_train_validation_data(path: Optional[str] = None) -> tuple(pd.DataFrame, pd.DataFrame, pd.Series, pd.Series)

Instructions

 Gets the training set and the verification set.

Arguments

  • path (str, optional): The following four loading modes are available:

 (1)Enter the address of the csv data set.

 (2)Enter a folder that contains only csv files.

 (3)Enter None, before calling load_training_data to load data.

 (4)Enter the DataFrame directly.

Return

  • tuple(pd.DataFrame, pd.DataFrame, pd.Series, pd.Series): Returns a DataFrame containing X_train, X_val, y_train, y_val.

obtain_model_configuration

obtain_model_configuration()

Instructions

 Gets and returns the configuration information for the ChangTianML model. This function extracts the type of model and its optimal configuration parameters. To ensure the accuracy and availability of parameter information.

Return

  • dict: Dictionaries containing the following key-value pairs:

    • model_type: AutoML selects the best model type.

    • model_params: Optimal configuration parameters of the model. Some parameters have been appropriately named and formatted.

    • task: The task types of the model are mainly classification and regression.

get_feature_importance

Instructions

 Obtain the feature importance of the training model.

Drawing Function

plot_box

plot_box(X: pd.DataFrame, y: pd.DataFrame, feature_name: str, target_name: str, path: Optional[str] = None)

Instructions

 A box plot is drawn to analyze the relationship between features and targets.

Arguments

  • X (pd.DataFrame): The DataFrame that contains the feature.

  • y (pd.Series): Target variable。

  • feature_name (str): The name of the feature to be analyzed。

  • target_name (str): The name of the target variable。

  • path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.

Return

 None

plot_kde

plot_kde(X: pd.DataFrame, feature_name: str, path: Optional[str] = None)

Instructions

 Plot a kernel density estimate map to estimate the distribution of features.

Arguments

  • X (pd.DataFrame): The DataFrame that contains the feature.

  • feature_name (str): The name of the feature to be used for kernel density estimation.

  • path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.

Return

 None

plot_trend_comparison

plot_trend_comparison(X: pd.DataFrame, feature_name: str, target_name: str, path: Optional[str] = None)

Instructions

 Plot trends between features and targets.

Arguments

  • X (pd.DataFrame): The DataFrame containing the data.

  • feature_name (str): Feature names for trends to analyze.

  • target_name (str): The name of the target variable.

  • path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.

Return

 None

plot_roc_curve

plot_roc_curve(y_true: Union[pd.Series, np.ndarray], y_score: Union[pd.Series, np.ndarray], n_classes: Optional[int] = None, roc_type: str = 'both', path: Optional[str] = None)

Instructions

 Receiver operating characteristic (ROC) curves are drawn to evaluate classification model performance.

Arguments

  • y_true (pd.Series, np.ndarray): The true label value.

  • y_score (pd.Series, np.ndarray): The probability value or score predicted by the model.

  • n_classes(int, optional): The number of categories in a classification task. If it is not specified, it is automatically calculated based on the data.

  • roc_type (str, optional): ROC curve type. Options include 'both' (plotting macro and micro ROC curves at the same time), 'macro', 'micro'. The default is 'both'.

  • path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.

Return

 None

plot_confusion_matrix

plot_confusion_matrix(y_true: Union[pd.Series, np.ndarray], y_pred: Union[pd.Series, np.ndarray], normalize: bool = False, title: Optional[str] = None, cmap = plt.cm.Blues, path: Optional[str] = None)

Instructions

 Draw a confusion matrix to show the prediction accuracy and error types of the classification model.

Arguments

  • y_true (pd.Series, np.ndarray): The true label value.

  • y_pred (pd.Series, np.ndarray): The prediction results of the model.

  • n_classes (int, optional): The number of categories in a classification task. If it is not specified, it is automatically calculated based on the data.

  • normalize (bool, optional): Whether to normalize the confusion matrix. Default is False

  • title (str, optional): The title of the chart. Default is None.

  • cmap (Colormap, optional): Use color mapping. The default is plt.cm.Blues

  • path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.

Return

 None

plot_regression_scatter

plot_regression_scatter(y_true: Union[pd.Series, np.ndarray], y_pred: Union[pd.Series, np.ndarray], model_name: str = 'Model', path: Optional[str] = None)

Instructions

  Plot a scatter plot of the regression tasks and compare the predicted and actual values of the model.

Arguments

  • y_true (pd.Series, np.ndarray): The true target value.

  • y_pred (pd.Series, np.ndarray): The predicted value of the model.

  • model_name (str, optional): The name of the model used in the legend. Default is 'Model'.

  • path (Optional[str], optional): The path to save the chart. If None, the chart is not saved.

Return

 None

plot_shap

plot_shap(X_test:Union[pd.DataFrame, None]=None, save_path:str=None, plot_type:str='dot')

Instructions

 Draw a shap diagram to see how important the features are.

Arguments

  • X_test (pd.DataFrame, None): The dataset distributed by shap does not enter data by default, and uses the verification data during training for testing.

  • save_path (str, optional): Save the path to the chart, or if None, do not save the chart.

  • plot_type(str, optional): The type of shap diagram can be selected as "dot", "bar", "violin", or "compact_dot".

Return

 None