본문 바로가기
데이터분석

[빅분기 실기] 모듈 이름, 함수 사용법 생각 안날 때 대처 방법

by 코듀킹 2024. 11. 16.

sklearn에서 불러와야할 모듈 이름이 생각이 나지 않는다고 하면, 아래와 같은 방법으로 모듈을 조회해보자.

import sklearn
print(sklearn.__all__)

 

['calibration', 'cluster', 'covariance', 'cross_decomposition', 'datasets', 'decomposition', 'dummy', 'ensemble', 'exceptions', 'experimental', 'externals', 'feature_extraction', 'feature_selection', 'gaussian_process', 'inspection', 'isotonic', 'kernel_approximation', 'kernel_ridge', 'linear_model', 'manifold', 'metrics', 'mixture', 'model_selection', 'multiclass', 'multioutput', 'naive_bayes', 'neighbors', 'neural_network', 'pipeline', 'preprocessing', 'random_projection', 'semi_supervised', 'svm', 'tree', 'discriminant_analysis', 'impute', 'compose', 'clone', 'get_config', 'set_config', 'config_context', 'show_versions']
 

 

이렇게 하면 sklearn.metrics, sklearn.model_selection과 같은 클래스들이 어떤게 있는지 리스트를 볼 수 있다.

클래스를 확인했으면, 함수를 조회하는 방법을 알아보자. 예시로 metrics를 불러와보겠다.

from sklearn import metrics
# 또는 import sklearn.metrics
",".join(dir(metrics))
ConfusionMatrixDisplay,DetCurveDisplay,DistanceMetric,PrecisionRecallDisplay,PredictionErrorDisplay,RocCurveDisplay,__all__,__builtins__,__cached__,__doc__,__file__,__loader__,__name__,__package__,__path__,__spec__,_base,_classification,_dist_metrics,_pairwise_distances_reduction,_pairwise_fast,_plot,_ranking,_regression,_scorer,accuracy_score,adjusted_mutual_info_score,adjusted_rand_score,auc,average_precision_score,balanced_accuracy_score,brier_score_loss,calinski_harabasz_score,check_scoring,class_likelihood_ratios,classification_report,cluster,cohen_kappa_score,completeness_score,confusion_matrix,consensus_score,coverage_error,d2_absolute_error_score,d2_log_loss_score,d2_pinball_score,d2_tweedie_score,davies_bouldin_score,dcg_score,det_curve,euclidean_distances,explained_variance_score,f1_score,fbeta_score,fowlkes_mallows_score,get_scorer,get_scorer_names,hamming_loss,hinge_loss,homogeneity_completeness_v_measure,homogeneity_score,jaccard_score,label_ranking_average_precision_score


이렇게 하면, metrcis에서 사용할 수 있는 함수가 어떤게 있는지 조회할 수 있다. 이상태에서 만약 인수가 어떤게 들어가는지 등 함수의 사용법이 생각이 안난다면?

help(metrics.f1_score)
 -------
    f1_score : float or array of float, shape = [n_unique_labels]
        F1 score of the positive class in binary classification or weighted
        average of the F1 scores of each class for the multiclass task.
    
    See Also
    --------
    fbeta_score : Compute the F-beta score.
    precision_recall_fscore_support : Compute the precision, recall, F-score,
        and support.
    jaccard_score : Compute the Jaccard similarity coefficient score.
    multilabel_confusion_matrix : Compute a confusion matrix for each class or
        sample.
    
    Notes
    -----
    When ``true positive + false positive + false negative == 0`` (i.e. a class
    is completely absent from both ``y_true`` or ``y_pred``), f-score is
    undefined. In such cases, by default f-score will be set to 0.0, and
    ``UndefinedMetricWarning`` will be raised. This behavior can be modified by
    setting the ``zero_division`` parameter.
    
    References
    ----------
    .. [1] `Wikipedia entry for the F1-score
           <https://en.wikipedia.org/wiki/F1_score>`_.
    
    Examples
    --------
    >>> import numpy as np
    >>> from sklearn.metrics import f1_score
    >>> y_true = [0, 1, 2, 0, 1, 2]
    >>> y_pred = [0, 2, 1, 0, 0, 1]
    >>> f1_score(y_true, y_pred, average='macro')
    np.float64(0.26...)
    >>> f1_score(y_true, y_pred, average='micro')
    np.float64(0.33...)
    >>> f1_score(y_true, y_pred, average='weighted')
    np.float64(0.26...)
    >>> f1_score(y_true, y_pred, average=None)
    array([0.8, 0. , 0. ])
    
    >>> # binary classification
    >>> y_true_empty = [0, 0, 0, 0, 0, 0]
    >>> y_pred_empty = [0, 0, 0, 0, 0, 0]
    >>> f1_score(y_true_empty, y_pred_empty)
    np.float64(0.0...)
    >>> f1_score(y_true_empty, y_pred_empty, zero_division=1.0)
    np.float64(1.0...)
    >>> f1_score(y_true_empty, y_pred_empty, zero_division=np.nan)
    nan...
    
    >>> # multilabel classification
    >>> y_true = [[0, 0, 0], [1, 1, 1], [0, 1, 1]]
    >>> y_pred = [[0, 0, 0], [1, 1, 1], [1, 1, 0]]
    >>> f1_score(y_true, y_pred, average=None)
    array([0.66666667, 1.        , 0.66666667])

 

이렇게 작성하면, 함수의 사용법이 출력된다. 

댓글