sklearn에서 불러와야할 모듈 이름이 생각이 나지 않는다고 하면, 아래와 같은 방법으로 모듈을 조회해보자.
import sklearn
print(sklearn.__all__)
['calibration', 'cluster', 'covariance', 'cross_decomposition', 'datasets', 'decomposition', 'dummy', 'ensemble', 'exceptions', 'experimental', 'externals', 'feature_extraction', 'feature_selection', 'gaussian_process', 'inspection', 'isotonic', 'kernel_approximation', 'kernel_ridge', 'linear_model', 'manifold', 'metrics', 'mixture', 'model_selection', 'multiclass', 'multioutput', 'naive_bayes', 'neighbors', 'neural_network', 'pipeline', 'preprocessing', 'random_projection', 'semi_supervised', 'svm', 'tree', 'discriminant_analysis', 'impute', 'compose', 'clone', 'get_config', 'set_config', 'config_context', 'show_versions']
이렇게 하면 sklearn.metrics, sklearn.model_selection과 같은 클래스들이 어떤게 있는지 리스트를 볼 수 있다.
클래스를 확인했으면, 함수를 조회하는 방법을 알아보자. 예시로 metrics를 불러와보겠다.
from sklearn import metrics
# 또는 import sklearn.metrics
",".join(dir(metrics))
ConfusionMatrixDisplay,DetCurveDisplay,DistanceMetric,PrecisionRecallDisplay,PredictionErrorDisplay,RocCurveDisplay,__all__,__builtins__,__cached__,__doc__,__file__,__loader__,__name__,__package__,__path__,__spec__,_base,_classification,_dist_metrics,_pairwise_distances_reduction,_pairwise_fast,_plot,_ranking,_regression,_scorer,accuracy_score,adjusted_mutual_info_score,adjusted_rand_score,auc,average_precision_score,balanced_accuracy_score,brier_score_loss,calinski_harabasz_score,check_scoring,class_likelihood_ratios,classification_report,cluster,cohen_kappa_score,completeness_score,confusion_matrix,consensus_score,coverage_error,d2_absolute_error_score,d2_log_loss_score,d2_pinball_score,d2_tweedie_score,davies_bouldin_score,dcg_score,det_curve,euclidean_distances,explained_variance_score,f1_score,fbeta_score,fowlkes_mallows_score,get_scorer,get_scorer_names,hamming_loss,hinge_loss,homogeneity_completeness_v_measure,homogeneity_score,jaccard_score,label_ranking_average_precision_score
이렇게 하면, metrcis에서 사용할 수 있는 함수가 어떤게 있는지 조회할 수 있다. 이상태에서 만약 인수가 어떤게 들어가는지 등 함수의 사용법이 생각이 안난다면?
help(metrics.f1_score)
-------
f1_score : float or array of float, shape = [n_unique_labels]
F1 score of the positive class in binary classification or weighted
average of the F1 scores of each class for the multiclass task.
See Also
--------
fbeta_score : Compute the F-beta score.
precision_recall_fscore_support : Compute the precision, recall, F-score,
and support.
jaccard_score : Compute the Jaccard similarity coefficient score.
multilabel_confusion_matrix : Compute a confusion matrix for each class or
sample.
Notes
-----
When ``true positive + false positive + false negative == 0`` (i.e. a class
is completely absent from both ``y_true`` or ``y_pred``), f-score is
undefined. In such cases, by default f-score will be set to 0.0, and
``UndefinedMetricWarning`` will be raised. This behavior can be modified by
setting the ``zero_division`` parameter.
References
----------
.. [1] `Wikipedia entry for the F1-score
<https://en.wikipedia.org/wiki/F1_score>`_.
Examples
--------
>>> import numpy as np
>>> from sklearn.metrics import f1_score
>>> y_true = [0, 1, 2, 0, 1, 2]
>>> y_pred = [0, 2, 1, 0, 0, 1]
>>> f1_score(y_true, y_pred, average='macro')
np.float64(0.26...)
>>> f1_score(y_true, y_pred, average='micro')
np.float64(0.33...)
>>> f1_score(y_true, y_pred, average='weighted')
np.float64(0.26...)
>>> f1_score(y_true, y_pred, average=None)
array([0.8, 0. , 0. ])
>>> # binary classification
>>> y_true_empty = [0, 0, 0, 0, 0, 0]
>>> y_pred_empty = [0, 0, 0, 0, 0, 0]
>>> f1_score(y_true_empty, y_pred_empty)
np.float64(0.0...)
>>> f1_score(y_true_empty, y_pred_empty, zero_division=1.0)
np.float64(1.0...)
>>> f1_score(y_true_empty, y_pred_empty, zero_division=np.nan)
nan...
>>> # multilabel classification
>>> y_true = [[0, 0, 0], [1, 1, 1], [0, 1, 1]]
>>> y_pred = [[0, 0, 0], [1, 1, 1], [1, 1, 0]]
>>> f1_score(y_true, y_pred, average=None)
array([0.66666667, 1. , 0.66666667])
이렇게 작성하면, 함수의 사용법이 출력된다.
'데이터분석' 카테고리의 다른 글
[빅분기 실기] 작업형1 문제 유형별 코드 정리 (0) | 2024.11.24 |
---|---|
[빅분기 실기] 작업형2 다중 분류 문제 예시 코드 (0) | 2024.11.16 |
댓글