MLP 실습 : 랜덤포레스트 (분류)¶
data/library 불러오기¶
In [1]:
# HTML 폭 조정하기
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
In [1]:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
data=pd.read_csv("breast-cancer-wisconsin.csv")
x=data[data.columns[1:10]]
y=data[['Class']]
In [2]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test= train_test_split(x,y,stratify=y,random_state=42)
from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler()
scaler.fit(x_train)
x_scaled_train=scaler.transform(x_train)
x_scaled_test=scaler.transform(x_test)
model 학습하기¶
In [5]:
from sklearn.ensemble import RandomForestClassifier
model=RandomForestClassifier()
model.fit(x_scaled_train,y_train)
Out[5]:
RandomForestClassifier()
결과 확인하기¶
In [6]:
pred_train=model.predict(x_scaled_train)
model.score(x_scaled_train,y_train)
Out[6]:
1.0
In [7]:
pred_test=model.predict(x_scaled_test)
model.score(x_scaled_test,y_test)
Out[7]:
0.9707602339181286
In [8]:
from sklearn.metrics import confusion_matrix
confusion_train=confusion_matrix(y_train,pred_train)
print(confusion_train)
[[333 0] [ 0 179]]
In [9]:
from sklearn.metrics import classification_report
cfreport_train=classification_report(y_train,pred_train)
print(cfreport_train)
precision recall f1-score support
0 1.00 1.00 1.00 333
1 1.00 1.00 1.00 179
accuracy 1.00 512
macro avg 1.00 1.00 1.00 512
weighted avg 1.00 1.00 1.00 512
In [10]:
from sklearn.metrics import classification_report
cfreport_test=classification_report(y_test,pred_test)
print(cfreport_test)
precision recall f1-score support
0 1.00 0.95 0.98 111
1 0.92 1.00 0.96 60
accuracy 0.97 171
macro avg 0.96 0.98 0.97 171
weighted avg 0.97 0.97 0.97 171
hyper parameter 최적화¶
GridSearch¶
In [16]:
param_grid={'n_estimators':range(100,1000,100),"max_features":['auto','sqrt','log2']}
In [17]:
from sklearn.model_selection import GridSearchCV
grid_search=GridSearchCV(RandomForestClassifier(),param_grid,cv=5)
# 파라미터 찾기
grid_search.fit(x_scaled_train,y_train)
Out[17]:
GridSearchCV(cv=5, estimator=RandomForestClassifier(),
param_grid={'max_features': ['auto', 'sqrt', 'log2'],
'n_estimators': range(100, 1000, 100)})
In [18]:
print("Best Parameter : {}".format(grid_search.best_params_))
print("Best Cross-validity Score : {:.4f}".format(grid_search.best_score_))
print('Test set Score : {:.4f}'.format(grid_search.score(x_scaled_test,y_test)))
Best Parameter : {'max_features': 'log2', 'n_estimators': 300}
Best Cross-validity Score : 0.9746
Test set Score : 0.9649
RandomSearch¶
In [19]:
from scipy.stats import randint
from sklearn.model_selection import RandomizedSearchCV
param_distribs={'n_estimators':randint(low=100,high=1000),"max_features":['auto','sqrt','log2']}
random_search=RandomizedSearchCV(RandomForestClassifier(),param_distributions=param_distribs,n_iter=20,cv=5)
# 파라미터 찾기
random_search.fit(x_scaled_train,y_train)
Out[19]:
RandomizedSearchCV(cv=5, estimator=RandomForestClassifier(), n_iter=20,
param_distributions={'max_features': ['auto', 'sqrt',
'log2'],
'n_estimators': <scipy.stats._distn_infrastructure.rv_frozen object at 0x00000217DE403FD0>})
In [20]:
print("Best Parameter : {}".format(random_search.best_params_))
print("Best Cross-validity Score : {:.4f}".format(random_search.best_score_))
print('Test set Score : {:.4f}'.format(random_search.score(x_scaled_test,y_test)))
Best Parameter : {'max_features': 'auto', 'n_estimators': 285}
Best Cross-validity Score : 0.9765
Test set Score : 0.9649
MLP 실습 : 랜덤포레스트 (회귀)¶
data/library 불러오기¶
In [21]:
data=pd.read_csv("house_price.csv")
x=data[data.columns[1:5]]
y=data[['house_value']]
x_train,x_test,y_train,y_test= train_test_split(x,y,random_state=42)
In [22]:
scaler=MinMaxScaler()
scaler.fit(x_train)
x_scaled_train=scaler.transform(x_train)
x_scaled_test=scaler.transform(x_test)
model 학습하기¶
In [26]:
from sklearn.ensemble import RandomForestRegressor
model=RandomForestRegressor()
model.fit(x_scaled_train,y_train)
Out[26]:
RandomForestRegressor()
결과 확인하기¶
In [27]:
pred_train=model.predict(x_scaled_train)
model.score(x_scaled_train,y_train)
Out[27]:
0.9377495088643856
In [28]:
pred_test=model.predict(x_scaled_test)
model.score(x_scaled_test,y_test)
Out[28]:
0.5830608080355251
RMSE 확인하기¶
In [29]:
import numpy as np
from sklearn.metrics import mean_squared_error
MSE_train=mean_squared_error(y_train,pred_train)
MSE_test=mean_squared_error(y_test,pred_test)
print(np.sqrt(MSE_train))
print(np.sqrt(MSE_test))
23813.45637095561 61730.36346797927
hyper parameter 확인하기¶
GridSearch¶
In [30]:
param_grid={'n_estimators':range(100,500,100),"max_features":['auto','sqrt','log2']}
In [31]:
from sklearn.model_selection import GridSearchCV
grid_search=GridSearchCV(RandomForestRegressor(),param_grid,cv=5)
# 파라미터 찾기
grid_search.fit(x_scaled_train,y_train)
Out[31]:
GridSearchCV(cv=5, estimator=RandomForestRegressor(),
param_grid={'max_features': ['auto', 'sqrt', 'log2'],
'n_estimators': range(100, 500, 100)})
In [32]:
print("Best Parameter : {}".format(grid_search.best_params_))
print("Best Cross-validity Score : {:.4f}".format(grid_search.best_score_))
print('Test set Score : {:.4f}'.format(grid_search.score(x_scaled_test,y_test)))
Best Parameter : {'max_features': 'sqrt', 'n_estimators': 300}
Best Cross-validity Score : 0.5689
Test set Score : 0.5921
RandomSearch¶
In [34]:
from scipy.stats import randint
from sklearn.model_selection import RandomizedSearchCV
param_distribs={'n_estimators':randint(low=100,high=500),"max_features":['auto','sqrt','log2']}
random_search=RandomizedSearchCV(RandomForestRegressor(),param_distributions=param_distribs,n_iter=20,cv=5)
# 파라미터 찾기
random_search.fit(x_scaled_train,y_train)
Out[34]:
RandomizedSearchCV(cv=5, estimator=RandomForestRegressor(), n_iter=20,
param_distributions={'max_features': ['auto', 'sqrt',
'log2'],
'n_estimators': <scipy.stats._distn_infrastructure.rv_frozen object at 0x00000217DE4C47F0>})
In [35]:
print("Best Parameter : {}".format(random_search.best_params_))
print("Best Cross-validity Score : {:.4f}".format(random_search.best_score_))
print('Test set Score : {:.4f}'.format(random_search.score(x_scaled_test,y_test)))
Best Parameter : {'max_features': 'sqrt', 'n_estimators': 310}
Best Cross-validity Score : 0.5692
Test set Score : 0.5928
In [ ]:
'빅데이터분석기사 자료 > 2) 빅.분. 기 - ML' 카테고리의 다른 글
| [빅.분.기] 작업형2유형 - 문제 연습 (0) | 2022.01.08 |
|---|---|
| [빅.분.기] 작업형2유형 - Train/Test 셋 분리 (0) | 2022.01.08 |
| [빅.분.기] 작업형2유형 - One Hot Encoding (0) | 2022.01.08 |
| [빅.분.기] 작업형2유형 - Logistic회귀 (0) | 2022.01.08 |
댓글