관리 메뉴

TEAM EDA

Instant Gratification 본문

카테고리 없음

Instant Gratification

김현우 2019. 6. 21. 09:30

Preprocessing

    1. Feature Selection with variance
    1. KERNEL PCA

Try and Fail : PCA, SVD, AutoEncoder, DAE etc

Feature Engineering

We used the following three methods to make variable of the distribution of the dataset.

    1. GMM_PRED
    1. GMM_SCORE
    1. HIST_PRED


By the way, one of the mistakes I found was that GMM_PRED and GMM_SCORE would increase their score if they were duplicated. The results below are still a mystery to our team.

  • Gmm_pred 1 + Gmm_score 0 : LB 0.97465
  • Gmm_pred 2 + Gmm_score 0 : LB 0.97475
  • Gmm_pred 3 + Gmm_score 0 : LB 0.97479
  • Gmm_pred 4 + Gmm_score 0 : LB 0.97480
  • Gmm_pred 5 + Gmm_score 0 : LB 0.97481
  • Gmm_pred 5 + Gmm_score 1 : LB 0.97482
  • Gmm_pred 5 + Gmm_score 2 : LB 0.97482
  • Gmm_pred 5 + Gmm_score 3 : LB 0.97473

Model

  • 1st layer : Nusvc + Nusvc2 + qda + svc + knn + lr / Stratified with GMM_LABEL
  • 2nd layer : Lgbm + mlp / Stratified
  • 3rd layer : 1st layer + 2st layer / Average
    Private Test with Make Classificationthe process is as follows.
    1. Estimate the number of private magic n(0~511) has number using the linear relationship between the number of magic and auc.
      (ex. private magic 0 auc 0.97xx then magic number 250)
    1. Create public and private data by seed using make_classification with estimated magic number.
    1. Test Model with maked public, private dataset and check score cv, pb, v
    1. Finally, choose a model with cv + pb + pv close to 0.975. Others thought that overfitting.

      But actually it did not work out.

Resource

Code is below