We used the following three methods to make variable of the distribution of the dataset.
GMM_PRED
GMM_SCORE
HIST_PRED
By the way, one of the mistakes I found was that GMM_PRED and GMM_SCORE would increase their score if they were duplicated. The results below are still a mystery to our team.
Gmm_pred 1 + Gmm_score 0 : LB 0.97465
Gmm_pred 2 + Gmm_score 0 : LB 0.97475
Gmm_pred 3 + Gmm_score 0 : LB 0.97479
Gmm_pred 4 + Gmm_score 0 : LB 0.97480
Gmm_pred 5 + Gmm_score 0 : LB 0.97481
Gmm_pred 5 + Gmm_score 1 : LB 0.97482
Gmm_pred 5 + Gmm_score 2 : LB 0.97482
Gmm_pred 5 + Gmm_score 3 : LB 0.97473
Model
1st layer : Nusvc + Nusvc2 + qda + svc + knn + lr / Stratified with GMM_LABEL
2nd layer : Lgbm + mlp / Stratified
3rd layer : 1st layer + 2st layer / Average
Private Test with Make Classificationthe process is as follows.
Estimate the number of private magic n(0~511) has number using the linear relationship between the number of magic and auc. (ex. private magic 0 auc 0.97xx then magic number 250)
Create public and private data by seed using make_classification with estimated magic number.
Test Model with maked public, private dataset and check score cv, pb, v
Finally, choose a model with cv + pb + pv close to 0.975. Others thought that overfitting. But actually it did not work out.