[Kaggle] Passenger Screening Algorithm Challenge Silver Medal Award

This article is the sharing of my method in Kaggle contest (total prize $1,500,000)I ended up 36th place and the contest information is linked below:


Contest Introduction

The competition is held by Department of Homeland Security (the United States) to identify airport threat. A millimeter wave scanner is set up at the airport to scan a person's entire body and then identify whether there are unusual objects among 17 places. Four formats are accepted while I presented the 2 listed below:
1. .a3daps: 512x660 To identify under 64 angles, the following is a demo picture of 16 angles, whether unusual items are hidden around the upper arm area.


2. .a3d: 512x512x660 3D data

The training of algorithm contains only 1147 data. Grades are set according to logloss (the smaller the better). The contest contains two phases. The First part contains partial public leaderboard and partial private leaderboard. The second phase is entirely private and is conducted in 4 days.

Contest Methods

My approach is 3D CNN. Two attempts were made, in both a3daps and a3d format. With the constraint of GPU, 3D Stide4 is set at 128x128x165. 3 networks -VGG16, Resnet and DeneNet 121 are adjusted by Keras into 3D CNN. Augmentation 3D shift +-5 pixel and 3D rotate +-5. 3D computation includes scipy.ndimage. 3D scale was originally in the plan, however two much time was required. 16 CPU Cores(32 Threads), 2 1080Ti GPU computation requires 50 hours of training.

3D CNN does not support transfer learning, with the fact that only 1000 data are at hand and that with numerous adjustable parameters, 3D CNN has a tendency of overfitting. Compared to Resnet and Densenet, VGG16 plus Batch Normalize produces better outcome. In addition, leaky RELU and ELU as active function adds the possibility of overfitting. To overcome the overfitting issue, augmentation took much time. Batch Normalize was added to VGG16, dropout was only added to Full Connection Layer (dropout and Batch Normalize should no be increased at the same time or else efficacy would be jeopardized, refer to(。

I ended up 36st place (Top 7%,35/518),logloss 0.15917, first place 0.02417.



Some shared to have adopted Multi-view CNN (MVCNN) (refer to,for the a3daps format. The fact that 3D CNN does not support transfer learning in a contest like this that offers only a thousand data is a big issue. On the other hand, transfer learning is applicable with MVCNN. With the sharing of others, I would suggest that MVCNN is a good technique when data set is around or under a thousand.