Machine Learns from Cardiologist (2)
Understand literatures and the result-analysis
Deep learning and classifications.
The pattern recognition using deep convolutional neural network is indisputably good. It shows in various complicated image recognitions or even sound recognition. It is obvious it is going to be so good at least as the similar level of human being.
What matters is if we have enough data, and how we can preprocess the data properly for machine to learn effectively. Collecting servicable ECG data was not easy to me. What I obtained is only the data prvided by Physionet and irhythm tech.
On-line ECG data and the research paper in NATURE
Look at the table. As compare with mnist(hand written digit recognition) data, CINC data are 5 times smaller in total array size and other datasets are even smaller than CINC data. Besides, the most of the ECG data are normal data, so the abnormal data to train the ill heart signal are seriously lack to perform well.
The highest precision score in CINC 2017 competition is about 83 percent, and the F1 score or other micro metric would be even worse since the number 83 must have been helped by normal signals which are major of the data.
Consequently, the research paper in NATURE by Stanford ML group is surprising.1 It says the F1 score (frequence-weighted) average of their trained model is superior to average cardiologist who is trained for the ECG classification and earning money from it. The Stanford group shows the data analysis to show it in the paper. Really? How? Did they use a ton of data? Where did they get it?
They did not change the CNN model from their 2017 published model. The ECG data is very subtle, so if want to distinguish features of those tiny differences, we need very deep network. CNN’s research history is written as how to make networks deeper and to keep it from the troubles such as overfitting, vanishing values can occur in deep network. Stanford group find a specific model with delicate building of convolution filters, resnet blocks, dropouts and the hyperparameters 2 years ago. If they did not change the model for 2 years, I do not want to spend much time to optimize their model more. Thus, I will just use their graph model.
After watching their ipython notebook for data analysis, I have realized they used 80k data (probably signal size 3k). The toal array size of it should be 5 times bigger than MNIST training data. When you read an image of hand written digits, you do not need high resolution of an images. The mnist image is only 24 x 24 pixeled. Besides, the most of the our heart beat data should be still normal sinus signals (from who not sick), so I still think the dataset is not that big for the quality we want. Besides, they used the only 550 test data set. Used 80k training set only for 550? And irhythm tech opened only 1/250 of the data used for the research. They sliced the 328 data to 80k? Then one data has only a doesn of signals, and it does not make sense. Maybe they used data augmentation by putting a bit of random noise? I do not know the detail. I just think they did not open the whole dataset yet. As descirbed above, if everyone knows the model already, then the dataset mining is the most crucial asset and this research has much financial potentials, so they couldn’t open the whole dataset.
Thus, I will stop to train my model to have a good metric scores as the Stanford group. After cleaning data, the metric score of ours is still quite good. Now many companies have released or are preparing wearble devices for analysing clients' heart beat such as iWatch 4 seriese and its app. We would see much preciser trained model soon using the data.
I will introduce Python (keras) codes next time.