Subtitles section Play video Print subtitles [MUSIC PLAYING] BRETT KUPREL: I'm Brett Kuprel. I'd like to tell you about some of the work we're doing at Stanford on skin cancer image classification. This project has been a collaboration between the Artificial Intelligence Lab and the medical school. So, let me warm up with some facts to motivate the threat of skin cancer. It's the most common cancer in the United States. One in five Americans will develop skin cancer at some point in their lifetime. And in 2017, it's estimated that there will be 87,000 new cases of melanoma, which is the deadliest form of skin cancer, and almost 10,000 deaths from it. Fortunately, there is good news, there's hope. The survival rate for melanoma is 98% if you can detect it early on. Also, by 2020, it's estimated that there will be $6.1 billion smart phones in circulation globally. So, this collaboration began a couple of years ago when a dermatologist at Stanford saw the recent breakthroughs in computer vision. And he e-mailed the director of the Artificial Intelligence Lab. And he said, if your program can differentiate between hundreds of dog breeds, I believe it can make a great contribution to dermatology. So, the first step to making a contribution is acquiring a data set. So, we have acquired a data set of almost 130,000 images spread out over 2000 diseases. We worked with the medical school to clean it up and put it into this nice taxonomy, a subset of which is shown here. So, in this taxonomy, you can see green nodes and red nodes. Green is safe, red is dangerous, black is deadly. And there's also some orange nodes, which could go either way. It's not clearly a binary task. So, on the next couple of slides I'll show some benign and malignant lesions. So here's some malignant lesions. Here's some benign lesions. So, you can kind of see flipping, back and forth, a visual distinction. But there's also some lesions that are very similar between the two, which I highlighted green, for instance. OK, so now we have a data set, next up is to train it. We find that training on finer classes results in better performance. So, if you consider the figures shown, if we train on the green nodes, we actually train on more-- like 700 of them-- but just, imagine we trained on the green ones. At inference time, we would sum the probabilities up the tree to the red nodes, the classes of interest. And then further, if we're interested in a binary task of distinguishing malignant melanoma from its binary look alike-- from its benign look alike, melanistic benign lesions, we would renormalize the probability sum to 1. And this is consistent with Bayes' rule condition on the information, that the disease is one of two things, not one of all the nine red nodes. So, we found that we got the best performance by fine-tuning an [INAUDIBLE] pretrain model. We tried a few different architectures. We train with AlexNet, VGG Inception, versions one and three. We found Inception version three worked the best. We also tried feeding with a spatial transformer network, because many of our images have lesions taken from far away. We didn't find improved performance, and using Occam's Razor, we just didn't use it. So, the next step is evaluation. We used two metrics to compare with dermatologists-- sensitivity, which is known as the true positive rate, and specificity, known as the true negative rate. So, we show the doctors-- dermatologists-- a bunch of images of benign and malignant lesions. We can then calculate their sensitivity as the percent of malignant lesions they were shown that they ordered a biopsy for. Similarly, the specificity is the percent of benign lesions they were shown that they did not order a biopsy for. And if they ordered a biopsy, it goes to a pathologist, which results in a near-perfect diagnosis. And for a network, we actually, for a classifier, we output a malignant probability. So, imagine we fed the same images through, and we got the probabilities as shown. Then the sensitivity would be the percent of malignant lesions that falls to the right of some chosen threshold. Similarly, the specificity would be the percent benign lesions that lie to the left of the threshold. And in this particular threshold, it would be a sensitive classifier, because almost all malignant lesions lie to the right of the threshold. And we can vary this threshold to get a whole range of sensitivity and specificity pairs. So, doing that, we get these results on three different tasks. So, the first task is the most prevalent type of skin cancer. Second task is the most dangerous type of skin cancer versus its look alike. And the third task is the same as the second task, except it's using dermoscopy images, which are taken with a dermoscope. It's a special device that shines polarized light at the lesion to expose underlying layers of tissue. And you might think that dermoscopy is harder, but, no, it's just a completely different set of lesions. And we also see that the dermatologists don't do as well in that data set. So, let's see how the dermatologists do. So, this is the most important slide. We can see, for one, the line is more jaggedy than the previous, and that's because this is a subset of our test set. Dermatologists have important things to do. They can't just classify thousands of images. We can do it really fast. Another thing you notice is dermatologists are actually widely varied in their performance. Anyway, from these plots, we conclude that we have achieved dermatologist level performance at skin lesion image classification of skin cancer. So, here's some confusion matrices on a nine way classification task. It's interesting to look at these to see the similarity. One thing you might notice is that we often mispredict inflammatory. It's such a broad category. Another thing you can notice is that dermatologists will err on the side of guessing a benign lesion is malignant than the malignant lesion is benign. Because that would be a deadly mistake. And you can see that from this box. OK, so I brought a demo of this classifier. So this one-- I can't actually read the text, but-- Yeah, you can see it's malignant pigmented. Let's do a couple of these. This one's epidermal malignant. There, that's probably good. Don't want to embarrass myself. So, I just want to say I'm honored at how well received our work has been. It's remarkable that Andre and I didn't know anything about dermatology going into this, and I think that's kind of the spirit of deep learning. You can just get a large enough data set and feed it through Google's latest image classifier. Fine tune it, let them do all the hard work, take the credit. And it's just a really amazing time to be an AI researcher, as evidenced by three covers of "Nature" dedicated to breakthroughs in artificial intelligence, two of them here at Google. I just wanted to comment a little bit on the future of AI applied to dermatology. And I think this could also apply to other skin diseases. But there will be increased access, because a lot of people might have access to a smartphone who don't have access to a dermatologist. Also, it will be more convenient to classify your lesion, and because it's more convenient, it will hopefully lead to earlier detection. And the survival rate is much better when you detect it early. Thank you.
B2 US malignant skin cancer cancer skin data set threshold Skin Cancer Image Classification (TensorFlow Dev Summit 2017) 54 7 alex posted on 2017/04/15 More Share Save Report Video vocabulary