Placeholder Image

Subtitles section Play video

  • ♪ (music) ♪

  • Alright.

  • So yes, I'm Patrick. I'm a Solutions Strategist from Coca-Cola.

  • Today I'm going to share with you how we're using TensorFlow

  • to support some of our largest, most popular

  • digital marketing programs in North America.

  • So, we're going to take the TensorFlow Dev Summit

  • off to a marketing tangent for a minute,

  • before we come back.

  • Alright, so as a background:

  • What is proof of purchase and what is its relationship to marketing?

  • As an example, back in the day,

  • folks could clip the barcodes off their cereal boxes

  • and then mail these barcodes back into the cereal company

  • to receive a reward--

  • some kind of coupon or prize back through the mail.

  • And this is basic loyalty marketing.

  • Brands--in this case, the cereal company--

  • rewarding consumers who purchase,

  • and at the same time

  • opening up a line of communication between the brand and the consumer.

  • Over the last 15-odd years of marketing digitization,

  • this concept has evolved into digital engagement marketing--

  • rewarding consumers in the moment, in real time,

  • through web and mobile channels.

  • But often, proof of purchase is still an important component of that experience.

  • We have a very active digital engagement marketing program at Coca-Cola.

  • Through proof of purchase

  • our consumers can earn a magazine subscription,

  • or the chance to win a cruise,

  • or a vintage vending machine.

  • And this is what proof of purchase looks like at Coke.

  • Underneath our bottle caps and inside those cardboard fridge packs

  • that you can buy at the grocery store

  • we've printed these 14-character product pincodes.

  • These are unique to every product

  • and these are what our consumers enter into our promotions.

  • You can enter these in by hand,

  • but on your mobile device, you can scan them.

  • This had been the holy grail of marketing IT at Coke for a long time.

  • We looked at both commercial

  • and open source optical character recognition software, OCR,

  • but it could never read these codes very well.

  • The problem has to do with the [failure] of the code.

  • So, these are 4 x 7 dot matrix printed.

  • The printer head is about an inch off the surface

  • of the cap and fridge pack,

  • and these things are flying underneath that printer head at a very rapid rate.

  • So this creates a lot of visual artifacts--

  • things like character skew and pixel drift--

  • things that normal OCR can't handle very well.

  • We knew that if we wanted to unlock this experience for our consumers,

  • we were going to have to build something from scratch.

  • When I look at these codes,

  • a couple of characteristics jump out at me.

  • We're using a small alphabet-- let's say, ten characters--

  • and there's a decent amount of variability in the presentation of these characters.

  • This reminds me of MNIST--

  • the online database of 60,000 handwritten digit images.

  • And Convolutional Neural Networks, or ConvNets,

  • are particular good at extracting the text from these images.

  • I'll probably tell you all something you already know,

  • but here we go.

  • ConvNets work by taking an image and initially breaking it down

  • into many smaller pieces,

  • and then detecting varied granular features within these pieces--

  • things like edges and textures and colors.

  • And these varied granular feature activations are pulled up, often,

  • into a more general feature layer,

  • and that's filtered, and those feature activations are pulled up, and so on,

  • until the output of the neural net

  • is run through a softmax function,

  • which creates a probability distribution

  • of a likelihood that a set of objects exist within the image.

  • But ConvNets have a really nice property

  • in that they handle the translation

  • and variant nature of the images very well.

  • That means, from our perspective,

  • they can handle the tilt and twist of a bottle cap held in someone's hand.

  • It's perfect.

  • So, this is what we're going to use, we're going to move forward.

  • So, now we need to build our platform, and that begins with training--

  • the beating heart of any applied AI solution.

  • And we knew that we needed high-quality images

  • with accurate labels of the codes within those images,

  • and we likely needed a lot of them.

  • We started by generating a synthetic data set

  • of randomly generated strings

  • that were superimposed over blank bottle cap images,

  • which were then, in turn, superimposed over random backgrounds.

  • This gave us a base for transfer learning in the future,

  • once we created our real-world data set.

  • And we did that by doing a production run of caps and fridge packs

  • out of printing facilities,

  • and then distributing those to multiple third-party suppliers,

  • along with some custom tools that we created

  • to allow them to scan a cap and then label it with a pincode.

  • But a really important component to this process

  • was an existing pincode validation service

  • that we've had in production for a long time to support our programs.

  • So, any time a trainer labeled an image,

  • we'd send that label through our validation service,

  • and if it was a valid pin code,

  • we knew we had an accurate label.

  • So, this gets our model trained, and now we need to release it to the wild.

  • We had some pretty aggressive performance requirements.

  • We wanted one second average processing time,

  • we wanted 95% accuracy at launch,

  • but we also wanted to host the model remotely for the Web,

  • and embed it natively on mobile devices to support mobile apps.

  • So, this means that our model has to be small--

  • small enough to support over-the-air updates

  • as the model improves over time.

  • And to help us improve that model over time

  • we created an active learning UI, User Interface,

  • that allows our consumers to train the model

  • once it's in production.

  • And that's what this looks like.

  • So, if I, as a consumer, scan a cap,

  • and the model cannot infer a valid pincode,

  • it sends down to the UI a per character confidence

  • of every character at every position.

  • And this can be used to render a screen

  • much like what you see here.

  • So I, as a user,

  • am only directed to address those particularly low confidence characters.

  • I see a couple of red characters there-- I tap them, it brings up a keyboard,

  • I correct them, then I'm entered into my promotion.

  • It's a good user experience for me.

  • I scan a code and I'm only a few taps away from being entered into a promotion,

  • but on the back end, we now have extremely valuable data for training,

  • because we have the image that created the invalid difference to begin with,

  • as well as the user corrected label

  • that they needed to correct to get into the promotion.

  • So, we can throw this into the hopper

  • for future rounds of training to improve the model.

  • When you put it all together this is what it looks like.

  • The user takes a picture of a cap,

  • the region of interest is found, the image is normalized.

  • It's then sent into our ConvNet model,

  • the output of which is a character probability matrix.

  • This is the per character confidence of every character at every position.

  • That is then further analyzed to create a top-ten prediction.

  • Each one of those predictions is vetted to our Bingo validation service.

  • The first one that is valid-- which is often the first one on the list--

  • is entered into the promotion,

  • and if none of them are valid,

  • our user sees the active learning experience.

  • So, our model development effort went through three [vague] iterations.

  • Initially, in an effort to keep the model size small upfront,

  • our data science team used binarization to normalize the image.

  • But this was too lossy.

  • It didn't produce enough data to create an accurate model.

  • So, they switched to best channel conversion,

  • which got the accuracy up,

  • but then the model size grew too large to support over-the-air updates.

  • So, at this point, our team starts over. (chuckles)

  • They just completely re-architect the ConvNet using SqueezeNet,

  • which is designed to reduce model size

  • by reducing the number of learnable parameters within the model.

  • But, after making this move, we had a problem.

  • We started to experience internal covariate shift,

  • which is the result of reducing the number of learnable parameters.

  • And that means that very small changes to upstream parameter values

  • cascaded to huge gyrations in downstream parameter values.

  • So, this slowed our training process considerably,

  • because we had to grind through this covariate shift

  • in order to get the model to converge,

  • if it would converge at all.

  • So, to solve this problem,

  • our team introduced batch normalization,

  • which sped up training, it got the model to converge,

  • and now we're exactly where we want to be.

  • We have a 5MB model,

  • it's a 25-fold decrease from where we started,

  • with accuracy greater than 95%.

  • And the results are impressive.

  • These are some screen grabs from a test site that I built,

  • and you can see across the top row

  • how the model handles different types of occlusion.

  • It also handles translation-- tilting the cap,

  • rotation-- twisting the cap,

  • and camera focus issues.

  • So, you can try this out for yourself.

  • I'm going to pitch the newly-launched Coca-Cola USA app.

  • It hit Android and iPhone app stores a couple of days ago.

  • It does many things, but you can use it to scan a code.

  • You can also go online with your mobile browser

  • to coke.com/rewards,

  • and take a picture of a code to be entered into a promotion.

  • Alright, so some quick shout-outs-- I can't not mention these folks.

  • Quantiphi is the data science team that built our image processing pipeline

  • and the pincode recognition model.

  • Digital Platforms & Innovation at Coke, led by Ellen Duncan.

  • She spearheaded this from the marketing side.

  • And then, my people in IT.

  • My colleague, Andy Donaldson, shepherded this into production.

  • So, thank you.

  • It's been a privilege to speak with you.

  • I covered a lot of ground in ten short minutes.

  • There's a lot of stuff I didn't talk about.

  • So, if you have any questions or any follow-up,

  • please feel free to reach out to me on Twitter @patrickbrandt.

  • You can also hit me up on LinkedIn.

  • That shortcode URL will get you to my profile page that's wpb.is/linkedin.

  • You can also read an article I published last year on this solution.

  • It's a Google developers' blog

  • and you can get there at wpb.is/tensorflow.

  • Alright. So, thank you.

  • Next up is Alex.

  • And Alex is going to talk to us about Applied ML in Robotics.

  • (applause)

  • ♪ (music) ♪

♪ (music) ♪

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it