Also, a deepneuralnetworkcantakeactionsthatit's neverseenbefore.
Sowith Q learningifif a certainscenariopresenteditselfanditwasoutsideofanyofthediscreetcombinationsthatwe'veeverseen, well, it's gonnatake a randomactionsthatgotinitializedrandomly, a deep.
We'regonnaaddanactivation, andtheactivationherewilluserectified, linearandthen, uh, modeldotadmaxpoolingto D.
We'lluse a twobytwowindowagainifyoudon't knowwhatMaxpoolingisorconvolutions, uh, checkoutthatbasicstutorialbecause I coverit, and I alsohavebeautifullydrawnphotos.
Ifyoureallylikemyotherphoto, you'lllovethatthosevoters, uhthenafterthemaxpulling, we'rejust a model, thatad, andwe'regonnaadd a dropoutlayerandwe'lldropout 20% andthenwe'rejustgonnadothesamethingagain.
Andthenwe'llsaymodeldotadwillthrowin a dense 64 layerandthenfinallymodeldotad a denselayerandit'llbeenddotactionspacesize, andthentheactivationactivationwillbelinear.
AndthenmodeldotcompilewillsaylossisMSCformeansquaredairoptimizerwillbetheAdamOptimizerwith a learningrateof 0.1 Uhandthenmetricswewilltrack a chorus e.
Okay, sothatisourmodelagain.
Samplecodeisin.
Therewillbe a linkinthedescriptiontosamplecode.
Soifyoumissedanything, youcancheckthatout.
Okay, So, umandthenAdam, wedon't actuallyhaveAdamimported, solet's goaheadandgrabthataswell.
Buttheproblemis, we'redoing a dotpredictforeverysinglestepthisagenttakesandwhatwewannahaveissomesortofconsistencyinthoseDOTpredictsthatweredoingbecausebesides, doing a dotpredicteverysinglestep, We'realsodoing a dotbiteverysinglestep.
Andthenwehaveselfdoubttargetmodel, thismodelwherethisistheonethatwe'redoing a dotpredicteverystep.
And I wouldevensaythisisThisiswhatwedon't predictagainsteverystep.
Andthenthisistheonethatgetsgetstrainedeverystep.
Makenoteofthatbecauseyou'llforget.
So, um, sothenwhathappensisaftereverysomenumberofstepsorepisodesorwhatever, you'llreupdateyourtargetmodel.
Soyou'lljustsettheweightstobeequaltoemodelagain.
Sothisjustkeeps a littlebitofsanityinyourpredictions, right?
Sointhepredictionsthatyou'remakingthisishowyou'llhave a littlebitofstabilityandconsistencysoyourmodelcancanactuallylearnsomethingbecauseit's justgoingtobetheresomuchrandom.
Next, we'regonnahaveselfdoubtreplay, underscorememory, andthatisgoingtobe a dick.
You'reDaekyu.
Don't I alwaysforgethowtopronouncethattousethatwe'regonnasayfromcollectioncollections, importdickyou, umAndifyoudon't know a dickDickyouisItis a setlength.
Thinkofitas, likegettingarrayor a listthatyoucansay.
I wantthislisttobemaxsize.
Sowe'regonnasayMaxLandequalsreplaymemorysize.
Let's goaheadandjustsetthatrealquick.
We'lljustsayboom.
Andwe'llsetthisto, um, 50,000.
Also, a coolertrick I recentlylearnedwasYoucanuseunderscoresinplaceofCommonsesosopythonseizesas 50,000 butit's a littlemorehumanreadable, likethisnumberis a littleharder.
Butthen, especiallylikeonceyouhave, like a numberlikethis, theunderscoresuddenlybecomesveryuseful.
I thinkrightnow, I justlike I said, I couldnotfind a tutorial, uh, or a informationtoactuallydothiskindofstuffthatactuallymadesenseoractuallyexplainedeverything.