Thismotionisactuallyindependentofwhereweplacetheglassandthepicture, sowecanhave a muchmoreadaptivesystemthatdoesnotneedtohaveeverythingspecifiedprecisely.
And I supposetheinteractionbetweendevicesisreallyimportantforreal-worldpurposeslikeself-drivingcars, thattheybeabletoseeeachotherandunderstandhowtointeractwithwhat's aroundthem.
It's neverseen a redbagor a trafficconeinitsdataset, butcombiningitsunderstandingoflanguageandvision, itisabletoreason, soshouldbeabletounderstandandcarryoutthiscommandandsomanymore.