Subtitles section Play video Print subtitles Hi, I'm Adriene Hill, and Welcome back to Crash Course Statistics. In the last episode we talked about some areas in which we still struggle to make consistently accurate predictions. But there are also many areas in which we have done really well. Companies have increasingly improved their use of both customer and outside data to make sure they have the right items in stock. And our ability to predict the weather has improved a ton since the days when people believed that deities used weather to punish us. Statistics has also transformed sports from football fans who use state of the art analytics to come up on top of their fantasy football leagues, to soccer where players shooting penalty kicks have figured out where to aim the ball for the highest chance of scoring. Baseball even has a name for its analytic field: Sabermetrics. Pretty much everything we've done in this series from data visualization, to chi-square tests, to Machine Learning and bayesian hypothesis testing has led up to this last episode. Whether we're doing inferential tests, or creating predictive models, we want to make informed decisions. From which medication to take, to which colleges to apply to. And statistics allows us to use inference and prediction to make those decisions. INTRO Let's start with how prediction helps companies and their customers. Walmart, has accumulated data on customer demand for different items. And their team discovered some surprising trends, like the fact that wind conditions may have an impact on whether or not customers want to eat berries… They found that people like to eat berries when temperatures it's cooler than 80F or 26.7 C and there's very little wind. So, they advertise berries more at times like that, when demand is high. They also know that if it's not raining and warm people are likely to buy steaks. If it gets hot--over 90F or 32.2 C people buy hamburger. Big and small stores alike all want to predict exactly when people will want to buy things. If they can get it right, then they save money by not having unwanted merchandise taking up warehouse space, and make money by selling stuff. They also won't LOSE money because they didn't have enough stock of a popular item. And customers are happy if there are NY strip steaks available when they want to eat them. One company that has shared a bit about its algorithms is StitchFix. It's a style subscription service that sends you clothes to try on and potentially buy. StitchFix uses data and statistics in order to make sure that they choose clothes you're more likely to wanna keep. And their model has a lot of moving parts. It uses algorithms not just to stock its warehouse or match me with a blouse but also to help DESIGN clothes. Each dress or pair of pants has a set or attributes. Gold, Lame, Flared. Stitchfix also has data on what subscribers like. Gold, Lame. To create new styles the recombine the attributes of existing styles and alter them slightly. Then they bring the human designers to help out. At least for now.. Alright, gold lame pants probably aren't the best example of successful use of statistics and algorithms, but the success of statistics and analytics in baseball will not come as a surprise to anyone who has seen or read “Moneyball”. Stats like batting average--which is number of hits divided by number of times at bat--have been around for a long time. But many of these simpler stats were missing a lot of information about what really makes a good baseball player. In Moneyball, Michael Lewis writes about Bill James the father of sabermetrics who believed “The statistics were not merely inadequate; they lied. And the lies they told led the people who ran major league baseball teams to misjudge their players, and mismanage their games.“ So in 2001, when the Oakland A's lost 3 of their best players, and found themselves with a lack of funds to replace them, manager Billy Beane decided to use statistics to find the best players for the team. Beane and his assistant--the stats savvy Paul DePodesta--looked at how adding individual players to the team could increase the probability of winning games. They calculated more complicated statistics such as how many walks players had, and their on base average (which is a measure of how often a player reaches a base whether from a hit, a walk, or by being hit by the pitch). They used data that other teams weren't paying attention to, and as a result, they recruited players that other teams had overlooked. Beane's attention to statistical details paid off. In the 2002 season, the A's won 20 straight games, a record at the time for their league. This spurred on the popularity of Sabermetrics which is the statistical analysis of players and gameplay in baseball. Sabermetricians use statistics to figure out who to hire, who to trade, and when to pull pitchers from the mound. Major League Baseball teams use high-def cameras and radar to measure pitch release and velocity. They track a baseball's spin rate. They gather data on the angle of the the ball when it leaves the bat after it's been hit. And data shows that a ball hit a little higher is more likely to become a hit or homerun. So, baseball players are now trying to hit the ball higher in the air. According to the Washington Post--the average launch angle went up from 10.5 degrees in 2015 to 11.5 degrees in 2016. Or as Dodger Justin Turner, put it: “You can't slug by hitting balls on the ground. You have to get the ball in the air if you want to slug, and guys who slug stick around, and guys who don't, don't.” Managers sometimes use statistics when they're deciding when and where players should stand on defense. Kind of like when I was at bat as a kid, and everyone ran in 5 steps it was embarrassing. Whatever. Since managers have access to data on every player, -they can gauge where a ball hit by an opposing batter is most likely to go. Traditionally the baseball players stand about here But managers can move them, based on the past behavior of the batter. If a player has a tendency to hit the ball to the left side of the field--like data from the Cubs' third baseman Kris Bryant showed in 2017 and 2018--managers can move their fielders so that they're more concentrated in that area. This gives the team on defense a better chance of getting the out. And it turns out defenses shift against Bryant specifically over half the time he's at bat! A lot of teams do this. Defensive shifting has gone up 5% in the last year. The Houston Astros and the Kansas City Royals shift more than most. The Astros shifted their defense about 37% of the time in 2018. And the Royals shifted 27% of the time, which meant they shifted 1304 more times than they did in 2017. Sabermatrecians aren't the only ones predicting what's going to happen on the field. Meteorologists are using statistics to predict the weather. so they can have that big tarp ready when it rains. I love that big tarp. [tarp-spreading noise] Weather has historically seemed unpredictable to humans. In ancient Greek mythology, Zeus controlled the sky, as well as the thunder, rain, and lightning. But we've come a long way since then. In 1870, President Ulysses S. Grant established The Weather Bureau--now called the National Weather Service--in the United States. At first, forecasts were filled with vague uncertainty, and had very little precision compared to the hour by hour forecasts we have today. They were also limited in their reach, perhaps only forecasting a day or two compared to today's 10 day forecasts. Over the years, our predictive abilities have improved. According to Nate Silver, “In 1972, the [National Weather Service's] high-temperature forecast missed by an average of six degrees when made three days in advance. Now it's down to three degrees.” Silver also cites the current odds of an American being killed by lightning -- 1 in 11 million -- compared to those odds in 1940, 1 in 400,000. Some of that not-being-struck-by-lightning can be attributed to better weather prediction. U.S. meteorologists and weather researchers use a combination of doppler radar, satellites data around the planet and facing the sun, radiosondes in weather balloons in the upper stratosphere, and regular old weather stations. And then they crunch all that data with NOAA's Weather and Climate Operational Supercomputer System which is 6 million times faster than your or my computer. And that allows them to more accurately predict weather events, like rainfall, drought and hurricane paths About 25 years ago, hurricane path predictions would be off by about 563 km (350 miles). Now we're off only about 161 km (100 miles) and scientists likely will keep improving on that. Nate Silver notes in his book “The Signal and the Noise” that the advanced notice we had that Hurricane Katrina was going to hit New Orleans likely saved a lot of people. Even though Katrina was still devastating, a few decades ago, we may not have known to evacuate as many people as we did. With better weather prediction--we also have more time to get out of the way of tornadoes and flash floods and severe thunderstorms. We can avoid getting stuck in extreme heat or extreme cold. And stay off icy roads. It's important that we have continued improvement on a global scale. Being able to predict rainfall and get that data to the right people will be crucial, particularly as temperatures change and the climate shifts. In recent years, climate scientists have been able to more accurately forecast rainfall in sub-Saharan Africa, which impacts food from farms that use rain as a water source. But for weather predictions to be useful to as many people as possible, experts recommend that investments are made in data management systems, satellites, and means to distribute the information to the right people, like rural farmers. The complexity of the weather data can also make it hard to create a “best” model by hand. Some researchers have begun to use Machine Learning to help handle all that data. One team at Chapman University used a Recurrent Neural Network to predict droughts in California. They predicted how severe droughts would be and their model did pretty well Weather is an incredibly noisy phenomenon. There are many factors that affect the temperature, humidity, and other weather events. And the more complex a phenomenon is, the more data we need to accurately predict it. As we've discussed before, Neural Networks are often better than humans at figuring out patterns in huge amounts of complex data. Statistics help us see how the world works, and hints at how the world could work. It helps us see through uncertainty, but doesn't get rid of that uncertainty. It can show us our biases, it can also paper over them. Statistics help us update our beliefs and come up with new ones. Even if you don't come away from this series remembering what ANOVA stands for we hope you take away that the world isn't binary that it's complicated sometimes requiring complicated solutions. If you don't remember specifics about p-values take away the importance of reading further anytime you see a study that you might base a life decision on see if it makes sense to you. And remember improbable things are likely to happen. Just not to you. Or to me. Most of us are right in the middle of most of the curves that describe us. And that's OK. Statistics can show us where we are outliers too. Thanks for watching! DFTBA-Q. Don't Forget to be Asking Questions.
B1 CrashCourse baseball data predict ball tarp When Predictions Succeed: Crash Course Statistics #44 6 0 林宜悉 posted on 2020/03/30 More Share Save Report Video vocabulary