Placeholder Image

Subtitles section Play video

  • Hi, and welcome back.

  • This is the main section of this course.

  • It is based on the knowledge that you acquired previously, so if you haven’t been through

  • it, you may have a hard time keeping up.

  • Make sure you have seen all the videos about confidence intervals, distributions, z-tables

  • and t-tables, and have done all the exercises.

  • If youve completed them already, you are good to go.

  • Confidence intervals provide us with an estimation of where the parameters are located.

  • However, when you are making a decision, you need a yes/no answer.

  • The correct approach in this case is to use a test.

  • In this section, we will learn how to perform one of the fundamental tasks in statistics

  • - hypothesis testing!

  • Okay.

  • There are four steps in data-driven decision-making.

  • First, you must formulate a hypothesis.

  • Second, once you have formulated a hypothesis, you will have to find the right test for your

  • hypothesis.

  • Third, you execute the test.

  • And fourth, you make a decision based on the result.

  • Let’s start from the beginning.

  • What is a hypothesis?

  • Though there are many ways to define it, the most intuitive I’ve seen is:

  • “A hypothesis is an idea that can be tested.”

  • This is not the formal definition, but it explains the point very well.

  • So, if I tell you that apples in New York are expensive, this is an idea, or a statement,

  • but is not testable, until I have something to compare it with.

  • For instance, if I define expensive as: any price higher than $1.75 dollars per pound,

  • then it immediately becomes a hypothesis.

  • Alright, what’s something that cannot be a hypothesis?

  • An example may be: would the USA do better or worse under a Clinton administration, compared

  • to a Trump administration?

  • Statistically speaking, this is an idea, but there is no data to test it, therefore it

  • cannot be a hypothesis of a statistical test.

  • Actually, it is more likely to be a topic of another discipline.

  • Conversely, in statistics, we may compare different US presidencies that have already

  • been completed, such as the Obama administration and the Bush administration, as we have data

  • on both.

  • Alright, let’s get out of politics and get into hypotheses.

  • Here’s a simple topic that can be tested.

  • According to Glassdoor (the popular salary information website), the mean data scientist

  • salary in the US is 113,000 dollars.

  • So, we want to test if their estimate is correct.

  • There are two hypotheses that are made: the null hypothesis, denoted H zero, and the alternative

  • hypothesis, denoted H one or H A. The null hypothesis is the one to be tested and the

  • alternative is everything else.

  • In our example, The null hypothesis would be: The mean data

  • scientist salary is 113,000 dollars, While the alternative: The mean data scientist

  • salary is not 113,000 dollars.

  • Now, you would want to check if 113,000 is close enough to the true mean, predicted by

  • our sample.

  • In case it is, you would accept the null hypothesis.

  • Otherwise, you would reject the null hypothesis.

  • The concept of the null hypothesis is similar to: innocent until proven guilty.

  • We assume that the mean salary is 113,000 dollars and we try to prove otherwise.

  • Alright.

  • This was an example of a two-sided or а two-tailed test.

  • You can also form one sided or one-tailed tests.

  • Say your friend, Paul, told you that he thinks data scientists earn more than 125,000 dollars

  • per year.

  • You doubt him so you design a test to see who’s right.

  • The null hypothesis of this test would be: The mean data scientist salary is more than

  • 125,000 dollars.

  • The alternative will cover everything else, thus: The mean data scientist salary is less

  • than or equal to 125,000 dollars.

  • It is important to note that outcomes of tests refer to the population parameter rather than

  • the sample statistic!

  • As such, the result that we get is for the population.

  • Another crucial consideration is that, generally, the researcher is trying to reject the null

  • hypothesis.

  • Think about the null hypothesis as the status quo and the alternative as the change or innovation

  • that challenges that status quo.

  • In our example, Paul was representing the status quo, which we were challenging.

  • Alright.

  • That’s all for now.

  • In the next lectures, we will see some examples and learn how to make data-driven decisions.

Hi, and welcome back.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it