Hypothesis testing is used to scientifically prove a fact, eliminating any randomness. Example: Is a drug A better than drug B?

  • Alternative Hypothesis is a type of Hypothesis testing where you want to prove the hypothesis (example: drug A is more efficient than drug B).
  • Null hypothesis is the opposite of Alternative Hypothesis, where you want to prove the inverse of . (example: drug A is not more efficient than drug B). Why do we need null hypothesis? It’s useful if we have an established fact (drug B is the most efficient drug until now), and we want to be the devil’s advocate (drug A is not more efficient). So we perform null testing, and if we arrive to a contradiction, then we can arrive to the fact that drug A is actually more efficient than drug B.

In machine learning, we can use hypothesis testing to prove that a new model B is better than an established model A.

P-value

The p-value measures the probability of obtaining results at least as extreme as the observed results, assuming that the null hypothesis is true.

  • Null Hypothesis (H0): This is a statement of no effect or no difference, which you are testing against.
  • Alternative Hypothesis (H1): This represents the opposite of the null, indicating that there is an effect or a difference.

Significance of the p-value

  1. Decision-Making Tool:
    • A low p-value (typically ≤ 0.05) suggests that the observed data is unlikely under the null hypothesis, leading researchers to reject the null hypothesis in favor of the alternative hypothesis.
    • A high p-value (> 0.05) indicates that the data does not provide enough evidence to reject the null hypothesis.
  2. Interpretation:
    • p ≤ 0.05: Suggests statistical significance; the results are considered unlikely under the null hypothesis.
    • p > 0.05: Suggests no statistical significance; the results could be due to chance.
  3. Limitations:
    • The p-value does not measure the size or importance of an effect; it only assesses whether an effect exists.
    • Misinterpretation is common; a p-value does not indicate the probability that either hypothesis is true.

Example

Suppose you are testing a new drug’s effectiveness:

  • H0: The drug has no effect on patients.
  • H1: The drug has a positive effect on patients.

After conducting an experiment, you calculate a p-value of 0.03. Since this is less than 0.05, you would reject the null hypothesis, suggesting that the drug likely has a significant effect.

Methods for Hypothesis testing

There are many techniques to conduct hypothesis testing, like:

  • t-test: test mean equality for two populations;
  • z-test: similar to t-test, but for a large sample size or when the population standard deviation is know.
  • ANOVA - Analysis of Variance: test mean equality for more than two populations;
  • Chi-Square Test: test the independence between two categorical variables;

Type I and Type II Errors

Type I Error (False Positive)

  • Definition: Rejecting the null hypothesis when it is actually true.
  • Consequence: You think there’s an effect or difference when there isn’t.
  • Probability: Denoted by α (alpha), typically set at 0.05.

Example: A pregnancy test says you’re pregnant (positive), but you’re not.


Type II Error (False Negative)

  • Definition: Failing to reject the null hypothesis when it is actually false.
  • Consequence: You miss a real effect or difference.
  • Probability: Denoted by β (beta). Power = 1 - β.

Example: A pregnancy test says you’re not pregnant (negative), but you are.


Error Type  Actual Truth      Decision Made        Description        
Type I ErrorH₀ is true        Reject H₀             False Positive     
Type II ErrorH₀ is false       Fail to reject H₀     False Negative     

statistics resources: