# A/B Testing

## Outcomes

A/B testing has limitations. There are different outcomes as a result of A/B testing.

- Control > Solution
- Control = Solution
- Control < Solution

From the above, A/B testing can tell you if solution is better/worse than control, but it does not tell you **how much** it is better or worse by.

## Null hypothesis

In an A/B test, the default assumption is that the solution and control will have the same win rates.

The default assumption is called the null hypothesis.

The goal of an A/B test is to disprove the null hypothesis. It should have enough evidence that the solution is better or worse than control.

## Measuring confidence

Using statistical significance.

Statistical significance is to reduce the difference between the true and observed win rate below a threshold. There is some risk but it is low enough to be acceptable.

2 parameters of stat sig: P value, and statistical power.

P-value: probability that your test is a false positive. P-value of 5% means there is a 5% chance of falsely calling a positive or negative impact. (Type 1 error)

Statistical power: probability of not having a false null. A false null is when there is a real difference between control and solution, but data suggests they are actually the same. A statistical power of 80% means there is a 20% chance of falsely calling a null impact (Type 2 error)

Type 1 error: Calling a false positive / negative Type 2 error: Accepting the null hypothesis when one variation is actually different than the other

### P Value

Widely adopted principle is that maximum P value should be 0.05 or 5%.

### False error rate

For false nulls.

False error rate (actually the inverse of statistical power). To measure the probability of having a false null (or type 2 error). Lower the better.

The max false error rate is usually 0.2 or 20%.

## Sample size

Evan Miller's A/B tools - can use to calculate sample size