A data set can have no mode (each value is unique), 1 mode, or more than 1 mode (if there are 2 or more equal number of most common values) — if there are two such numbers, we say the data is bi-modal.
When you have even number of data points (say 10), we don’t have an exact middle number — what we do is to take the two middle numbers (5th and 6th in this case) and take the mean of them as the median.
Variability of Data
Range
Range looks at 2 extreme data points and compute the statistic.
Standard Deviation
Standard Deviation measure the variability of data points with respect to the mean of the dataset. If the data points are dispersed, it has a high standard deviation. If the data points are close to the mean, it has a low standard deviation.
Standard Normal Distribution
When our data fit a bell shaped like curve, we say our data is normal distributed and it has a normal distribution. Normal distribution is the most common distribution we see in the world. Height of the people in the world are normally distributed. Grades of large enough set of students (usually 30 or more) are normally distributed.
When the mean = 0 and SD = 1, we say the normal distribution is a standard normal distribution (SND).
SND gives us a way to meaningfully compare two distributions.
Z-Score
Calculating Standard Deviation
Empirical Rule
Outliers
How do we measure outliers?
What should we do with outliers?
We sometimes ignore them as they are rare events and are likely to not occur.
We often time use them as a learning opportunity. We study why it happened, how it happened, what can we do to produce more outliers like that (if it is something good — e.g. extreme performer)
In this lesson, we learned about the very basics of statistics. Now you have the understanding to conquer more advanced topics. Khan Academy’s statistics course is a great next step.