Understanding Student's t-Distribution: Symmetry and Applications
Dive into the world of Student's t-distribution, exploring its symmetric nature, formula, and crucial role in statistical analysis. Learn how to apply this powerful tool in various data scenarios.

  1. Intros0/1 watched
  2. Examples0/5 watched
  3. Practice0/9 practiced
  1. 0/1
  2. 0/5
  3. 0/9
Now Playing:Confidence intervals with t distribution– Example 0
Intros
0/1 watched
  1. How do we estimate population mean when ? is unknown?
Examples
0/5 watched
  1. Determining a Confidence Interval for a Population Mean using t-distributions
    The "Vendee Globe" is an around the world solo yacht race. In a particular year 31 sailors did the race and finished with an average time of 123 days, with a standard deviation of 11 days. With a t-score of tα2=2.45t_\frac{\alpha}{2}=2.45 construct a confidence interval for the average amount of time it takes the average Vendee Globe sailor to circumnavigate the world (sail around the world).
    Practice
    0/9
    Confidence Intervals With T Distribution 1
    Point estimates
    Jump to:Notes
    Notes

    Students t-distribution


    Continuing on our statistics course, in the lesson of today we will learn what is a t-distribution and how do we use it when constructing confidence intervals for the difference between two population means, but before we continue onto the full explanation of the topic, we need to get a little review on what is a normal probability distribution, and then how do we make a standard normal distribution, so that we can compare our concepts and their usage to perform similar statistical tests.

    A little review on Normal Distributions


    A normal distribution, also called a Gaussian distribution, is the most common (and probably most important) type of continuous probability distribution that exists. Because of that, many academic texts and study materials may provide a normal distribution definition where they simply call them a continuous probability distribution.

    The normal distribution allows a statistician to work with the best approximation for a random variables behavior on real life scenarios as established in the central limit theorem : as long as the sample is sufficiently large, the shape of a random variables distribution will be nearly normal. The normal distribution curve looks like:

    Students t-distribution
    Figure 1: Normal distribution curve

    The main characteristics of a normal probability distribution are:

    • It has a bell-shaped curve (the reason why many times is simply called a bell curve).

    • The normal curve is symmetric with the mean of the distribution as its symmetry axis and this mean has a value that is equal to the median and mode of the distribution (so, median = mode = mean in a normal distribution!).

    • The total area under the bell curve (also called a Gaussian curve) is equal to 1, then half of it is on one side of the mean value (the axis of symmetry) and half is on the other side.

    • The left and right tails on the normal distribution graph never touch the horizontal axis, they extend indefinitely because the distribution is asymptotic.

    • The shape of the normal distribution and its position on the horizontal axis are determined by the standard deviation and the mean. The mean sets the center point, while the bigger the standard deviation, the wider the bell curve will be.


    Students t-distribution
    Figure 2: How the mean and standard deviation define a normal distribution

    Then we have the standard normal distribution, also called a z-distribution, is a normal distribution which has a value of zero as its mean and it runs along its horizontal axis in units of its standard deviation. The values of any normal distribution can be translated into a standard normal distribution, meaning that a normal distribution can be re-scaled into a curve centered in the value of zero. This process is called standardization, and the resulting values corresponding to each point in the z-distribution are named z-scores.

    What is a t distribution


    What is the t distribution in statistics? The Students t-distribution is a continuous probability distribution that comes up when our data sample is considerably small and we do not know its variance because the standard deviation is unknown.

    The t distribution is a probability distribution similar to the standard normal distribution, but this one can be done without knowing the standard deviation of the population. When comparing the normal distribution versus t distribution, we see that the graph t distribution has tails which are bigger or fatter'' than the standard normal distribution, which means that there is a higher probability of finding a data point in one of the tails than when working with a normal distribution. That is one reason t-distribution graphs look shorter than the normal distribution, because more of its area is gone to the tails.

    Students t-distribution
    Figure 3: Comparison between a z distribution and a t distribution

    Remember that the area under a t-distribution and z-distribution curve represents probability, the total area in each curve is equal to 1. And so, this means the probabilities are similarly but still differently distributed in each of these probabilities.

    The reason for a t chart distribution shape comes from the fact that t-distributions are used for smaller samples. When studying characteristics of a population, there are different sampling methods that need to be followed in order for the sample to be representative of a population. It has been found that when data points are graphed in a probability distribution, the bigger the sample the more data points will be found to be close or equal to the mean, and so a smaller sample size provides a higher proportion of the sample to be spread out.

    And so, what are t distributions when graphed? They are still bell shaped distributions but since your sample is smaller, one single individual point being far away from the mean is a bigger proportion of the total sample than when your sample contains many more individual points, and so this characteristic makes its tails thicker.


    Derivation of the t distribution


    In order to find the t distribution values for a specific problem setting we use the significance level and the degrees of freedom for the case to locate the specific t-score dividing the region of rejection and failure-to-reject. We do this by finding the intersection point of the degrees of freedom and the significance level in a Students t distribution table.
    Notice, these values are actually critical values for a t distribution graph (locations in the horizontal axis of the graph that provide a division among different regions, specifically the zone denoting the rejection of the null hypothesis and the zone denoting the failure to reject it).

    An example of these t distribution tables is given in the section for the step-by-step instructions to solve a hypothesis test for a population mean.

    Besides the tables, we can calculate the test statistic t-score using the following formula:

    t=xμsn\large t = \frac{\overline{x} \, - \, \mu } {\frac{s}{\sqrt{n}} }
    Equation 1: T distribution formula

    t\quad t = Student t-distribution
    n\quad n = the sample size
    μ\quad \mu = the population mean
    x\quad \overline{x} = the sample mean
    s\quad s = the sample standard deviation

    When to use t distribution


    In the previous lesson we discovered how to make a confidence interval for estimating a population mean. However we knew what the population standard deviation (σ\sigma ) was. However it is not always the case that σ\sigma is known.
    When building a confidence interval, remember that we use the formula:

    xE<μ<x+E \overline{x} \, - \, E< \, \mu \, < \, \overline{x} \, + \, E
    Equation 2: Formula for a confidence interval

    Where the margin of error EE is defined as:

    E=Zσ2σn \large E = Z_{\frac{\sigma}{2}} \, \cdot \, \frac{\sigma}{\sqrt{n}}
    Equation 3: Formula for the margin of Error defined in terms of z-scores

    If the population standard deviation (σ\sigma ) is unknown then to make a confidence interval to estimate the population mean we cannot use our old formula for error as it requires a knowledge of σ\sigma . So instead we are required to use a thing called t-scores (tα2 \large t_{\frac{\alpha}{2}}).

    Stop for a moment and check the notation for the t-scores: tα2 \large t_{\frac{\alpha}{2}}

    The tt comes from the Students t distribution and the subindex α2\large \frac{\alpha}{2} comes from the significance level (α\alpha) and how it is spread in the tails of the distribution. Remember that the significance level can be located either on the right or the left tail of a distribution. When the significance level is divided by two, it means half of it is found on the left tail and the other half on the right tail of the distribution as shown below:

    Students t-distribution
    Figure 4: Distribution graph showing 2 tails, each having an area of α\alpha/2

    To find the t-scores we need the value of the significance level, the amount of degrees of freedom for the specific Student t distribution and a full t distribution table such as the one shown below:

    Students t-distribution
    Figure 5: T-table

    The degrees of freedom are obtained by subtracting the sample size nn minus 1:

    degrees of freedom = n1=d.f. n - 1 = d.f.
    Equation 4: Degrees of freedom for a t-distribution

    And you may be wondering how to use a students t distribution table, well it is easy!
    Notice the top row shows the value of the significance level and it differentiates between a one tail or two tailed case, while the left column contains the number of degrees of freedom. The only thing you need to do is look for the row of degrees of freedom you have, and the column of the significance level you have, and where the row and column meet, that is the value of your t-score!

    If you still have any question on how to read a students t distribution table just take a look at example problem 2 (part a) below, where it is explained step by step.

    Once we find the t-scores for particular values (this is done in a similar way to finding z-scores) we have a new formula for the Margin of Error:

    E=tα2sn\large E = t_{\frac{\alpha}{2}} \cdot \frac{s}{\sqrt{n}}
    Equation 5: Formula for the margin of Error defined in terms of t-scores


    Find a t distribution example problems


    Example 1

    Determining a Confidence Interval for a Population Mean using t-distributions
    The "Vendee Globe" is an around the world solo yacht race. In a particular year 31 sailors did the race and finished with an average time of 123 days, with a standard deviation of 11 days. With a t-score of tα2\large t_{\frac{\alpha}{2}} = 2.45 construct a confidence interval for the average amount of time it takes the average Vendee Globe sailor to circumnavigate the world (sail around the world).

    We have the following information for this problem:

    n\quad n = 31 = Sample size
    s\quad s = 11 days = Sample standard deviation
    x\quad \overline{x} = 123 days = sample mean
    tα2\quad \large t_{\frac{\alpha}{2}} = 2.45 = t score

    In order to find the confidence interval we take a look at the population mean and the values possible below and above it. We set this up easily because we know we will have values spreading out from the mean in the following manner:

    xE<μ<x+E \overline{x} \, - \, E< \, \mu \, < \, \overline{x} \, + \, E
    Equation 6: Formula for a confidence interval

    Where:

    x\quad \overline{x} = sample mean = 123 days
    E\quad E = margin of error
    μ\quad \mu = population mean

    Using the formula for error, we calculate the left hand side and right hand side of the mean definition:

    xE=x \overline{x} - E=\overline{x} - tα2sn \large t_{\frac{\alpha}{2}} \cdot \frac{s}{\sqrt{n}}

    xE=123days(2.45) \overline{x} - E = 123 \, days-(2.45) 11days31\large \frac{11 \, days}{\sqrt{31}}

    xE=118.16days \overline{x} - E =118.16 \, days

    And

    x+E=x+ \overline{x} + E = \overline{x} \, + \, tα2sn \large t_{\frac{\alpha}{2}} \cdot \frac{s}{\sqrt{n}}

    x+E=123days+(2.45) \overline{x} + E = 123 \, days + (2.45) 11days31\large \frac{11 \, days}{\sqrt{31}}

    x+E=127.84days\overline{x} + E =127.84 \, days
    Equation 7: Solving for the confidence interval limits

    And so the confidence interval is as follows:

    118.16days<μ<127.84days118.16 \, days< \mu <127.84 \, days
    Equation 8: Final confidence interval

    Therefore, the Vendee Globe sailors take in between 118.16 to 127.84 days on average to circumnavigate the world.

    Example 2

    In "Anchiles", a small made-up town near the equator, 15 random days were sampled and found to have an average temperature of 28°C, with a standard deviation of 4°C. Assume that the average daily temperature of this town is normally distributed.

    a. \quad With a 95% confidence where does the average daily temperature of Anchiles lie?

    We have the following information for this problem:

    n\quad n = 15 = Sample size
    s\quad s = 4 °C = Sample standard deviation
    x\quad \overline{x} = 28 °C = sample mean

    Where the margin of error is:

    E=tα2sn\large E = t_{\frac{\alpha}{2}} \cdot \frac{s}{\sqrt{n}}
    Equation 9: Formula for the margin of Error defined in terms of t-scores

    To calculate this, we need to obtain the t-score. Remember that we can obtain a t-score from a t-table of values using the significance level (α\alpha) and the degrees of freedom in t distribution for this case.
    The significance level can be easily obtained from the confidence level of 95%, which means that:

    Confidence Level = 0.95 = 1 - α\alpha

    Therefore:

    α\alpha = 1 - 0.95 = 0.05
    Equation 10: Significance level α\alpha

    The degrees of freedom are calculated from the subtraction:

    nn - 1 = 15 - 1 = 14
    Equation 11: Degrees of freedom

    And with this information we use the t table to find the t-value:

    Students t-distribution
    Figure 6: Finding a t-score on the t-table

    Notice that we used the two-tail significance level value because when looking for the t-score at this time, we are looking for tα2 \large t_{\frac{\alpha}{2}} where α2\large \frac{\alpha}{2} refers to a significance level (rejection area) divided in two equal parts which belong to the tails of the distribution curve.

    And so, using the t-value found of tα2 \large t_{\frac{\alpha}{2}} = 2.145 we can calculate the error value for this problem:

    E=tα2sn=(2.145)(4C15) \large E = t_{\frac{\alpha}{2}} \, \cdot \, \frac{s}{\sqrt{n}} = (2.145)(\frac{4^{\circ}C}{\sqrt{15}}) = 2.215°C
    Equation 12: Finding the margin of error value

    And finally find the confidence interval of the average daily temperatures:

    xE<μ<x+E \overline{x} - E < \, \mu \, < \overline{x} + E

    Where:

    xE=28C2.215C=25.785C \overline{x} - E = 28^{\circ}C - 2.215^{\circ}C = 25.785^{\circ}C

    And:

    x+E=28C+2.215C=30.215C \overline{x} + E = 28^{\circ}C + 2.215^{\circ}C = 30.215^{\circ}C

    And so:

    25.785C<μ<30.215C 25.785^{\circ}C < \, \mu \, < 30.215^{\circ}C
    Equation 13: Finding the confidence interval

    And so we can say with 95% of confidence that the daily average temperature of Anchiles falls somewhere between 25.785°C and 30.215°C.

    b. \quad What if we knew that in fact the standard deviation of temperature was 4°C for the entire population? Then with a 95% confidence where does the average daily temperature of Anchiles lie?

    For this question we actually have the population standard deviation so we can use z-scores instead of t scores!, let us gather all of the information we have:

    n\quad n = 15 = Sample size
    σ\quad \sigma = 4 °C = population standard deviation
    x\quad \overline{x} = 28 °C = sample mean
    \quad 1 - α\alpha = 95% = 0.95
    α\quad \alpha = 0.05

    And we are looking for the confidence interval   xE<μ<x+E \; \overline{x} - E < \, \mu \, < \overline{x} + E
    Where:

    E=Zσ2σn\large E = Z_{\frac{\sigma}{2}} \, \cdot \, \frac{\sigma}{\sqrt{n}}
    Equation 14: Formula for the margin of error

    In order to calculate the error we need to find the z-score Zα2 \large Z_{\frac{\alpha}{2}}, notice this one comes from a two tailed test. Since the confidence level is 95%, that means that in order to find the point on the horizontal axis of the distribution for the right tail we add the surfaces of the confidence level (0.95) and the area on the left tail (0.025)... and so you have an area of 0.975 to use in the z-table to find the value of the z-score:

    Students t-distribution
    Figure 7: Finding a t-score on the t-table

    And so:

    E=Zα2σn=(1.96)(4C15) \large E = Z_{\frac{\alpha}{2}} \, \cdot \, \frac{\sigma}{\sqrt{n}} = (1.96)(\frac{4^{\circ}C}{\sqrt{15}}) =2.0243= 2.0243
    Equation 15: Finding the margin of error value

    Finally find the confidence interval of the average daily temperatures:

    xE<μ<x+E \overline{x} - E < \, \mu \, < \overline{x} + E

    Where:

    xE=28C2.0243C=25.9757C \overline{x} - E = 28^{\circ}C - 2.0243^{\circ}C = 25.9757^{\circ}C

    And:

    x+E=28C+2.0243C=30.0243C \overline{x} + E = 28^{\circ}C + 2.0243^{\circ}C = 30.0243^{\circ}C

    And so:

    25.9757C<μ<30.0243C 25.9757^{\circ}C < \, \mu \, < 30.0243^{\circ}C
    Equation 16: Finding the confidence interval

    Therefore the average daily temperature of Anchiles lies between 25.9757 to 30.0243 degrees celsius.

    c. \quad From the previous two questions, which has a larger confidence interval? Why might that be the case? Look at the t-scores as the sample gets larger and larger.
    The first case has the largest confidence interval, take a look at our video lesson so you can see the graphic description of how the size of the sample does affect t-scores.

    Example 3

    Determining the Sample Standard Deviation with a given Margin of Error
    From a sample of 25 new drivers it was found that the average age that a young adult in British Columbia receives their driver's license is given with a 90% confidence as somewhere in the interval of 16.72 < μ \mu < 23.28 years old. Assume that the age that new drivers receive their license is normally distributed. What was the standard deviation from this sample?

    Since we are looking for a sample standard deviation, we know that we need to use the formulas for a Students t distribution to find the value of s. But first, we need to gather all the information for the problem:

    n\quad n = 25 = Sample size
    \quad 1 - α\alpha = 90% = 0.90 = confidence level
    α\quad \alpha = 0.10 = significance level
    \quad 16.72 < μ \mu < 23.28 years old = confidence interval

    Using the formula for the confidence interval we have that:

    16.72<μ<23.28yearsold16.72 < \, \mu \, < 23.28 \, years old \enspace xE<μ<x+E \enspace \overline{x} - E < \, \mu \, < \overline{x} + E

    Therefore:

    xE=16.72\overline{x} - E = 16.72

    x+E=23.28\overline{x} + E = 23.28
    Equation 17: Declaring a system of equations with two unknowns to find the margin of error and the sample mean

    Solving the system of equations:

    x=16.72+E\overline{x} = 16.72 + E

    x+E=(16.72+E)+E=16.72+2E=23.28\overline{x} + E = (16.72 + E) + E = 16.72 + 2E = 23.28

    16.72+2E16.72=23.2816.7216.72 + 2E - 16.72 = 23.28 - 16.72

    2E=6.56 2E = 6.56 \enspace E= \enspace E = 6.562 \large \frac{6.56}{2} =3.28 = 3.28

    xE=x3.28=16.72\overline{x} - E = \overline{x} - 3.28 = 16.72 \enspace x=16.72+3.28=20 \enspace \overline{x} = 16.72 + 3.28 = 20
    Equation 18: Finding the margin of error and the sample mean values

    So the error is equal to 3.28 and the sample standard deviation is 20.
    Now we use the equation for the error to solve for the sample standard deviation:

    E=tα2sn  \large E = t_{\frac{\alpha}{2}} \, \cdot \, \frac{s}{\sqrt{n}} \;   Entα2=s \; \frac{E \, \cdot \, \sqrt{n}} {t_{\frac{\alpha}{2}}} = s
    Equation 19: Solving for the sample standard deviation from the margin of error formula

    For this we need the t-score tσ2\large t_{\frac{\sigma}{2}} , and it comes from a two tailed test.
    To find the t-score we need the t-distribution degrees of freedom which can be found by doing the subtraction: nn - 1 = 25 - 1 = 24. And we also need the confidence level which is equal to 0.10.
    With that, we use a Students t-distribution table:

    Students t-distribution
    Figure 8: Finding a t-score on the t-table

    And so:

    s=s = Entα2=(3.28)(25)(1.711)= \large \frac{E \, \cdot \, \sqrt{n} }{t_{\frac{\alpha}{2}}} = \frac{(3.28)(\sqrt{25})}{(1.711)} = 9.5859.585
    Equation 20: Result for the Sample standard deviation

    Therefore the sample standard deviation is 9.585 years.

    **********

    To continue working on examples of building confidence intervals using a t-distribution we recommend you to take a look at the following t distribution calculator where you can find t-scores faster. Practice at home finding values for t-scores using the t distribution table and checking them with the calculator.
    In the previous section we discovered how to make a confidence interval for estimating population mean. However we knew what the population standard deviation (σ\sigma) was. However it is not always the case that σ\sigma is known.

    If population standard deviation (σ\sigma) is unknown then to make a confidence interval to estimate population mean we cannot our old formula for error: E=Zσ2σnE=Z_\frac{\sigma}{2}*\frac{\sigma}{\sqrt{n}} as it requires a knowledge of σ\sigma. So instead we are required to use a thing called t-scores (tα2)t_{\frac{\alpha}{2}}).

    Once we find the t-scores for particular values (this is done in a similar way to finding z-scores) we have a new formula for the Margin of Error:
    E=Zσ2SnE=Z_\frac{\sigma}{2}*\frac{S}{\sqrt{n}}