Teach yourself statistics
Hypothesis Test of a Proportion (Small Sample)
This lesson explains how to test a hypothesis about a proportion when a simple random sample has fewer than 10 successes or 10 failures - a situation that often occurs with small samples. (In a previous lesson , we showed how to conduct a hypothesis test for a proportion when a simple random sample includes at least 10 successes and 10 failures.)
The approach described in this lesson is appropriate, as long as the sample includes at least one success and one failure. The key steps are:
- Formulate the hypotheses to be tested. This means stating the null hypothesis and the alternative hypothesis .
- Determine the sampling distribution of the proportion. If the sample proportion is the outcome of a binomial experiment , the sampling distribution will be binomial. If it is the outcome of a hypergeometric experiment , the sampling distribution will be hypergeometric.
- Specify the significance level . (Researchers often set the significance level equal to 0.05 or 0.01, although other values may be used.)
- Based on the hypotheses, the sampling distribution, and the significance level, define the region of acceptance .
- Test the null hypothesis. If the sample proportion falls within the region of acceptance, do not reject the null hypothesis; otherwise, reject the null hypothesis.
The following examples illustrate how to test hypotheses with small samples. The first example involves a binomial experiment; and the second example, a hypergeometric experiment.
Example 1: Sampling With Replacement
Suppose an urn contains 30 marbles. Some marbles are red, and the rest are green. A researcher hypothesizes that the urn contains 15 or more red marbles. The researcher randomly samples five marbles, with replacement , from the urn. Two of the selected marbles are red, and three are green. Based on the sample results, should the researcher reject the null hypothesis? Use a significance level of 0.20.
Solution: There are five steps in conducting a hypothesis test, as described in the previous section. We work through each of the five steps below:
Null hypothesis: P >= 0.50
Alternative hypothesis: P < 0.50
Given those inputs (a binomial distribution where the true population proportion is equal to 0.50), the sampling distribution of the proportion can be determined. It appears in the table below, which shows individual probabilities for single events and cumulative probabilities for multiple events. (Elsewhere on this website, we showed how to compute binomial probabilities that form the body of the table.)
Number of red marbles in sample | Sample prop | Prob | Cumulative probability |
---|---|---|---|
0 | 0.0 | 0.03125 | 0.03125 |
1 | 0.2 | 0.15625 | 0.1875 |
2 | 0.4 | 0.3125 | 0.5 |
3 | 0.6 | 0.3125 | 0.8125 |
4 | 0.8 | 0.15625 | 0.96875 |
5 | 1.0 | 0.03125 | 1.00 |
- Specify significance level . The significance level was set at 0.20. (This means that the probability of making a Type I error is 0.20, assuming that the null hypothesis is true.)
However, we can define a region of acceptance for which the significance level would be no more than 0.20. From the table, we see that if the true population proportion is equal to 0.50, we would be very unlikely to pick 0 or 1 red marble in our sample of 5 marbles. The probability of selecting 1 or 0 red marbles would be 0.1875. Therefore, if we let the significance level equal 0.1875, we can define the region of rejection as any sampled outcome that includes only 0 or 1 red marble (i.e., a sampled proportion equal to 0 or 0.20). We can define the region of acceptance as any sampled outcome that includes at least 2 red marbles. This is equivalent to a sampled proportion that is greater than or equal to 0.40.
- Test the null hypothesis . Since the sample proportion (0.40) is within the region of acceptance, we cannot reject the null hypothesis.
Example 2: Sampling Without Replacement
The Acme Advertising company has 25 clients. Account executives at Acme claim that 80 percent of these clients are very satisfied with the service they receive. To test that claim, Acme's CEO commissions a survey of 10 clients. Survey participants are randomly sampled, without replacement , from the client population. Six of the ten sampled customers (i.e., 60 percent) say that they are very satisfied. Based on the sample results, should the CEO accept or reject the hypothesis that 80 percent of Acme's clients are very satisfied. Use a significance level of 0.10.
Null hypothesis: P >= 0.80
Alternative hypothesis: P < 0.80
Given those inputs (a hypergeometric distribution where 20 of 25 clients are very satisfied), the sampling distribution of the proportion can be determined. It appears in the table below, which shows individual probabilities for single events and cumulative probabilities for multiple events. (Elsewhere on this website, we showed how to compute hypergeometric probabilities that form the body of the table.)
Number of satisfied clients in sample | Sample prop | Prob | Cumulative probability |
---|---|---|---|
4 or less | 0.4 or less | 0.00 | 0.00 |
5 | 0.5 | 0.00474 | 0.00474 |
6 | 0.6 | 0.05929 | 0.06403 |
7 | 0.7 | 0.23715 | 0.30119 |
8 | 0.8 | 0.38538 | 0.68656 |
9 | 0.9 | 0.25692 | 0.94348 |
10 | 1.0 | 0.05652 | 1.00 |
- Specify significance level . The significance level was set at 0.10. (This means that the probability of making a Type I error is 0.10, assuming that the null hypothesis is true.)
However, we can define a region of acceptance for which the significance level would be no more than 0.10. From the table, we see that if the true proportion of very satisfied clients is equal to 0.80, we would be very unlikely to have fewer than 7 very satisfied clients in our sample. The probability of having 6 or fewer very satisfied clients in the sample would be 0.064. Therefore, if we let the significance level equal 0.064, we can define the region of rejection as any sampled outcome that includes 6 or fewer very satisfied customers. We can define the region of acceptance as any sampled outcome that includes 7 or more very satisfied customers. This is equivalent to a sample proportion that is greater than or equal to 0.70.
- Test the null hypothesis . Since the sample proportion (0.60) is outside the region of acceptance, we cannot accept the null hypothesis at the 0.064 level of significance.
Best Practices for Using Statistics on Small Sample Sizes
Put simply, this is wrong, but it’s a common misconception .
There are appropriate statistical methods to deal with small sample sizes.
Although one researcher’s “small” is another’s large, when I refer to small sample sizes I mean studies that have typically between 5 and 30 users total—a size very common in usability studies .
But user research isn’t the only field that deals with small sample sizes. Studies involving fMRIs, which cost a lot to operate, have limited sample sizes as well [pdf] as do studies using laboratory animals.
While there are equations that allow us to properly handle small “n” studies, it’s important to know that there are limitations to these smaller sample studies: you are limited to seeing big differences or big “effects.”
To put it another way, statistical analysis with small samples is like making astronomical observations with binoculars . You are limited to seeing big things: planets, stars, moons and the occasional comet. But just because you don’t have access to a high-powered telescope doesn’t mean you cannot conduct astronomy. Galileo, in fact, discovered Jupiter’s moons with a telescope with the same power as many of today’s binoculars .
Just as with statistics, just because you don’t have a large sample size doesn’t mean you cannot use statistics. Again, the key limitation is that you are limited to detecting large differences between designs or measures.
Fortunately, in user-experience research we are often most concerned about these big differences—differences users are likely to notice, such as changes in the navigation structure or the improvement of a search results page.
Here are the procedures which we’ve tested for common, small-sample user research, and we will cover them all at the UX Boot Camp in Denver next month.
If you need to compare completion rates, task times, and rating scale data for two independent groups, there are two procedures you can use for small and large sample sizes. The right one depends on the type of data you have: continuous or discrete-binary.
Comparing Means : If your data is generally continuous (not binary), such as task time or rating scales, use the two sample t-test . It’s been shown to be accurate for small sample sizes.
Comparing Two Proportions : If your data is binary (pass/fail, yes/no), then use the N-1 Two Proportion Test. This is a variation on the better known Chi-Square test (it is algebraically equivalent to the N-1 Chi-Square test). When expected cell counts fall below one, the Fisher Exact Test tends to perform better. The online calculator handles this for you and we discuss the procedure in Chapter 5 of Quantifying the User Experience .
Confidence Intervals
When you want to know what the plausible range is for the user population from a sample of data, you’ll want to generate a confidence interval . While the confidence interval width will be rather wide (usually 20 to 30 percentage points), the upper or lower boundary of the intervals can be very helpful in establishing how often something will occur in the total user population.
For example, if you wanted to know if users would read a sheet that said “Read this first” when installing a printer, and six out of eight users didn’t read the sheet in an installation study, you’d know that at least 40% of all users would likely do this –a substantial proportion.
There are three approaches to computing confidence intervals based on whether your data is binary, task-time or continuous.
Confidence interval around a mean : If your data is generally continuous (not binary) such as rating scales, order amounts in dollars, or the number of page views, the confidence interval is based on the t-distribution (which takes into account sample size).
Confidence interval around task-time : Task time data is positively skewed . There is a lower boundary of 0 seconds. It’s not uncommon for some users to take 10 to 20 times longer than other users to complete the same task. To handle this skew, the time data needs to be log-transformed and the confidence interval is computed on the log-data, then transformed back when reporting. The online calculator handles all this.
Confidence interval around a binary measure: For an accurate confidence interval around binary measures like completion rate or yes/no questions, the Adjusted Wald interval performs well for all sample sizes.
Point Estimates (The Best Averages)
The “best” estimate for reporting an average time or average completion rate for any study may vary depending on the study goals. Keep in mind that even the “best” single estimate will still differ from the actual average, so using confidence intervals provides a better method for estimating the unknown population average.
For the best overall average for small sample sizes, we have two recommendations for task-time and completion rates, and a more general recommendation for all sample sizes for rating scales.
Completion Rate : For small-sample completion rates, there are only a few possible values for each task. For example, with five users attempting a task, the only possible outcomes are 0%, 20%, 40%, 60%, 80% and 100% success. It’s not uncommon to have 100% completion rates with five users. There’s something about reporting perfect success at this sample size that doesn’t resonate well. It sounds too good to be true.
We experimented [pdf] with several estimators with small sample sizes and found the LaPlace estimator and the simple proportion (referred to as the Maximum Likelihood Estimator) generally work well for the usability test data we examined. When you want the best estimate, the calculator will generate it based on our findings.
Rating Scales : Rating scales are a funny type of metric, in that most of them are bounded on both ends (e.g. 1 to 5, 1 to 7 or 1 to 10) unless you are Spinal Tap of course. For small and large sample sizes, we’ve found reporting the mean to be the best average over the median [pdf] . There are in fact many ways to report the scores from rating scales, including top-two boxes . The one you report depends on both the sensitivity as well as what’s used in an organization.
Average Time : One long task time can skew the arithmetic mean and make it a poor measure of the middle. In such situations, the median is a better indicator of the typical or “average” time. Unfortunately, the median tends to be less accurate and more biased than the mean when sample sizes are less than about 25. In these circumstances, the geometric mean (average of the log values transformed back) tends to be a better measure of the middle. When sample sizes get above 25, the median works fine.
You might also be interested in
8.4 Small Sample Tests for a Population Mean
Learning objective.
- To learn how to apply the five-step test procedure for test of hypotheses concerning a population mean when the sample size is small.
In the previous section hypotheses testing for population means was described in the case of large samples. The statistical validity of the tests was insured by the Central Limit Theorem, with essentially no assumptions on the distribution of the population. When sample sizes are small, as is often the case in practice, the Central Limit Theorem does not apply. One must then impose stricter assumptions on the population to give statistical validity to the test procedure. One common assumption is that the population from which the sample is taken has a normal probability distribution to begin with. Under such circumstances, if the population standard deviation is known, then the test statistic ( x - − μ 0 ) ∕ ( σ ∕ n ) still has the standard normal distribution, as in the previous two sections. If σ is unknown and is approximated by the sample standard deviation s , then the resulting test statistic ( x - − μ 0 ) ∕ ( s ∕ n ) follows Student’s t -distribution with n − 1 degrees of freedom.
Standardized Test Statistics for Small Sample Hypothesis Tests Concerning a Single Population Mean
The first test statistic ( σ known) has the standard normal distribution.
The second test statistic ( σ unknown) has Student’s t -distribution with n − 1 degrees of freedom.
The population must be normally distributed.
The distribution of the second standardized test statistic (the one containing s ) and the corresponding rejection region for each form of the alternative hypothesis (left-tailed, right-tailed, or two-tailed), is shown in Figure 8.11 "Distribution of the Standardized Test Statistic and the Rejection Region" . This is just like Figure 8.4 "Distribution of the Standardized Test Statistic and the Rejection Region" , except that now the critical values are from the t -distribution. Figure 8.4 "Distribution of the Standardized Test Statistic and the Rejection Region" still applies to the first standardized test statistic (the one containing σ ) since it follows the standard normal distribution.
Figure 8.11 Distribution of the Standardized Test Statistic and the Rejection Region
The p -value of a test of hypotheses for which the test statistic has Student’s t -distribution can be computed using statistical software, but it is impractical to do so using tables, since that would require 30 tables analogous to Figure 12.2 "Cumulative Normal Probability" , one for each degree of freedom from 1 to 30. Figure 12.3 "Critical Values of " can be used to approximate the p -value of such a test, and this is typically adequate for making a decision using the p -value approach to hypothesis testing, although not always. For this reason the tests in the two examples in this section will be made following the critical value approach to hypothesis testing summarized at the end of Section 8.1 "The Elements of Hypothesis Testing" , but after each one we will show how the p -value approach could have been used.
The price of a popular tennis racket at a national chain store is $179. Portia bought five of the same racket at an online auction site for the following prices:
Assuming that the auction prices of rackets are normally distributed, determine whether there is sufficient evidence in the sample, at the 5% level of significance, to conclude that the average price of the racket is less than $179 if purchased at an online auction.
Step 1. The assertion for which evidence must be provided is that the average online price μ is less than the average price in retail stores, so the hypothesis test is
Step 2. The sample is small and the population standard deviation is unknown. Thus the test statistic is
and has the Student t -distribution with n − 1 = 5 − 1 = 4 degrees of freedom.
Step 3. From the data we compute x - = 169 and s = 10.39. Inserting these values into the formula for the test statistic gives
- Step 4. Since the symbol in H a is “<” this is a left-tailed test, so there is a single critical value, − t α = − t 0.05 [ d f = 4 ] . Reading from the row labeled d f = 4 in Figure 12.3 "Critical Values of " its value is −2.132. The rejection region is ( − ∞ , − 2.132 ] .
Step 5. As shown in Figure 8.12 "Rejection Region and Test Statistic for " the test statistic falls in the rejection region. The decision is to reject H 0 . In the context of the problem our conclusion is:
The data provide sufficient evidence, at the 5% level of significance, to conclude that the average price of such rackets purchased at online auctions is less than $179.
Figure 8.12 Rejection Region and Test Statistic for Note 8.42 "Example 10"
To perform the test in Note 8.42 "Example 10" using the p -value approach, look in the row in Figure 12.3 "Critical Values of " with the heading d f = 4 and search for the two t -values that bracket the unsigned value 2.152 of the test statistic. They are 2.132 and 2.776, in the columns with headings t 0.050 and t 0.025 . They cut off right tails of area 0.050 and 0.025, so because 2.152 is between them it must cut off a tail of area between 0.050 and 0.025. By symmetry −2.152 cuts off a left tail of area between 0.050 and 0.025, hence the p -value corresponding to t = − 2.152 is between 0.025 and 0.05. Although its precise value is unknown, it must be less than α = 0.05 , so the decision is to reject H 0 .
A small component in an electronic device has two small holes where another tiny part is fitted. In the manufacturing process the average distance between the two holes must be tightly controlled at 0.02 mm, else many units would be defective and wasted. Many times throughout the day quality control engineers take a small sample of the components from the production line, measure the distance between the two holes, and make adjustments if needed. Suppose at one time four units are taken and the distances are measured as
Determine, at the 1% level of significance, if there is sufficient evidence in the sample to conclude that an adjustment is needed. Assume the distances of interest are normally distributed.
Step 1. The assumption is that the process is under control unless there is strong evidence to the contrary. Since a deviation of the average distance to either side is undesirable, the relevant test is
where μ denotes the mean distance between the holes.
and has the Student t -distribution with n − 1 = 4 − 1 = 3 degrees of freedom.
Step 3. From the data we compute x - = 0.02075 and s = 0.00171. Inserting these values into the formula for the test statistic gives
- Step 4. Since the symbol in H a is “≠” this is a two-tailed test, so there are two critical values, ± t α ∕ 2 = − t 0.005 [ d f = 3 ] . Reading from the row in Figure 12.3 "Critical Values of " labeled d f = 3 their values are ± 5.841 . The rejection region is ( − ∞ , − 5.841 ] ∪ [ 5.841 , ∞ ) .
Step 5. As shown in Figure 8.13 "Rejection Region and Test Statistic for " the test statistic does not fall in the rejection region. The decision is not to reject H 0 . In the context of the problem our conclusion is:
The data do not provide sufficient evidence, at the 1% level of significance, to conclude that the mean distance between the holes in the component differs from 0.02 mm.
Figure 8.13 Rejection Region and Test Statistic for Note 8.43 "Example 11"
To perform the test in Note 8.43 "Example 11" using the p -value approach, look in the row in Figure 12.3 "Critical Values of " with the heading d f = 3 and search for the two t -values that bracket the value 0.877 of the test statistic. Actually 0.877 is smaller than the smallest number in the row, which is 0.978, in the column with heading t 0.200 . The value 0.978 cuts off a right tail of area 0.200, so because 0.877 is to its left it must cut off a tail of area greater than 0.200. Thus the p -value, which is the double of the area cut off (since the test is two-tailed), is greater than 0.400. Although its precise value is unknown, it must be greater than α = 0.01 , so the decision is not to reject H 0 .
Key Takeaways
- There are two formulas for the test statistic in testing hypotheses about a population mean with small samples. One test statistic follows the standard normal distribution, the other Student’s t -distribution.
- The population standard deviation is used if it is known, otherwise the sample standard deviation is used.
- Either five-step procedure, critical value or p -value approach, is used with either test statistic.
Find the rejection region (for the standardized test statistic) for each hypothesis test based on the information given. The population is normally distributed.
- H 0 : μ = 27 vs. H a : μ < 27 @ α = 0.05 , n = 12, σ = 2.2.
- H 0 : μ = 52 vs. H a : μ ≠ 52 @ α = 0.05 , n = 6, σ unknown.
- H 0 : μ = − 105 vs. H a : μ > − 105 @ α = 0.10 , n = 24, σ unknown.
- H 0 : μ = 78.8 vs. H a : μ ≠ 78.8 @ α = 0.10 , n = 8, σ = 1.7.
- H 0 : μ = 17 vs. H a : μ < 17 @ α = 0.01 , n = 26, σ = 0.94.
- H 0 : μ = 880 vs. H a : μ ≠ 880 @ α = 0.01 , n = 4, σ unknown.
- H 0 : μ = − 12 vs. H a : μ > − 12 @ α = 0.05 , n = 18, σ = 1.1.
- H 0 : μ = 21.1 vs. H a : μ ≠ 21.1 @ α = 0.05 , n = 23, σ unknown.
Find the rejection region (for the standardized test statistic) for each hypothesis test based on the information given. The population is normally distributed. Identify the test as left-tailed, right-tailed, or two-tailed.
- H 0 : μ = 141 vs. H a : μ < 141 @ α = 0.20 , n = 29, σ unknown.
- H 0 : μ = − 54 vs. H a : μ < − 54 @ α = 0.05 , n = 15, σ = 1.9.
- H 0 : μ = 98.6 vs. H a : μ ≠ 98.6 @ α = 0.05 , n = 12, σ unknown.
- H 0 : μ = 3.8 vs. H a : μ > 3.8 @ α = 0.001 , n = 27, σ unknown.
- H 0 : μ = − 62 vs. H a : μ ≠ − 62 @ α = 0.005 , n = 8, σ unknown.
- H 0 : μ = 73 vs. H a : μ > 73 @ α = 0.001 , n = 22, σ unknown.
- H 0 : μ = 1124 vs. H a : μ < 1124 @ α = 0.001 , n = 21, σ unknown.
- H 0 : μ = 0.12 vs. H a : μ ≠ 0.12 @ α = 0.001 , n = 14, σ = 0.026.
A random sample of size 20 drawn from a normal population yielded the following results: x - = 49.2 , s = 1.33.
- Test H 0 : μ = 50 vs. H a : μ ≠ 50 @ α = 0.01 .
- Estimate the observed significance of the test in part (a) and state a decision based on the p -value approach to hypothesis testing.
A random sample of size 16 drawn from a normal population yielded the following results: x - = − 0.96 , s = 1.07.
- Test H 0 : μ = 0 vs. H a : μ < 0 @ α = 0.001 .
A random sample of size 8 drawn from a normal population yielded the following results: x - = 289 , s = 46.
- Test H 0 : μ = 250 vs. H a : μ > 250 @ α = 0.05 .
A random sample of size 12 drawn from a normal population yielded the following results: x - = 86.2 , s = 0.63.
- Test H 0 : μ = 85.5 vs. H a : μ ≠ 85.5 @ α = 0.01 .
Applications
Researchers wish to test the efficacy of a program intended to reduce the length of labor in childbirth. The accepted mean labor time in the birth of a first child is 15.3 hours. The mean length of the labors of 13 first-time mothers in a pilot program was 8.8 hours with standard deviation 3.1 hours. Assuming a normal distribution of times of labor, test at the 10% level of significance test whether the mean labor time for all women following this program is less than 15.3 hours.
A dairy farm uses the somatic cell count (SCC) report on the milk it provides to a processor as one way to monitor the health of its herd. The mean SCC from five samples of raw milk was 250,000 cells per milliliter with standard deviation 37,500 cell/ml. Test whether these data provide sufficient evidence, at the 10% level of significance, to conclude that the mean SCC of all milk produced at the dairy exceeds that in the previous report, 210,250 cell/ml. Assume a normal distribution of SCC.
Six coins of the same type are discovered at an archaeological site. If their weights on average are significantly different from 5.25 grams then it can be assumed that their provenance is not the site itself. The coins are weighed and have mean 4.73 g with sample standard deviation 0.18 g. Perform the relevant test at the 0.1% (1/10th of 1%) level of significance, assuming a normal distribution of weights of all such coins.
An economist wishes to determine whether people are driving less than in the past. In one region of the country the number of miles driven per household per year in the past was 18.59 thousand miles. A sample of 15 households produced a sample mean of 16.23 thousand miles for the last year, with sample standard deviation 4.06 thousand miles. Assuming a normal distribution of household driving distances per year, perform the relevant test at the 5% level of significance.
The recommended daily allowance of iron for females aged 19–50 is 18 mg/day. A careful measurement of the daily iron intake of 15 women yielded a mean daily intake of 16.2 mg with sample standard deviation 4.7 mg.
- Assuming that daily iron intake in women is normally distributed, perform the test that the actual mean daily intake for all women is different from 18 mg/day, at the 10% level of significance.
- The sample mean is less than 18, suggesting that the actual population mean is less than 18 mg/day. Perform this test, also at the 10% level of significance. (The computation of the test statistic done in part (a) still applies here.)
The target temperature for a hot beverage the moment it is dispensed from a vending machine is 170°F. A sample of ten randomly selected servings from a new machine undergoing a pre-shipment inspection gave mean temperature 173°F with sample standard deviation 6.3°F.
- Assuming that temperature is normally distributed, perform the test that the mean temperature of dispensed beverages is different from 170°F, at the 10% level of significance.
- The sample mean is greater than 170, suggesting that the actual population mean is greater than 170°F. Perform this test, also at the 10% level of significance. (The computation of the test statistic done in part (a) still applies here.)
The average number of days to complete recovery from a particular type of knee operation is 123.7 days. From his experience a physician suspects that use of a topical pain medication might be lengthening the recovery time. He randomly selects the records of seven knee surgery patients who used the topical medication. The times to total recovery were:
- Assuming a normal distribution of recovery times, perform the relevant test of hypotheses at the 10% level of significance.
- Would the decision be the same at the 5% level of significance? Answer either by constructing a new rejection region (critical value approach) or by estimating the p -value of the test in part (a) and comparing it to α .
A 24-hour advance prediction of a day’s high temperature is “unbiased” if the long-term average of the error in prediction (true high temperature minus predicted high temperature) is zero. The errors in predictions made by one meteorological station for 20 randomly selected days were:
- Assuming a normal distribution of errors, test the null hypothesis that the predictions are unbiased (the mean of the population of all errors is 0) versus the alternative that it is biased (the population mean is not 0), at the 1% level of significance.
- Would the decision be the same at the 5% level of significance? The 10% level of significance? Answer either by constructing new rejection regions (critical value approach) or by estimating the p -value of the test in part (a) and comparing it to α .
Pasteurized milk may not have a standardized plate count (SPC) above 20,000 colony-forming bacteria per milliliter (cfu/ml). The mean SPC for five samples was 21,500 cfu/ml with sample standard deviation 750 cfu/ml. Test the null hypothesis that the mean SPC for this milk is 20,000 versus the alternative that it is greater than 20,000, at the 10% level of significance. Assume that the SPC follows a normal distribution.
One water quality standard for water that is discharged into a particular type of stream or pond is that the average daily water temperature be at most 18°C. Six samples taken throughout the day gave the data:
The sample mean x - = 18.15 exceeds 18, but perhaps this is only sampling error. Determine whether the data provide sufficient evidence, at the 10% level of significance, to conclude that the mean temperature for the entire day exceeds 18°C.
Additional Exercises
A calculator has a built-in algorithm for generating a random number according to the standard normal distribution. Twenty-five numbers thus generated have mean 0.15 and sample standard deviation 0.94. Test the null hypothesis that the mean of all numbers so generated is 0 versus the alternative that it is different from 0, at the 20% level of significance. Assume that the numbers do follow a normal distribution.
At every setting a high-speed packing machine delivers a product in amounts that vary from container to container with a normal distribution of standard deviation 0.12 ounce. To compare the amount delivered at the current setting to the desired amount 64.1 ounce, a quality inspector randomly selects five containers and measures the contents of each, obtaining sample mean 63.9 ounces and sample standard deviation 0.10 ounce. Test whether the data provide sufficient evidence, at the 5% level of significance, to conclude that the mean of all containers at the current setting is less than 64.1 ounces.
A manufacturing company receives a shipment of 1,000 bolts of nominal shear strength 4,350 lb. A quality control inspector selects five bolts at random and measures the shear strength of each. The data are:
- Assuming a normal distribution of shear strengths, test the null hypothesis that the mean shear strength of all bolts in the shipment is 4,350 lb versus the alternative that it is less than 4,350 lb, at the 10% level of significance.
- Estimate the p -value (observed significance) of the test of part (a).
- Compare the p -value found in part (b) to α = 0.10 and make a decision based on the p -value approach. Explain fully.
A literary historian examines a newly discovered document possibly written by Oberon Theseus. The mean average sentence length of the surviving undisputed works of Oberon Theseus is 48.72 words. The historian counts words in sentences between five successive 101 periods in the document in question to obtain a mean average sentence length of 39.46 words with standard deviation 7.45 words. (Thus the sample size is five.)
- Determine if these data provide sufficient evidence, at the 1% level of significance, to conclude that the mean average sentence length in the document is less than 48.72.
- Estimate the p -value of the test.
- Based on the answers to parts (a) and (b), state whether or not it is likely that the document was written by Oberon Theseus.
- Z ≤ − 1.645
- T ≤ − 2.571 or T ≥ 2.571
- Z ≤ − 1645 or Z ≥ 1.645
- T ≤ − 0.855
- T ≤ − 2.201 or T ≥ 2.201
- T = − 2.690 , d f = 19 , − t 0.005 = − 2.861 , do not reject H 0 .
- 0.01 < p -value < 0.02 , α = 0.01 , do not reject H 0 .
- T = 2.398, d f = 7 , t 0.05 = 1.895 , reject H 0 .
- 0.01 < p -value < 0.025 , α = 0.05 , reject H 0 .
T = − 7.560 , d f = 12 , − t 0.10 = − 1.356 , reject H 0 .
T = − 7.076 , d f = 5 , − t 0.0005 = − 6.869 , reject H 0 .
- T = − 1.483 , d f = 14 , − t 0.05 = − 1.761 , do not reject H 0 ;
- T = − 1.483 , d f = 14 , − t 0.10 = − 1.345 , reject H 0 ;
- T = 2.069, d f = 6 , t 0.10 = 1.44 , reject H 0 ;
- T = 2.069, d f = 6 , t 0.05 = 1.943 , reject H 0 .
T = 4.472, d f = 4 , t 0.10 = 1.533 , reject H 0 .
T = 0.798, d f = 24 , t 0.10 = 1.318 , do not reject H 0 .
- T = − 1.773 , d f = 4 , − t 0.05 = − 2.132 , do not reject H 0 .
- 0.05 < p -value < 0.10
- α = 0.05 , do not reject H 0
Hypothesis Testing for Means & Proportions
- 1
- | 2
- | 3
- | 4
- | 5
- | 6
- | 7
- | 8
- | 9
- | 10
Hypothesis Testing: Upper-, Lower, and Two Tailed Tests
Type i and type ii errors.
All Modules
Z score Table
t score Table
The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.
- Step 1. Set up hypotheses and select the level of significance α.
H 0 : Null hypothesis (no change, no difference);
H 1 : Research hypothesis (investigator's belief); α =0.05
Upper-tailed, Lower-tailed, Two-tailed Tests The research or alternative hypothesis can take one of three forms. An investigator might believe that the parameter has increased, decreased or changed. For example, an investigator might hypothesize: : μ > μ , where μ is the comparator or null value (e.g., μ =191 in our example about weight in men in 2006) and an increase is hypothesized - this type of test is called an ; : μ < μ , where a decrease is hypothesized and this is called a ; or : μ ≠ μ where a difference is hypothesized and this is called a .The exact form of the research hypothesis depends on the investigator's belief about the parameter of interest and whether it has possibly increased, decreased or is different from the null value. The research hypothesis is set up by the investigator before any data are collected.
|
- Step 2. Select the appropriate test statistic.
The test statistic is a single number that summarizes the sample information. An example of a test statistic is the Z statistic computed as follows:
When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.
- Step 3. Set up decision rule.
The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.
- The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is proposed. In an upper-tailed test the decision rule has investigators reject H 0 if the test statistic is larger than the critical value. In a lower-tailed test the decision rule has investigators reject H 0 if the test statistic is smaller than the critical value. In a two-tailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
- The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.
- The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value. For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.
The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.
IMAGES
VIDEO
COMMENTS
The first step is to state the null hypothesis and an alternative hypothesis. Null hypothesis: P >= 0.80. Alternative hypothesis: P < 0.80. Note that these hypotheses constitute a one-tailed test. The null hypothesis will be rejected only if the sample proportion is too small. Determine sampling distribution.
where μ denotes the mean distance between the holes. Step 2. The sample is small and the population standard deviation is unknown. Thus the test statistic is T = ˉx − μ0 s / √n and has the Student t -distribution with n − 1 = 4 − 1 = 3 degrees of freedom. Step 3. From the data we compute ˉx = 0.02075 and s = 0.00171.
The right one depends on the type of data you have: continuous or discrete-binary. Comparing Means: If your data is generally continuous (not binary), such as task time or rating scales, use the two sample t-test. It's been shown to be accurate for small sample sizes. Comparing Two Proportions: If your data is binary (pass/fail, yes/no), then ...
Since the hypothesis test is one-sided, the estimated p-value is equal to this tail area: 0.1222. Exercise 6.5.1. Because the estimated p-value is 0.1222, which is larger than the signi cance level 0.05, we do not reject the null hypothesis. Explain what this means in plain language in the context of the problem.
ample may be regarded as coming from population with coefficie. of correlationExercise 6A factory makes a machine part with axle diameter of 0.7 inch. A random sam. le of 10 parts shows a mean diameter of 0.742 inch with a standard d. viation of 0.04 inch. On the basis of this sample would you say that th.
Step 1. The assertion for which evidence must be provided is that the average online price μ is less than the average price in retail stores, so the hypothesis test is. H 0: μ = 179 vs. H a: μ < 179 @ α = 0.05; Step 2. The sample is small and the population standard deviation is unknown. Thus the test statistic is. T = x-− μ 0 s ∕ n
For small to moderate sample size, the sampling distribution of \ (M^2\) is better approximated by a chi-squared distribution than are the sampling distributions for \ (X^2\) and \ (G^2\), the Pearson and LRT statistics, respectively; this tends to hold in general for distributions with smaller degrees of freedom.
Select the appropriate test statistic. Because the sample size is small (n<30) the appropriate test statistic is. Step 3. Set up decision rule. This is a lower tailed test, using a t statistic and a 5% level of significance. In order to determine the critical value of t, we need degrees of freedom, df, defined as df=n-1. In this example df=15-1=14.
The samples must be independent, the populations must be normal, and the population standard deviations must be equal. "Small" samples means that either n1 <30 or n2 <30. The quantity s2 p is called the pooled sample variance. It is a weighted average of the two estimates s2 1 and s2 2 of the common variance σ2 1 = σ2 2 of the two ...
We will assume the sample data are as follows: n=100, =197.1 and s=25.6. Step 1. Set up hypotheses and determine level of significance. H 0: μ = 191 H 1: μ > 191 α =0.05. The research hypothesis is that weights have increased, and therefore an upper tailed test is used. Step 2.
the Northeast is independent of another. Sample size is small (fewer than 10 expected successes) so we should use a simulation method. 48 0:20 = 9.6and 48 0:80 = 38:4 Statistics 101 (Prof. Rundel) L17: Small sample proportions November 1, 2011 14 / 28 Small sample inference for a proportion Hypothesis test (cont.)
Hypothesis Testing for a Proportion and . for a Mean with Unknown Population Standard Deviation. Small Sample Hypothesis Tests For a Normal population. When we have a small sample from a normal population, we use the same method as a large sample except we use the t statistic instead of the z-statistic.Hence, we need to find the degrees of freedom (n - 1) and use the t-table in the back of the ...
In hypothesis testing studies, this is mathematically calculated, conventionally, as the sample size necessary to be 80% certain of identifying a statistically significant outcome should the hypothesis be true for the population, with P for statistical significance set at 0.05. Some investigators power their studies for 90% instead of 80%, and ...
Answers without enough detail may be edited or deleted. I read about the z-score intervals: For small samples (n < 50), if absolute z-scores for either skewness or kurtosis are larger than 1.96, which corresponds with an alpha level 0.05, then reject the null hypothesis and conclude the distribution of the sample is non-normal.
Answer. Setting α, the probability of committing a Type I error, to 0.05, implies that we should reject the null hypothesis when the test statistic Z ≥ 1.645, or equivalently, when the observed sample mean is 103.29 or greater: because: x ¯ = μ + z (σ n) = 100 + 1.645 (16 64) = 103.29. Therefore, the power function \K (\mu)\), when μ ...
Khan Academy. If you're seeing this message, it means we're having trouble loading external resources on our website. If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked. Explore. Search. AI for Teachers Donate Log in Sign up.
The table shown on the right can be used in a two-sample t-test to estimate the sample sizes of an experimental group and a control group that are of equal size, that is, the total number of individuals in the trial is twice that of the number given, and the desired significance level is 0.05. [4] The parameters used are: The desired statistical power of the trial, shown in column to the left.
Optimize clinical trial sample sizes in this free webinar. Learn about statistical power, pitfalls, and solutions. ... Two Sample t-test Simvastin.nqt; ... Sample size determination has a number of common pitfalls which can lead to inappropriately small or large sample sizes with issues ranging from poor design decisions, misspecifying nuisance ...
More specific for your question I am surprised that the shapiro-test is actually able to find deviations given your small sample size of 3 samples per group. Are you sure you used it correctly? ... hypothesis-testing; small-sample; or ask your own question. Featured on Meta Site maintenance - Mon, Sept 16 2024, 21:00 UTC to Tue, Sept 17 2024, 2 ...
I can think that small sample size is not enough to check the assumptions of the hypothesis test (e.g., distribution assumptions for parametric tests) that are being used. At the same time, extremely large sample size will always find such assumptions to be violated (e.g., it is impractical to think any natural distributions are exactly normal ...
A permutation test based on ratio of variances could give a 0.05 for a 1-sided test. @gung I think the big problem with the Mann-Whitney on this tiny a sample size (3,3) is that - no matter how vastly separated the samples are, and even without any ties - you simply can't get a p-value below 0.1 (two tailed).