Agbor Taku Junior    April 30, 2019    0

1) HYPOTHESIS: It is a definite statement about the population parameters.

2) NULL HYPOTHESIS: (H0) states that no association exists between the two cross-tabulated variables in the population, and therefore the variables are statistically independent. E.g. if we want to compare 2 methods, method A and method B for its superiority, and if the assumption is that both methods are equally good, then this assumption is called NULL HYPOTHESIS.

3) ALTERNATIVE HYPOTHESIS: (H1) proposes that the two variables are related in the population. If we assume that from 2 methods, method A is superior to method B, then this assumption is called as ALTERNATIVE HYPOTHESIS.

4) DEGREE OF FREEDOM: It denotes the extent of independence (freedom) enjoyed by a given set of observed frequencies Suppose we are given a set of n observed frequencies which are subjected to k independent constraints (restrictions) then, d.f. = (number of frequencies) – (number of independent constraints on them) In other terms, df = (r – 1) (c – 1) where r = the number of rows c = the number of columns

5) CONTINGENCY TABLE: When the table is prepared by enumeration of qualitative data by entering the actual frequencies, and if that table represents occurrence of two sets of events, that table is called the contingency table. (Latin, con- together, tangere- to touch). It is also called as an association table. Consideration.


                                                               THE CHI SQUARE TEST

Chi square distribution is a measure of the discrepancy between theoretical and observed results using statistical hypothesis testing procedures. Statistical hypothesis testing is a decision-making process for evaluating claims about a population. For example, a geologist based on his/her knowledge might claim that all the rocks around the Mt. Cameroon contain olivine. He/she can decide to go to the field to evaluate his/her claim thereby comparing theoretical and observed parameters.

Basically, chi square distribution is used to test a claim about a single variance or standard deviation. The importance to test for variance or standard deviation is that in any analysis or manufacturing process, consistency is required. For example:s

  • During sieve analysis, the variation in grain size of the various sediments with respect to the sieve should not be too different.
  • It can also be used for tests concerning frequency distributions such as ‘’ if we are given varieties of quartz with different colors, will each color be selected with the same frequency?
  • It can be used to test the homogeneity of proportion for example Is the olivine content in basalt of Mt Cameroon same as those of Mt Kupe.

The chi square distribution is obtained from the values of (n – 1)s22 when random samples are selected from a normally distributed population whose variance is 2.

A chi square variable cannot be negative and the distribution is skewed to the right. The area under each chi – square distribution equal to 1.00 or 100%.

Two different values are used in the formula because the distribution is not symmetric. One value is found on the left side of the table and the other is on the right, for example to fine the table value corresponding to the 95% confidence interval, you must first change 95% to a decimal and subtract it from 1 (1 – 0.95) = 0.05 , then divide the answer by 2 (σ/2 =0.05/2 =0.025. This is the column on the right side of the table used to get the values for X2right. To get the value for x2left, subtract the value of σ/2 from (1 – 0.05/2) = 0.975


Figure1.4 chi distribution for 22 degree of freedom

Remember that chi-square values cannot be negative. Hence, you must use α value in the table of 0.025 and 0.975. With 22 degrees of freedom, the critical values are 36.781 and 10.982, respectively. After the degrees of freedom reach 30, the table gives values only for multiples of 10 (40, 50, 60, etc.). When the exact degrees of freedom sought are not specified in the table, the closest smaller value should be used. For example, if the given degrees of freedom are 36, use the table value for 30 degrees of freedom. This guideline keeps the type I error equal to or below the α value.


This test can be used in:

1) Goodness of fit of distributions

2) test of independence of attributes

3)  Test of homogeneity

Test for Goodness of fit distribution

In addition to being used to test a single variance, the Chi – square statistic can be used to see whether a frequency distribution fits a specific pattern. For example a traffic engineer may wish to see whether accident occurs more often on some days than others, so that she/he can increase police patrol accordingly.

When you are testing to see whether a frequency distribution fits a specific pattern, you can use the chi – square goodness of fit test for example Suppose as a geologist, you wish to see whether consumers have any preference among five minerals. A sample of 100 people provided these data:

Gold silver platinum Iron Diamond
16 14 28 10 32


If there were no preference. You would expect each mineral to be selected with equal frequency .Therefore the equal frequency is 100/5 = 20. That is approximately 20 people would select each mineral.

Since the frequency for each mineral were obtained from a sample, these actual are called the observed frequencies. The frequencies obtained by calculation (as if there were no preference) are called the expected frequencies. A completed table for the test is shown

frequency gold silver platinum Iron Diamond
Observed 16 14 28 10 32
expected 20 20 20 20 20

The observed frequency will almost always differ from the expected frequencies due to sample error; that is the value differ from sample to sample.


The formula for the Chi – square Goodness of fit test

With degrees of freedom equal to the number of categories minus 1, and where

O = observed frequency

E = expected frequency

 Assumptions for the Chi-Square Goodness-of-Fit Test

  1. The data are obtained from a random sample.
  2. The expected frequency for each category must be 5 or more.

The steps for The Chi-Square Goodness-of-Fit Test are summarized in the procedure below

Step 1 State the hypotheses and identify the claim.

H0: Consumers show no preference for the minerals (claim).

H1: Consumers show a preference.

Step 2 Find the critical value. The degrees of freedom are 5 – 1 = 4, and σ= 0.05. Hence, the critical value from Table G in Appendix C is 9.488.

Step 3 Compute the test value by subtracting the expected value from the corresponding observed value, squaring the result and dividing by the expected value, and finding the sum. The expected value for each category is 20, as shown previously.

122/20 + 82/20 + (-4)2/20 + (-6)2/20 + (-10)2/20 =18.0

Step 4 Make the decision. The decision is to reject the null hypothesis, since 18.0 > 9.488

Step 5 Summarize the results there is enough evidence to reject the claim that consumers show no preference for the mineral.

NB: When there is perfect agreement between the observed and the expected values, X2 = 0. Also, X2 can never be negative.

 2) TEST OF INDEPENDENCE OF ATTRIBUTES: It enables us explain whether or not two attributes are associated. For instance, we may be interested in knowing whether a new medicine is effective in controlling fever or not, the chi – square is useful. In such a situation, we proceed with with the null hypothesis that the two attributes (i.e. new medicine and control of fever) are independent which means that new medicine is not effective in controlling fever. If X2 (calculated) < X2 (tabulated), the null hypothesis is accepted i.e. 2 variables are independent. (i.e., the new medicine is not effective in controlling. X2 (calculated) > X2 (tabulated) at a certain level of significance for given degrees of freedom, the null hypothesis is rejected, i.e. two variables are dependent. (i.e., the new medicine is effective in controlling the fever), when null hypothesis is rejected, it can be concluded that there is a significant association between the two attributes.

 3) TEST OF HOMOGENITY: This test can also be used to test whether the occurrence of events follow uniformity or not e.g. the amount of sediments deposited at the banks of a river every day of the week is uniform or not can be tested with the help of chi square test.X2 (calculated < X2 (tabulated), then null hypothesis is accepted, and it can be concluded that there is a uniformity in the occurrence of the events. (Uniformity I the deposition throughout the week)


The following conditions should be satisfied before X2 test can be applied:

1) The data must be in the form of frequencies

2) The frequency data must have a precise numerical value and must be organized into categories or groups.

3) Observations recorded and used are collected on a random basis.

4) All the items in the sample must be independent.

5) No group should contain very few items, say less than 10. In case where the frequencies are less than 10, regrouping is done by combining the frequencies of adjoining groups so that the new frequencies become greater than 10. (Some statisticians take this number as 5, but 10 is regarded as better by most of the statisticians.)

6) The overall number of items must also be reasonably large. It should normally be at least 50


1) The data is from a random sample.

2) The test may be misleading if any expected frequency is much below 5. In that case another appropriate test should be applied.

3) This test tells the presence or absence of an association between the events but doesn’t measure the strength of association.

4) This test doesn’t indicate the cause and effect, it only tells the probability of occurrence of association by chance.

5) The test is to be applied only when the individual observations of sample are independent which means that the occurrence of one individual observation (event) has no effect upon the occurrence of any other observation (event) in the sample under

Leave a Reply

Your email address will not be published. Required fields are marked