Statistics - Practically Study Material

1. BASICS OF STATISTICS

1. STATISTICAL DATA

Statistical data are the facts which are collected for the purpose of investigation. There are two types of statistical data:

(i) Primary data: The data collected by an investigator for the first time for his own purpose are called primary data. As the primary data are collected by the user of the data, so it is more reliable and relevant.

(ii) Secondary data: The data collected by a secondary source and used by the investigator for his purpose is called secondary data. For example score of a cricket match noted from newspapers is secondary data. Thus data which are primary in the hands of one become secondary in the hands of the other. Data collected by any source also can be divided in following two types:

(i) Raw Data: Raw data are those data which are obtained from the original source but not arranged numerically. This is also called ‘ungrouped data’ for example marks of 10 students in maths are given as: 75, 96, 25, 32, 89, 62, 40, 79, 35, 55 An ‘array’ is an arrangement of raw numerical data in the ascending or descending order of magnitude. Above data can be written as 25, 32, 35, 40, 55, 62, 75, 79, 89, 96

(ii) Grouped data: An array can be placed systematically in groups or categories. For example the above data can be grouped in following manner.

GROUPS	MARKS	TOTAL NUMBER OF STUDENTS
0 to 20	–	0
21 to 40	25, 32, 35, 40	4
41 to 60	55	1
61 to 80	62, 75, 79	3
81 to 100	89, 96	2
TOTAL		10

2. SOME BASIC DEFINITIONS

(i) Variate: Variate is a quantity that may vary from observation to observation.

(ii) Range: Range is difference between the maximum and minimum observations.

(iii) Class Interval: When data are divided in groups, each group is called a class interval.

(iv) Class Limit: Every class interval has two limits. The smallest observation of the interval is called lower limit and the largest observation of the interval is called upper limit.

(v) Class Mark: The mid value of any class is called its class mark.

$Class Mark = \frac{Upper limit of the class + lower limit of the class}{2}$

(vi) Class Size: Class size is defined as the difference between two successive class marks. It is also the difference between the upper and lower limits of any class interval.

(vii) Frequency: In a particular class the count of the number of observation is called its frequency. So the corresponding frequency of a class is called its class frequency.

(viii) Cumulative Frequency: The cumulative frequency of any class is obtained by adding all the frequencies successively prior to that class i.e. it is the sum of all frequencies up to that class.

(ix) True Class Limit: In the case of exclusive classes the upper and lower limits are respectively known as its true upper limits and true lower limits. In the case of inclusive classes, the true lower and upper limits are obtained by subtracting 0.5 from the lower limit and adding 0.5 to the upper limit. True upper limits and true lower limits are also known as boundaries of the class.

(x) Tally: Tally method is used to keep the chance of error at minimum in counting. A bar (|) called tally mark is put against any item when it occurs. The fifth occurrence of any item is represented by putting diagonally a cross tally (\) on the first four tallies.

3. FREQUENCY DISTRIBUTION

The tabular arrangement of data showing the frequency of each item is called a frequency distribution table. It is a method to present raw data in the form from which one can easily understand the information contained in the raw data. Frequency distribution are of two types:

i. Discrete frequency distribution:

In this type of frequency distribution, in the first column of frequency table we write all possible values of the variables from the lowest to the highest, in the second column we write tally marks and in the third column we show frequency of each item. In this method data are not divided into groups or classes.

e.g. no. of girls in 20 families is given in following data:

1, 2, 3, 1, 1, 2, 3, 3, 4, 1, 5, 1, 1, 2, 2, 3, 3, 2, 4, 1

The above data can be put in the form of a discrete frequency distribution table in the following manner:

S. No.	No. of girls	Tally Marks	Frequency
1.	1	$\| \| \| \| \| \|$	7
2.	2	$\| \| \| \|$	5
3.	3	$\| \| \| \|$	5
4.	4	\|\|	2
5.	5	\|	1
TOTAL			20

ii. Continuous or Grouped Frequency Distribution

In the frequency distribution data are divided into groups or classes. This method is used only where the values in the raw data are largely repeating and the difference between the greatest and the smallest observations is not very large.

4. CUMULATIVE FREQUENCY

Cumulative frequency table is obtained from the ordinary frequency table by successively adding the several frequencies. Thus to form a cumulative frequency table we add a column of cumulative frequency in the frequency distribution table. It is obvious that the cumulative frequency of the last class is the sum of the frequencies of all the classes.

Cumulative frequency series are of two types:

(i) Less than series (ii) More than series

Illustration -1

The weekly saving of 30 workers working in a factory is as given below:

64, 60, 87, 75, 69, 34, 51, 78, 39, 48, 73, 54, 63, 70, 57, 88, 90, 53, 74, 44, 31, 71, 68, 72, 36, 89, 55, 67, 73, 83

(a) Taking first class interval as 30 – 40 (40 not included), form a frequency table of equal intervals.

(b) Also form a cumulative frequency table.

Solution

(i) For the given data of the weekly saving of 30 workers working in a factory, we prepare following table:

S. No.	Class Interval (Saving in Rs.)	Tally Marks	Frequency (No. of workers)
1.	30 – 40	\| \| \| \|	4
2.	40 – 50	\| \|	2
3.	50 – 60	$\| \| \| \|$	5
4.	60 – 70	$\| \| \| \| \|$	6
5.	70 – 80	$\| \| \| \| \| \| \|$	8
6.	80 – 90	\| \| \| \|	4
7.	90 – 100	\|	1
		TOTAL	30

(ii) For given data cumulative frequency table can be given in the following manner:

S. No.	Class Interval (Saving in Rs.)	Tally Marks	Frequency (No. of workers)	Cumulative Frequency
1.	30 – 40	\| \| \| \|	4	4
2.	40 – 50	\| \|	2	6
3.	50 – 60	$\| \| \| \|$	5	11
4.	60 – 70	$\| \| \| \| \|$	6	17
5.	70 – 80	$\| \| \| \| \| \| \|$	8	25
6.	80 – 90	\| \| \| \|	4	29
7.	90 – 100	\|	1	30
		TOTAL	30

(iii) Number of workers whose weekly saving is Rs. 60 or more than Rs. 60 = (6 + 8 + 4 + 1) = 19

Illustration -2

Heights of seven persons in cm are 120, 125, 142, 134, 150, 155 and 128. Calculate the range.

Solution

For given data maximum height = 155 cm

and minimum height = 120 cm

Range = maximum height – minimum height

= 155 cm – 120 cm

= 35 cm.

Illustration -3

Form a frequency table from the following table:

Marks	No. of Students
Below 10	15
Below 20	35
Below 30	60
Below 40	84
Below 50	96
Below 60	127
Below 70	198
Below 80	250

Solution

For given data we make classes 0 – 10, 10 – 20, 20 – 30, ….., 70–80.

There are 15 students who obtained below 10 marks, therefore frequency of class 0 – 10 is 15. Again number of students getting below 20 marks is 35.

This includes those students also who obtained below 10 marks.

Number of students who got marks between 10 and 20 i.e. frequency of class

10 – 20 = 35–15 = 20

Thus for given data we make following table:

S. No.	Class Interval	Comparative	Frequency
1.	0 – 10	15	15
2.	10 – 20	35	20
3.	20 – 30	60	25
4.	30 – 40	84	24
5.	40 – 50	96	12
6.	50 – 60	127	31
7.	60 – 70	198	71
8.	70 – 80	250	52
		TOTAL	250

Illustration -4

Find the unknown entries (a, b, c, d, e, f, g) from the following frequency distribution of heights of 60 students in a class:

Height (in cm)	Frequency	Cumulative frequency
160 – 165	15	a
165 – 170	b	35
170 – 175	12	c
175 – 180	e	50
180 – 185	d	55
185 – 190	5	f
Total	g

Solution

From given table, we make following cumulative frequency table:

S. No.	Height (in cm)	Frequency	Cumulative Frequency
1.	160 – 165	15	15 = a
2.	165 – 170	b	15 + b = 35
3.	170 – 175	12	15 + b + 12 = c
4.	175 – 180	d	15 + b + 12 + d = 50
5.	180 – 185	e	15 + b + 12 + b + e = 55
6.	185 – 190	5	15 + b + 12 + d + e + 5 = f
		g	60

Now, from table:

a = 15

15 + b = 35 b = 35 – 15 = 20

15 + b + 12 = c 15 + 20 + 12 = c

c = 47

15 + b + 12 + d = 50

15 + 20 + 12 + d = 50

d = 50 – 47

d = 3

15 + b + 12 + d + c = 55

15 + 20 + 12 + 3 + e = 55

e = 55 – 50

e = 5

15 + b + 12 + d + e + 5 = f

15 + 20 + 12 + 3 + 5 + 5 = f

f = 60

g = 60. Hence, a = 15, b = 20, c = 47, d = 3, e = 5, f = 60 and g = 60.

5. GRAPHICAL REPRESENTATION OF DATA

A given data can be represented in graphical way. There are various methods of graphical representation of frequency distribution.

(i) Bar Graphs

(ii) Histogram

(iii) Frequency Polygon

(iv) Pie Chart

Bar Graph

The frequency distribution of a discrete value is best represented by a bar graph. The height of the bars is proportional to the frequency of each variate-value. In a bar graph the bars must be kept distinct to show that the variate-values are distinct. The bars are of equal width and are drawn with equal spacing between them on the x-axis depicting the variable. The values of the variable are shown on the y-axis.

Illustration -5

The following table shows the number of illiterate persons in the age group (10–58 years) in a town:

Age Group (in years)	10 – 16	17 – 23	24 – 30	31 – 37	38 – 44	45 – 51	52 – 58
Number of illiterate person	175	375	100	150	250	400	525

Solution

Draw a bar graph to represent the above data.

Histogram

Histogram is a graphical representation of a grouped frequency distribution with continuous classes. It consists of a set of rectangles where heights of rectangles are proportional to their class frequencies, for equal class intervals. There is no gap between two successive rectangles. The rectangles are constructed with base as the class size and their heights representing the frequencies.

Illustration -6

Construct a histogram from the following distribution of total marks obtained by 750 students of class IX in the final examination.

Marks (mid-points)	620	660	700	740	780	820	860
Number of students	16	45	156	284	172	59	18

Solution

To draw the histogram for above data first of all we find out class limits for making class intervals from given class marks.

Here difference between second and the first class marks = h = 660 – 620 = 40

$∴ \frac{h}{2} = \frac{40}{2} = 20$

lower limit of first class interval = 620 – h/2

= 620 – 20

= 600

And upper limit of the first class interval = $620 + \frac{h}{2}$ = 620 + 20 = 640

Similarly other class intervals would be,

(660 – 20) – (660 + 20), (700 – 20) – (700 + 20), (740 – 20) – (740 + 20), (780 – 20) – (780 + 20), (820 – 20) – (820 + 20) and (860 – 20) – (860 – 20)

i.e. 640 – 680, 680 – 720, 720 – 760, 760 – 800, 800 – 840 and 840 – 880 Using these class intervals we draw following histogram taking class intervals on X-axis and number of students on Y-axis

Frequency Polygon

A frequency polygon is a graph of frequency distribution. It is a line graph of class frequency which is plotted against class mark. A frequency polygon can be obtained by two methods:

(1) By using Histogram: A frequency polygon can be obtained by joining mid points of the top of the rectangles of a histogram. For this we obtain the mid points of the upper horizontal sides of each rectangle and then join these mid points by dotted lines to get frequency polygon. End of a frequency polygon preferably extended to the mid points of imagined class intervals adjacent to first and last class intervals.

Illustration -7

Draw the histogram and frequency polygon of the following frequency distribution:

Monthly wages (in rupees)	325-350	350-375	375-400	400-425	425-450	Total
Number of workers	30	45	75	60	55	245

Solution

Using given data we draw histogram taking monthly wages (in rupees) on X-axis and number of workers on Y-axis. Then we join the mid points of the top of the rectangles by dotted straight lines and complete it by joining the mid points of imagined class intervals adjacent to the first and last class intervals.

(2) Frequency polygon without using Histogram: Following procedure is used to make a frequency polygon without using histogram.

(i) Calculate the class marks, x₁, x₂, …., x_n of each of the given class intervals.

(ii) Mark class marks x₁, x₂, …., x_n, along X-axis and frequencies f₁, f₂, …. f_n along Y-axis.

(iii) Plot the points (x₁, f₁), (x₂, f₂), ,….., (x_n, f_n).

(iv) Obtain the mid-points of two class intervals of zero frequencies at the beginning of the first interval and at the end of the last interval.

(v) Join the points (x₁, f₁), (x₂, f₂), …, (x_n, f_n) by the line segments and complete the frequency polygon by joining the mid points of the first and last intervals to the mid points of the imagined classes adjacent to them.

Illustration -8

Represent the following distribution by a frequency polygon without using a histogram:

Scores	20-29	30-39	40-49	50-59	60-69	70-79	80-89	90-99
Frequency	2	5	3	18	20	22	12	7

Solution

Given data is in the form of inclusive classes, so first of all we shall convert them into exclusive classes. Also we have to get class marks of these class intervals for this purpose.

S. No.	Scores	True Class Limits	Class Marks	Frequency	Points to be plotted
1.	20 – 29	19.5 – 29.5	24.5	2	(24.5, 2)
2.	30 – 39	29.5 – 39.5	34.5	5	(34.5, 5)
3.	40 – 49	39.5 – 49.5	44.5	3	(44.5, 3)
4.	50 – 59	49.5 – 59.5	54.5	18	(54.5, 18)
5.	60 – 69	59.5 – 69.5	64.5	20	(64.5, 20)
6.	70 – 79	69.5 – 79.5	74.5	22	(74.5, 22)
7.	80 – 89	79.5 – 89.5	84.5	12	(84.5, 12)
8.	90 – 99	89.5 – 99.5	94.5	7	(94.5, 7)

Plotting these points and joining them by line segments we get frequency polygon. We complete the frequency polygon by extending it up to points (14.5, 0) and (104.5, 0) adjacent to first and last class marks on X-axis.

Pie Chart

In a pie-chart, various observations or components are represented by the sectors of a circle and the whole circle represents the sum of the values of all the components. Clearly, the total angle of 360o at the center of the circle is divided according to the values of the components. Thus we have, Central Angle for Component

= $(\frac{Value of the Component}{Total Value} \times 360)$

Sometimes, the value of components are expressed in percentages. In such cases, we have: Central Angle for Component = $(\frac{Percentage Value of the Component}{100} \times 360)$

Illustration -9

A survey was conducted among a group of girls to know their preferences for the types of slippers. The given circle graph shows the results of the survey.

If the survey was conducted among a group of 1080 girls, then find out how many girls preferred to wear leather slippers?

(A) 140 (B) 280 (C) 420 (D) 560

Solution

In the given circle graph, it is seen that the central angle of the sector corresponding to leather slippers is 140°.

Therefore, number of girls who preferred to wear leather slippers:

$= \frac{140 °}{360 °} \times 1080 = 420$

Thus, 420 girls preferred to wear leather slippers.

6. MEASURES OF CENTRAL TENDENCY

One of the most important objectives of statistical analysis is to get one single value that describes the characteristic of the entire data. Such a value is called the central value or an average. The following are the important types of averages:

1. Arithmetic Mean

2. Geometric Mean

3. Harmonic mean

4. Median

5. Mode

We consider these measures in three cases (i) Individual series (i.e. each individual observation is given) (ii) discrete series (i.e., the observations along with number of times a particular observation called the frequency is given) (iii) continuous series (i.e. the class intervals along with their frequencies are given)

2. Arithmetic Mean

The average of numbers in arithmetic is known as the Arithmetic Mean of these numbers in statistics.

1. MEAN OF AN UNGROUPED DATA

The Arithmetic Mean or simply the Mean of n observations x₁, x₂, x₃, …….x_n is given by the formula:

Mean = $\frac{(x_{1} + x_{2} + x_{3} + \dots \dots x_{n})}{n} = \frac{\sum x_{i}}{n}$

Where the symbol $Σ$ , called sigma stands for the summation of the terms.

Illustration -10

Calculate the mean of the following numbers: 3, 1, 5, 6, 3, 4, 5, 3, 7, 2

Solution

Sum of the given numbers = (3 + 1 + 5 + 6 + 3 + 4 + 5 + 3 + 7 + 2) = 39.

Number of these numbers = 10.

$∴$ Mean of the given numbers = $\frac{39}{10} = 3.9$ .

Illustration -11

The weights (in kg) of 5 persons in a group are: 55, 63, 48, 59 and 61. Find their mean weight.

Solution

Sum of the weight of all persons = (55 + 63 + 48 + 59 + 61) kg = 286 kg.

Number of persons = 5.

$∴$ Mean weight = $(\frac{286}{5})$ kg = 57.2 kg.

Some Useful Result: Let the mean of x₁ x₂ x₃, ……..x_n be A. Then

(i) Mean of (x₁ + k), (x₂ + k), (x₃+ k) ………..(x_n + k) is (A + k);

(ii) Mean of (x₁ – k), (x₂ – k), (x₃ – k)………..(x_n – k) is (A – k);

(iii) Mean of kx₁, kx₂, kx₃ …….. kx_n is kA, where k $\neq$ 0.

2. MEAN OF GROUPED DATA

Direct Method

When the variates x₁, x₂, x₃, …………x_n have frequencies f₁, f₂, f₃ ……..f_n respectively, then the mean is given by the formula:

Mean = $\frac{(f_{1} x_{1} + f_{2} x_{2} + f_{3} x_{3} + \dots \dots . . + f_{n} x_{n})}{(f_{1} + f_{2} + f_{3} + \dots \dots \dots . + f_{n})} = \frac{\sum f_{i} x_{i}}{\sum f_{i}}$ .

Illustration -12

The following table shows the weights of 15 members of an athletic team in a school.

Weight (in kg)	42	45	46	48	49
Number of athletes	4	3	5	2	1

Find the mean weight.

Solution

From the above data, we may prepare the table given below.

Weight (in kg)

x_i

Number of athletes

(Frequency) $f_{1}$

f_{1} x_{i}

168

135

230

${Σf}_{I}$ = 15

$Σ f_{i} x_{i}$ = 678

$∴$ Mean Weight = $\frac{\sum f_{i} x_{i}}{\sum f_{i}} = \frac{678}{15} = 45.2 kg$ .

Illustration -13

Using short cut method, calculate the mean weekly wage from the following frequency distribution.

Weekly wages(in Rs)	950	1000	1050	1100	1250	1500	1600
Number of workers	24	18	13	15	20	11	9

Solution

Let the assumed mean be, A = 1100.

From the given data, we may prepare the table given below.

Weekly Wages (in Rs) x₁

Numbers of Workers

(Frequency)

d_i = (x_i – A)

= (x_i – 1100)

f_id_i

950

1000

1050

1100 = A

1250

1500

1600

–150

–100

–50

150

400

500

–3600

–1800

–650

3000

4400

4500

$Σ$ f_I = 110

$Σ$ f_I d_i = 5850

$∴$ Mean = $(A + \frac{\sum f_{i} d_{i}}{\sum f_{i}}) = (1100 + \frac{5850}{110}) = 1153.18$

Hence, Mean Wage = Rs 1153.18.

3. MEAN OF GROUPED DATA IN THE FORM OF CLASSES

I. Direct Method

Step 1: For each class, find the class mark x_i by using the relation, x_i = $\frac{1}{2}$ (lower limit + upper limit). Step 2: Use the formula, Mean = $\frac{\sum f_{i} X_{i}}{\sum f_{i}}$ .

II. Short Cut Method or Deviation Method

Step 1: For each class, find the class mark x_i.

Step 2: Let A be the assumed mean.

Step 3: Find d_i = (x_i – A).

Step 4: Use the formula, Mean = $(A \frac{\sum f_{i} d}{Σ f_{i}})$ .

III. Step-Deviation Method

Step 1. For each class, find the class mark x₁.

Step 2: Let A be the assumed mean.

Step 3: Calculate, $μ_{I} = \frac{(X_{i} - A)}{C}$ , where c is the class size.

Step 4: Use the formula, Mean = $(A + C \frac{\sum f_{i} μ_{i}}{\sum f_{i}})$ .

Illustration -14

Using direct method, find the mean of the following frequency distribution:

Class-interval	10-20	20-30	30-40	40-50	50-60	60-70
Frequency	6	8	12	15	10	9

Solution

From the given data, we may prepare the table given below.

Class-Interval

Class-Mark x_i

Frequency f_i

f_ix_i

10-20

20-30

30-40

40-50

50-60

60-70

200

420

675

550

585

$Σ$ f_I = 60

$Σ$ f_ix_i = 2520

$∴$ Mean = $\frac{\sum f_{i} x_{i}}{\sum f_{i}} = \frac{2520}{60} = 42$ .

Illustration -15

Using Short Cut Method find the mean for the following frequency distribution:

Class-interval	84-90	90-96	96-102	102-108	108-114	114-120
Frequency	8	10	16	23	12	11

Solution

From the given data, we may prepare the table given below.

Class-Interval

Class-Mark x_i

Frequency f_i

d_i = (x_i – A)

= (x_i – 99)

f_id_i

84-90

90-96

96-102

102-108

108-114

114-120

99 = A

105

111

117

–12

–6

–96

–60

138

144

198

$Σ$ f_I = 80

$Σ$ f_id_i = 324

$∴$ Mean = $\frac{Σ f_{i} d_{i}}{\sum f_{i}} = (99 + \frac{324}{80}) = (99 + 4.05) = 103.05$

Hence, Mean = 103.05.

Illustration -16

Using Step Deviation Method, calculate the mean for the following data:

Height (in cm)	135-140	140-145	145-150	150-155	155-160	160-165	165-170	170-175
Frequency	4	9	18	28	24	10	5	2

Solution

Here, class size, c = 5. Take assumed mean, A = 152.5. Thus, from the given data, we may prepare

Class-Interval

Class-Mark x_i

Frequency f_i

(μ_{i} = \frac{X_{i} - A}{C})

f_i $μ$ _i

135-140

140-145

145-150

150-155

155-160

160-165

165-170

170-175-

137.5

142.5

147.5

152.5 = A

157.5

162.5

167.5

172.5

–3

–2

–1

–12

–18

$Σ$ f_I = 100

$Σ$ f_i $μ$ _i = 19

$∴$ Mean = $(A + c \times \frac{\sum f_{i} μ_{i}}{Σ f_{i}}) = (152.5 + \frac{5 \times 19}{100})$ = (152.5 + 0.95) = 153.45 cm.

4. IMPORTANT FORMULAE FOR SOLVING ARITHMETIC MEAN

1. Arithmetic Mean of Individual Series

If x₁, x₂ , x₃,….., x_n are n values of variant x, then its Arithmetic Mean, denoted by x is

$A.M. (\bar{x}) = \frac{x_{1} + x_{2} + x_{3} + \dots . . + x_{n}}{n} = \frac{\sum x_{i}}{n}$ (or)

$A.M. (\bar{x}) = A + \frac{\sum (x_{i} - A)}{n}$ where A is the assumed average. (For individual series)

2. Arithmetic Mean of Discreate Series

If a variable takes values x₁, x₂ , x₃,….., x_nwith corresponding frequencies f₁, f₂ , f₃,….., f_nthen the arithmetic mean $x$ is given by

$\bar{x} = \frac{f_{1} x_{1} + f_{2} x_{2} \dots . + f_{n} x_{n}}{f_{1} + f_{2} \dots . + f_{n}} = \frac{1}{N} \sum_{i = 1}^{n} f_{i} x_{i}$

where $N = \sum_{i = 1}^{n} f_{i}$

3. Arithmetic Mean Of Continuous Series

In case of a set of data with class intervals, we cannot find the exact value of the mean because we do not know the exact values of the variables. We, therefore, try to obtain an approximate value of the mean. The method of approximate is to replace all the observed values belonging to a class by mid-value of the class.
If x₁, x₂ , x₃,….., x_n are the mid values of the class intervals having corresponding frequencies f₁, f₂ , f₃,….., f_n then we apply the same formula as in discrete series.

$\bar{x} = \frac{1}{N} \sum_{i = 1}^{n} f_{i} x_{i}, N = \sum_{i = 1}^{n} f_{i}$

4. Combined Arithmetic Mean

If ${\bar{x}}_{i} (i = 1, 2, \dots ., k)$ are the means of k – series of sizes $n_{i} (i = 1, 2, 3, \dots \dots \dots . ., k)$ then the combined or composite mean $x$ can be obtained by the formula :

$\bar{x} = \frac{n_{1} {\bar{x}}_{1} + n_{2} {\bar{x}}_{2} + \dots . + n_{k} {\bar{x}}_{k}}{n_{1} + n_{2} + \dots . . + n_{k}} = \frac{\sum n_{i} {\bar{x}}_{i}}{\sum n_{i}}$

5. Weighted Arithmetic Mean

If w₁, w₂ , w₃,….., w_n be the weights assigned to the values x₁, x₂ , x₃,….., x_n be respectively of a variable x, then the weighted A.M. is $\bar{x} = \frac{\sum w_{i} x_{i}}{\sum w_{i}}$ .

5. PROPERTIES OF ARITHMETIC MEAN

1. Sum of all the deviations from arithmetic mean is zero i.e.,

$\sum_{i = 1}^{n} (x_{i} - \bar{x}) = 0$ (in case of individual series)

$\sum_{i = 1}^{n} f_{i} (x_{i} - \bar{x}) = 0$ (in case of discrete or continuous series)

2. If each observation is increased or decreased by a given constant K, the mean is also increased or decreased by K
The property is also known as effect of change of origin. K can be taken to be any number. However, to simplify the calculations, K should be taken as a value which is in the middle of the table.
3. Step Deviation Method or change of scale

If x₁, x₂ , x₃,….., x_n are mid values of class intervals with corresponding frequencies f₁, f₂ , f₃,….., f_nthen we may change the scale by taking $d_{i} = \frac{x_{i} - A}{h}$ in this case.

$\bar{x} = A + h \times (\frac{1}{N} \sum f_{i} d_{i})$ (if A is assumed mean)

A and h can be any numbers but if the lengths of class intervals are equal then h may be taken as width of the class interval.
In particular if each observation is multiplied or divided by a constant, the mean is also multiplied or divided by the same constant.
4. The sum of the squared deviation of the variate from their mean is minimum i.e., the quantity $\sum {(x_{i} - A)}^{2} or \sum f_{i} {(x_{i} - A)}^{2}$ is minimum when $A = \bar{x}$ .

5. E(aX + b) = aE(X) + b (where E(X) = Mean of X)

Illustration-17
The weighted mean of the first n natural numbers, the weights being the corresponding numbers, is
Solution
First n natural numbers are 1, 2, 3,…,n; whose corresponding weights are 1, 2, 3,…,n respectively.

$∴$ weight mean = $\frac{1 \times 1 + 2 \times 2 + \dots . + n \times n}{1 + 2 + \dots + n}$

$= \frac{1^{2} + 2^{2} + \dots . . + n^{2}}{1 + 2 + \dots . . + n}$

$= \frac{n (n + 1) (2 n + 1)}{\frac{6 n (n + 1)}{2}} = \frac{2 n + 1}{3}$

Illustration-18

The weighted mean of the first n natural numbers whose weights are equal to the squares of the corresponding numbers is
Solution

weighted mean = $\frac{1 . 1^{2} + 2 . 2^{2} + \dots + n \cdot n^{2}}{1^{2} + 2^{2} + \dots + n^{2}}$

$= \frac{{Σn}^{3}}{{Σn}^{2}} = \frac{\frac{n (n + 1)}{2} \frac{n (n + 1)}{2}}{\frac{n (n + 1) (2 n + 1)}{6}} = \frac{3 n (n + 1)}{2 (2 n + 1)}$

Illustration-19
The average salary of male employees in a firm is Rs. 5200 and that of females is Rs.4200. The mean salary of all the employees is Rs.5000. The percentage of male and female employees are respectively is
Solution

Let $x_{1} = 5200, x_{2} = 4200, \bar{x} = 5000$

Also, we know that $\bar{x} = \frac{n_{1} {\bar{x}}_{1} + n_{2} {\bar{x}}_{2}}{n_{1} + n_{2}}$

$\Rightarrow 5000 (n_{1} + n_{2}) = 5200 n_{1} + 4200 n_{2} \Rightarrow \frac{n_{1}}{n_{2}} = \frac{4}{1}$

$∴$ The percentage of male employees in the firm = $\frac{4}{4 + 1} \times 100 = 80 %$

and the percentage of female employees in the firm = $\frac{1}{4 + 1} \times 100 = 20 %$

Illustration-20
If the mean of 9 observations is 100 and mean of 6 observations is 80, then the mean of 15 observations is
Solution

$n_{1} = 9, {\bar{x}}_{1} = 100 and n_{2} = 6, {\bar{x}}_{2} = 80$

$\bar{x} = \frac{n_{1} {\bar{x}}_{1} + n_{2} {\bar{x}}_{2}}{n_{1} + n_{2}} = \frac{9 \times 100 + 6 \times 80}{9 + 6} = 92$

1. BASICS OF STATISTICS

1. STATISTICAL DATA

Statistical data are the facts which are collected for the purpose of investigation. There are two types of statistical data:

(ii) Grouped data: An array can be placed systematically in groups or categories. For example the above data can be grouped in following manner.

GROUPS	MARKS	TOTAL NUMBER OF STUDENTS
0 to 20	–	0
21 to 40	25, 32, 35, 40	4
41 to 60	55	1
61 to 80	62, 75, 79	3
81 to 100	89, 96	2
TOTAL		10