Calculation of the average from an interval series, relative values. Harmonic mean, order of calculation. Geometric mean. Mean square. For example, the arithmetic mean for an interval series

The most common type of average is the arithmetic mean.

Simple arithmetic mean

A simple arithmetic mean is the average term, in determining which the total volume of a given attribute in the data is equally distributed among all units included in the given population. Thus, the average annual output per employee is the amount of output that would be produced by each employee if the entire volume of output were equally distributed among all employees of the organization. The arithmetic mean simple value is calculated using the formula:

Simple arithmetic average— Equal to the ratio of the sum of individual values ​​of a characteristic to the number of characteristics in the aggregate

Example 1 . A team of 6 workers receives 3 3.2 3.3 3.5 3.8 3.1 thousand rubles per month.

Find average salary
Solution: (3 + 3.2 + 3.3 +3.5 + 3.8 + 3.1) / 6 = 3.32 thousand rubles.

Arithmetic average weighted

If the volume of the data set is large and represents a distribution series, then the weighted arithmetic mean is calculated. This is how the weighted average price per unit of production is determined: the total cost of production (the sum of the products of its quantity by the price of a unit of production) is divided by the total quantity of production.

Let's imagine this in the form of the following formula:

Weighted arithmetic average— equal to the ratio of (the sum of the products of the value of a feature to the frequency of repetition of this feature) to (the sum of the frequencies of all features). It is used when variants of the population under study occur an unequal number of times.

Example 2 . Find the average salary of workshop workers per month

Average wage can be obtained by dividing the total salary by total number workers:

Answer: 3.35 thousand rubles.

Arithmetic mean for interval series

When calculating the arithmetic mean for an interval variation series, first determine the mean for each interval as the half-sum of the upper and lower limits, and then the mean of the entire series. In the case of open intervals, the value of the lower or upper interval is determined by the size of the intervals adjacent to them.

Averages calculated from interval series are approximate.

Example 3. Define average age evening students.

Averages calculated from interval series are approximate. The degree of their approximation depends on the extent to which the actual distribution of population units within the interval approaches uniform distribution.

When calculating averages, not only absolute but also relative values ​​(frequency) can be used as weights:

The arithmetic mean has a number of properties that more fully reveal its essence and simplify calculations:

1. The product of the average by the sum of frequencies is always equal to the sum of the products of the variant by frequencies, i.e.

2. The arithmetic mean of the sum of varying quantities is equal to the sum of the arithmetic means of these quantities:

3. The algebraic sum of deviations of individual values ​​of a characteristic from the average is equal to zero:

4. The sum of squared deviations of options from the average is less than the sum of squared deviations from any other arbitrary value, i.e.

Calculation of the average value in interval variation series slightly different from the calculation in discrete series. You can see how to calculate the arithmetic mean and harmonic mean in discrete series here. This difference is quite understandable - it is due to the feature in which the characteristic being studied is given in the interval from and to.

So, let's look at the features of the calculation using an example.

Example 1. There is data on the daily earnings of the company's workers.

Number of workers, people
500-1000 15
1000-1500 30
1500-2000 80
2000-2500 60
2500-3000 25
Total 210

The beginning of solving the problem will be similar to the rules for calculating the average value, which can be viewed.

We start by determining the options and frequency, since we are looking for average earnings per day, then the option is the first column, and the frequency is the second. Our data is given in explicit quantities, so we will carry out the calculation using the weighted arithmetic average formula (since the data is presented in tabular form). But this is where the similarities end and new actions appear.

Daily earnings of a worker, rub. X Number of workers, people f
500-1000 15
1000-1500 30
1500-2000 80
2000-2500 60
2500-3000 25
Total 210

The fact is that the interval rad represents the averaged value in the form of an interval. 500-1000, 2000-2500 and so on. To solve this problem, it is necessary to carry out intermediate actions, and only then calculate the average value using the basic formula.

What needs to be done in this case? Everything is quite simple, to carry out the calculation we need the option to be represented by a single number and not an interval. To obtain such a value, find the so-called CENTRAL VALUE OF THE INTERVAL (or the middle of the interval). It is determined by adding the upper and lower boundaries of the interval and dividing by two.

Let's carry out necessary calculations and insert the data into the table.

Daily earnings of a worker, rub. X Number of workers, people f X'
500-1000 15 750
1000-1500 30 1250
1500-2000 80 1750
2000-2500 60 2250
2500-3000 25 2750
Total 210

After we have calculated the central values, we will then carry out the calculations in the tables and substitute the final data into the formula, similar to what we have already considered earlier.

Daily earnings of a worker, rub. X Number of workers, people f X' x'f
500-1000 15 750 11250
1000-1500 30 1250 37500
1500-2000 80 1750 140000
2000-2500 60 2250 135000
2500-3000 25 2750 68750
Total ∑f = 210 ∑ x'f = 392500


As a result, we find that the average daily wage of one worker is 1,869 rubles.

This is an example of a solution if an interval series is presented with all intervals closed. But quite often it happens when two intervals are open, the first and the last. In such situations, direct calculation of the central value is impossible, but there are two options for doing this.

Example 2. There is data on the length of service of the enterprise personnel. Calculate the average herd life of one employee.

Number of employees, people
until 3 19
3-6 21
6-9 15
9-12 10
12 or more 5
Total 70

In this case, the principle of the solution will remain exactly the same. The only thing that has changed in this problem is the first and last intervals. Up to 3 years and 12 years or more, these are the same open intervals. This is where the question arises: how to find the central value of the interval for such intervals.

There are two ways to deal with this situation:

  1. It is quite possible to guess what the interval might be, given that we are given equal intervals. The interval to 3 could look like 0-3, and then its central value would be (0+3)/2 = 1.5 years. An interval of 12 or more would look like 12-15, and then its central value would be (12+15)/2 = 13.5 years. All remaining central values ​​of the interval are calculated in the same way. As a result, we get the following.
Duration of production experience, years X Number of employees, people f X' x'f
until 3 19 1,5 28,5
3-6 21 4,5 94,5
6-9 15 7,5 112,5
9-12 10 10,5 105,0
12 or more 5 13,5 67,5
Total ∑f = 70 ∑ x'f = 408.0

The average length of service is 5.83 years.

  1. Take as the central value the given value that is present in the interval, without additional calculations. In our case, in the interval up to 3 it will be 3, and in the interval 12 or more it will be 12. This method is more suitable for situations where the intervals are unequal and it might be difficult to guess which interval. Let us calculate our problem using such data further.
Duration of production experience, years X Number of employees, people f X' x'f
until 3 19 3 57,0
3-6 21 4,5 94,5
6-9 15 7,5 112,5
9-12 10 10,5 105,0
12 or more 5 12 60,0
Total ∑f = 70 ∑ x'f = 429.0

The average length of experience is 6.13 years.

Homework

  1. Calculate the average sown area per farm using the following data.
Size of sown area, ha Number of farms
0-20 64
20-40 58
40-60 32
60-80 21
80-100 12
Total 187
  1. Calculate the average age of an enterprise employee using the following data
Personnel age, years Number of employees, people
before 18 7
18-25 68
25-40 79
40-55 57
55 and older 31
Total 242

Now you can calculate the average in an interval variation series!

According to the sample survey, depositors were grouped according to the size of their deposit in the city’s Sberbank:

Define:

1) scope of variation;

2) average deposit size;

3) average linear deviation;

4) dispersion;

5) standard deviation;

6) coefficient of variation of contributions.

Solution:

This distribution series contains open intervals. In such series, the value of the interval of the first group is conventionally assumed to be equal to the value of the interval of the next one, and the value of the interval of the last group is equal to the value of the interval of the previous one.

The value of the interval of the second group is equal to 200, therefore, the value of the first group is also equal to 200. The value of the interval of the penultimate group is equal to 200, which means that the last interval will also have a value of 200.

1) Let us define the range of variation as the difference between the largest and smallest value of the attribute:

The range of variation in the deposit size is 1000 rubles.

2) The average size of the contribution will be determined using the weighted arithmetic average formula.

Let us first determine the discrete value of the attribute in each interval. To do this, using the simple arithmetic mean formula, we find the midpoints of the intervals.

The average value of the first interval will be:

the second - 500, etc.

Let's enter the calculation results in the table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, xxf
200-400 32 300 9600
400-600 56 500 28000
600-800 120 700 84000
800-1000 104 900 93600
1000-1200 88 1100 96800
Total 400 - 312000

The average deposit in the city's Sberbank will be 780 rubles:

3) The average linear deviation is the arithmetic mean of the absolute deviations of individual values ​​of a characteristic from the overall average:

The procedure for calculating the average linear deviation in the interval distribution series is as follows:

1. The weighted arithmetic mean is calculated, as shown in paragraph 2).

2. Absolute deviations from the average are determined:

3. The resulting deviations are multiplied by frequencies:

4. Find the sum of weighted deviations without taking into account the sign:

5. The sum of weighted deviations is divided by the sum of frequencies:

It is convenient to use the calculation data table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, x
200-400 32 300 -480 480 15360
400-600 56 500 -280 280 15680
600-800 120 700 -80 80 9600
800-1000 104 900 120 120 12480
1000-1200 88 1100 320 320 28160
Total 400 - - - 81280

The average linear deviation of the size of the deposit of Sberbank clients is 203.2 rubles.

4) Dispersion is the arithmetic mean of the squared deviations of each attribute value from the arithmetic mean.

Calculation of variance in interval distribution series is carried out using the formula:

The procedure for calculating variance in this case is as follows:

1. Determine the weighted arithmetic mean, as shown in paragraph 2).

2. Find deviations from the average:

3. Square the deviation of each option from the average:

4. Multiply the squares of the deviations by the weights (frequencies):

5. Sum up the resulting products:

6. The resulting amount is divided by the sum of the weights (frequencies):

Let's put the calculations in a table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, x
200-400 32 300 -480 230400 7372800
400-600 56 500 -280 78400 4390400
600-800 120 700 -80 6400 768000
800-1000 104 900 120 14400 1497600
1000-1200 88 1100 320 102400 9011200
Total 400 - - - 23040000

Often in statistics, when analyzing a phenomenon or process, it is necessary to take into account not only information about the average levels of the indicators being studied, but also scatter or variation in the values ​​of individual units , which is an important characteristic of the population being studied.

The most subject to variation are stock prices, supply and demand volumes, interest rates V different periods time and in different places.

The main indicators characterizing the variation , are range, dispersion, standard deviation and coefficient of variation.

Range of variation represents the difference between the maximum and minimum values ​​of the characteristic: R = Xmax – Xmin. The disadvantage of this indicator is that it evaluates only the boundaries of variation of a trait and does not reflect its variability within these boundaries.

Dispersion lacks this shortcoming. It is calculated as the average square of deviations of the characteristic values ​​from their average value:

A simplified way to calculate variance carried out using the following formulas (simple and weighted):

Examples of application of these formulas are presented in tasks 1 and 2.

A widely used indicator in practice is standard deviation :

The standard deviation is defined as the square root of the variance and has the same dimension as the characteristic being studied.

The considered indicators allow us to obtain the absolute value of the variation, i.e. evaluate it in units of measurement of the characteristic being studied. Unlike them, the coefficient of variation measures variability in relative terms - relative to the average level, which in many cases is preferable.

Formula for calculating the coefficient of variation.

Examples of solving problems on the topic “Indicators of variation in statistics”

Problem 1 . When studying the influence of advertising on the size of the average monthly deposit in banks in the region, 2 banks were examined. The following results were obtained:

Define:
1) for each bank: a) average deposit per month; b) contribution dispersion;
2) the average monthly deposit for two banks together;
3) Deposit variance for 2 banks, depending on advertising;
4) Deposit variance for 2 banks, depending on all factors except advertising;
5) Total variance using the addition rule;
6) Coefficient of determination;
7) Correlation relationship.

Solution

1) Let's create a calculation table for a bank with advertising . To determine the average monthly deposit, we will find the midpoints of the intervals. In this case, the value of the open interval (the first) is conditionally equated to the value of the interval adjacent to it (the second).

We will find the average deposit size using the weighted arithmetic average formula:

29,000/50 = 580 rub.

We find the variance of the contribution using the formula:

23 400/50 = 468

We will perform similar actions for a bank without advertising :

2) Let's find the average deposit size for the two banks together. Хср =(580×50+542.8×50)/100 = 561.4 rub.

3) We will find the variance of the deposit for two banks, depending on advertising, using the formula: σ 2 =pq (formula for the variance of an alternative attribute). Here p=0.5 is the proportion of factors dependent on advertising; q=1-0.5, then σ 2 =0.5*0.5=0.25.

4) Since the share of other factors is 0.5, then the variance of the deposit for two banks, depending on all factors except advertising, is also 0.25.

5) Determine the total variance using the addition rule.

= (468*50+636,16*50)/100=552,08

= [(580-561,4)250+(542,8-561,4)250] / 100= 34 596/ 100=345,96

σ 2 = σ 2 fact + σ 2 rest = 552.08+345.96 = 898.04

6) Determination coefficient η 2 = σ 2 fact / σ 2 = 345.96/898.04 = 0.39 = 39% - the size of the contribution depends on advertising by 39%.

7) Empirical correlation ratio η = √η 2 = √0.39 = 0.62 – the relationship is quite close.

Problem 2 . There is a grouping of enterprises according to the size of marketable products:

Determine: 1) the dispersion of the value of marketable products; 2) standard deviation; 3) coefficient of variation.

Solution

1) By condition, an interval distribution series is presented. It must be expressed discretely, that is, find the middle of the interval (x"). In groups of closed intervals, we find the middle using a simple arithmetic mean. In groups with an upper limit - as the difference between this upper limit and half the size of the next interval (200-(400 -200):2=100).

In groups with a lower limit - the sum of this lower limit and half the size of the previous interval (800+(800-600):2=900).

We calculate the average value of marketable products using the formula:

Хср = k×((Σ((x"-a):k)×f):Σf)+a. Here a=500 is the size of the option at the highest frequency, k=600-400=200 is the size of the interval at the highest frequency Let's put the result in the table:

So, the average value of commercial output for the period under study is generally equal to Хср = (-5:37)×200+500=472.97 thousand rubles.

2) We find the variance using the following formula:

σ 2 = (33/37)*2002-(472.97-500)2 = 35,675.67-730.62 = 34,945.05

3) standard deviation: σ = ±√σ 2 = ±√34,945.05 ≈ ±186.94 thousand rubles.

4) coefficient of variation: V = (σ /Хср)*100 = (186.94 / 472.97)*100 = 39.52%

Example : It is required to determine the average age of a part-time student using the data specified in the following table:

Age of students, years ( X)

Number of students, people ( f)

average value of the interval (x",xcentral)

xi*fi

26 and older

Total:

To calculate the average in interval series, first determine the average value of the interval as the half-sum of the upper and lower limits, and then calculate the average using the arithmetic weighted average formula.

Above is an example with equal intervals, with the 1st and last being open.

Answer: The average student age is 22.6 years, or approximately 23 years.

Harmonic mean has a more complex structure than the arithmetic mean. Used in cases where statistical information does not contain frequencies for individual values ​​of the attribute, and is represented by the product of the attribute value by frequency . The harmonic mean as a type of power mean looks like this:

Depending on the form of presentation of the source data, the harmonic mean can be calculated as simple or weighted. If the source data is not grouped, then average harmonic simple :

It is used in cases of determining, for example, the average costs of labor, materials, etc. per unit of production for several enterprises.

When working with grouped data, use weighted harmonic mean:

Geometric meanapplies in cases where when the total volume of the averaged feature is a multiplicative quantity,those. is determined not by summing, but by multiplying the individual values ​​of the characteristic.

Shape of geometric weighted mean in practical calculations not applicable .

Mean square used in cases where, when replacing individual values ​​of a characteristic with an average value, it is necessary to keep the sum of squares of the original values ​​unchanged .

home scope of its use – measurement of the degree of fluctuation of individual values ​​of a characteristic relative to the arithmetic mean(standard deviation). In addition, the mean square used in cases where it is necessary to calculate the average value a characteristic expressed in square or cubic units of measurement (when calculating the average size of square areas, average diameters pipes, trunks, etc.).

The root mean square is calculated in two forms:

All power means differ from each other in the values ​​of the exponent. Wherein, the higher the exponent, the morequantitative value of the average:

This property of power averages is called property of majorance of averages.