General population and sampling method. Sample. Types of samples. Sampling error calculation Full sample

Statistical population- a set of units that have mass, typicality, qualitative homogeneity and the presence of variation.

The statistical population consists of materially existing objects (Employees, enterprises, countries, regions), is an object.

Unit of the population— each specific unit of a statistical population.

The same statistical population can be homogeneous in one characteristic and heterogeneous in another.

Qualitative uniformity- similarity of all units of the population on some basis and dissimilarity on all others.

In a statistical population, the differences between one population unit and another are often of a quantitative nature. Quantitative changes in the values ​​of a characteristic of different units of a population are called variation.

Variation of a trait- a quantitative change in a characteristic (for a quantitative characteristic) during the transition from one unit of the population to another.

Sign- this is a property, characteristic feature or other feature of units, objects and phenomena that can be observed or measured. Signs are divided into quantitative and qualitative. The diversity and variability of the value of a characteristic in individual units of a population is called variation.

Attributive (qualitative) characteristics cannot be expressed numerically (population composition by gender). Quantitative characteristics have a numerical expression (population composition by age).

Index- this is a generalizing quantitative and qualitative characteristic of any property of units or the population as a whole under specific conditions of time and place.

Scorecard is a set of indicators that comprehensively reflect the phenomenon being studied.

For example, salary is studied:
  • Sign - wages
  • Statistical population - all employees
  • The unit of the population is each employee
  • Qualitative homogeneity - accrued wages
  • Variation of a sign - a series of numbers

Population and sample from it

The basis is a set of data obtained as a result of measuring one or more characteristics. A truly observed set of objects, statistically represented by a number of observations of a random variable, is sampling, and the hypothetically existing (conjectural) - general population. The population may be finite (number of observations N = const) or infinite ( N = ∞), and a sample from a population is always the result of a limited number of observations. The number of observations forming a sample is called sample size. If the sample size is large enough ( n → ∞) the sample is considered big, otherwise it is called sampling limited volume. The sample is considered small, if when measuring a one-dimensional random variable the sample size does not exceed 30 ( n<= 30 ), and when measuring several simultaneously ( k) features in multidimensional relation space n To k does not exceed 10 (n/k< 10) . The sample forms variation series, if its members are ordinal statistics, i.e. sample values ​​of the random variable X are ordered in ascending order (ranked), the values ​​of the characteristic are called options.

Example. Almost the same randomly selected set of objects - commercial banks of one administrative district of Moscow, can be considered as a sample from the general population of all commercial banks in this district, and as a sample from the general population of all commercial banks in Moscow, as well as as a sample from the commercial banks of the country and etc.

Basic methods of organizing sampling

The reliability of statistical conclusions and meaningful interpretation of the results depends on representativeness samples, i.e. completeness and adequacy of the representation of the properties of the general population, in relation to which this sample can be considered representative. The study of the statistical properties of a population can be organized in two ways: using continuous And not continuous. Continuous observation provides for examination of all units studied totality, A partial (selective) observation- only parts of it.

There are five main ways to organize sample observation:

1. simple random selection, in which objects are randomly selected from a population of objects (for example, using a table or random number generator), with each of the possible samples having equal probability. Such samples are called actually random;

2. simple selection using a regular procedure is carried out using a mechanical component (for example, date, day of the week, apartment number, letters of the alphabet, etc.) and the samples obtained in this way are called mechanical;

3. stratified selection consists in the fact that the general population of the volume is divided into subpopulations or layers (strata) of the volume so that . Strata are homogeneous objects in terms of statistical characteristics (for example, the population is divided into strata by age groups or social class; enterprises - by industry). In this case, the samples are called stratified(otherwise, stratified, typical, regionalized);

4. methods serial selection are used to form serial or nest samples. They are convenient if it is necessary to survey a “block” or a series of objects at once (for example, a batch of goods, products of a certain series, or the population in the territorial and administrative division of the country). The selection of series can be done purely randomly or mechanically. In this case, a complete inspection of a certain batch of goods, or an entire territorial unit (a residential building or block), is carried out;

5. combined(stepped) selection can combine several selection methods at once (for example, stratified and random or random and mechanical); such a sample is called combined.

Types of selection

By mind individual, group and combined selection are distinguished. At individual selection individual units of the general population are selected into the sample population, with group selection- qualitatively homogeneous groups (series) of units, and combined selection involves a combination of the first and second types.

By method selection is distinguished repeated and non-repetitive sample.

Repeatless called selection in which a unit included in the sample does not return to the original population and does not participate in further selection; while the number of units in the general population N is reduced during the selection process. At repeated selection caught in the sample, a unit after registration is returned to the general population and thus retains an equal opportunity, along with other units, to be used in a further selection procedure; while the number of units in the general population N remains unchanged (the method is rarely used in socio-economic research). However, with large N (N → ∞) formulas for repeatable selection approaches those for repeated selection and the latter are practically more often used ( N = const).

Basic characteristics of the parameters of the general and sample population

The statistical conclusions of the study are based on the distribution of the random variable, and the observed values (x 1, x 2, ..., x n) are called realizations of the random variable X(n is sample size). The distribution of a random variable in the general population is of a theoretical, ideal nature, and its sample analogue is empirical distribution. Some theoretical distributions are specified analytically, i.e. their options determine the value of the distribution function at each point in the space of possible values ​​of the random variable. For a sample, the distribution function is difficult and sometimes impossible to determine, therefore options are estimated from empirical data, and then they are substituted into an analytical expression describing the theoretical distribution. In this case, the assumption (or hypothesis) about the type of distribution can be either statistically correct or erroneous. But in any case, the empirical distribution reconstructed from the sample only roughly characterizes the true one. The most important distribution parameters are expected value and variance.

By their nature, distributions are continuous And discrete. The best known continuous distribution is normal. Sample analogues of the parameters and for it are: mean value and empirical variance. Among discrete ones in socio-economic research, the most frequently used alternative (dichotomous) distribution. The mathematical expectation parameter of this distribution expresses the relative value (or share) units of the population that have the characteristic being studied (it is indicated by the letter); the proportion of the population that does not have this characteristic is denoted by the letter q (q = 1 - p). The variance of the alternative distribution also has an empirical analogue.

Depending on the type of distribution and on the method of selecting population units, the characteristics of the distribution parameters are calculated differently. The main ones for theoretical and empirical distributions are given in table. 1.

Sample fraction k n The ratio of the number of units in the sample population to the number of units in the general population is called:

kn = n/N.

Sample fraction w is the ratio of units possessing the characteristic being studied x to sample size n:

w = n n /n.

Example. In a batch of goods containing 1000 units, with a 5% sample sample share k n in absolute value is 50 units. (n = N*0.05); if 2 defective products are found in this sample, then sample defect rate w will be 0.04 (w = 2/50 = 0.04 or 4%).

Since the sample population is different from the general population, there are sampling errors.

Table 1. Main parameters of the general and sample populations

Sampling errors

In any case (continuous and selective), errors of two types may occur: registration and representativeness. Errors registration can have random And systematic character. Random errors consist of many different uncontrollable causes, are unintentional and usually balance each other out (for example, changes in device performance due to temperature fluctuations in the room).

Systematic errors are biased because they violate the rules for selecting objects for the sample (for example, deviations in measurements when changing the settings of the measuring device).

Example. To assess the social situation of the population in the city, it is planned to survey 25% of families. If the selection of every fourth apartment is based on its number, then there is a danger of selecting all apartments of only one type (for example, one-room apartments), which will provide a systematic error and distort the results; choosing an apartment number by lot is more preferable, since the error will be random.

Representativeness errors are inherent only in sample observation, they cannot be avoided and they arise as a result of the fact that the sample population does not completely reproduce the general population. The values ​​of the indicators obtained from the sample differ from the indicators of the same values ​​in the general population (or obtained through continuous observation).

Sampling bias is the difference between the parameter value in the population and its sample value. For the average value of a quantitative characteristic it is equal to: , and for the share (alternative characteristic) - .

Sampling errors are inherent only to sample observations. The larger these errors, the more the empirical distribution differs from the theoretical one. The parameters of the empirical distribution are random variables, therefore, sampling errors are also random variables, they can take different values ​​for different samples and therefore it is customary to calculate average error.

Average sampling error is a quantity expressing the standard deviation of the sample mean from the mathematical expectation. This value, subject to the principle of random selection, depends primarily on the sample size and on the degree of variation of the characteristic: the greater and the smaller the variation of the characteristic (and therefore the value), the smaller the average sampling error. The relationship between the variances of the general and sample populations is expressed by the formula:

those. when large enough, we can assume that . The average sampling error shows possible deviations of the sample population parameter from the general population parameter. In table 2 shows expressions for calculating the average sampling error for different methods of organizing observation.

Table 2. Average error (m) of sample mean and proportion for different types of samples

Where is the average of the within-group sample variances for a continuous attribute;

Average of the within-group variances of the proportion;

— number of selected series, — total number of series;

,

where is the average of the th series;

— the overall average for the entire sample population for a continuous characteristic;

,

where is the share of the characteristic in the th series;

— the total share of the characteristic across the entire sample population.

However, the magnitude of the average error can only be judged with a certain probability P (P ≤ 1). Lyapunov A.M. proved that the distribution of sample means, and therefore their deviations from the general mean, for a sufficiently large number approximately obeys the normal distribution law, provided that the general population has a finite mean and limited variance.

Mathematically, this statement for the average is expressed as:

and for the share, expression (1) will take the form:

Where - There is marginal sampling error, which is a multiple of the average sampling error , and the multiplicity coefficient is the Student's test ("confidence coefficient"), proposed by W.S. Gosset (pseudonym "Student"); values ​​for different sample sizes are stored in a special table.

The values ​​of the function Ф(t) for some values ​​of t are equal to:

Therefore, expression (3) can be read as follows: with probability P = 0.683 (68.3%) it can be argued that the difference between the sample and general average will not exceed one value of the average error m(t=1), with probability P = 0.954 (95.4%)- that it will not exceed the value of two average errors m (t = 2) , with probability P = 0.997 (99.7%)- will not exceed three values m (t = 3) . Thus, the probability that this difference will exceed three times the average error is determined by error level and amounts to no more 0,3% .

In table 3 shows formulas for calculating the maximum sampling error.

Table 3. Marginal error (D) of the sample for the mean and proportion (p) for different types of sample observation

Generalization of sample results to the population

The ultimate goal of sample observation is to characterize the general population. With small sample sizes, empirical estimates of parameters ( and ) may deviate significantly from their true values ​​( and ). Therefore, there is a need to establish boundaries within which the true values ​​( and ) lie for the sample values ​​of the parameters ( and ).

Confidence interval of any parameter θ of the general population is the random range of values ​​of this parameter, which with a probability close to 1 ( reliability) contains the true value of this parameter.

Marginal error samples Δ allows you to determine the limiting values ​​of the characteristics of the general population and their confidence intervals, which are equal:

Bottom line confidence interval obtained by subtraction maximum error from the sample mean (share), and the upper one by adding it.

Confidence interval for the average it uses the maximum sampling error and for a given confidence level is determined by the formula:

This means that with a given probability R, which is called the confidence level and is uniquely determined by the value t, it can be argued that the true value of the average lies in the range from , and the true value of the share is in the range from

When calculating the confidence interval for three standard confidence levels P = 95%, P = 99% and P = 99.9% the value is selected by . Applications depending on the number of degrees of freedom. If the sample size is large enough, then the values ​​corresponding to these probabilities t are equal: 1,96, 2,58 And 3,29 . Thus, the marginal sampling error allows us to determine the limiting values ​​of the characteristics of the population and their confidence intervals:

The distribution of the results of sample observation to the general population in socio-economic research has its own characteristics, since it requires complete representation of all its types and groups. The basis for the possibility of such distribution is the calculation relative error:

Where Δ % - relative maximum sampling error; , .

There are two main methods for extending a sample observation to a population: direct recalculation and coefficient method.

Essence direct conversion consists of multiplying the sample mean!!\overline(x) by the size of the population.

Example. Let the average number of toddlers in the city be estimated by the sampling method and amount to one person. If there are 1000 young families in the city, then the number of required places in municipal nurseries is obtained by multiplying this average by the size of the general population N = 1000, i.e. will have 1200 seats.

Odds method It is advisable to use in the case when selective observation is carried out in order to clarify the data of continuous observation.

The following formula is used:

where all variables are the population size:

Required sample size

Table 4. Required sample size (n) for different types of sample observation organization

When planning a sample observation with a predetermined value of the permissible sampling error, it is necessary to correctly estimate the required sample size. This volume can be determined on the basis of the permissible error during sample observation based on a given probability that guarantees the permissible value of the error level (taking into account the method of organizing the observation). Formulas for determining the required sample size n can be easily obtained directly from the formulas for the maximum sampling error. So, from the expression for the marginal error:

sample size is directly determined n:

This formula shows that as the maximum sampling error decreases Δ the required sample size increases significantly, which is proportional to the variance and the square of the Student's t test.

For a specific method of organizing observation, the required sample size is calculated according to the formulas given in table. 9.4.

Practical calculation examples

Example 1. Calculation of the mean value and confidence interval for a continuous quantitative characteristic.

To assess the speed of settlement with creditors, a random sample of 10 payment documents was carried out at the bank. Their values ​​turned out to be equal (in days): 10; 3; 15; 15; 22; 7; 8; 1; 19; 20.

Necessary with probability P = 0.954 determine the marginal error Δ sample mean and confidence limits of mean calculation time.

Solution. The average value is calculated using the formula from table. 9.1 for the sample population

The variance is calculated using the formula from table. 9.1.

Mean square error of the day.

The average error is calculated using the formula:

those. the average is x ± m = 12.0 ± 2.3 days.

The reliability of the mean was

We calculate the maximum error using the formula from table. 9.3 for repeated sampling, since the population size is unknown, and for P = 0.954 level of confidence.

Thus, the average value is `x ± D = `x ± 2m = 12.0 ± 4.6, i.e. its true value lies in the range from 7.4 to 16.6 days.

Using a Student's t-table. The application allows us to conclude that for n = 10 - 1 = 9 degrees of freedom, the obtained value is reliable with a significance level of a £ 0.001, i.e. the resulting mean value is significantly different from 0.

Example 2. Estimation of probability (general share) p.

A mechanical sampling method of surveying the social status of 1000 families revealed that the proportion of low-income families was w = 0.3 (30%)(sample was 2% , i.e. n/N = 0.02). Required with confidence level p = 0.997 determine the indicator R low-income families throughout the region.

Solution. Based on the presented function values Ф(t) find for a given confidence level P = 0.997 meaning t = 3(see formula 3). Marginal error of fraction w determine by the formula from the table. 9.3 for non-repetitive sampling (mechanical sampling is always non-repetitive):

Maximum relative sampling error in % will be:

The probability (general share) of low-income families in the region will be р=w±Δw, and confidence limits p are calculated based on the double inequality:

w — Δ w ≤ p ≤ w — Δ w, i.e. the true value of p lies within:

0,3 — 0,014 < p <0,3 + 0,014, а именно от 28,6% до 31,4%.

Thus, with a probability of 0.997 it can be stated that the share of low-income families among all families in the region ranges from 28.6% to 31.4%.

Example 3. Calculation of the mean value and confidence interval for a discrete characteristic specified by an interval series.

In table 5. the distribution of applications for the production of orders according to the timing of their implementation by the enterprise has been specified.

Table 5. Distribution of observations by time of appearance

Solution. The average time for completing orders is calculated using the formula:

The average period will be:

= (3*20 + 9*80 + 24*60 + 48*20 + 72*20)/200 = 23.1 months.

We get the same answer if we use the data on p i from the penultimate column of the table. 9.5, using the formula:

Note that the middle of the interval for the last gradation is found by artificially supplementing it with the width of the interval of the previous gradation equal to 60 - 36 = 24 months.

The variance is calculated using the formula

Where x i- the middle of the interval series.

Therefore!!\sigma = \frac (20^2 + 14^2 + 1 + 25^2 + 49^2)(4), and the mean square error is .

The average error is calculated using the monthly formula, i.e. the average value is!!\overline(x) ± m = 23.1 ± 13.4.

We calculate the maximum error using the formula from table. 9.3 for repeated selection, since the population size is unknown, for a 0.954 confidence level:

So the average is:

those. its true value lies in the range from 0 to 50 months.

Example 4. To determine the speed of settlements with creditors of N = 500 corporation enterprises in a commercial bank, it is necessary to conduct a sample study using a random non-repetitive selection method. Determine the required sample size n so that with probability P = 0.954 the error of the sample mean does not exceed 3 days if trial estimates showed that the standard deviation s was 10 days.

Solution. To determine the number of required studies n, we will use the formula for non-repetitive selection from the table. 9.4:

In it, the t value is determined from a confidence level of P = 0.954. It is equal to 2. The mean square value is s = 10, the population size is N = 500, and the maximum error of the mean is Δ x = 3. Substituting these values ​​into the formula, we get:

those. It is enough to compile a sample of 41 enterprises to estimate the required parameter - the speed of settlements with creditors.

The total number of objects of observation (people, households, enterprises, settlements, etc.) with a certain set of characteristics (gender, age, income, number, turnover, etc.), limited in space and time. Examples of populations

  • All residents of Moscow (10.6 million people according to the 2002 census)
  • Male Muscovites (4.9 million people according to the 2002 census)
  • Legal entities of Russia (2.2 million at the beginning of 2005)
  • Retail outlets selling food products (20 thousand at the beginning of 2008), etc.

Sample (Sample Population)

A portion of a population selected for study in order to draw conclusions about the entire population. In order for the conclusion obtained by studying the sample to be extended to the entire population, the sample must have the property of representativeness.

Representativeness of the sample

The property of a sample to correctly reflect the population. The same sample can be representative and unrepresentative for different populations.
Example:

  • A sample consisting entirely of Muscovites who own a car does not represent the entire population of Moscow.
  • A sample of Russian enterprises with up to 100 employees does not represent all enterprises in Russia.
  • A sample of Muscovites shopping at the market does not represent the purchasing behavior of all Muscovites.

At the same time, these samples (subject to other conditions) can perfectly represent Muscovites who own cars, small and medium-sized Russian enterprises, and buyers who make purchases in markets, respectively.
It is important to understand that sample representativeness and sampling error are different phenomena. Representativeness, unlike error, does not depend in any way on the sample size.
Example:
No matter how much we increase the number of Muscovites who are car owners surveyed, we will not be able to represent all Muscovites with this sample.

Sampling error (confidence interval)

The deviation of the results obtained using sample observation from the true data of the general population.
There are two types of sampling error - statistical and systematic. Statistical error depends on sample size. The larger the sample size, the lower it is.
Example:
For a simple random sample of 400 units, the maximum statistical error (with 95% confidence level) is 5%, for a sample of 600 units - 4%, for a sample of 1100 units - 3% Usually, when they talk about sampling error, they mean statistical error .
Systematic error depends on various factors that constantly influence the study and bias the results of the study in a certain direction.
Example:

  • Using any probability samples will underestimate the proportion of people with high incomes who lead an active lifestyle. This happens due to the fact that it is much more difficult to find such people in any specific place (for example, at home).
  • The problem of respondents refusing to answer questions (the share of “refuseniks” in Moscow, for different surveys, ranges from 50% to 80%)

In some cases, when the true distributions are known, the systematic error can be leveled out by introducing quotas or reweighting the data, but in most real studies it can be quite problematic to even estimate it.

Sample types

Samples are divided into two types:

  • probabilistic
  • non-probabilistic

1. Probability samples
1.1 Random sampling (simple random sampling)
Such a sample assumes the homogeneity of the population, the same probability of availability of all elements, and the availability of a complete list of all elements. When selecting elements, as a rule, a table of random numbers is used.
1.2 Mechanical (systematic) sampling
A type of random sample, ordered by some characteristic (alphabetical order, phone number, date of birth, etc.). The first element is selected randomly, then, with step 'n', every 'k'th element is selected. The size of the population, in this case – N=n*k
1.3 Stratified (zoned)
It is used in case of heterogeneity of the population. The general population is divided into groups (strata). In each stratum, selection is carried out randomly or mechanically.
1.4 Serial (cluster or cluster) sampling
In serial sampling, the units of selection are not the objects themselves, but groups (clusters or nests). Groups are selected randomly. Objects within groups are examined in bulk.

2. Non-probability samples
Selection in such a sample is carried out not according to the principles of randomness, but according to subjective criteria - availability, typicality, equal representation, etc.
2.1. Quota sampling
Initially, a number of groups of objects are identified (for example, men aged 20-30 years, 31-45 years and 46-60 years old; persons with income up to 30 thousand rubles, with income from 30 to 60 thousand rubles and with income over 60 thousand rubles ) For each group, the number of objects that must be examined is specified. The number of objects that should fall into each of the groups is most often set either in proportion to the previously known share of the group in the general population, or the same for each group. Within groups, objects are selected randomly. Quota sampling is used quite often.
2.2. Snowball method
The sample is constructed as follows. Each respondent, starting with the first, is asked for contact information of his friends, colleagues, acquaintances who would fit the selection conditions and could take part in the study. Thus, with the exception of the first step, the sample is formed with the participation of the research subjects themselves. The method is often used when it is necessary to find and interview hard-to-reach groups of respondents (for example, respondents with a high income, respondents belonging to the same professional group, respondents with any similar hobbies/interests, etc.)
2.3 Spontaneous sampling
The most accessible respondents are interviewed. Typical examples of spontaneous samples are in newspapers/magazines, given to respondents for self-completion, and most online surveys. The size and composition of spontaneous samples is not known in advance, and is determined only by one parameter - the activity of respondents.
2.4 Sample of typical cases
Units of the general population that have an average (typical) value of the characteristic are selected. This raises the problem of selecting a feature and determining its typical value.

Course of lectures on the theory of statistics

More detailed information on sample observations can be obtained by viewing.

Sample

Sample or sample population- a set of cases (subjects, objects, events, samples), using a certain procedure, selected from the general population to participate in the study.

Sample characteristics:

  • Qualitative characteristics of the sample - who exactly we choose and what sampling methods we use for this.
  • Quantitative characteristics of the sample - how many cases we select, in other words, sample size.

Necessity of sampling

  • The object of study is very extensive. For example, consumers of a global company’s products are represented by a huge number of geographically dispersed markets.
  • There is a need to collect primary information.

Sample size

Sample size- the number of cases included in the sample population. For statistical reasons, it is recommended that the number of cases be at least 30-35.

Dependent and independent samples

When comparing two (or more) samples, an important parameter is their dependence. If a homomorphic pair can be established (that is, when one case from sample X corresponds to one and only one case from sample Y and vice versa) for each case in two samples (and this basis of relationship is important for the trait being measured in the samples), such samples are called dependent. Examples of dependent samples:

  • pairs of twins,
  • two measurements of any trait before and after experimental exposure,
  • husbands and wives
  • and so on.

If there is no such relationship between samples, then these samples are considered independent, For example:

Accordingly, dependent samples always have the same size, while the size of independent samples may differ.

Comparison of samples is made using various statistical criteria:

  • and etc.

Representativeness

The sample may be considered representative or non-representative.

Example of a non-representative sample

  1. A study with experimental and control groups, which are placed in different conditions.
    • Study with experimental and control groups using a pairwise selection strategy
  2. A study using only one group - an experimental group.
  3. A study using a mixed (factorial) design - all groups are placed in different conditions.

Sampling types

Samples are divided into two types:

  • probabilistic
  • non-probabilistic

Probability samples

  1. Simple probability sampling:
    • Simple resampling. The use of such a sample is based on the assumption that each respondent is equally likely to be included in the sample. Based on the list of the general population, cards with respondent numbers are compiled. They are placed in a deck, shuffled and a card is taken out at random, the number is written down, and then returned back. Next, the procedure is repeated as many times as the sample size we need. Disadvantage: repetition of selection units.

The procedure for constructing a simple random sample includes the following steps:

1. it is necessary to obtain a complete list of members of the population and number this list. Such a list, recall, is called a sampling frame;

2. determine the expected sample size, that is, the expected number of respondents;

3. extract as many numbers from the random number table as we need sample units. If there should be 100 people in the sample, 100 random numbers are taken from the table. These random numbers can be generated by a computer program.

4. select from the base list those observations whose numbers correspond to the written random numbers

  • Simple random sampling has obvious advantages. This method is extremely easy to understand. The results of the study can be generalized to the population being studied. Most approaches to statistical inference involve collecting information using a simple random sample. However, the simple random sampling method has at least four significant limitations:

1. It is often difficult to create a sampling frame that would allow simple random sampling.

2. Simple random sampling may result in a large population, or a population distributed over a large geographic area, which significantly increases the time and cost of data collection.

3. The results of simple random sampling are often characterized by low precision and a larger standard error than the results of other probability methods.

4. As a result of using SRS, a non-representative sample may be formed. Although samples obtained by simple random sampling, on average, adequately represent the population, some of them are extremely misrepresentative of the population being studied. This is especially likely when the sample size is small.

  • Simple non-repetitive sampling. The sampling procedure is the same, only the cards with respondent numbers are not returned to the deck.
  1. Systematic probability sampling. It is a simplified version of simple probability sampling. Based on the list of the general population, respondents are selected at a certain interval (K). The value of K is determined randomly. The most reliable result is achieved with a homogeneous population, otherwise the step size and some internal cyclic patterns of the sample may coincide (sampling mixing). Disadvantages: the same as in a simple probability sample.
  2. Serial (cluster) sampling. Selection units are statistical series (family, school, team, etc.). The selected elements are subject to a complete examination. The selection of statistical units can be organized as random or systematic sampling. Disadvantage: Possibility of greater homogeneity than in the general population.
  3. Regional sampling. In the case of a heterogeneous population, before using probability sampling with any selection technique, it is recommended to divide the population into homogeneous parts, such a sample is called district sampling. Zoning groups can include both natural formations (for example, city districts) and any feature that forms the basis of the study. The characteristic on the basis of which the division is carried out is called the characteristic of stratification and zoning.
  4. "Convenience" sample. The “convenience” sampling procedure consists of establishing contacts with “convenient” sampling units - a group of students, a sports team, friends and neighbors. If you want to get information about people's reactions to a new concept, this type of sampling is quite reasonable. Convenience sampling is often used to pretest questionnaires.

Non-probability samples

Selection in such a sample is carried out not according to the principles of randomness, but according to subjective criteria - availability, typicality, equal representation, etc.

  1. Quota sampling - the sample is constructed as a model that reproduces the structure of the general population in the form of quotas (proportions) of the characteristics being studied. The number of sample elements with different combinations of studied characteristics is determined so that it corresponds to their share (proportion) in the general population. So, for example, if our general population consists of 5,000 people, of which 2,000 are women and 3,000 are men, then in the quota sample we will have 20 women and 30 men, or 200 women and 300 men. Quota samples are most often based on demographic criteria: gender, age, region, income, education, and others. Disadvantages: usually such samples are not representative, because it is impossible to take into account several social parameters at once. Pros: readily available material.
  2. Snowball method. The sample is constructed as follows. Each respondent, starting with the first, is asked for contact information of his friends, colleagues, acquaintances who would fit the selection conditions and could take part in the study. Thus, with the exception of the first step, the sample is formed with the participation of the research objects themselves. The method is often used when it is necessary to find and interview hard-to-reach groups of respondents (for example, respondents with a high income, respondents belonging to the same professional group, respondents with any similar hobbies/interests, etc.)
  3. Spontaneous sampling – sampling of the so-called “first person you come across”. Often used in television and radio polls. The size and composition of spontaneous samples is not known in advance, and is determined only by one parameter - the activity of respondents. Disadvantages: it is impossible to establish which population the respondents represent, and as a result, it is impossible to determine representativeness.
  4. Route survey – often used when the unit of study is the family. On the map of the locality in which the survey will be carried out, all streets are numbered. Using a table (generator) of random numbers, large numbers are selected. Each large number is considered as consisting of 3 components: street number (2-3 first numbers), house number, apartment number. For example, the number 14832: 14 is the street number on the map, 8 is the house number, 32 is the apartment number.
  5. Regional sampling with selection of typical objects. If, after zoning, a typical object is selected from each group, i.e. an object that is close to the average in terms of most of the characteristics studied in the study, such a sample is called regionalized with the selection of typical objects.

6.Modal sampling. 7.expert sampling. 8. Heterogeneous sample.

Group Building Strategies

The selection of groups for participation in a psychological experiment is carried out using various strategies to ensure that internal and external validity are maintained to the greatest possible extent.

Randomization

Randomization, or random selection, is used to create simple random samples. The use of such a sample is based on the assumption that each member of the population is equally likely to be included in the sample. For example, to make a random sample of 100 university students, you can put pieces of paper with the names of all university students in a hat, and then take 100 pieces of paper out of it - this will be a random selection (Goodwin J., p. 147).

Pairwise selection

Pairwise selection- a strategy for constructing sampling groups, in which groups of subjects are made up of subjects who are equivalent in terms of secondary parameters that are significant for the experiment. This strategy is effective for experiments using experimental and control groups, with the best option being the involvement of twin pairs (mono- and dizygotic), as it allows you to create...

Stratometric sampling

Stratometric sampling- randomization with the allocation of strata (or clusters). With this method of sampling, the general population is divided into groups (strata) with certain characteristics (gender, age, political preferences, education, income level, etc.), and subjects with the corresponding characteristics are selected.

Approximate Modeling

Approximate Modeling- drawing limited samples and generalizing conclusions about this sample to the wider population. For example, with the participation of 2nd year university students in the study, the data of this study applies to “people aged 17 to 21 years”. The admissibility of such generalizations is extremely limited.

Approximate modeling is the formation of a model that, for a clearly defined class of systems (processes), describes its behavior (or desired phenomena) with acceptable accuracy.

Notes

Literature

Nasledov A. D. Mathematical methods of psychological research. - St. Petersburg: Rech, 2004.

  • Ilyasov F.N. Representativeness of survey results in marketing research // Sociological Research. 2011. No. 3. P. 112-116.

see also

  • In some types of studies, the sample is divided into groups:
    • experimental
    • control
  • Cohort

Links

  • The concept of sampling. Main characteristics of the sample. Sampling types

Wikimedia Foundation. 2010.

Synonyms:
  • Shchepkin, Mikhail Semenovich
  • Population

See what “Selection” is in other dictionaries:

    sample- a group of subjects representing a specific population and selected for an experiment or study. The opposite concept is the general totality. A sample is a part of the general population. Dictionary of a practical psychologist. M.: AST,... ... Great psychological encyclopedia

    sample- sample Part of the general population of elements that is covered by observation (often it is called a sample population, and a sample is the method of sampling observation itself). In mathematical statistics it is accepted... ... Technical Translator's Guide

    Sample- (sample) 1. A small quantity of a product, selected to represent its entire quantity. See: sale by sample. 2. A small quantity of goods given to potential buyers to give them the opportunity to carry it out... ... Dictionary of business terms

    Sample- part of the general population of elements that is covered by observation (often it is called a sample population, and a sample is the method of sampling observation itself). In mathematical statistics, the principle of random selection is adopted; This… … Economic and mathematical dictionary

    SAMPLE- (sample) A random selection of a subgroup of elements from the main population, the characteristics of which are used to evaluate the entire population as a whole. The sampling method is used when it is too time-consuming or too expensive to survey the entire population... Economic dictionary

Elements that are covered by the experiment (observation, survey).

Sample characteristics:

  • Qualitative characteristics of the sample - what exactly we choose and what methods of sampling we use for this.
  • Quantitative characteristics of the sample - how many cases we select, in other words, sample size.

Sampling Need:

  • The object of study is very extensive. For example, consumers of a global company's products are a huge number of geographically dispersed markets.
  • There is a need to collect primary information.

Encyclopedic YouTube

    1 / 5

    ✪ Sampling: volume calculation. Reliability and power of the study. Biostatistics.

    ✪ 02 - Mat. statistics. Sampling. Sample space. Examples

    ✪ SQL Basics for Beginners | Fetching values ​​from the database

    ✪ SQL for Beginners (DML): Selecting from a Table (MySql), Lesson 4!

    ✪ Production of SIP panels. Part 2. Cutting and shape cutting. Selection of grooves. Everything is in your mind

    Subtitles

Sample size

Sample size - the number of cases included in the sample population.

Samples can be divided into large and small, since different approaches are used in mathematical statistics depending on the sample size. It is believed that samples larger than 30 can be classified as large.

Dependent and independent samples

When comparing two (or more) samples, an important parameter is their dependence. If a homomorphic pair can be established (that is, when one case from sample X corresponds to one and only one case from sample Y and vice versa) for each case in two samples (and this basis of relationship is important for the trait being measured in the samples), such samples are called dependent. Examples of dependent samples:

  • pairs of twins,
  • two measurements of any trait before and after experimental exposure,
  • husbands and wives
  • and so on.

If there is no such relationship between samples, then these samples are considered independent, For example:

  • men and women ,
  • psychologists and mathematicians.

Accordingly, dependent samples always have the same size, while the size of independent samples may differ.

Comparison of samples is made using various statistical criteria:

  • Pearson criterion (χ 2 )
  • Student's t-test ( t )
  • Wilcoxon test ( T )
  • Mann-Whitney criterion ( U )
  • Sign criterion ( G )
  • and etc.

Representativeness

The sample may be considered representative or non-representative. The sample will be representative when examining a large group of people, if within this group there are representatives of different subgroups, this is the only way to draw correct conclusions.

Example of a non-representative sample

  1. A study with experimental and control groups, which are placed in different conditions.
    • Study with experimental and control groups using a pairwise selection strategy
  2. A study using only one group - an experimental group.
  3. A study using a mixed (factorial) design - all groups are placed in different conditions.

Sample types

Samples are divided into two types:

  • probabilistic
  • non-probabilistic

Probability samples

  1. Simple probability sampling:
    • Simple resampling. The use of such a sample is based on the assumption that each respondent is equally likely to be included in the sample. Based on the list of the general population, cards with respondent numbers are compiled. They are placed in a deck, shuffled and a card is taken out at random, the number is written down, and then returned back. Next, the procedure is repeated as many times as the sample size we need. Disadvantage: repetition of selection units.

The procedure for constructing a simple random sample includes the following steps:

1) it is necessary to obtain a complete list of members of the general population and number this list. Such a list, recall, is called a sampling frame;

2) determine the expected sample size, that is, the expected number of respondents;

3) extract as many numbers from the table of random numbers as we need sample units. If there should be 100 people in the sample, 100 random numbers are taken from the table. These random numbers can be generated by a computer program.

4) select from the base list those observations whose numbers correspond to the written random numbers

  • Simple random sampling has obvious advantages. This method is extremely easy to understand. The results of the study can be generalized to the population being studied. Most approaches to statistical inference involve collecting information using a simple random sample. However, the simple random sampling method has at least four significant limitations:

1) It is often difficult to create a sampling frame that would allow for a simple random sample.

2) the result of using a simple random sample can be a large population, or a population distributed over a large geographical area, which significantly increases the time and cost of data collection.

3) the results of using a simple random sample are often characterized by low accuracy and a larger standard error than the results of using other probabilistic methods.

4) as a result of using SRS, a non-representative sample may be formed. Although samples obtained by simple random sampling, on average, adequately represent the population, some of them are extremely misrepresentative of the population being studied. The likelihood of this is especially high with a small sample size.

  • Simple non-repetitive sampling. The sampling procedure is the same, only the cards with respondent numbers are not returned to the deck.
  1. Systematic probability sampling. It is a simplified version of simple probability sampling. Based on the list of the general population, respondents are selected at a certain interval (K). The value of K is determined randomly. The most reliable result is achieved with a homogeneous population, otherwise the step size and some internal cyclic patterns of the sample may coincide (sampling mixing). Disadvantages: the same as in a simple probability sample.
  2. Serial (cluster) sampling. Selection units are statistical series (family, school, team, etc.). The selected elements are subject to a complete examination. The selection of statistical units can be organized as random or systematic sampling. Disadvantage: Possibility of greater homogeneity than in the general population.
  3. Regional sampling. In the case of a heterogeneous population, before using probability sampling with any selection technique, it is recommended to divide the population into homogeneous parts, such a sample is called district sampling. Zoning groups can include both natural formations (for example, city districts) and any feature that forms the basis of the study. The characteristic on the basis of which the division is carried out is called the characteristic of stratification and zoning.
  4. "Convenience" sample. The “convenience” sampling procedure consists of establishing contacts with “convenient” sampling units - a group of students, a sports team, friends and neighbors. If you want to get information about people's reactions to a new concept, this type of sampling is quite reasonable. Convenience sampling is often used to pretest questionnaires.

Non-probability samples

Selection in such a sample is carried out not according to the principles of randomness, but according to subjective criteria - availability, typicality, equal representation, etc.

  1. Quota sampling - the sample is constructed as a model that reproduces the structure of the general population in the form of quotas (proportions) of the characteristics being studied. The number of sample elements with different combinations of studied characteristics is determined so that it corresponds to their share (proportion) in the general population. So, for example, if our general population consists of 5,000 people, of which 2,000 are women and 3,000 are men, then in the quota sample we will have 20 women and 30 men, or 200 women and 300 men. Quota samples are most often based on demographic criteria: gender, age, region, income, education, and others. Disadvantages: usually such samples are unrepresentative, since it is impossible to take into account several social parameters at once. Pros: readily available material.
  2. Snowball method. The sample is constructed as follows. Each respondent, starting with the first, is asked for contact information of his friends, colleagues, acquaintances who would fit the selection conditions and could take part in the study. Thus, with the exception of the first step, the sample is formed with the participation of the research objects themselves. The method is often used when it is necessary to find and interview hard-to-reach groups of respondents (for example, respondents with a high income, respondents belonging to the same professional group, respondents with any similar hobbies/interests, etc.)
  3. Spontaneous sampling - sampling of the so-called “first person you meet”. Often used in television and radio polls. The size and composition of spontaneous samples is not known in advance, and is determined only by one parameter - the activity of respondents. Disadvantages: it is impossible to establish which population the respondents represent, and as a result, it is impossible to determine representativeness.
  4. Route survey - often used if the unit of study is the family. On the map of the locality in which the survey will be carried out, all streets are numbered. Using a table (generator) of random numbers, large numbers are selected. Each large number is considered as consisting of 3 components: street number (2-3 first numbers), house number, apartment number. For example, the number 14832: 14 is the street number on the map, 8 is the house number, 32 is the apartment number.
  5. Regional sampling with selection of typical objects. If, after zoning, a typical object is selected from each group, that is, an object that is close to the average in terms of most of the characteristics studied in the study, such a sample is called zoned with the selection of typical objects.
  6. Modal sampling.
  7. Expert sampling.
  8. Heterogeneous sample.

Group Building Strategies

The selection of groups for participation in a psychological experiment is carried out using various strategies to ensure that internal and external validity are maintained to the greatest possible extent.

Randomization

Randomization, or random selection, is used to create simple random samples. The use of such a sample is based on the assumption that each member of the population is equally likely to be included in the sample. For example, to make a random sample of 100 university students, you can put pieces of paper with the names of all university students in a hat, and then take 100 pieces of paper out of it - this will be a random selection (Goodwin J., p. 147)....

Pairwise selection

Pairwise selection- a strategy for constructing sampling groups, in which groups of subjects are made up of subjects who are equivalent in terms of secondary parameters that are significant for the experiment. This strategy is effective for experiments using experimental and control groups, with the best option being the use of twin pairs (mono- and dizygotic).

Stratometric sampling

Stratometric sampling- randomization with the allocation of strata (or clusters). With this method of sampling, the general population is divided into groups (strata) with certain characteristics (gender, age, political preferences, education, income level, etc.), and subjects with the corresponding characteristics are selected.

Approximate Modeling

Approximate Modeling- drawing limited samples and generalizing conclusions about this sample to the wider population. For example, with the participation of 2nd year university students in the study, the data of this study applies to “people aged 17 to 21 years”. The admissibility of such generalizations is extremely limited.

Approximate modeling is the formation of a model that, for a clearly defined class of systems (processes), describes its behavior (or desired phenomena) with acceptable accuracy.

Sample - This:

1) the totality of those elements of the research object that will be directly studied;

2) methods and procedures for selecting elements of the research object.

Population – a complete set of objects related to the problem being studied. In sociological research as G.S. most often they are aggregates of individuals - the population (city, country, etc.), a social group (youth, the unemployed, businessmen, etc.), the audience of mass media (MSC), etc. However, in many cases G.S. . may consist of larger elements (objects) - families (households), academic groups, enterprises, religious communities, individual localities or states, etc.

Sample population - a portion of objects from a population selected for study in order to draw conclusions about the entire population.

In order for the conclusion obtained by studying the sample to be extended to the entire population, the sample must have the property of representativeness.

Representativeness is the ability of a sample to represent the population being studied. The more accurately the composition of the sample represents the population on the issues being studied, the higher its representativeness.

EXAMPLE: Representativeness can be illustrated by the following example. Let's assume that the population is all the students of the school (600 people from 20 classes, 30 people in each class). The subject of study is attitudes towards smoking. A sample consisting of 60 high school students represents the population much worse than a sample of the same 60 people, which will include 3 students from each class. The main reason for this is the unequal age distribution in classes. Consequently, in the first case, the representativeness of the sample is low, and in the second case, the representativeness is high (all other things being equal).

Sample types

1.Random sampling.

1.1.Simple random selection.

1.2. Systematic (or mechanical) sampling method.

1.3. Serial (cluster or cluster) sampling.

1.4. Stratified sampling.

2. Non-random sampling (non-probability).

2.2. Spontaneous sampling.

2.3. Multi-stage and single-stage sampling.

1.Random sampling.

The peculiarity of a random sample is that all units in the population have an equal probability of being included in the sample population. In case of random sampling it is carried out randomness principle. The sampling basis can be lists of enterprise employees, telephone directories, registration lists of car owners, lists of voters at polling stations, house registers, as well as various lists compiled by the sociologist himself, depending on the purposes of the study (a list of streets on which respondents are then selected).

Random sampling is usually used in public opinion polls before elections, referendums and other public events.

Plus This method is to fully comply with the principle of randomness and, as a result, to avoid systematic errors.

Disadvantages of this method:

– The need to have a list of population elements.

– Difficulty of conducting a survey.

– Relatively large sample size.