Pages

Monday 10 October 2011


Sample and Population

The population consists of the set of all measurements in which the investigator is interested. The population is also called the universe.

A sample is a subset of measurements selected from the population. Sampling from the population is often done randomly, such that every possible sample of n elements will have an equal chance of being selected. A sample selected in this way is called a simple random sample, or just a random sample. A random sample allows chance to determine its elements.

Source: http://mips.stanford.edu/courses/stats_data_analsys/lesson_1/pop_sam.mov

For example, a manufacturer produces 1000 units of a product ‘A’ in a production cycle, then the total number of units in a cycle constitutes a population. Now draw 30 units of the product ‘A’ for a routine quality check; which is called a sample (a miniature or representation of population). If the selection process is random, then the sample can be considered as random sample.

The definition of Population and sample are relative to what we want to consider. If we are dealing with all production cycles in a quarter, then that is our population and total number of production cycles in any two weeks may be considered as sample.

Source: http://mips.stanford.edu/courses/stats_data_analsys/lesson_1/pop.gif

A set of observations/ measurements obtained on some variable is called a data set. For example the units the number of units of product ‘A’ sold through 10 different outlets makes a data set.

A conclusion drawn about a population based on the information in a sample from the population is called a statistical inference. Statistical inference may be based on data collected in surveys or experiments. To ensure the accuracy of statistical inference, data must be drawn randomly from the population of interest, and we must make sure that every segment of the population is adequately and proportionally represented in the sample.

There are challenges in data collection. One of them is non-response bias. This is the biasing of the results that occurs when we disregard the fact that some people will simply not respond to the survey. The bias distorts the findings, because the people who do not respond may belong more to one segment of the population than to another.

In experiments, as in surveys, it is important to randomize if inferences are indeed to be drawn. People should be randomly chosen as subjects for the experiment if an inference is to be drawn to the entire population.

In other situations Data may come from secondary sources like published government statistical abstracts or from any other relevant data providers. In that case, we must get through the background of such data to make use in any realistic analysis. For example, the unemployment rate over a given period is not a random sample of any future unemployment rates, and making statistical inferences in such cases may be complex and difficult.

Usual inspection of the data or information will serve when interest centres on the particular observations. If, however, you want to draw meaningful conclusions with implications extending beyond your limited data, statistical inference is the way to do it.

In marketing research, we are often interested in the relationship between advertising and sales. A data set of randomly chosen sales and advertising figures for a given firm may be of some interest in itself, but the information in it is much more useful if it leads to implications about the background process. The relationship between the firm’s level of advertising, the resulting level of sales, conversion thread in an add, influence variations, impact on brand equity, and split up media planning etc. are some advantage due to ‘Add- Sales- Brand Analysis. An understanding of the true relationship between advertising and sales—the relationship in the population of advertising and sales possibilities for the firm—would allow us to predict sales for any level of advertising and thus to set advertising at a level that maximizes profits.

A quality control engineer at a plant making a product ‘A’ needs to make sure that no more than 2% of the product ‘A’ produced are defective. The engineer may routinely collect random samples of ‘A’ and check their quality. Based on the random samples, the engineer may then draw a conclusion about the proportion of defective items in the entire population of Production ‘A’. These are just a few examples illustrating the use of statistical inference in business situations.