Collecting (measuring) a lot of data can sometimes be very expensive, it would require a lot of time and resources. For this reason, we are working with sampling.
The sample is a significantly smaller extract of the population used to make a deduction and relate back to the population. Usually the most common deductions are average and standard deviation. The sample size is always a compromise between what is statistically desirable and what is practically feasible.
We can perform a sampling by one of the following methods:
Random - Each individual of the target population has the same chances to be selected in the sample. For example LOTTO draw.
Stratified - The population is divided into similar relevant categories on various criteria and sampling is done separately from those categories. For example on exchanges, suppliers, production lines.
Systematic - The selection is made at regular intervals. We can choose one piece per day, in turn, or every hour.
As a guide (there are many discussions on this topic), the sizing of the sample, depending on the instrument applied, can be considered according to the model below:
Tool No minimum sample Medium or median 5-10 Standard deviation 25-30 100% defects (≥ 5 defects) Box Plot 30 Pareto 50 Control diagram 10
The sample size can be determined from tables or most often formulas are used depending of course on the limit error of representativeness.
Very important - data collection must provide information with high accuracy and be representative.