How to measure the quality and characteristics of data collected? |

In maths or statistics, the quality assurance or survey methodology or sampling is the selection of a data (in simple words, a statistical sample) of individuals from within a demographic population to estimate characteristics of the entire population. Usually, statisticians consider samples from the collected data to represent the population, in certain scenarios for a specific purpose. Two major advantages of sampling are reduction in cost and faster data collection, when compared with measuring the entire population.

Basically, the data which is collected for a specific purpose will be analysed through sampling techniques. Events from statistical theory and probability theory are applied to oversee the practice. For example, while studying the behaviour of roulette rings, we can use this sampling to identify a biased ring. Here, the population we want to investigate is the complete behaviour of the ring and the probability distribution of its results across infinitely many trials. At the same time, you can notice that the ‘sample’ formed from observed results from that ring. Similar assumptions will arise when estimating repeated measurements of some other physical characteristic, such as the electrical conductivity of copper.

Sometimes the population from which the sample is taken may not be the same as the purpose of the population about which information is collected. To avoid such results we can compare the data with different dimensions namely consistency, completeness, timeliness, uniqueness, accuracy and validity. If the given or collected data satisfies all the above dimensions, then the results that we get through samples will be accurate and relevant to our requirement.

There are many other ways to measure the quality of data, one of them is representing the collected data using variables and performing the different probability techniques. For example, consider probability density function, or density of a continuous random variable. It is a function whose value can be interpreted at any given point or sample in the collected data (i.e. sample space) by providing a corresponding probability, so that the value of the random variable would equal to that sample.