A sample is a subset drawn from a larger population for the purpose of studying its properties. In statistics the sample substitutes for the whole population when a complete count is impractical or impossible. A well chosen sample permits estimates of population quantities, or parameters, and allows assessment of uncertainty around those estimates. The concept is central to experimental design, surveys, environmental monitoring and laboratory measurement.

Notation and basic characteristics

When treated as a dataset, a sample is commonly represented by random variables such as X or Y, and by their observed values x1, x2, …, xn. The letter n denotes the sample size, the number of observations. Important descriptors include the sample mean, variance and other summary statistics which estimate corresponding population quantities. Practical samples vary in scale, from a handful of repeated measurements in a laboratory to tens of thousands of survey responses collected by national census bureaus.

How samples are chosen and common designs

Sampling is the process of selecting units from the population. A primary goal is to obtain a sample that is representative and as free of bias as possible. Typical probability-based designs include simple random sampling, stratified sampling, cluster sampling and systematic sampling. Probability methods ensure that every unit in the population has a known chance of selection, which enables valid inference based on probability theory. Non-probability methods, such as convenience sampling, are easier to implement but are more vulnerable to bias and limited generalizability. In practice, many field procedures are rules or protocols that must be followed exactly to make the selection reproducible: a written sequence of rules defines how to proceed.

Sources of error and bias

No sample is perfect. Errors may arise from imperfect sampling frames, non-response, measurement variations, or interviewer effects. Even in carefully planned random samples, systematic differences between selected and non-selected units can remain. For example, polls that rely on telephone contact can miss citizens who do not answer calls, so results may deviate from the true outcome on election day—the challenge of predicting an election illustrates this vividly. Statisticians quantify and, when possible, adjust for bias; they also provide measures of uncertainty such as standard errors and confidence intervals so users understand the precision of sample-based estimates. When neutrality is unattainable, practitioners try to measure and report the magnitude and direction of expected deviations from a fully neutral design.

Applications and examples

Samples are used across scientific disciplines and applied settings. Environmental scientists collect water samples to assess pollution levels in a lake or estuary; where the water was taken can change results and conclusions. In laboratory contexts repeated measurements of a physical constant, like the speed of light, are treated as a sample of observations subject to instrument and procedural variability. Quality control programs draw samples of manufactured items to infer the proportion defective. Social researchers interview a sample of residents to estimate public attitudes, then use statistical analysis of the collected data to generalize to a larger group. Each application highlights the trade-off between cost, feasibility and the acceptable level of uncertainty.

Distinctions and important concepts

  • Complete sample: includes all units possessing a specified property (rare outside administrative registers).
  • Representative or unbiased sample: selection mechanism does not systematically depend on the values of interest.
  • Sample versus population: sample statistics estimate population parameters; sampling error quantifies the difference due to selection rather than measurement.
  • Measurement error: repeated measures of the same object generate a sample of observations affected by instrument and human factors; no measurement system is perfect and such variability must be modeled and reported (measurement, error).
  • Role of the statistician: experts design sampling schemes, assess bias and compute uncertainty so users can interpret results responsibly (statistician).

Choosing an appropriate sampling approach and documenting how the sample was obtained are essential for the credibility of any study. Even when logistical or ethical constraints limit what is possible, transparent reporting—what was sampled, how, and with what expected limitations—allows results to be interpreted correctly and used effectively. For further methodological detail consult specialized texts and professional guidance available from methodological resources. Random selection and careful attention to probability remain the most reliable foundations for making sound inferences from a sample.