Skip to main content

Chapter 1 Data collection

Scientists seek to answer questions using rigorous methods and careful observations. These observations — collected from the likes of field notes, surveys, and experiments — form the backbone of a statistical investigation and are called data. Statistics is the study of how best to collect, analyze, and draw conclusions from data. It is helpful to put statistics in the context of a general process of investigation:

  1. Identify a question or problem.

  2. Collect relevant data on the topic.

  3. Analyze the data.

  4. Form a conclusion.

Statistics as a subject focuses on making stages 2-4 objective, rigorous, and efficient. That is, statistics has three primary components: How best can we collect data? How should it be analyzed? And what can we infer from the analysis?

Researchers from a wide array of fields have questions or problems that require the collection and analysis of data. Let's consider three examples.

  • Climate scientists: how will the global temperature change over the next 100 years?

  • Psychology: can a simple reminder about saving money cause students to spend less?

  • Political science: what fraction of Americans approve of the job Congress is doing?

What questions from current events or from your own life can you think of that could be answered by collecting and analyzing data? While the questions that can be posed are incredibly diverse, many of these investigations can be addressed with a small number of data collection techniques, analytic tools, and fundamental concepts in statistical inference.

This chapter focuses on collecting data. We'll discuss basic properties of data, common sources of bias that arise during data collection, and several techniques for collecting data through both sampling and experiments. After finishing this chapter, you will have the tools for identifying weaknesses and strengths in data-based conclusions, tools that are essential to be an informed citizen and a savvy consumer of information.