Census and Bias: Essential Concepts in Statistical Analysis
Master the fundamentals of census and bias in data collection. Learn to identify various types of bias, understand sampling techniques, and improve the accuracy of your statistical analyses.

Get the most by viewing this topic in your current grade. Pick your course now.

Now Playing:Census and bias – Example 0a
Intros
  1. What is a census? And what are some variables in a census?
  2. What is bias?
Examples
  1. Determining Response Variables and Explanatory Variables
    Classify the response variables and the explanatory variables from the following experiments:
    1. A census is done on amount of money people make and how long they spent in school.

    2. A study is concerned with looking at what sports athletes do and how far they can throw a Frisbee.

    3. An experiment correlates the amount of sunlight and the amount of food produced by a crop.

Practice
Census And Bias 1a
Influencing factors in data collection
Jump to:NotesRelated
Notes

Glossary terms:

  • Census
  • Explanatory variable
  • Response variable
  • Bias
  • Response bias
  • Selection bias
  • Non-response bias
  • Voluntary response bias

Census and bias


What is a census in statistics


We define census as the study performed in an entire population, not through sampling, but by performing the interview, observation or analysis on every single individual from the population directly; every object, every being, every unit, everything.

When hearing the word census we usually think about an statistical study of human population; indeed, population censuses are common every so often throughout the world. In the three north american countries, Canada, the United States and Mexico; the population census is carried out by Statistics Canada, the Census Bureau and the INEGI (the National Institute of Statistics and Geography, from its initials in Spanish) respectively. But all of this does not mean that a census is exclusive to human statistics, a census is the name of the process through which data from every single subject of a population is obtained directly; therefore, a census can be conducted on any type of population, be it composed of humans, animals, plants or inanimate objects.

Although census data is comprehensive and would provide us with the highest degree of accuracy in statistical findings, censuses are not very practical when working with large populations. Besides monetary costs, imagine having to go through every household in your country to interview all of their inhabitants every time we wanted to gather some statistical information (not only the typical data for a population census, but something much simpler, such as who in your country prefers summer over winter). Such all-inclusive process would take huge amounts of time to accomplish, and that process would have to be repeated every time a new question arises; therefore, we usually use statistical analysis based on sampling methods rather than a census.

However, it is important you know that although the usage of sampling methods to execute statistical studies is generally the most time and cost effective path to follow, this method can produce errors when the sample is not a good representation of the population being studied; this error is what we call bias. Before we continue with the bias definition in more detail, let us talk about the types of variables that will be used throughout a statistical analysis (and any experiment for that matter): Explanatory variables and response variables.

The explanatory variable of an experiment is the independent variable of the mathematical problem represented by such experiment; in statistics, the explanatory variable is that which can be manipulated, and each manipulation will produce a change in the response variable. The explanatory variable is the variable that affects the response variable; thus, in statistics we call explanatory variables to those needed to be considered in order to obtain the response variable.
A response variable is that which provides the outcome of an experiment, or in this case, a statistical study. When thinking on the variables of a study or experiment as algebraic variables, the response variable is the equivalent to the dependent variable in a problem; therefore, this variable typically represents the outcome the experiment is trying to achieve. In simple words, the response variable provides the result of the overall response measured from the individuals on a statistical study; hence, whatever your statistical study wants to find out, is the information that the response variable will produce.

There may be many explanatory variables in an experiment; typically it is better to find the single most relevant explanatory variable.

What is bias in statistics


Continuing with the basic concepts of statistics, let us take a look at the type of errors that a statistical analysis may contain.

When utilizing sampling methods to execute a statistical study, a common error is the one called sampling error. Just as its name reflects, a sampling error comes from the fact that you are not studying a complete population in order to obtain certain statistic information, but a sample from it; consequently, even when you have a big representative sample of your population to analyse, is probable your finding will not be the exact true value of the statistic as if you had performed a census.

Sampling error may be reduced by increasing the size of the sample; furthermore, increasing a sample can provide more information about the subdivisions of a population. Unfortunately, statistical bias is not as simple as sampling error.

Bias in statistics, or sampling bias, is a systematic error originated from obtaining a sample that does not represent the population or was incorrectly measured; therefore, no matter if the calculations of the study and all the procedures are followed correctly afterwards, the error has been introduced at the moment of sampling, and it will continue to affect the result unless you take a new sample which is representative or well measured. In simple words, we can define bias as a distortion introduced in a statistical study due poorly implemented sampling techniques.
There are different types of bias in statistics, each of them resulting from a distinct issues during sampling, let us take a look to the next section in our lesson so you can learn about these biases.

Types of bias in statistics


As we mentioned before, there are many different ways in which a sample could result in bias, here are only a few of the most common types of bias found in statistical computations:

\quad Response Bias:
Due to popular trends, costumes or social standards, people tend to lie when being surveyed, this is where response bias comes from. Response bias are distortions to true information resulted from general people not wanting to have unpopular opinions; therefore, the psychological effect of social pressure pushes them to respond untruthfully when being face to face with an interviewer or if they were not anonymous for the survey. A good way to solve this issue is to ensure that all participants of a survey remain anonymous.

Response bias example: When making a survey asking the ages, cloth sizes or weight of people. Due to established social ideals or estigmas, people may be tempted to lie while responding about their true numbers on these variables. For example, in some countries and cultures women get offended when asked about their age because there is a bad stigma about being old; on the other hand, weight and size are common controversial issues in the occidental world, society favors a thin figure; if for any reason someone is not thin, they might be shamed about it (they shouldnt! Always be proud of your body, it allows you to enjoy life).


\quad Selection Bias:
Selection bias occurs when the sample taken from the population does not fairly represent it. The issue of misrepresentation can arise from different causes, often, from failing to provide all subjects in the target population the same possibility to be part of the selected sample (due a non random selection for example).
When selecting people for a survey make sure that your selection process is fair. When collecting samples think about how you are gathering your data and make sure that it is fully randomized.

For example: When analysing the shopping preferences from a city population, if the sample of individuals is taken from a rich neighborhood, the study will be biased to produce conclusions representative of expensive tastes rather than representing the whole population of the city.


\quad Non-response Bias:
Also known as participation bias, non-response bias results from the inability (or reluctance) of a subject to participate in the study. For example, when choosing a simple random sample of a population, sometimes individuals chosen may be unwilling or unable to participate; therefore, a non-response bias is the bias that results when there are very low response rates and it becomes unclear what part of the population is participating in this survey.

Example: If a survey were to ask people about their income, not many people would be willing to respond fearing a fraud or scam; therefore, it is difficult to acquire information about household incomes directly unless you are an established government agency or similar.


\quad Voluntary Response Bias:
From all of the sources of bias in statistics, this is one of the easiest to understand; when individuals learn about a statistical survey and they offer themselves to participate in it, is usually because they want their voices to be heard. If individuals offer to participate in a survey they may have very strong feelings one way or the other about a specific matter, and they want to influence the outcome of the experiment altogether.

For example: A radio DJ asks his listeners to call in if they think Segway scooters are stupid and should be outlawed. This may encourage a small proportion of the population who hates Segway scooters to call in.

After learning about these types of statistical bias, let us finalize this lesson with a section presenting a few examples of them.

Bias examples


Example 1

Classify the response variables and the explanatory variables from the following experiments:
  1. A census is done on amount of money people make and how long they spent in school.
  2. A study is concerned with looking at what sports athletes do and how far they can throw a Frisbee.
  3. An experiment correlates the amount of sunlight and the amount of food produced by a crop.

Example 2

For each of the following experiments below determine the sources of bias may be present , and provide a solution to overcoming the bias:

  • A university (UBC) wishes to figure out how many students like its new online homework submitting system (Connect). So UBC sends emails out to all the students currently enrolled asking them to submit a survey asking them how much they like Connect. (In all likelihood few people will respond).

  • A study is done by the U.S. Census board trying to figure out the proportion of the population that plays PS4 games. So they send a representative out to a Best Buy to ask the customers of the store whether they play PS4 or not.

  • Emily believes that everybody loves Adele, so on Facebook she tells her friends to comment on a post about how much they love Adele. Since all her response are positive she concludes that nearly 100% of the population must love Adele

  • Dr. Anstee is a professor of mathematics at UBC. He loves mathematics and wants to know how many of his students share his love for math. So in his office hours he asks all of his students one at a time how much they truly love math.
Census:
A census is the procedure of conducting experiments to figure out information about members of a given population.
Doesn't necessarily have to be human!

Explanatory Variables and Response Variables:
A response variable is the outcome of the experiment. This is typically the sort of outcome the experiment is concerned with finding. Think of it as the "response" of the experiment.

The explanatory variable is the variable that affects the response variable. Think of it as the "explanation" to the response variable.

There may be many explanatory variables in an experiment; typically it is better to find the single most relevant explanatory variable.

2 errors in conducting a census:
Sampling error and bias

Bias:
Poorly implemented sampling techniques can lead a wide variety of Bias. A Bias is when the selection of a sample results in an unfair proportion of the sample having a tendency to favour certain outcomes.
i.e. Trying to decide what proportion of the population likes ice cream, but only asking kids

Sources of Bias:
There are many different ways in which a sample could result in bias, here are only a few of the most common sources of bias:

Response Bias:
In general people will not want to have unpopular opinions, so they may respond untruthfully when being face to face with an interviewer or if they were not anonymous for the survey. A good way to solve this issue is to ensure that all participants of a survey remain anonymous.
e.g. Doctors asking patients if they are following their orders (people will be tempted to lie to the doctor to maintain image or not incur the doctors wrath)

Selection Bias:
When selecting people for a survey make sure that your selection process doesn't favour any of the population that has a specific preference for an outcome in your experiment. When collecting samples think about how you are gathering your data and make sure that it is fully randomized.
e.g. When polling the amount of families in an area it would be a selection bias to poll all the people entering a toy store (as more families will shop at a toy store).

Non-response Bias:
Sometimes individuals chosen for the sample in a census may be unwilling or unable to participate. A non-response bias is the bias that results when there are very low response rates and it becomes unclear what part of the population is participating in this survey.
e.g. If a mail survey was conducted asking people about what sorts of car they drive it is likely that few people would respond and it would be hard to know what proportion of the population this represents.

Voluntary Response Bias:
If individuals offer to participate in a survey they may have very strong feelings one way or the other about a specific matter.
e.g. A radio DJ asks his listeners to call in if they think Segway scooters are stupid and should be outlawed. This may encourage a small proportion of the population who hates Segway scooters to call in.

So basically make sure to choose your sample well using a good sampling technique and make sure there is no bias!