# Validating Data Worksheets

Collecting data allows us to understand the nature of many different things and it can help us answer relevant questions in our lives. These questions can be simple as which type of juice is the most popular at breakfast. They can also be as complex as which of those juices is the best for our health. In order to answer either of these questions we must first collect data. We must also ensure that the data that we are collecting is in fact valid. If the data is tainted, the answers to our questions simply cannot be answered accurately at all. These worksheets and lessons help students learn to be critical of the data they are reviewing and how that data was collected and processed.

### Aligned Standard: HSS-IC.A.2

- Coin Flipping Step-by-step Lesson- Some parts of the world refer to a coin flip as a spin. I thought I would try it out.
- Guided Lesson - We try to determine if you can model a given circumstance.
- Guided Lesson Explanation - I didn't know how to explain number three well, other than because.
- Practice Worksheet - Here is a good work out for your brain.
- Can These Be Design Models Worksheet - It seems simple enough, just read it carefully.

- Answer Keys - These are for all the unlocked materials above.

### Homework Sheets

Lots of coins, dice, and spinners in here.

- Homework 1 - 8 flips is a very small sample size. When the sample sizes are small we cannot question models, theories or laws. When the data is large which go against the model we can question the model theory or law.
- Homework 2 - Mia rolls the dice 5 times in row and each time it lands on the same number. Does this make you question the model of probability for die?
- Homework 3 - Yes, models can be generated for all types of probability based scenarios.

### Practice Worksheets

We add real world problems and a deck of cards. Why not?

- Practice 1 - On a game show, a prize wheel is spun to win a prize. The wheel has 12 equally sized prize areas. A different prize is in each area. Over the last 200 spins, 15% of the time the time the same prize was landed on. Would this make you question model that is being presented to you?
- Practice 2 - A die is rolled and a coin is tossed. What is the probability that the die shows an odd number and the coin shows a heads?
- Practice 3 - A card is drawn 200 times from a deck of cards. The King of Hearts appears 5 times. Does this make you question the model of the experiment?

### Math Skill Quizzes

I stuck with the very traditional problems here.

- Quiz 1 - The probability of getting 6 questions right on a math test. You are answering randomly and the test is multiple choice with four answers.
- Quiz 2 - A local lottery states that the probability of winning is 0.00014. The lottery drawing took place 4 times. Over 10,000 were purchased and no one won. Do you feel the probability of lottery winners is flawed?

### Why is it Important to Validate Data?

When we are trying to better understand just about anything, it is always best to step back and observe the current state of that thing. To do this we first define the data that is being generated that is of interest. For instance, if we were trying to understand how to help a big-league baseball player improve his on base percentage, the number of hits he gets may be helpful and so would current on base percentage. Knowing the amount of water, he drinks over the course of the day would probably not be a critical statistic.

Once the data of interest is defined it is time to collect it. We must make sure that the data is collected from multiple angles that means that the data is collected by as many means as possible. We do this to make sure that our data is accurate and clear. In the case of the ball player, we would want to have several people watch his games to come up with these statistics. A single person can become easily distracted, at a moment in time, which would result in an accumulation of inaccurate data. The validity of our data improves with every additional source we add to collect it. Once we can ensure that our data is clean, it makes it easy to make stronger generalizations from the data that is collected.

### What are Data Generating Processes?

When you are trying to perform data analysis, you need to consider a sample population for forecasting procedures. In such circumstances, data-generating procedures get taken into consideration. Firstly, you need to know what the population is. Population is the set of elements that can be taken as a statistical analysis to determine whether an element belongs to any population or not.

In fact, in most cases, you don't know the population, and then we use joint probability distribution. From the distribution, the statistician takes out a finite set of observations from those populations. The original purpose of statistical analysis is to get information from the population sample taken from the probability distribution.

Data can be generated via a probability based or a non-probability based process. Probability based processes result in a much more random data set which is more reflective of the overall population. Even though you may feel that non-probability based data collection result in completely invalid data. There are many situations where you do not have the ability to collect data across the entire population. Yes, there is some level of bias in this, but it does give you a better understanding of the system that you are studying.