MDM4U Grade 12 Data Management One Variable Statistics Test

Data Analysis

Numerical Data

–          quantitative data can be measured numerically

–          discrete numerical data is measured with whole numbers

–          continuous numerical data is measured with real numbers (decimals)

–          bar graphs represent discrete data while histograms represent continuous data

Categorical Data

–          qualitative data that can not be measured numerically

–          nominal data can be placed in any order

–          ordinal data is better represented when sorted or ordered

–          categorical data is often represented with bar graphs or circle graphs

Data Vocabulary

–          primary data is data collected first-hand

–          secondary data is data collected by others

–          micro data represents individual pieces of data

–          aggregate data represents grouped micro data

–          binary data is data with only 2 outcomes, e.g. gender

–          non-binary data does not have only 2 outcomes

–          population: entire group being studied

–          sample: part of the population being studied

–          inference: conclusion made about the population based on the sample

–          observational data collection involves grouping people by common qualities, then observing

–          experimental data collection involves the creation of groups, then certain criteria is enforced

Sampling Techniques

  • Good Sample Characteristics: large enough to represent the population and each person/subject in the population must have an equal chance of being chosen

Simple Random Sample: every member of the population has an equal chance of being picked

Systematic Random Sample

–          to go through a population sequentially and select subjects at consistent intervals

–          interval = population/sample size

Stratified Sample

–          a strata is a group of subjects that share a common characteristic which groups them

–          keeps proportionate samples of each strata to the population

Cluster Sample: one representative group of the population chosen at random

Multi-Stage Sample: several techniques used to sample

Voluntary Response Sample: inviting subjects to voluntarily be a part of the sample group

Convenience Sample: selecting easily available subjects

Bias in Surveys

  • Questions must be: simple, specific, clear, ethical, free of bias, allows for honest response, and do not infringe on anyone’s privacy
  • Good questions stay away from: jargon (slang), abbreviations, negatives, leading questions, and insensitivities
  • Good questions are often anonymous and require the subject to select from a list of responses
  • Biases can be intentional or unintentional, but causes the survey to be invalid regardless of intent
  • Sampling Bias: the chosen sample does not represent the population (choosing the wrong people/subjects)
  • Non-Response Bias: when subjects choose not to participate and are under-represented (non-response is a form of sampling bias)
  • Measurement Bias: data collection consistently under/overestimates a characteristic of a population (leading questions)
  • Response Bias: subjects provide false/inaccurate results (asking a sensitive question)

 

 

 

 

Measures of Central Tendency

  • Mean: sum of data divided by number of data values
  • Median: the middle value of a set, if there is an even number of values; find the mean of the two middle values
  • Mode: the value that occurs most frequently

 

 

Measures of Spread

  • Range: highest value – lowest value

Interquartile Range

–          split the data into 2 groups by median, then split those two groups into two more groups each using the median again

–          median of high group is labelled Q3, median of entire data is Q2, and median of low group is labelled Q1

–          interquartile range = Q3 – Q1

Box and Whisker Plot

–          illustrates how clustered data is around the median

–          the plot must be scaled and include a box around Q1, Q2, and Q3

–          whiskers extend from the box to the lowest value and the highest value

Variance

  • Population: Variance = (Σ(x-μ)2)/n
  • Sample: Variance = (Σ(x-mean)2)/(n-1)

 

Standard Deviation

  • Population: σ = √((Σ(x-μ)2)/n)
  • Sample: S = √((Σ(x-mean)2)/(n-1))

 

Percentile

  • –          divides data into 100 intervals that have an equal amount of values
  • –          tests often use percentile to convert a score to a new score on a scale of 1 to 100

Z-Score

  • Population: z = (x- μ)/ σ
  • Sample: z = (x-mean)/S