MDM4U – Grade 12 Data Management – Exam

Grade 12 – Data Management

 

Exam

 

Unit 1: One Variable Analysis

Types of Data

  • Numerical Data
    • Discrete:  consists of whole numbers
      • Ie. Number of trucks.
    • Continuous: measured using real numbers
      • Ie, Measuring temperature.
  • Categorical Data: cannot be qualitatively measured
    • Nominal: Data which any order presented makes sense
      • Ie, Eye Colour, Hair Colour.
    • Ordinal Data: better if sorted or ordered
      • Ie, Date and Time, scalar options
  • Collecting Data
    • Primary: collected by yourself
    • Secondary: collected by someone else
  • Organizing Data
    • Micro Data: information about an individual
    • Aggregate Data: grouped data about a group; summarized data.
  • Data collection
    • Observational Data: group of people by characteristic, then observe
      • Group by adult/children then look at sunlight’s effect on them
    • Experimental Data: create groups and impose some treatment on them
      • Create experimental groups then apply placebo drug treatments on them.
  • Other Terms
    • Population: entire group of people being studied
    • Sample: the part of the population being studied
    • Inference: conclusion made about the population based on the sample
    • Binary Data: only 2 choices/outcomes
    • Non-Binary: more than 2 outcomes

Sampling Techniques

Characteristics of a good sample

-Each person must have an equal chance to be in the sample

-Sample must be vast enough to represent

  • Simple Random: each member has equal chance of being selected
    • Ie, picking members randomly apartments
  • Sequential Random: go through population sequentially and select members
    • Ie, Selecting every 5th person
  • Stratified Sampling: a strata is a group of people that share common charactoristics
    • Constraints the proportion of members in the strata from the population in the sample
    • Ie, Each strata is represented based on their proportion in the population
  • Cluster Sampling: random sample of 2 representative group
    • Ie, picking 1 floor of people and survey them
  • Multi-Stage Sampling: several levels of sampling
    • Ie, Randomly selecting provinces, random cities, then random people.
  • Voluntary Response Samples: invite members of the entire population to participate in the survey
    • Ie, Sending the survey to everyone in the hotel
  • Convenience Sample: easily accessible members are selected
    • Ie, Asking people at the mall who walks closest to you

Types of Bias

  • Good survey Questions are simple, specific, ethical, free of bias, and respects privacy
  • Survey questions should prevent jargon, abbreviations, negatives, leading questions, and insensitivity
  • Sampling Bias: occurs when the chosen sample doesn’t reflect the population
    • Ie, Asking basketball players about math issues
  • Non-Response Bias: occurs when particular groups are under-represented in a survey because they chose not to participate.
    • Ie, when respondents don’t respond, it leads the surveyor to make up their own thoughts
  • Measurement Bias: occurs when the data collection method consistently under- or overestimates a characteristic of the population
    • Leading questions also cause data over/under estimation
    • Ie, police radar gun measuring for average speed of the road
  • Response Bias: when participants in a survey give false or misleading answers
    • Question quality might  lead to response bias
    • Ie, A teacher asks class to raise their hands if they have completed their homework

 

Unit 2: Two Variable Analysis

  • Correlation
  • Scatter Plots graph data and is used to determine if there is a relation between the 2 variables
  • Linear Correlation: changes in one variable tend to be proportional to changes in other variables
    • The stronger the correlation, the more closely the data points cluster around the line of best fit.
  • Correlation Coefficient ( r ): a value between -1 and 1 that provides a measure of how closely data points cluster around the line of best fit.
    • -1 –  -0.62: negative, strong correlation
    • -0.61 –  -0.33: negative, moderate correlation
    • -0.32 –  0: negative, weak correlation
    • 0 – 0.32: positive, weak correlation
    • 0.33 – 0.61: positive, moderate correlation
    • 0.62 – 1: positive, strong correlation
  • Regression: finding a relationship that models the 2 variables

 

  • Generating lines of best fit and Outliers
  • TI-83 Graphing Calculator: 
    • Turn diagnostics on (2nd, O, DiagnosticsOn, Enter)
    • Enter Data (STAT, 1:edit)
    • Graph Data (2nd, y=, Turn Plot 1 on, zoom, 9:zoomStat)
    • Equation of line of best fit (STAT, Calc, 4: LinReg(ax+b), Vars, yvars, 1: functions, 1:y1)
  • Microsoft Excel
    • Enter data
    • Highlight data and construct scatterplot (Insert, Charts, Scatter)
    • Equation for line of best fit (Chart Tools, Layout, Trend line)
  • Fathom
    • Enter data (Copy/Type/Open)
    • Construct scatterplot (drag variables to axes)
    • Add “Movable Lines”
    • Equation for line of best fit (Graph, least squares line)
    • Show Squares, residual plot to identify outliers
    • Determine value of correlation coefficients

 

  • Cause and Effect
  • Cause and Effect
    • A change in X causes a change in Y
      • Ie. Time and tree trunk diameter
  • Common Cause
    • An external factor causes two variables to change in the same way
      • Ie. Correlation between ski sales, and video rentals
        • Where it’s caused by colder weather
  • Reverse Cause and Effect
    • The dependent and independent variables are reversed in ascertaining which caused which.
      • Ie. Correlation between coffee consumption and anxiety theorized that drinking coffee causes anxiety and it is found that anxious people drink coffee
  • Accidental Relationships
    • A correlation without any casual relationship between the variables
      • Ie Increase in SUV sales causes increase in chipmunk population
  • Presumed Relationship
    • A correlation that does not seem to be accidental even though no cause-and-effect or common cause relationship is apparent
      • Ie. A correlation between the person’s level of fitness and the number of action movies they watch.

 

  • Critically Thinking about Data
  • When analyzing data, we should ask:
    • Source: How reliable/current is the source?
    • Sample: Does the sample reflect the opinions in the population?
      • Was the sampling technique free foam bias?
    • Graph: Is the graph accurately portrayed? (Axis starting at zero)
    • Correlation: Is the correlation between the variables strong enough to make inferences?
      • Is the causation assumed just because there is a correlation?
      • Are there extraneous variables impacting the results?

 

  • Number Manipulation
  • Percentage Points: means that it’s X percentage points / the value
    • Ie. 3 percentage points up from 75% is 75+(3/75*100) = 79%
  • Making Numbers Larger: In order to make better sense of numbers, sometimes people use smaller scales to make them seem bigger
    • Ie. 2,000,000 iPads sold in the first 3 months can be said as “2 iPads sold every second” to sound larger.

 

Unit 3: Permutations

  • Multiplicative Principles
    • If one operation can be performed in K1 ways, and for each operation that can be performed K2 ways, and for each operation that can be performed K3 ways..
      • All of these ways can be performed K1 x K2 x K3.. ways

 

  • Additive Principles
    • If one mutually exclusive action can occur in K1 ways and a second can occur in K2 ways, then there are K1 + K2.. ways in which these actions can occur.

 

  • Methods
    • If a set of operations can be used to determine a result, then it’s called  Direct Method
    • However, if it is difficult to determine directly, an indirect method may be used by subtracting certain possibilities so they are eliminated

 

  • Factorial Notation
    • For the following: r < n
      • n! = n(n-1)(n-2)(n-3)(n-4)… (n-r+1)(n-r)!, n belongs to natural numbers
      • n!/(n-r)! = n(n-1)(n-2)(n-3)(n-4)… (n-r+1)(n-r)!/(n-r)!
      • ie. 6! = 6*5*4*3*2*1

 

  • Permutations with some elements alike
    • In general, the number of different arrangements of n objects K1 alike of one kind and k2 alike of another kind is:

n! / (k1!)(k2!)

    • ie. in the word “COOL”, the permutations are as follows:

4! / (2!)  = 12

 

Unit 4: Combinations

  • Venn Diagrams
    • Venn Diagrams: a number of overlapping circles each represent their own properties. Overlapped areas show values which share both properties. Center where all circles overlap show values which share all properties
    • Venn Diagrams placed in a rectangle have “s” to denote the universal set
    • Operations on Venn Diagrams
      • n(A): number of values with property A
      • n(A U B): number of values with property A or B (Union)
      • n(A n B): number of values with only A and B (Intersection)
    • Principle of Inclusion and Exclusion

 

n(A U B ) = n(A) + n(B) – n(A n B)

 

n(B U C U P) = n(B) + n(C) + n(P) – n(B U C) – n(B n P) – n(C n P) + n(B n C n P)

 

  • Combinations
    • Combination: a combination of n distinct objects taken r at a time is a selection of r of the n objects without regard to order.
      • Denoted as: C(n,r) or (n r) or nCr or “n choose r”

 

C(n,r) = n! / [(n-r)!*r!]

where n, r E W, n >= r

 

    • If some elements are alike and if atleast one item is to be chosen, then the total number of selections from P alike items, Q alike items, R alike items and so on is:

 

(P+1)(Q+1)(R+1).. -1

 

    • Each way P, Q, or R can be chosen is added by 1 for the possibility that it isn’t chosen
    • 1 is subtracted for the possibility where all aren’t chosen

 

  • Properties of Pascal’s Triangle
    • it’s symmetrical
    • potentially infinite in size
    • each number is the sum of the 2 numbers above it to the left and right
    • Combinations in the form C(row number, element number) also form Pascal’s Triangle
    • Pascal’s Identity: (n , r) = (n-1 , r-1) + (n-1 , r)

 

 

    • Row n: nC0*nC1*nC2*nC3*nC4… nCn
    • Sum of nth row: 2n

 

Unit 5: Probability

  • Experimental and Theoretical Probability
    • Probability: is the value between 0 and 1 that describes that likelihood of an occurrence of a certain event.
    • Experimental Probability: making predictions based on a large number of previous results.
    • Theoretical Probability: Make predictions based on a mathematical model.
    • In general, experimental probability will approach theoretical probability as the number of trials increase.
    • Discrete Sample Space: a sample space where you can count the number of outcomes ie. blue balls
    • Continuous Sample Space: decimal numbers with infinite possibilities ie. Time.
    • Event: is the occurrence of a specific outcome in the sample space.

P(A) = n(A) / n(S)

Probability of A is number of outcomes for A over total possibilities

    • P(A’) the probability that event A will not occur.
      • P(A’) = 1 – P(A)

 

  • Odds
    • Odds: a ratio used to represent a degree of confidence in whether or not an event will occur.
    • Odds In favour: P(A) : P(A’)
      • = n(A) : n(A’)
    • Odds Against: P(A’) : P(A)
      • = n(A’) : n(A)

 

  • Probability using counting principles
    • Instead of listing out all possibilities, counting principles such as combinations and permutations can be used to calculate all the possibilities of outcome and the possibilities of the event occurrence.
    • Refer to these links for information about counting principles

 

  • Independent and Dependent Event
    • Two events are independent if the occurrence of one event has no effect on the occurrence of another event.
    • If two events are independent, then P (A n B) = P(A) P(B)
    • Drawing tree diagrams with probability percentages on the branches can be multiplied
      • P(AA) = P(A)*P(A)
    • ie. When drawing disks from a bag, if the disks are replaced, the 2nd draw will be an independent event.
    • ie. When drawing disks from a bag, but the disks are not replaced, the 2nd draw will be a dependent event.

 

  • Mutually Exclusive Events
    • Two events are mutually exclusive if when one event occurs, the other event cannot occur.
    • If two events are mutually exclusive, then P(A U B) = P(A) + P(B)
    • If two events are not mutually exclusive, then P( A U B) = P(A) + P(B) – P(A U B)
      • ie Probability of picking a KING or a FOUR is a mutually exclusive event.
      • ie Probability of picking a KING or a RED card is non-mutually exclusive. 

 

  • Conditional Probability
    • The probability that an event will occur given that another compatible event that already occurred.
    • P(A / B) = P(A and B) / P(B)
      • Probability of A given the occurrence of B is equal to the probability of A and B over the probability that B has occurred.
    • ie. Probability of drawing a QUEEN if we know the chosen card is a face card is an example of conditional probability.

 

Unit 6: Probability Distributions

  • Basic Probability Distributions
    • Random Variable: By letting X be a random variable, can generalize the probability to obtain the number times something happens
    • Probability Distributions can be created in a Table then graphed into a histogram to analyze the probability of each event happening
    • Expected Value: Expectation or expected value, E(X), is the predicted average of all possible outcomes of a probability experiment. In essence, it is a weighted mean of all the outcomes.

E(x) = Summation of (X*P(x)

    • X: Random variable value, P: Probability of the random variable

 

  • Binomial Distribution
    • All trials are independent
    • Only 2 possible outcomes (Success or failures)
    • Probability of success is the same on every level
    • Usually replaceable items
    • Binomial Distribution Formula
    • P(x) = (nCx) Px Qn-x
    • n: number of trials, P: probability, Q: 1-probability, X: random variable
    • Shortcut Expected Value Formula
    • E(x) = np
    • n: number of trials, P: probability

 

  • Hypergeometric Distributions
    • Hypergeometric Distributions are used for sampling without replacement.
    • Expected value of the sample should be proportional to the population
    • Outcomes are still 2 possibilities (Success or Failures)
    • Probabilities are not the same each time
    • Dependent Events
    • Not replaceable
    • Formula for Hypergeometric Distributions

 

Unit 7: Continuous Probability Distributions

  • Continuous Probability Distributions
    • a random variable that can assume all possible random values (ie city temperature)
    • Probability Density Function: a function that describes how likely this random variable will occur at a given point.
    • Height formula:  height = 1/(b-a) where b is the top range, and a is the bottom range given.

 

  • The Normal Distribution
    • used to solve continuous probabilities
    • symmetry about the mean
    • total area under the curve is 1
    • standard deviation is the distance from the mean to the point of inflection
    • Any normal distribution can be described as by the mean and the variance: so we often write N(mean, variance) to describe a distribution
    • The distribution chart shows area under the graph from the X value to the left end
    • Z-Scores can be calculated using Normal distributions
      • Z = x – mean / standard deviation
      • Sometimes, you will have to subtract the mean to equalize. This makes it so the mean is on the center.

 

  • Normal Approximation
    • Step 1: Check if a normal approximation is appropriate. Test if np > 5 and nq > 5.
    • Step 2: Estimate the mean and standard deviation (mean = nq, SD = √(npq) )
    • Step 3: Estimate the probability using z-score method from above.

 

  • Confidence Interval
    • x- z * (σ/√n) < μ < x + z * (σ/√n)
      • where x is the mean of the sample
      • z is the z score of acceptable error
      • μ is the mean of population
      • n is the size of sample
      • σ is the standard deviation
    • Confidence levels and z- scores are retrieved from a given chart below: