Grade 12 – Data Management

**Exam**

**Unit 1: One Variable Analysis**

**Types of Data**

**Numerical Data****Discrete:**consists of whole numbers- Ie. Number of trucks.
**Continuous:**measured using real numbers- Ie, Measuring temperature.
**Categorical Data:**cannot be qualitatively measured**Nominal:**Data which any order presented makes sense- Ie, Eye Colour, Hair Colour.
**Ordinal Data:**better if sorted or ordered- Ie, Date and Time, scalar options
**Collecting Data****Primary:**collected by yourself**Secondary:**collected by someone else**Organizing Data****Micro Data:**information about an individual**Aggregate Data:**grouped data about a group; summarized data.**Data collection****Observational Data:**group of people by characteristic, then observe- Group by adult/children then look at sunlight’s effect on them
**Experimental Data:**create groups and impose some treatment on them- Create experimental groups then apply placebo drug treatments on them.
**Other Terms****Population:**entire group of people being studied**Sample:**the part of the population being studied**Inference:**conclusion made about the population based on the sample**Binary Data:**only 2 choices/outcomes**Non-Binary:**more than 2 outcomes

**Sampling Techniques**

**Characteristics of a good sample**

-Each person must have an equal chance to be in the sample

-Sample must be vast enough to represent

**Simple Random:**each member has equal chance of being selected- Ie, picking members randomly apartments
**Sequential Random:**go through population sequentially and select members- Ie, Selecting every 5
^{th}person **Stratified Sampling:**a strata is a group of people that share common charactoristics- Constraints the proportion of members in the strata from the population in the sample
- Ie, Each strata is represented based on their proportion in the population
**Cluster Sampling:**random sample of 2 representative group- Ie, picking 1 floor of people and survey them
**Multi-Stage Sampling:**several levels of sampling- Ie, Randomly selecting provinces, random cities, then random people.
**Voluntary Response Samples:**invite members of the entire population to participate in the survey- Ie, Sending the survey to everyone in the hotel
**Convenience Sample:**easily accessible members are selected- Ie, Asking people at the mall who walks closest to you

**Types of Bias**

- Good survey Questions are simple, specific, ethical, free of bias, and respects privacy
- Survey questions should prevent jargon, abbreviations, negatives, leading questions, and insensitivity
**Sampling Bias:**occurs when the chosen sample doesn’t reflect the population- Ie, Asking basketball players about math issues
**Non-Response Bias**: occurs when particular groups are under-represented in a survey because they chose not to participate.- Ie, when respondents don’t respond, it leads the surveyor to make up their own thoughts
**Measurement Bias:**occurs when the data collection method consistently under- or overestimates a characteristic of the population- Leading questions also cause data over/under estimation
- Ie, police radar gun measuring for average speed of the road
**Response Bias:**when participants in a survey give false or misleading answers- Question quality might lead to response bias
- Ie, A teacher asks class to raise their hands if they have completed their homework

**Unit 2: Two Variable Analysis**

**Correlation**

**Scatter Plots**graph data and is used to determine if there is a relation between the 2 variables

**Linear Correlation:**changes in one variable tend to be proportional to changes in other variables- The stronger the correlation, the more closely the data points cluster around the line of best fit.
**Correlation Coefficient ( r ):**a value between -1 and 1 that provides a measure of how closely data points cluster around the line of best fit.**-1 – -0.62:**negative, strong correlation**-0.61 – -0.33:**negative, moderate correlation**-0.32 – 0:**negative, weak correlation**0 – 0.32:**positive, weak correlation**0.33 – 0.61:**positive, moderate correlation**0.62 – 1:**positive, strong correlation**Regression:**finding a relationship that models the 2 variables

**Generating lines of best fit and Outliers**

**TI-83 Graphing Calculator:**- Turn diagnostics on (2
^{nd}, O, DiagnosticsOn, Enter) - Enter Data (STAT, 1:edit)
- Graph Data (2
^{nd}, y=, Turn Plot 1 on, zoom, 9:zoomStat) - Equation of line of best fit (STAT, Calc, 4: LinReg(ax+b), Vars, yvars, 1: functions, 1:y1)
**Microsoft Excel**- Enter data
- Highlight data and construct scatterplot (Insert, Charts, Scatter)
- Equation for line of best fit (Chart Tools, Layout, Trend line)
**Fathom**- Enter data (Copy/Type/Open)
- Construct scatterplot (drag variables to axes)
- Add “Movable Lines”
- Equation for line of best fit (Graph, least squares line)
- Show Squares, residual plot to identify outliers
- Determine value of correlation coefficients

**Cause and Effect**

**Cause and Effect**- A change in X causes a change in Y
- Ie. Time and tree trunk diameter
**Common Cause**- An external factor causes two variables to change in the same way
- Ie. Correlation between ski sales, and video rentals
- Where it’s caused by colder weather
**Reverse Cause and Effect**- The dependent and independent variables are reversed in ascertaining which caused which.
- Ie. Correlation between coffee consumption and anxiety theorized that drinking coffee causes anxiety and it is found that anxious people drink coffee
**Accidental Relationships**- A correlation without any casual relationship between the variables
- Ie Increase in SUV sales causes increase in chipmunk population
**Presumed Relationship**- A correlation that does not seem to be accidental even though no cause-and-effect or common cause relationship is apparent
- Ie. A correlation between the person’s level of fitness and the number of action movies they watch.

**Critically Thinking about Data**

- When analyzing data, we should ask:
**Source:**How reliable/current is the source?**Sample:**Does the sample reflect the opinions in the population?- Was the sampling technique free foam bias?
**Graph:**Is the graph accurately portrayed? (Axis starting at zero)**Correlation:**Is the correlation between the variables strong enough to make inferences?- Is the causation assumed just because there is a correlation?
- Are there extraneous variables impacting the results?

**Number Manipulation**

**Percentage Points:**means that it’s X percentage points / the value- Ie. 3 percentage points up from 75% is 75+(3/75*100) = 79%
**Making Numbers Larger:**In order to make better sense of numbers, sometimes people use smaller scales to make them seem bigger- Ie. 2,000,000 iPads sold in the first 3 months can be said as “2 iPads sold every second” to sound larger.

**Unit 3: Permutations**

**Multiplicative Principles**- If one operation can be performed in K1 ways, and for each operation that can be performed K2 ways, and for each operation that can be performed K3 ways..
- All of these ways can be performed K1 x K2 x K3.. ways

**Additive Principles**- If one mutually exclusive action can occur in K1 ways and a second can occur in K2 ways, then there are K1 + K2.. ways in which these actions can occur.

**Methods**- If a set of operations can be used to determine a result, then it’s called
**Direct Method** - However, if it is difficult to determine directly, an
**indirect method**may be used by subtracting certain possibilities so they are eliminated

**Factorial Notation**- For the following: r < n
- n! = n(n-1)(n-2)(n-3)(n-4)… (n-r+1)(n-r)!, n belongs to natural numbers
- n!/(n-r)! = n(n-1)(n-2)(n-3)(n-4)… (n-r+1)(n-r)!/(n-r)!
- ie. 6! = 6*5*4*3*2*1

**Permutations with some elements alike**- In general, the number of different arrangements of n objects K1 alike of one kind and k2 alike of another kind is:

n! / (k1!)(k2!)

- ie. in the word “COOL”, the permutations are as follows:

4! / (2!) = 12

**Unit 4: Combinations**

**Venn Diagrams****Venn Diagrams:**a number of overlapping circles each represent their own properties. Overlapped areas show values which share both properties. Center where all circles overlap show values which share all properties- Venn Diagrams placed in a rectangle have “s” to denote the universal set
**Operations on Venn Diagrams**

**n(A):**number of values with property A**n(A U B):**number of values with property A or B (Union)**n(A n B):**number of values with only A and B (Intersection)

**Principle of Inclusion and Exclusion**

**n(A U B ) = n(A) + n(B) – n(A n B)**

**n(B U C U P) = n(B) + n(C) + n(P) – n(B U C) – n(B n P) – n(C n P) + n(B n C n P)**

**Combinations****Combination:**a combination of**n**distinct objects taken**r**at a time is a selection of r of the n objects without regard to order.- Denoted as:
**C(n,r)**or**(n r)**or**nCr**or**“n choose r”**

**C(n,r) = n! / [(n-r)!*r!]**

where n, r E W, n >= r

- If some elements are alike and if atleast one item is to be chosen, then the total number of selections from P alike items, Q alike items, R alike items and so on is:

**(P+1)(Q+1)(R+1).. -1**

- Each way P, Q, or R can be chosen is added by 1 for the possibility that it isn’t chosen
- 1 is subtracted for the possibility where all aren’t chosen

**Properties of Pascal’s Triangle**- it’s symmetrical
- potentially infinite in size
- each number is the sum of the 2 numbers above it to the left and right
- Combinations in the form
**C(row number, element number)**also form Pascal’s Triangle **Pascal’s Identity: (n , r) = (n-1 , r-1) + (n-1 , r)**

**Row n: nC0*nC1*nC2*nC3*nC4… nCn****Sum of nth row: 2**^{n}

**Unit 5: Probability**

**Experimental and Theoretical Probability****Probability**: is the value between 0 and 1 that describes that likelihood of an occurrence of a certain event.**Experimental Probability:**making predictions based on a large number of previous results.**Theoretical Probability:**Make predictions based on a mathematical model.- In general, experimental probability will approach theoretical probability as the number of trials increase.
**Discrete Sample Space:**a sample space where you can count the number of outcomes ie. blue balls**Continuous Sample Space:**decimal numbers with infinite possibilities ie. Time.**Event:**is the occurrence of a specific outcome in the sample space.

P(A) = n(A) / n(S)

Probability of A is number of outcomes for A over total possibilities

**P(A’)**the probability that event A will not occur.- P(A’) = 1 – P(A)

**Odds****Odds:**a ratio used to represent a degree of confidence in whether or not an event will occur.**Odds In favour:**P(A) : P(A’)**= n(A) : n(A’)****Odds Against:**P(A’) : P(A)**= n(A’) : n(A)**

**Probability using counting principles**- Instead of listing out all possibilities, counting principles such as combinations and permutations can be used to calculate all the possibilities of outcome and the possibilities of the event occurrence.
- Refer to these links for information about counting principles

**Independent and Dependent Event**- Two events are independent if the occurrence of one event has no effect on the occurrence of another event.
- If two events are
**independent, then P (A n B) = P(A) P(B)** - Drawing tree diagrams with probability percentages on the branches can be multiplied
- P(AA) = P(A)*P(A)
- ie. When drawing disks from a bag, if the disks are replaced, the 2nd draw will be an independent event.
- ie. When drawing disks from a bag, but the disks are not replaced, the 2nd draw will be a dependent event.

**Mutually Exclusive Events**- Two events are mutually exclusive if when one event occurs, the other event cannot occur.
- If two events are
**mutually exclusive, then P(A U B) = P(A) + P(B)** - If two events are
**not mutually exclusive, then P( A U B) = P(A) + P(B) – P(A U B)** - ie Probability of picking a KING or a FOUR is a mutually exclusive event.
- ie Probability of picking a KING or a RED card is non-mutually exclusive.

**Conditional Probability**- The probability that an event will occur given that another compatible event that already occurred.
- P(A / B) = P(A and B) / P(B)
- Probability of A given the occurrence of B is equal to the probability of A and B over the probability that B has occurred.
- ie. Probability of drawing a QUEEN if we know the chosen card is a face card is an example of conditional probability.

**Unit 6: Probability Distributions**

**Basic Probability Distributions****Random Variable:**By letting X be a random variable, can generalize the probability to obtain the number times something happens- Probability Distributions can be created in a Table then graphed into a histogram to analyze the probability of each event happening
**Expected Value:**Expectation or expected value, E(X), is the predicted average of all possible outcomes of a probability experiment. In essence, it is a weighted mean of all the outcomes.

**E(x) = Summation of (X*P(x)**

- X: Random variable value, P: Probability of the random variable

**Binomial Distribution**- All trials are independent
- Only 2 possible outcomes (Success or failures)
- Probability of success is the same on every level
- Usually replaceable items
- Binomial Distribution Formula
**P(x) = (nCx) P**^{x}**Q**^{n-x}- n: number of trials, P: probability, Q: 1-probability, X: random variable
- Shortcut Expected Value Formula
**E(x) = np**- n: number of trials, P: probability

**Hypergeometric Distributions****Hypergeometric Distributions**are used for sampling without replacement.- Expected value of the sample should be proportional to the population
- Outcomes are still 2 possibilities (Success or Failures)
- Probabilities are not the same each time
- Dependent Events
- Not replaceable
- Formula for Hypergeometric Distributions

**Unit 7: Continuous Probability Distributions**

**Continuous Probability Distributions**- a random variable that can assume all possible random values (ie city temperature)
**Probability Density Function:**a function that describes how likely this random variable will occur at a given point.**Height formula:**height = 1/(b-a) where b is the top range, and a is the bottom range given.

**The Normal Distribution**- used to solve continuous probabilities
- symmetry about the mean
- total area under the curve is 1
- standard deviation is the distance from the mean to the point of inflection
- Any normal distribution can be described as by the mean and the variance: so we often write N(mean, variance) to describe a distribution
- The distribution chart shows area under the graph from the X value to the left end
**Z-Scores**can be calculated using Normal distributions- Z = x – mean / standard deviation
- Sometimes, you will have to subtract the mean to equalize. This makes it so the mean is on the center.

**Normal Approximation**- Step 1: Check if a normal approximation is appropriate. Test if np > 5 and nq > 5.
- Step 2: Estimate the mean and standard deviation (mean = nq, SD = √(npq) )
- Step 3: Estimate the probability using z-score method from above.