MDM4U – Grade 12 Data Management – Analysis of 2 Variable Data Test

Grade 12 – Data Management – 2-Variable Analysis

Correlation

• Scatter Plots graph data and is used to determine if there is a relation between the 2 variables
• Linear Correlation: changes in one variable tend to be proportional to changes in other variables
• The stronger the correlation, the more closely the data points cluster around the line of best fit.
• Correlation Coefficient ( r ): a value between -1 and 1 that provides a measure of how closely data points cluster around the line of best fit.
• -1 –  -0.62: negative, strong correlation
• -0.61 –  -0.33: negative, moderate correlation
• -0.32 –  0: negative, weak correlation
• 0 – 0.32: positive, weak correlation
• 0.33 – 0.61: positive, moderate correlation
• 0.62 – 1: positive, strong correlation
• Regression: finding a relationship that models the 2 variables

Generating lines of best fit and Outliers

• TI-83 Graphing Calculator:
• Turn diagnostics on (2nd, O, DiagnosticsOn, Enter)
• Enter Data (STAT, 1:edit)
• Graph Data (2nd, y=, Turn Plot 1 on, zoom, 9:zoomStat)
• Equation of line of best fit (STAT, Calc, 4: LinReg(ax+b), Vars, yvars, 1: functions, 1:y1)
• Microsoft Excel
• Enter data
• Highlight data and construct scatterplot (Insert, Charts, Scatter)
• Equation for line of best fit (Chart Tools, Layout, Trend line)
• Fathom
• Enter data (Copy/Type/Open)
• Construct scatterplot (drag variables to axes)
• Equation for line of best fit (Graph, least squares line)
• Show Squares, residual plot to identify outliers
• Determine value of correlation coefficients

Cause and Effect

• Cause and Effect
• A change in X causes a change in Y
• Ie. Time and tree trunk diameter
• Common Cause
• An external factor causes two variables to change in the same way
• Ie. Correlation between ski sales, and video rentals
• Where it’s caused by colder weather
• Reverse Cause and Effect
• The dependent and independent variables are reversed in ascertaining which caused which.
• Ie. Correlation between coffee consumption and anxiety theorized that drinking coffee causes anxiety and it is found that anxious people drink coffee
• Accidental Relationships
• A correlation without any casual relationship between the variables
• Ie Increase in SUV sales causes increase in chipmunk population
• Presumed Relationship
• A correlation that does not seem to be accidental even though no cause-and-effect or common cause relationship is apparent
• Ie. A correlation between the person’s level of fitness and the number of action movies they watch.

• When analyzing data, we should ask:
• Source: How reliable/current is the source?
• Sample:Does the sample reflect the opinions in the population?
• Was the sampling technique free foam bias?
• Graph: Is the graph accurately portrayed? (Axis starting at zero)
• Correlation:Is the correlation between the variables strong enough to make inferences?
• Is the causation assumed just because there is a correlation?
• Are there extraneous variables impacting the results?

Number Manipulation

• Percentage Points: means that it’s X percentage points / the value
• Ie. 3 percentage points up from 75% is 75+(3/75*100) = 79%
• Making Numbers Larger: In order to make better sense of numbers, sometimes people use smaller scales to make them seem bigger
• Ie. 2,000,000 iPads sold in the first 3 months can be said as “2 iPads sold every second” to sound larger.