MDM4U – Grade 12 Data Management – Analysis of 2 Variable Data Test

Grade 12 – Data Management – 2-Variable Analysis

Correlation

  • Scatter Plots graph data and is used to determine if there is a relation between the 2 variables
  • Linear Correlation: changes in one variable tend to be proportional to changes in other variables
    • The stronger the correlation, the more closely the data points cluster around the line of best fit.
    • Correlation Coefficient ( r ): a value between -1 and 1 that provides a measure of how closely data points cluster around the line of best fit.
      • -1 –  -0.62: negative, strong correlation
      • -0.61 –  -0.33: negative, moderate correlation
      • -0.32 –  0: negative, weak correlation
      • 0 – 0.32: positive, weak correlation
      • 0.33 – 0.61: positive, moderate correlation
      • 0.62 – 1: positive, strong correlation
      • Regression: finding a relationship that models the 2 variables

Generating lines of best fit and Outliers

  • TI-83 Graphing Calculator:
    • Turn diagnostics on (2nd, O, DiagnosticsOn, Enter)
    • Enter Data (STAT, 1:edit)
    • Graph Data (2nd, y=, Turn Plot 1 on, zoom, 9:zoomStat)
    • Equation of line of best fit (STAT, Calc, 4: LinReg(ax+b), Vars, yvars, 1: functions, 1:y1)
    • Microsoft Excel
      • Enter data
      • Highlight data and construct scatterplot (Insert, Charts, Scatter)
      • Equation for line of best fit (Chart Tools, Layout, Trend line)
      • Fathom
        • Enter data (Copy/Type/Open)
        • Construct scatterplot (drag variables to axes)
        • Add “Movable Lines”
        • Equation for line of best fit (Graph, least squares line)
        • Show Squares, residual plot to identify outliers
        • Determine value of correlation coefficients

Cause and Effect

  • Cause and Effect
    • A change in X causes a change in Y
      • Ie. Time and tree trunk diameter
      • Common Cause
        • An external factor causes two variables to change in the same way
          • Ie. Correlation between ski sales, and video rentals
            • Where it’s caused by colder weather
            • Reverse Cause and Effect
              • The dependent and independent variables are reversed in ascertaining which caused which.
                • Ie. Correlation between coffee consumption and anxiety theorized that drinking coffee causes anxiety and it is found that anxious people drink coffee
                • Accidental Relationships
                  • A correlation without any casual relationship between the variables
                    • Ie Increase in SUV sales causes increase in chipmunk population
                    • Presumed Relationship
                      • A correlation that does not seem to be accidental even though no cause-and-effect or common cause relationship is apparent
                        • Ie. A correlation between the person’s level of fitness and the number of action movies they watch.

Critically Thinking about Data

  • When analyzing data, we should ask:
    • Source: How reliable/current is the source?
    • Sample:Does the sample reflect the opinions in the population?
      • Was the sampling technique free foam bias?
  • Graph: Is the graph accurately portrayed? (Axis starting at zero)
  • Correlation:Is the correlation between the variables strong enough to make inferences?
    • Is the causation assumed just because there is a correlation?
    • Are there extraneous variables impacting the results?

Number Manipulation

  • Percentage Points: means that it’s X percentage points / the value
    • Ie. 3 percentage points up from 75% is 75+(3/75*100) = 79%
    • Making Numbers Larger: In order to make better sense of numbers, sometimes people use smaller scales to make them seem bigger
      • Ie. 2,000,000 iPads sold in the first 3 months can be said as “2 iPads sold every second” to sound larger.