**Data Analysis**

**Numerical Data**

– **quantitative** data can be measured numerically

– **discrete** numerical data is measured with whole numbers

– **continuous** numerical data is measured with real numbers (decimals)

– bar graphs represent discrete data while histograms represent continuous data

**Categorical Data**

– **qualitative** data that can not be measured numerically

– **nominal** data can be placed in any order

– ** ordinal** data is better represented when sorted or ordered

– categorical data is often represented with bar graphs or circle graphs

**Data Vocabulary**

– **primary** data is data collected first-hand

– **secondary** data is data collected by others

– **micro** data represents individual pieces of data

– **aggregate** data represents grouped micro data

– **binary** data is data with only 2 outcomes, e.g. gender

– **non-binary** data does not have only 2 outcomes

– **population**: entire group being studied

– **sample**: part of the population being studied

– **inference**: conclusion made about the population based on the sample

– **observational data collection** involves grouping people by common qualities, then observing

– **experimental data collection** involves the creation of groups, then certain criteria is enforced

**Sampling Techniques**

- Good Sample Characteristics: large enough to represent the population and each person/subject in the population must have an equal chance of being chosen

**Simple Random Sample:** every member of the population has an equal chance of being picked

**Systematic Random Sample**

– to go through a population sequentially and select subjects at consistent intervals

– interval = population/sample size

**Stratified Sample**

– a** strata** is a group of subjects that share a common characteristic which groups them

– keeps proportionate samples of each strata to the population

**Cluster Sample**: one representative group of the population chosen at random

**Multi-Stage Sample:** several techniques used to sample

**Voluntary Response Sample:** inviting subjects to voluntarily be a part of the sample group

**Convenience Sample:** selecting easily available subjects

**Bias in Surveys**

- Questions must be: simple, specific, clear, ethical, free of bias, allows for honest response, and do not infringe on anyone’s privacy
- Good questions stay away from: jargon (slang), abbreviations, negatives, leading questions, and insensitivities
- Good questions are often anonymous and require the subject to select from a list of responses
- Biases can be intentional or unintentional, but causes the survey to be invalid regardless of intent
**Sampling Bias:**the chosen sample does not represent the population (choosing the wrong people/subjects)**Non-Response Bias:**when subjects choose not to participate and are under-represented (non-response is a form of sampling bias)**Measurement Bias:**data collection consistently under/overestimates a characteristic of a population (leading questions)**Response Bias:**subjects provide false/inaccurate results (asking a sensitive question)

**Measures of Central Tendency**

**Mean:**sum of data divided by number of data values**Median:**the middle value of a set, if there is an even number of values; find the mean of the two middle values**Mode:**the value that occurs most frequently

**Measures of Spread**

**Range:**highest value – lowest value

**Interquartile Range**

– split the data into 2 groups by median, then split those two groups into two more groups each using the median again

– median of high group is labelled Q3, median of entire data is Q2, and median of low group is labelled Q1

– interquartile range = Q3 – Q1

**Box and Whisker Plot**

– illustrates how clustered data is around the median

– the plot must be scaled and include a box around Q1, Q2, and Q3

– whiskers extend from the box to the lowest value and the highest value

**Variance**

**Population:**Variance = (Σ(x-μ)^{2})/n**Sample:**Variance = (Σ(x-mean)^{2})/(n-1)

**Standard Deviation**

**Population:**σ = √((Σ(x-μ)^{2})/n)**Sample:**S = √((Σ(x-mean)^{2})/(n-1))

**Percentile**

- – divides data into 100 intervals that have an equal amount of values
- – tests often use percentile to convert a score to a new score on a scale of 1 to 100

**Z-Score**

**Population:**z = (x- μ)/ σ**Sample:**z = (x-mean)/S