# Variables and Types of Data

 Page 2/6 Date 03.05.2017 Size 341.52 Kb. #18702

3.1.2 Variables and Types of Data LEVEL OF MEASUREMENT • Statisticians gain information about a particular situation by collecting data for random variables.

• Types of Data (variables)

1. Qualitative variables

• Variables that can be placed into distinct categories, according to some characteristics or attribute.

• Nonnumeric categories

• E.g.: Gender , color, religion , workplace and etc

1. Quantitative variables

• It is numerical in nature and can be ordered or ranked.

• A quantitative variable may be one of two kinds:

• Discrete variable – a variable that can be counted or for which there is a fixed set of values. Example: the number of children in a family, the number of students in a class and etc

• Continuous variable – a variable that can be measured on continuous scale , the result depending on the precision of the measuring instrument, or the accuracy of the observer. Continuous variable can assume all values between any two specific values. Example: temperatures, heights, weights, time taken and etc.

• Variables can be classified by how they are categorized, counted or measured. Data/ variables can be classified according to the LEVEL OF MEASUREMENT as follows:

1. Nominal Level Data: - classifies data (persons/objects) into two or more categories. Whatever the basis for classification, a person can only be in one category and members of a given category have a common set of characteristics.

• The lowest level of measurement.

• No ranking/order can be placed on the data

• E.g. : Gender (Male / Female) , Type of school (Public / Private), Height (Tall/Short) , etc

1. Ordinal Level Data:- classifies data into categories that can be ranked; however precise differences between the ranks do not exist.

• This type of measuring scale puts the data/subjects in order from highest to lowest, from most to least. It does not indicate how much higher or how much better. Intervals between ranks are not equal.

• E.g.: Letter grades (A,B,C,D,E,F) ; Man’s build (small, medium, or large)-large variation exists among the individuals in each class.

1. Interval Level Data:- has all characteristics of a nominal and ordinal scale but in addition it is based upon predetermined equal interval. It has no true zero point (ratio between number on the scale are not meaningful). E.g.:

• Achievement test; aptitude tests, IQ test. A one point difference between IQ test of 110 and an IQ of 111 gives a significant difference.

• The Fahrenheit scale is a clear example of the interval scale of measurement. Thus, 60 degree Fahrenheit or -10 degrees Fahrenheit represent interval data. Measurement of Sea Level is another example of an interval scale. With each of these scales there are direct, measurable quantities with equality of units. In addition, zero does not represent the absolute lowest value. Rather, it is point on the scale with numbers both above and below it (for example, -10degrees Fahrenheit).

1. Ratio Level Data:- possesses all the characteristics of interval scale and in addition it has a meaningful (true zero point). True ratios exist when the same variable is measured on two different members of the population.

• The highest, most precise level of measurement.

• E.g.: Weight, number of calls received; height.

3.1.3 Data collection and Sampling Techniques

• Sampling is the process of selecting a number of individuals for a study in such a way that the individuals represent the larger group from which they were selected.

• The purpose of sampling is to use a sample to gain information about a population.

• In order to obtain samples that are unbiased, statisticians use :

1. Random Sampling: subjects are selected by random numbers.

2. Systematic Sampling: Subjects are selected by using every kth number after the first subject is randomly from 1 through k.

3. Stratified Sampling: Subjects are selected by dividing up the population into groups (strata) and subjects within groups are randomly selected.

- E.g.: We divide the population into 5 group then we take the subjects from each group to become our sample.

1. Cluster Sampling: Subjects are selected by using an intact group that is representative of the population.

• E.g.: We divide the population into 5 group then we take 2 groups to become our sample. That means 2 group of subject represent 5 groups of subjects.

Exercise:

A ) Classify each set of data as discrete or continuous.

1) The number of suitcases lost by an airline.

2) The height of corn plants.

3) The number of ears of corn produced.

4) The number of green M&M's in a bag.

5) The time it takes for a car battery to die.

6) The production of tomatoes by weight.

B) Identify the following as nominal level, ordinal level, interval level, or ratio level data.

1) Percentage scores on a Math exam.

2) Letter grades on an English essay.

3) Flavors of yogurt.

4) Instructors classified as: Easy, Difficult or Impossible.

5) Employee evaluations classified as : Excellent, Average, Poor.

6) Religions.

7) Political parties.

8) Commuting times to school.

9) Years (AD) of important historical events.

10) Ages (in years) of statistics students.

11) Ice cream flavor preference.

12) Amount of money in savings accounts.

13) Students classified by their reading ability: Above average, Below average, Normal.

3.2 HISTOGRAMS, FREQUENCY POLYGONS AND OGIVES

Example:

For 108 randomly selected college applicants, the following frequency distribution for entrance exam scores was obtained.

 Class Limit Frequency 90 – 98 6 99 – 107 22 108 – 116 43 117 – 125 28 126 - 134 9

Construct:

1. Histogram

i) x-axis :class boundary ii) x-axis :class boundary

y-axis : frequency y-axis : relative frequency

1. Frequency Polygon

i) x-axis :class midpoint ii) x-axis :class midpoint

y-axis : frequency y-axis : relative frequency

1. Ogive

i) x-axis : class boundary ii) x-axis : class boundary

y-axis : cumulative frequency y-axis : cumulative relative frequency

Relative frequency =  Cumulative relative frequency = or add the relative frequency in each class to the total relative frequency.

Note: Graphing

Given the frequency distribution below:
 Class Limit Class Boundary f Cf 0 – 19 -0.5 – 19.5 13 13 20 – 39 19.5 – 39.5 18 31

The first value on the x-axis is -0.5 can be drawn as below

OR

-0.5 19.5 39.5 -0.5 19.5 39.5

All graphs must be drawn on the right side of y-axis and omit question on analyzing the graph in exercise.

Exercise:

1. In a class of 35 students, the following grade distribution was found. Construct a histogram, frequency polygon and ogive for the data. (A=4, B=3, C=2, D=1, F=0)
 Grade Frequency 0 3 1 6 2 9 3 12 4 5

1. Using the histogram shown below. Construct

1. A frequency distribution

2. A frequency polygon

3. An ogive

y

7

6 6

5

5

4

3 3 3

2 2

1 1

x

21.5 24.5 27.5 30.5 33.5 36.5 39.5 42.5

Class Boundaries

1. Below is a data set for the duration (in minutes) of a random sample of 24 long-distance phone calls:

1 20 10 20 12 23 3 7 18 12 4 5

15 7 29 10 18 10 10 23 4 12 8 6

1. Construct a frequency distribution table for the data using the classes “1 to 5” “6 to 10” etc.

2. Construct a cumulative frequency distribution table and use it to draw up an ogive.

1. The following table refers to the 2003 average income (in thousand Ringgit) per year for 20 employees of company A.
 Income (‘000 Ringgit) Frequency 5 -9 6 10 – 14 3 15 – 19 2 20 – 24 4 25 – 29 3 30 – 34 2

1. Draw the histogram and frequency polygon for the above data.

2. Construct the cumulative frequency table. Hence, draw up an ogive for the above data.

3.3 DATA DESCRIPTION

3.3.1 MEASURES OF CENTRAL TENDENCY

• Mean, median and Mode for Ungrouped data

• Mean (arithmetic average)

Symbol for Sample: Symbol for Population: μ

(Syllabus focus on sample formula), Mean, • Median : (the middle point in ordered data set)

- arrange the data in order, ascending or descending

- select the middle point or use formula , n is number of data.

- Then, the median is:

• the value at location T (for odd number of data)

• the average of the value at location T and the value at location (T +1) (for even number of data)

• Mode : the value that occur most often in the data set

Example:

1. The following data are the number of burglaries reported for a specific year for nine western Pennsylvania universities. Find mean, median and mode.

61, 11, 1, 3, 2, 30, 18, 3, 7

1. Twelve major earthquakes had Richter magnitudes shown here. Find mean, median and mode.

7.0 , 6.2 , 7.7 , 8.0 , 6.4 , 6.2 , 7.2 , 5.4 , 6.4 , 6.5 , 7.2 , 5.4

1. The number of hospitals for the five largest hospital systems is shown here. Find mean, median and mode.

340, 75, 123, 259, 151

• Mean, median and Mode for Ungrouped frequency distribution

• Mean, • Median :

- find cumulative frequency

- Location of median • Mode : the value with the largest frequency

Example:

1. A survey taken in a restaurant. This ungrouped frequency distribution of the number of cups of coffee consumed with each meal was obtained. Find mean, median and mode.
 Number of cups Frequency 0 5 1 8 2 10 3 2 4 3 5 2