The distribution of a variable refers to the way its values are spread over all possible values. We can summarize a distribution in a table or show a distribution visually with a graph.

Discuss the shapes of the distributions displayed by these graphs.

CNN.com posted misleading graph showing poll results on Schiavo case

In presenting the results of a CNN/USA Today/Gallup poll, CNN.com used a visually distorted graph* that falsely conveyed the impression that Democrats far outnumber Republicans and Independents in thinking the Florida state court was right to order Terri Schiavo's feeding tube removed. In fact, a majority of all three groups agrees with the court's decision, and the gap between Democrats on one hand and Republicans and Independents on the other is within the poll's margin of error.

According to the poll, conducted March 18-20, when asked if they "agree[d] with the court's decision to have the feeding tube removed," 62 percent of Democratic respondents agreed, compared to 54 percent of Republicans, and 54 percent of Independents. But these results were displayed along a very narrow scale of 10 percentage points, and thus appeared to show a large gap between Democrats and Republicans/Independents:

Laid out in this manner, the graph suggests that the gap between the two groups is overwhelming, rather than only 8 percentage points, within the poll's margin of error of +/- 7 percentage points. Also, this presentation obscures the poll's finding that majorities of all the groups sampled approved of the removal of Schiavo's feeding tube. A more accurate presentation of the poll's findings would have looked like this:

How to Lie with Graphs: The NY Times as Real Estate Case Study

The New York Times just released a new graph showing the housing bubble. The only problem is, they have intentionally skewed the way the chart reads to make their bubble look even bigger and more extreme.

“In effect, they’ve zoomed in on the area from 100-150 and magnified the growth in the last 15 years.”

We very well might be in a housing bubble, that doesn’t excuse the NY Times creation of a misleading and overly sensational chart.

The Calculated Risk blog breaks down the errors and omissions even further and then shows what the graph should really look like if the NY Times wasn’t intentionally trying to magnify the negatives:

• Because the highest frequency is 9 (the frequency for C grades), we chose to make the vertical scale run from 0 to 10. This ensures that even the tallest bar does not quite touch the top of the graph.

Let’s create a vertical bar graph from the essay grade data in Table 3.1.

Bar Graph

A bar graph can be used to show how a whole is divided into parts, but it can also compare quantities that are not parts of a whole.

Pareto charts were invented by Italian economist Vilfredo Pareto (1848-1923). Pareto is best known for developing methods of analyzing income distributions, but his most important contributions probably were in developing new ways of applying mathematics and statistics to economic analysis.

A pie chart is a circle divided so that each wedge represents the relative frequency of a particular category. The wedge size is proportional to the relative frequency. The entire pie represents the total relative frequency of 100%.

Definition

A histogram is a bar graph showing a distribution for quantitative data (at the interval or ratio level of measurement); the bars have a natural order and the bar widths have specific meaning.

Figure 3.10 Stem-and-leaf plot showing numerical data—in this case, the per person carbon dioxide emissions from Table 3.11.

The stem-and-leaf plot (or stemplot) looks somewhat like a histogram turned sideways, except in place of bars we see a listing of data for each category.

A line chart shows a distribution of quantitative data as a series of dots connected by lines. For each dot, the horizontal position is the center of the bin it represents and the vertical position is the frequency value for the bin.