Guide to Interpreting Box Plots
(aka Box-and-Whisker Plots)
As described in Understanding the Contents of the Study Charts, the charts display NCES survey data elements (like circulation count) and measures derived from these (like circulation per capita) for ten different population groupings. To explore either the data within one group or among the ten groups, pictorial representations of the data are helpful. (Indeed, all graphs are pictures itended to communicate things about data!)
Box-and-whisker plots, or boxplots for short, are pictures (simple shapes, actually) designed to illustrate key information about a set of data. Things like how spread out the data are, how they cluster (or do not cluster) around central values like means (averages) or medians, and so on. In any given chart posted here, then, a single boxplot shape shows the lowest value (minimum), highest value (maximum), distribution (spread) of the scores in between, median value, and mean value. Look at Chart 28B as an example. (Click on the link and then scroll down to see Chart 28B.) Notice that there are ten vertical shapes, each with long lines extending upward, and shorter lines extending downward. These lines are called "whiskers". Now scroll upward from Chart 28B to see Chart 28A.
Subtables. The first chart in each 3-chart set (each designated with an A) contains a table referred to as a "sub-table." Note, for example, in chart 28A that the sub-table has the heading "Statistic...<1K 1K 2.5K 5K..." to indicate the ten population groups. These same heading appear in all sub-tables, and can be seen in this sub-table excerpt:
Sub-tables contain basic statistics about each population group's data in a columnar format. Boxplots portray these same statistics in a pictoral format (as I explain below under "Structure of Box-and-Whisker Plots"). In each sub-table, under the "Statistic" column there are several abbreviations listed, as explained in this table:
To reiterate, in these box-and-whisker charts sub-tables appear only in the A chart from each set. When viewing B or C charts, refer back to sub-table in the A chart to view the statistical figures as needed. (See Understanding the Contents of the Study Charts for further details about the A, B, and C charts.)
Structure of Box-and-Whisker Plots. An excerpt from a box-and-whisker plot appears here:
Notice that box-and-whisker plots are arranged left to right along the horizontal axis. Each boxplot portrays statistical features of a given population group's data. In the excerpt shown here, the three boxplots portray statistical features for the 25K, 50K, and 100K population groups.
A boxplot consists of a box (a rectangle, actually) with a T-shape at the top and an inverted-T at the bottom. These "T's" are called "whiskers." Whiskers indicate the minimum and the maximum values of the data for the population category.* The bottom of the box delineates the 25th percentile, while the top delineates the 75th percentile. Thus, the box height indicates the span of data lying between the 25th and 75th percentiles. This area--outlined by the box height--is called the interquartile range. The horizontal line dividing the box marks the median value for the population group. The mean value (average) is marked with a plus sign ('+').
Note that these components of a boxplot correspond exactly with data appearing in the sub-tables described above. For example, to find out the mean for a given boxplot, that is, the value indicated by the '+', refer to the sub-table in the A chart for the set of data you are viewing. Values for other features of the boxplot--median, minimum, maximum values, and so on also appear in the sub-table.
In each boxplot, the complete range of data for a given population group--from the 0th to the 100th percentile--is divided into four sections called "quartiles." The interval from the 0th to 25th percentile is the first quartile (Q1), the 25th percentile to 50th percentile is the second quartile (Q2), the 50th and 75th percentile is the third quartile (Q3). The interval from 75th to the 100th percentile is the fourth quartile (Q4), and is not used in the sub-table. (Incidentally, the 100th percentile corresponds with the maximum value in the data.)
In order get a "closer" view of the details, boxplots in B and C charts in this series may have the top T-shape cut off (called "clipped") when a maximum value exceeds the scale of the vertical axis. When this occurs, boxplots are marked with a small diamond (◊). For instance, see chart 13B (scroll down to see the B chart),where the maximum value for the 10K population category is missing, since it exceeds 20,000, the maximum scale value on the axis of the graph. The clipped value can be determined, however, by referring back to the sub-table in chart 13A, under the heading "10K." The "MAX" value for service population the 10K library category can be seen there as 24,997.
* Some other styles of box-and-whisker plots use different conventions for marking extremely high or low values in the data, called "outliers". When extreme values stray beyond specific ranges, these alternative plotting styles use special markings to indicate this, rather than "T's".