Valen Johnson's Book

A Review of Valen E. Johnson's Book:
Grade Inflation: A Crisis in College Education,
262 pages, Springer-Verlag 2003, New York;
And a Reminder of LSU Faculty Senate Resolution 03-04:
On Grades and Standards
by Carruth McGehee, President of the Faculty Senate
email address mcgehee at math.lsu.edu
August 29, 2003
Table of Contents, with Links 1. Summary
2. How SET Numbers Are Biased
3. Grade Inflation, Broad Perspective
4. Remedies Moderate and Local
5. How to Reform the Use of SET Numbers
6. Remedies Formulaic and Centralized
7. Reservations and an Antidote

1. Summary

The grade-inflation hawks outnumbered the doves in the months of discussion that led to the adoption of Faculty Senate Resolution 03-04 in December 2002, which appears at http://www.math.lsu.edu/~mcgehee/Grades.html. Its appendices refer to strongly-worded reports with startling proposals. Nevertheless the Resolution itself calls only for moderate and local remedies. It also avoids the inexact term "grade inflation," which in general, alas, is unavoidable.

This seems a good time to call your attention to the Resolution. It contains policy recommendations for administrators, and charges to two Faculty Senate committees.

Valen E. Johnson's new book examines the nature of SETs - Student Evaluations of Teaching - and surveys a few dozen studies conducted over a period of decades at various universities. It also reports on the 1998-1999 study called DUET - Duke Undergraduates Evaluate Teaching - which was designed to address questions left open by earlier studies. Teachers who have paid thoughtful attention to their SET numbers over the years will not be surprised by anything here.

Johnson's work strongly indicates that to use SET numbers exclusively, primarily, or uncritically in the evaluation of teaching is to create pressures for lower grading standards.

Return to top.

2. How SET Numbers are Biased

The main merit of the book is its well-grounded, careful analysis of how SET numbers behave statistically, and how they are biased. This tells us what to watch out for when drawing conclusions from the numbers. I begin with a non-technical and qualitative description of findings. Three of Valen Johnson's five summary conclusions, as he states them briefly and simply on p. 237, are as follows:

Differences in grading practices between instructors cause biases in student evaluations of teaching.
Student evaluations of teaching are not reliable indicators of teaching effectiveness and account for only a small proportion of the variance in student learning from student to student and course to course.
High grade distributions cannot be associated with higher levels of student achievement.

The variables in the study are standardized (pages 87, 88, 118). Thus for example, conclusion 1 does not quite mean that the student whom you give a B will rate you higher than the student whom you give a C. What it means is this: The A student whom you give a B will rate you lower, and the C student whom you give a B will rate you higher.

Knowing that such a bias exists, as many studies confirm, it remains to evaluate various theories that might explain the bias: Perhaps the class that gets higher grades is more capable; or more interested in the subject matter; or has been more effectively taught. If such were so, then the bias would be benign. But in the light of the DUET study, those hypotheses do not hold up. The best explanation is the grade-attribution theory: Students have a certain measurable tendency to attribute success in academic work to themselves, and failure to external sources.

Johnson discusses a whole array of biases that are present, and considers ways to improve survey instruments. He confirms that feedback to teachers from certain survey items is associated with improvements in teaching. Nevertheless, he writes, the use of SET numbers as measures of overall teaching effectiveness has been an "unqualified failure" (p. 151); the sensitivity of SET numbers to biases like grading leniency and other extraneous factors remains "unacceptably high" (p. 165).

Return to top.

3. Grade Inflation, Broad Perspective

"Grade inflation" is a flawed term. It suggests an analogy to price inflation. If prices remain stable for a time after a period of inflationary increases, the harm comes to an end. By contrast, grades rise toward the ceiling of 4.0. The effect is a permanent degradation of the grading system, a shrinkage of the scale that is in actual use - "grade compression." Thus in the case of grades, reform may require some degree of "deflation." The longer we wait, the more difficult reform will be.

The task of upholding sound standards, and of assigning grades fairly and appropriately, so that the distinction between one grade and the next is valid, is a difficult component of teaching. It can be both difficult and unpleasant in the absence of support from university policies and administrators.

Overall grade distributions have been rising steadily at LSU and at many other universities. The rise in grades has not been uniform, but has run more rapidly in some programs than in others. Within a single program one often sees not a steady rise in all courses, but major sudden upward lurches, first in one course and then in another.

The problems that are present in our grading practices need to be analyzed and addressed locally, within academic units, in a discipline-specific way. Nevertheless, they must also be appreciated globally.

University-wide, we should consider the interaction between our grading practices and those of the high schools. LSU has increasingly used the high school academic GPA (HSAGPA) as a weight-bearing element of freshman-admissions criteria. A study was done in fall 2002 by the Office of Budget and Planning, considering students entering LSU as new freshmen from LSU's top 50 feeder schools. Some of the results were as follows:

In the 3-year period 1992-1994, 1,117 students with ACT Composite scores 20, 21, or 22 entered LSU from those high schools. Their average HSAGPA was 2.72.
In the 3-year period 2000-2002, 2,108 students with ACT Composite scores 20, 21, or 22 entered LSU from those high schools. Their average HSAGPA was 3.03.

There were similar changes in the average HSAGPA of students in each of the ranges of 23-26 and 27-36. It appears that the use of the grading scale has changed significantly in our main feeder high schools, and that the change in the university's grading practices is a closely related phenomenon. We are reminded that in the design of admissions policy at a large state university, we must be alert to the effects it will have on secondary education.

In the perplex of grades and standards, there are curricular issues as well. The academic profile of our entering class is much stronger now than in 1985. Perhaps we owe them an upward adjustment in our expectations, in our standards, and in the level of general graduation requirements.

In sum: Our consideration of "grade inflation" ought to include the full set of related issues. The essential point is that faculty should take hold of that which is clearly within their purview and responsibility - the quality of academic standards and programs.

Valen Johnson's book understandably focuses on the use and abuse of SET numbers, which is surely a central problem. If teachers - particularly the GAs and others with less job security - perceive that we do not care about grading practices, and that their interests lie in raising their average SET number by +0.4 points, then the effect is to press downwards on academic standards. Section 5 of this review discusses how to reform the use of SET numbers. But first, Section 4 will speak of remedies for grade inflation, broadly defined and considered.

Return to top.

4. Remedies Moderate and Local

The author discusses a number of remedies. I quote in part his descriptions of the first two, which are similar to recommendations in the Faculty Senate Resolution:

Remedy 1: Encourage institutional dialogue. ... Organized faculty discussions of grading practices are ... rare. Because discussion of grading practices is a prerequisite to reform, dialogue is critical. ... Provosts and deans can play a pivotal role in establishing this dialogue by communicating their concerns to department chairs and other faculty leaders, who in turn can relay these concerns and initite further discussion within their academic units. Alternatively, faculty ... bodies can initiate discussion through the organization of forums and study groups. Student input should also be solicited.
Remedy 2: Provide instructors with more information about their university's grading practices. Many professors do not know how their grade distributions compare to their colleagues', or how their departmental standards compare to others'. ... [S]ophisticated approaches ... might involve providing instructors with individualized assessments of their grading practices ... [to] assist instructors in quantitatively assessing their grading practices in relation to institutional norms.

Surely those two proposals are essential, and ought to be undertaken without delay. My opinion is that such measures are also very promising. In sum: Shine a light on the problem, confirm the authority of faculty, and stop the absent-minded pressures to lower standards. In particular, restrain and reform the use of SET numbers.

Return to top.

5. How to Reform the Use of SET Numbers

Understanding that the grading bias exists, and that it is not explainable as benign, we should become wiser about SET numbers. They can still be used without harm if they are used with restraint and good judgment; if other indicators of teaching quality are also used; and if the importance of appropriately high grading standards is steadily affirmed as - indeed - a component of good teaching.

Remember that we are talking about only a measurable statistical tendency for the student's grade to predict how he or she will rate the teacher. The analysis does not say that the SET numbers are totally corrupted by this bias. The student's grade is not the only predictor of the rating. In fact, the best predictor of a student's rating is the consensus rating by other students in the class (p. 95). More precisely put: As a predictor of how a student will rate you as a teacher, the grade received is about one-quarter to one-half as important as the consensus rating by the class (page 115).

Thus the studies themselves give us no reason to believe that the student consensus, if adjusted for this bias, is anything but a conscientious evaluation of the teacher. Depending on the level of the class and other circumstances, one might of course reach various conclusions about the validity and significance of the evaluation. In any event, we have an interest in how well satisfied our students are with our teaching. That has an importance in itself. The businesslike and courteous treatment of students, personal concern, and other teacher behaviors ought to be encouraged, whether or not we can prove their correlation with teaching effectiveness.

There are surely more ways than one to design a reasonable and moderate policy for the evaluation of teaching, eschewing both the prevalent practice of using SET numbers excessively, on the one hand, and the extreme notion of ignoring student opinion altogether, on the other. I offer for discussion the following set of guidelines as one possibility:

SET numbers should never be the only thing in view when teaching is evaluated. Grade distributions for the class and other appropriate information should also be routinely on view.
If a teacher's average SET number is within, say, one standard deviation from the mean SET number for some appropriate large set of teachers, and if his or her grade distributions and the other indicators are also in some suitable middle ground, then let's presume highly competent teaching - unless other facts indicate otherwise.
If the SET numbers or the grades are not in the middle ground, let's presume that there may be a problem (in the one case) or superior teaching (in the other); but let's investigate the matter, and bring other evidence to bear, before we draw a conclusion.
Encourage the use by teachers of SET numbers and written student comments to analyze and evaluate their own teaching.

Let me put it another way. It seems to me that the professional standing of faculty, the principles of academic freedom, and the effective promotion of good teaching require restraint. To wit:

We should not declare a teacher either outstanding or unsatisfactory, or assign a precise rating of a teacher, without a review that uses an appropriate wide range of observations and indicators.
We should not do invasive, time-consuming, or paperwork-intensive evaluation techniques every semester for every course and every teacher.

Return to top.

6. Remedies Formulaic and Centralized

Surely, wisdom does not lie in complacency about the state of grading practices. Just as surely, moderate remedies should be tried first, with patience and persistence. At the same time, it is an instructive exercise to think through stronger measures which have been proposed and sometimes applied. They tend to consist of the insertion, into the system, of mathematical processing schemes, which claim to repair the results of bad practice and/or to induce better practice. In each case, however, we are left with questions: Is the logic of the scheme complete? What will be the actual effect on practices, once the processing scheme is in place? Will that be a good effect? Will it be the intended effect?

The author's presentations of remedies 3-7 illuminate the prevalent injustices and dysfunctions. Nevertheless, the ultimate result may be only to send us back to Remedies 1 and 2 with renewed determination.

Johnson calls Remedies 3 and 4 "radical" (pages 240-241). He calls Remedies 5, 6, and 7 "centrist" and "minimally invasive" (pages 244-245).

Remedy 3: Constrain course grade distributions. This practice is fairly common in graduate and professional schools, and both the School of Law and Fuqua School of Business at Duke University use this method to maintain grading standards. The author explains methods and difficulties.
Remedy 4: Include information about course grading practices on student transcripts. Such policies have been adopted by Columbia, Dartmouth, Indiana, and Eastern Kentucky. An example: Record with each course grade the percentile ranking within the class. Thus a B in one class might represent the 60-percentile, while an A in another class might represent only the 40-percentile.
Remedy 5: Since GPAs are unfair on a campus where grading practices vary to the point of incoherence, let students choose whether their transcripts will show their adjusted GPA. Johnson discusses various ways to define an adjusted GPA, all of which have merits, none of which is ideal. The general idea is to assign a numerical value to a grade depending on the rank of that grade among all grades given in a class, after accounting for the quality of other students in the class. For example: If an A- is the lowest grade awarded in the class, then its adjusted value should be lower, maybe 3.10; if there are relatively few higher grades and most grades are lower, then the value of the A- should be higher, maybe 3.80. The details are available to the dedicated reader of pages 209-232.
Remedy 6: Use adjusted grades and GPAs to establish honors distinctions.
Remedy 7: Adjust the SET numbers in each class to remove the grading bias and to eliminate the incentives for the teacher to lower standards. To do so, cast out selected student ratings, depending on the grade distribution in the class. For example, one scheme might work as follows. In a class in which, say, 10% of the assigned grades were Cs or lower, ignore the lowest 10% of the SET numbers. In a class in which the percentage of As exceeded some target threshold by, say, 10, ignore the highest 10% of the SET numbers.

Return to top.

7. Reservations and an Antidote

After an immersion in Johnson's relentlessly serious analysis, a reader may want an antidote to restore humor and balance. When I presented the Resolution, I tried to express a temperate, decaffeinated attitude in my remarks. But the worried reaction of a few, not quite hearing me, was understandable. So I'll repeat: The Senators who supported the Resolution, by and large, don't believe that grades are everything. We do not wish to make every increment of student effort into a Skinnerian moment. Our hopes are modest. Many of us made some Bs ourselves, and appreciate the teachers who acquainted us with a meaningful A-standard; we'd like to return to the day when a B was a good grade.

Senator Jim Catallo recommended the book by Alfie Kohn entitled Punished by Rewards (431 pages, Houghton Miflin, Boston, 1993). When Kohn was a student in Introduction to Psychology, he was required to train caged rats. He turned in a lab report written from the rat's point of view, and the instructor was not amused. His book seriously questions our society's system of grades, prizes, and other incentive systems. Fair enough. I recommend his remarks on college grading as an antidote to behaviorist extremes. But don't miss his explanation of why he accepts fees for talks.

Those who deny or celebrate grade inflation are happy perhaps because it's essentially destroying the grading system. Kohn is one of those. He is a bit of an educational anarchist. Surely if we have the grace required to make anarchy work - to make a gradeless system work - then we certainly have the grace to make our classical grading system work with integrity, balance, and fairness; and that's what we ought to do.

Return to top.