Why do we live in an age of league tables?

The original version of this blog was intended to be quite academic and “dry”. This meant that I did not convey any of the emotional stress such rankings and league tables cause. For an example of what that can mean see the article by a “secret head teacher” in The Guardian of 23 August 2016. The disparity in power and the arbitrary nature of the judgements (in the name of being “user friendly”) come across very clearly.

Also, when I argued that my critiques of ranking were valid even if the exercise was carried out effectively and efficiently, I added that these were big assumptions. So far as Ofsted and UK school tests are concerned this is very clear. Absurd questions about arcane grammatical knowledge of 7 year olds; contradictions between the detailed feedback to schools of an Ofsted inspection and the summary verdict; lack of qualifications of inspectors: these and much else suggest incompetence and corruption. Again, the workload on the assessment panels for the UK universities REF exercise, and the restricted range of expertise of the members of those panels in relation to the range of research they are to assess:such things raise questions about the reliability of rankings or ratings in their own terms, quite apart from the inherent flaws of ranking and rating.

Finally, when the principal “reward” is to be less punished than others, it means such exercises are conducted in a poisonous atmosphere. Given that one could easily devise better ways of monitoring levels of performance and of working out ways of enabling “consumers” to interpret such monitoring, it becomes increasingly clear to me that what sustains such league tables is the drive to exercise power over those being ranked. That drive is then cloaked in a rhetoric of accountability and incentives. Simply challenging the particular way rankings are produced fails to expose this power drive which is at the heart of the whole approach.

 

I recently downloaded an article entitled ‘Top universities in the world featured in the Shanghai Jiao Tong University ARWU league table’. [1] Lists like this proliferate today, ranking societies (economic growth, human rights, corruption), organisations (schools, universities), products (washing machines, holiday resorts) and individuals (beauty contests, talent shows). They provide entertainment, information for funders and consumers and have huge effects on those ranked. We take them for granted, even if questioning their findings or deploring their impact. What we rarely do is trace their history, yet that tells us much about why rankings have become so important, so recently.[2]

Comparison takes different forms and these have a history. Classical political philosophy used analogy. Analogy selects a well-understood case to help understand another case which is the principal concern of the writer and seen as more difficult to understand. Plato classified human beings by analogy with bronze, silver and gold to explain his ideal polity consisting of guardians, soldiers and the people.[3] He used the image of the shadow in the cave to express the idea that human knowledge is an imperfect version of ideal knowledge. The frontispiece of Hobbes’s Leviathan is a visual analogy with the body politic, depicting a creature made up of many individual human beings. Mandeville used the beehive to explain the division of labour in human society. In War and Peace Tolstoy imagined a beehive abandoned by its queen to illuminate his description of Moscow after the Tsar had abandoned it to Napoleon.

Comparison can be used as critique. Tacitus in Germania praised the marital fidelity and simple life of Germans, contrasted with the promiscuity and corruption of Romans. Montesquieu invented visitors from Persia and Voltaire made observations about China to criticise contemporary France. Swift invented a traveller – Gulliver – to imaginary worlds in order to satirise his society.

Analogy has dropped out of favour in the social sciences and political philosophy, seeming too literary and “unscientific”. By the late 18th century one finds another kind of comparison based on the idea of progress. Scottish Enlightenment thinkers outlined a stage theory of history which culminated in “commercial society”. In the debate between “ancient” and “modern” this favoured the modern but moved beyond juxtaposing two static types to enabling comparisons across time and space.

This period also saw the construction of disciplines designed to make comparisons such as linguistics, political economy, anthropology and art history. Such comparison was systematised by liberal and socialist writers, most notably Karl Marx, who in the preface to volume one of Das Kapital could declare to his German readers that his analysis of capitalism, based on the British case, told the story of their future.

Such a view became commonplace by the late 19th century. It was used to explain and justify conflict between the major powers and provided the prospect of emancipation for the colonised. British writers worried about falling behind the national “efficiency” of Germany. A Japanese deputation studied the major powers in order to adopt best practice back home.

After the first world war comparisons between “nation states” and “national economies” became commonplace, linked to new ways of assembling statistical data. By mid-20th century such comparison took two powerful and competing forms centred on the Soviet Union and the USA. This gave rise to the specialised ranking which predominates today.

Two comparisons dominate: ranking and rating.[4] A ranking list is relational and a zero-sum game: it does not compare over time and any case that goes up must do so at the expense of another. The university ranking cited earlier is such a list. Ratings claim to be absolute; one case going up a grade from a previous rating does not entail another dropping but only that the first case has improved. Examples are Michelin restaurant guides, Ofsted school grades and Moody’s economic ratings.

Widespread and systemic ranking based on evaluation is no older than the 1970s although there are a few precursors.[5] In principle they became possible with the emergence of comparative disciplines and the capacity to produce and analyse appropriately the required data. As these were in place by 1900 that alone does not explain the recent explosion in ranking.However, it has become much easier with advances in computer technology.

 

The sports league table had become widespread by 1900 and can be linked to changes which enabled regular competition beyond restricted localities.[6] However, I am concerned with ranking where marks must be indirectly derived. There are different kinds of measures: “objective” and “subjective”, input and output. Take universities. Degree classifications or post-graduate employment rates are objective output measures; student surveys subjective ones. Staff-student ratios and library and laboratory resources provide objective input measures; observations of teaching subjective ones. Complex evaluations involve a mix of such measures so there must also be combinatorial rules for producing a final mark.

What accounts for the recent emergence and proliferation? What are their effects? What does this tell us about their functions?

To answer the first question we must move back from the neo-liberal era which started in the 1970s and became globally dominant following the collapse of communism. The first elaborate production of targets (closely related to ranking) took place in Stalin’s Soviet Union. In free market economies prices based on interactions between producers and consumers supposedly provide all the information needed and punish or reward according to performance. Without market prices how can one make decisions about efficient and effective performance? The crude answer worked out for agricultural collectivisation and the first five-year plans were targets and models.[7]

However, this had drawbacks even for the simple goal of crash developing heavy industry with a state able and willing to impose draconian sanctions. With advances in computer technology the hope was raised that one could find adequate informational substitutes for the market.[8] That failed because the technology was inadequate to the amount of information that had to be quickly processed, the “price” itself was an artificial construct and the regime was not prepared to relinquish the power of economic decision making if the computer programme contradicted its preferences. Reform through the partial introduction of markets also failed in the face of overwhelming resistance from interests formed in a command economy. The end result was gradual, then accelerating stagnation. When Gorbachev tried radical measures to undercut the top-down single party and the command economy the result generated a crisis which led to collapse. The post-communist societies became the first guinea pigs for neo-liberal experiments in rapidly transforming command economies into free market ones.

What light does this communist experiment cast upon later neo-liberal rankings? There are important differences. Modern rankings are more sophisticated than Stalinist targets, cover a greater range of activities, and are operated by non-state as well as state organisations. Nevertheless, they share one essential feature with the Soviet Union: the impetus to inject competition into what are not intrinsically competitive activities.

Even in neo-liberal capitalist economies there is a good deal of non-market activity in such fields as education, health, defence, transport and the criminal and penal system. Even after enthusiastic bouts of privatisation this remains the case, whether because of failure, a refusal to privatise or the problem of substituting a private for a public monopoly. Fall-back positions include regulators, “internal markets” and ranking. This in turn generates a “taste” for ranking as we see in TV shows which pit amateur cooks or dancers against each other.[9]

We can better understand rankings by looking at when and why the begin. UK University ranking can be traced back to the first cuts imposed in 1981. With the challenge now of allocating shrinking rather than expanding resources robust measures were needed. Coinciding with the hey-day of Thatcherism but recoiling from the radical step of privatising universities, this quickly led to the research assessment exercise.

Ranking linked to significant changes in funding changes the behaviour of those ranked. For the evaluators a ranking measure is a proxy for some desired performance. However, for those evaluated the concern is to maximise the measure not the performance. Under Stalin the target set for window factories was weight of glass. This produced thick, opaque windows. The target was changed from weight to area. Now the factories produced thin, fragile windows. Such measures, intended as substitutes for prices, are not produced like prices. Prices are the direct result of competitive interaction;[10] the target or measure is an indirect way of providing a “price’ which is supposed to stimulate competitive activity. What it actually stimulates is the effort to raise that “price” as high as possible. That may but does not necessarily include improvements in performance.

This point underlies one major critiques of ranking and rating. For example, in the UK Ofsted uses as one measure of school performance tests of pupils. Naturally pupils do not experience such tests as a proxy for rating the school but as an individual test. Schools “teach to the target”. Some critics argue persuasively that this does not merely fail to measure how the school is “really” performing but for activities like reading and writing actually subverts good teaching.

A related criticism concerns the measures. All measures are selections from an infinity of possibilities and contestable in how they are constructed and used. For example, a key “input” not measured initially for schools was the pupils themselves. Pupils from different backgrounds cannot be expected to perform identically under the same conditions. Yet “correcting” for this omission is incredibly difficult. Assessments of class teaching changes teacher performance and varies from one assessor to another. Some measures are “circular”. An influential ranking of US law faculties gives the biggest weighting to “reputation”. This was provided by a number of “reputable” law professors, who presumably mainly come from the institutions regarded as the most reputable! Also, a measure which might make sense for one activity is inappropriately imposed on another. An example is “impact” in academic research which is fine for cancer research but not for an outstanding book on medieval bridge-building.

Finally, once the measures – many contestable as accurate or functional – produce a set of marks, these must be combined to produce an overall ranking or rating. Any combination will be highly contentious. For example, one student might rate degree results highest, another employment prospects, yet another student satisfaction. There is a case for not providing an overall mark but instead to present a disaggregated set of the different measures along with an account of the problems associated with each of them. This is often rejected on the grounds that the consumer or public must not be confused by complexity.

The final critique concerns the impact of such measures on those being measured. Teaching or researching to a target produces high levels of anxiety, inappropriate shifts from cooperation to competition, and a widespread sense that one is misusing talent. Much time and energy is spent “gaming” the system. For example, UK universities run dummy research assessments in order to raise performance in the actual assessment. They import research ‘stars” instead of appointing promising younger academics who have not yet produced the desired “research outputs”.

Finally, all these critiques are valid even assuming that the assessments are appropriate, properly resourced and not subject to corruption. In many cases that is not so.

But how else can we produce “accountability” is the neo-liberal riposte? Ranking and rating might not be perfect but in the absence of markets and prices it is better than nothing. It stimulates and it helps the consumer. There is weight to such justifications. The ideal alternative is trust which, if it works, does dispense with the defects of ranking. However, a trust culture can be and often is corrupted to produce quasi-monopolies in which producer interests trump those of the consumer. (This is not necessarily cynical. People, if not constantly challenged, can quite naturally come to believe that what suits them suits everybody.)

However, there are two problems with this neo-liberal answer. The first is it is not applied to the free market. Indeed the neo-liberal case for ranking is precisely to substitute for the presumed optimal performance of free markets. However, as consumer surveys have shown, free markets often produce non-optimal performance. They work best when entry and exit to the market is easy, the products are easily assessed by consumers, are purchased on a routine and repetitive basis and are neither so trivial that evaluation does not matter or so important that one is overwhelmed by the responsibility of making a decision. Even in the limited range of cases remaining, the elimination of trust from market transactions renders those transactions inefficient and produces non-optimal outcomes.[11] Once such trust is lost, it is as necessary to monitor for accountability in market as in non-market transactions (e.g. on food adulteration). However, neo-liberal governments tend not to police markets effectively in such ways. Meanwhile, the very penetration of rankings into non-market activities destroys what had existed of a culture of trust.

Just as market transactions do not police themselves and need trust or other forms of accountability, so ranking non-market activities proves one-dimensional and, taken alone, destructive of efficiency and effectiveness in achieving their stated objectives. Many non-market goods are complex and rankings pulverise such complexity. They are often opaque and beyond the understanding of the consumer. Usng a ranking to assess which heart surgeon to use or which degree at what university to study can lead to bad decisions.

Perhaps there is no ideal answer. However, in areas I know fairly well (schools and universities) it is quite easy to think of better ways of evaluating than presently used. A much boosted and independent external examining system could have been used for university teaching; instead it has been reduced in significance. In-depth student questionnaires and follow-ups, customised to different disciplines and courses would work much better than a mass of standard forms without feedback which simply produces indifference and scepticism. It would cost more than current external examining and student surveys but that would be more than covered by dispensing with the effort that currently goes into meaningless rankings.

This failure to explore more differentiated and collegial ways of assessing performance suggestes the “real” reasons for ranking and grading. First, there is an assumption that the consumer needs an easy-to-understand summary verdict. I do not find that persuasive. Choosing a degree subject and university or deciding as a research council or a private firm where to place a research project should not be made into as simple a task as buying goods in the supermarket. Given the investment involved one would expect such users to ensure they understood more meaningful but complex information.

Second, and in my view, most important, is that ranking and rating is about power. School ranking in the UK was about undercutting the power of teachers and local authorities. Research and teaching assessments was about undermining university autonomy and, even within universities, shifting it from the academics in their different disciplines to the administrators at the centre. The power to rank is vested in neo-liberal regimes which distrust non-market autonomy and seek to erode it if they cannot replace it by markets. It is not as terrible as Stalinism but it is about power. It ends up spreading the false idea that one can rank and rate all activities and one must to so because no one can be trusted to do anything well otherwise.

[1] https://www.timeshighereducation.com/student/news/shanghai-ranking-academic-ranking-world-universities-2016-results-announced

[2] This blog is inspired by and deeply indebted to a research project of the Centre for Interdisciplinary Research at Bielefeld University. My thanks to Willibald Steinmetz for bringing this to my attention. Here I focus on just one of many aspects the project considers.

 

[3] Only in modern times did the metals come to be used to mark a ranking. The ancient Olympics only recognised winners. It was the third modern Games (St.Louis, 1904) which first awarded the three medals for coming first, second and third. This contradicted Plato’s analogy which insisted each of the three orders was of equal and eternal value.

[4] I draw on Bettina Heintz who, in a paper at a workshop in Oxford on ‘The force of comparison’, added two others: superlatives (the 100 best popular songs, etc.) and prizes.

[5] I exclude league tables of “competitive” sports as such ranking is not based on evaluation. I do include “display” sports where no direct competition – simultaneous or sequential – takes place but instead judges mark performances.

[6] The drawing up of rules for English football, rugby and cricket coincides with the advent of cheap rail travel from the 1860s which meant games could no long be based on tacit local conventions.

[7] The liberal capitalist alternative first emerged with “modernisation theory” after 1945 with the USA setting targets for “undeveloped” societies in order to ward off the appeal of communist development. Walt Rostow’s “stages of economic development” was such a model. It is no coincidence that this economic historian became National Security Adviser to Lyndon Johnson.

[8] For an illuminating account of this story see Frances Spufford’s “docu-novel” Red Plenty. Stakhanov, the heroic miner, was the first individual to function as a model.

[9] However, these do not produce rankings because the losers are eliminated. In principle one could eliminate poorly performing schools or universities but not in the radical fashion of a TV talent show.

[10] This is why a school “league table” is different from a football league table. No-one would take seriously the claim that the best team could be any other than that of being top of the league at the end of the season.

[11] Shaking hands on a deal is a lot cheaper than drawing up elaborate contracts.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s