James Wilsdon is professor of science and democracy in the Science Policy Research Unit at the University of Sussex, and chair of the Independent Review of the Role of Metrics in Research Assessment. More information on the metrics review can be  found here: http://www.hefce.ac.uk/rsrch/metrics/


Image credit: CC-BY antony_mayfield

Citations, journal impact factors, H-indices, even tweets and Facebook likes – there are no end of quantitative measures that can now be used to assess the quality and wider impacts of research. But how robust and reliable are such metrics, and what weight – if any –should we give them in the management of the UK’s research system?

These are some of the questions that are currently being examined by an Independent Review of the Role of Metrics in Research Assessment, which I am chairing, and which includes representatives of the Royal Society, British Academy, Research Councils UK and Wellcome Trust. The review was announced by David Willetts, then Minister for Universities and Science, in April 2014, and is being supported by HEFCE (Higher Education Funding Council for England).

Our work builds on an earlier pilot exercise in 2008-9, which tested the potential for using bibliometric indicators of research quality in the Research Excellence Framework (REF). At that time, it was concluded that citation information was insufficiently robust to be used formulaically or as a primary indicator of quality, but that there might be scope for it to enhance processes of expert review.

The current review is taking a broader look at this terrain, by exploring the use of metrics across different academic disciplines, and assessing their potential contribution to the development of research excellence and impact within higher education, and in processes of research assessment like the REF. It’s also looking at how universities themselves use metrics, at the rise of league tables and rankings, at the relationship between metrics and issues of equality and diversity, and at the potential for ‘gaming’ and other perverse consequences that can arise from the use of particular indicators in the funding system.

Last summer, we issued a call for evidence and received a total of 153 responses from across the HE and research community. 57 per cent of these responses expressed overall scepticism about the further introduction of metrics into research assessment, a fifth supported their increased use and a quarter were ambivalent. We’ve also run a series of workshops, undertaken a detailed literature review, and carried out a quantitative correlation exercise, to see how the results of REF 2014 might have differed had the exercise relied purely on metrics, rather than on expert peer review.

Our final report, entitled ‘The Metric Tide’ will be published on 9 July 2015. But ahead of that, we’ve recently announced emerging findings, in respect of the future of the REF. Some see the greater use of metrics as a way of reducing the costs and administrative burden of the REF. Our view is that is it not currently feasible to assess the quality and impact of research outputs using quantitative indicators alone. Around the edges of the exercise, more use of quantitative data should be encouraged as a contribution to the peer review process. But no set of numbers, however broad, is likely to be able to capture the multifaceted and nuanced judgements on the UK’s research base that the REF currently provides.

So if you’ve been primping and priming your H-Index in anticipation of a metrics-only REF, I’m afraid our review will be a disappointment. Metrics cannot and should not be used as a substitute for informed judgement. But in our final report, we will say a lot more about how quantitative data can be used intelligently and appropriately to support expert assessment, in the design and operation of our research system.



This post was published in relation to our Future of Scholarly Scientific Communication events (#FSSC), bringing together stakeholders in a series of discussions on evolving and controversial areas in scholarly communication, looking at the impact of technology, the culture of science and how scientists might communicate in the future.

  • Mike Taylor

    It’s great to have such informed people working on these issues, and especially that they are blogging them so that the wider world can understand what’s being discussed and concluded. My thanks to Prof. Wilsdon for writing this.

    Against that backdrop, I fear to comment from my own position of ignorance. But, heck, that’s never stopped me before, so here goes. My thinking is as follows:

    Dorothy Bishop has noted that results of REF assessments correlate strongly with departmental H-indexes[citation needed] (and suggested that we could save on the cost of future REFs by just using that H-index instead of the time-consuming peer-review process).

    But it’s also been shown that H-index is strongly correlated with the simple number of publications. A seductive but naive conclusion would be: “we could just count publications for the next REF!”

    But of course if we simply allocated research funding across universities on the basis of how many papers they produce, they will — they will HAVE TO — simply race to write more and more papers. Our already overloaded ability to assimilate new information would be further flooded. It’s a perverse incentive.

    So this is a classic example of a very important general dictum: measure the thing you’re actually interested in, not a proxy. I don’t know if this dictum has been given a name yet, but it ought to be.

    Measuring the thing we’re interested in is often difficult and time-consuming. But since only the thing we measure will ever be optimised for, we must measure thing we want optimised — in this case, quality of research rather than quantity. That would still be true EVEN IF the correlation between REF assessments and departmental H-indexes was perfect. Because the correlation is an accident, and changing the circumstances will break it.

    No doubt all this reasoning is very familiar and painfully basic to people who have been working on the problem of metrics in assessment for years; to them, I apologise. For everyone else, I hope this comment provides some grains of insight.