Data Quality in Current Research Information systems

Published: Posted on

November saw the first Pure User Community event at the University of Birmingham. As the Research Information Manager I am really lucky to have a great community of institutions using Pure through the UK and international user groups. However, I was very aware that we have a growing group of staff supporting Pure internally who we don’t hear from enough. While I was putting together my introductory slides welcoming everyone to the event it was really striking how many staff across the institution are involved in collating or curating research information in Pure; aside from the 2000+ academic users maintaining their own profiles we also have a support cast of administrators, Project Managers, Librarians and other Professional Services staff offering advice, maintaining and reusing the information stored in Pure.

Current Research Information Systems (CRIS) such as Pure are becoming an integral part of research management within Higher Education. Collating and managing information about all aspects of research life they allow research staff to maintain rich research profiles and reuse the information in CVs, websites and grant applications as well as feed internal and external processes. In an era where academic staff are being asked to do more and more it is crucial that we reuse the data held in these systems as much as possible, and where we can, reduce the administrative burden on researchers. Yet, as the quantity of data collected in these systems increases I do fear the quality is decreasing; how do we maintain a high level of data quality with so many people responsible for creating the data?

So, I have decided 2017 will be the year of research information quality at Birmingham! After more than 4 years using Pure we know that there are issues with data quality, what we don’t know is how to solve them. Anyone who has seen Steven Van den Bergh (Research Information Systems Manager, Vrije Universiteit Brussel) speak knows he is passionate about data quality issues and has done some great work at his institution to improve the quality of the data they hold in Pure. My biggest take away from talking to Steven is thinking about what impact improving the quality of certain data will have; what are the quick wins and how will it benefit the way we work?

Steven advocates that some data quality issues are so complex and so difficult to solve we shouldn’t bother trying. It’s a liberating idea, and one that helps you to focus on the problems that are fixable. For us that will mean thinking about how we want to use the data held in Pure to help understand the activities taking place across the institution. Take the External Organisation field as an example: any record of a research activity in Pure can have a link to an external organisation whether it is the place of work of a co-author on a paper, an organisation one of our researchers is working with on an event, a funder or a stakeholder. The problem comes when one record of an external organisation differs slightly from another. One user may record RCUK, while another prefers Research Councils UK.

Although the fixing all the data errors and inconstancies in our 70,000 external organisation records is a daunting and unachievable task, there are some key organisations that we are interested in understanding our interactions with more fully. Pure could be a goldmine of information in this regard if the external organisations were tidied up. We know that for example, we have approximately 200 different iterations of Cancer Research UK in Pure. If we were to rationalise these it would help us to understand our interactions with the charity much more easily, allowing researchers to see how they fit into the network of staff working with them, giving Business Engagement teams the information they need to support and develop new activity and help the University promote a fuller picture of our impact on research into cancer.

Improving data quality can be time consuming and resource intensive, so it is crucial the time is spent well. Some issues may be able to be solved through the use of scripts to automate the work, but others will have to be dealt with manually. As we embark on a year of data improvement I want us to keep 3 key aims in mind:

  1. Articulate what we want to achieve by improving the quality of specific data types/fields
  2. Be clear about the time and effort we want to put into these tasks
  3. Know how we are going to maintain the changes long term

Getting back to our new User Community, I think all users of Pure at Birmingham will have a part to play in this. We will all need to care about the quality of data we are recording and recognise the impact of poor data quality on the system. Future user community sessions will definitely focus on the data quality work we are doing, hopefully gaining support from staff locally and getting them to buy into the process. The work we do now needs to feed into sustainable data quality processes in future.

I am hopeful that the work will have a positive impact on everyone’s experience of the system, continue to promote the importance of CRIS such as Pure and hopefully this time next year we will be reflecting on a job well done!

Karen leads the management of activities relating to research information across the institution and is responsible for Pure, Researchfish and SciVal. As well as managing these systems her remit extends to the support and coordination of activities around research information, including open access and open data programmesand the research metrics working group

Author: Karen Clews

Test bio

Leave a Reply

Your email address will not be published.