Data Curation

Nishmeet Singh is a Research Analyst at Outline India. He has previously worked as an Intern with the Center of WTO Studies, IIFT, New Delhi. He holds a Masters degree in Development Economics from South Asian University and a Bachelors in Economics from Delhi University.


Data curation is defined as the active and ongoing management of data throughout its life cycle such that it can be reused by scholars and researchers. The process of data curation broadly involves managing, describing and preserving data. Long before the digital revolution of the 21st century, natural history museums and libraries were centers of scientific data, where data was stored in the form of physical specimens, archives and printed research materials. The word curation meaning ‘to care’ emerged from museums and libraries where it carried emphasis on protection, amelioration and contextualization.  


The 21st century digital revolution fulfilled the need felt by scholars, researchers, librarians and museum curators to preserve data for long term usage and access. It provided the platform for data-intensive research and solved the problem of data deluge. research_data_lifecycleThis period witnessed several significant strands of early computational research in humanities including the development of indices, annotated linguistic corpora, and digitally encoded texts. In simpler terms, data began to be prepared, collected, organized, and maintained in data sets.


In the specific context of social research, data curation not only involves the capturing and preserving of data but it also carries the additional burden of gathering information about the methods used to produce the data i.e. the process of data curation gives additional consideration to the methods involved in data collection. Data curation in this context also involves collecting raw and abstracted material created as a part of the research process. The information collected or the material generated feeds as input to further research and is also essential for subsequent interpretation and reuse. The unique feature of data generated in survey research includes the social practices, layers of interpretation and effective responses characterizing the data. These unique features have important implications for designing curation systems and curation education programs. They signal a paradigm shift towards building technology driven platforms and developing web-applications where scholars can easily share research ideas, working papers, publications along with the used data and other relevant information, leading to possible reproduction, verification, advancement and public dissemination of research work. Curation in survey research additionally implies that data driven activities need knowledge about research and data collection methods, possibly giving rise to a new certifications and specialized training programs.

One of the challenges of data curation is the exchange and re-use of richly interpretive data over digital platforms. Scholars or researchers are sometimes interested in the interpretation of a particular work from the conducting researcher’s point of view or other critiques, hence in case of many layers of interpretation, the compilation of various secondary readings, emendations and commentary tends to hold relatively higher importance than the generated final document. Other challenges involved in curation of survey data is skill sets, education, training and institutional support. The required skills for curation should comprise knowledge of methods and designs, policy documents, standards, metadata and project management. The challenge of educating and training professionals is to attain the required equilibrium between the theoretical, practical and technical know-how of data collection.

In the current landscape, the most critical challenge is to set-up or bring together organisations, Research Centers, Libraries, IT organisations and institutional repositories to promote sharing of information and data. There are a number of centers that already exist, however, owing to data privacy, copyrights over research work and related material and lack of funding, there hasn’t been enough amalgamation of thoughts and ideas.

A platform needs to be set up, connecting students, faculty, researchers, government and their research materials with data sets, analysis and presentation tools and other curation services. The aim should be to converge on technically and financially sustainable requirements for digital scholarship.    

Image Source: