TranscribeEstoria and the collaborative edition: harnessing the momentum of the crowd

Last week we launched the first of the five texts for TranscribeEstoria, a collaborative project. This is a pilot for the eventual full project that we hope will take place as a result of the current project. TranscribeEstoria is a crowdsourcing initiative which will hopefully eventually see the transcription of manuscript C (MS 12837 BNE, Madrid) of the Estoria de Espanna transcribed primarily by volunteers. The transcriptions of C will supplement the Estoria Digital, a digital edition of the Estoria de Espanna that we produced, of which the first version officially went live in 2016. The launch of TranscribeEstoria has led to some debate about why we are asking volunteers to transcribe for us, with our supervision – are we trying to get something for nothing? And as rightly pointed out via social media, there are many experts on the Estoria de Espanna who could relatively quickly produce the transcriptions for us. In fact, having spent four years preparing the digital edition of the Estoria de Espanna, and a further year preparing the digital Crónica particular de San Fernando, the team behind TranscribeEstoria are no strangers to XML editing software or strings of complicated nesting tags (which we secretly love – the more complex the better). In fact, as we worked out when analysing our small-scale use of crowdsourcing when preparing the 2016 edition, once you have factored in the significant time investment required to set up a crowdsourcing initiative, particularly one with a brand new transcription platform such as the one we are using with TranscribeEstoria, to date we could have produced the transcriptions far more quickly and cheaply ourselves. So why haven’t we? The question is a valid one.

In fact, we have produced transcriptions of the five passages being crowdsourced ourselves, in full and glorious TEI5-compliant XML. This was one of the very first tasks we had to do in order to plan TranscribeEstoria. But this, for us, is missing the point. The point for us is not how quickly we can have the transcriptions produced, but rather the opportunity to enable other scholars to join in our work. Some of these are Scholars with a capital S: scholars in the traditional sense – academics, doctoral researchers, those who earn their crust, or who hope to, by studying or teaching the history, the texts, or both, of the medieval period. But many of these are scholars in a much wider sense: interested members of the public, amateurs and hobbyists, citizen scholars who are no less fascinated by the medieval period, and who have no less right to access the digital materials we are working on. At the Estoria de Espanna Digital Project, the wider project of which TranscribeEstoria forms a part, we are passionate that access to medieval texts should not be restricted to those with the ‘correct’ academic credentials – this may be true for direct physical access to the extremely fragile documents themselves, but in the digital age, where high-quality images of these texts exist and are available freely for not-for-profit use by everyone, as part of the cultural heritage of a country, there is no reason why anyone who is interested should not be allowed to work on and to learn from these materials.

Where we do see our role, however, is similar to the idea of gatekeepers. This is not to say that we feel we have the right to dictate who can study or work on what, but rather that we recognise that the many years we have spent working with these documents, and ones related to them, have enabled us to gain a level of expertise that we can use to help others access the materials. That is not to say that we believe we are the sole experts in the wider TranscribeEstoria team, including our volunteers, whilst these volunteers are the empty little pitchers awaiting the knowledge we deign to allow them to soak up from us. Far from it. We know, and have already seen, that many of our volunteers are experts in their own right on a wide range of topics. Our role here, we believe, is to offer both a level of quality control of transcriptions, as well as the tools, information and advice required to unlock these manuscripts to a wide range of lifelong learners with vastly differing backgrounds and prior experience, to enable as wide an audience-base as possible to join us in learning together about the Estoria de Espanna. For these purposes we recognise the need for the ship to have a captain, steering everyone in the right direction.

The suggestion that we are trying to get something for nothing, to profit from the unpaid labour of others, is one which cuts us particularly deeply because it is based on a profound misunderstanding of our objective. That said, we can understand why some people may have these superficial views, before they take the time to understand what we are really about. Our ethos is at heart deeply democratic, where all are enabled to work on, learn from, and most importantly enjoy these wonderful medieval materials that form a part of our shared cultural heritage. At the moment the transcriptions produced for the pilot TranscribeEstoria will not form part of the Estoria Digital, because before this is possible we need to analyse the results generated by this new collaborative tool, and of course, the transcriptions created will need to be revised by the research team. However, in return for volunteers’ hard work, volunteers can achieve certificates, and will receive our gratitude for their help. They will also be explicitly recognised in the eventual publication online of the transcriptions, following revision. The key reward for volunteers at this stage is, of course, the learning that will take place: depending on a volunteer’s background, this may include increased digital literacy, an improved understanding of the specific manuscript being transcribed, of its palaeography, of the Estoria de Espanna as a work, of medieval texts written in Castilian prose in general, of history and historiography, of medieval Spanish and its orthography, and more widely the society in which this work was first produced. This, we hope, will take place via carrying out the transcriptions themselves, but also, and no less importantly, by reading the many and varied blogs we have cued up for publication during the pilot project, and by engaging in discussion via social media. We do hope that in the longer term this crowdsourcing initiative can grow to such a scale that volunteers, once trained using the TranscribeEstoria course as it currently stands (albeit an improved version, taking into account the teething problems we are encountering and working on as we run our pilot project; this is, after all, one of the points of a pilot project), can produce transcriptions more quickly than we could, and that volunteers will be able to produce transcriptions that we have not already produced ourselves, to such a high standard as to be able to make their way into the updated Estoria Digital. At this stage, as was the case with the crowdsourcers whose transcriptions were used for the 2017 edition, the names of those who are working alongside us as volunteers will be publicly included in the future editions that contain the transcriptions of C. But we are a long way from this at the moment.

As a project team, we are grateful to those who have already started transcribing, engaging with the material, commenting and asking us questions, and in some cases, spotting teething problems in the system. All of this work is already making a genuine contribution to scholarship in that it is helping us to improve and refine our transcription platform, which we hope will eventually grow into the training for the wider project, where transcriptions really will be crowdsourced, rather than this more MOOC-style pilot project we are running at the moment. We understand that crowdsourcing the transcription of medieval prose in Castilian is still in its infancy, (see, for example, some interesting crowdsourcing projects already taking place as part of the community of the National Library of Spain) and where the material for transcription is so old, being early fourteenth-century, and containing quite so many abbreviations requiring expansion, the task required of volunteers is complicated significantly. This does, however, certainly add to the interest and the sense of achievement when a transcription is completed to the best of a transcriber’s ability. We are excited to be able to share with others the opportunities we have been given to work on the documents of the Estoria de Espanna, not just to enable them to read about the outcomes of our research, but to play an active role in the research itself. And this is why we are crowdsourcing.

1 thought on “TranscribeEstoria and the collaborative edition: harnessing the momentum of the crowd”