Musings on yesterday’s Transcription Jamboree

Published: Posted on

Sometimes the smallest comment someone makes just sticks with you. This is true for me with a comment I remember Aengus making back when I was a second year undergraduate in his Spanish intermediate linguistics class. He said something about not realising what you don’t know until you try to teach it to someone else. For some unknown reason this comment lodged itself in my brain and just stayed there. A few years later, teaching A Level French, the truth in his words became very clear when one of the students asked me: “Miss, what’s the difference between disjunctive, tonic and stressed pronouns?”. Cue the tumble weed blowing across the classroom as the cogs in my brain attempted to retrieve information I had probably not accessed since the day it went in during a first year French grammar lecture at uni. I knew how to use these pronouns, but were they different from each other? If not, then why do they have three names for the same concept? It’s true – you really don’t realise what you don’t know until you teach it. As an aside, a friend of mine recently suggested we start to call them ‘(gin and) tonic pronouns’, in that you feel you need a gin before teaching them. This is, of course, neither advisable before teaching any topic to A Level French classes (GCSE, maybe), nor endorsed by the Estoria de Espanna Digital Project. But I think the name could catch on.

Yesterday almost the whole Estoria team met, either in person or via the wonders of the World Wide Web, to run a fine-tooth comb through the tags we use, with the objective of tightening up our transcription guidelines and our transcription practices. We needed to do this to ensure everybody was on the same page before we can start any checking of others’ transcriptions, and before we carve them in stone for all eternity (probably) in the online training course for crowdsourcers. This was a long but essential meeting, and brought to light some very interesting questions; all things which as individual transcribers working separately, we assumed everyone else thought the same as us, but suddenly the need to really tighten up before teaching or correcting others’ work made strikingly clear the fact that not everyone in the team transcribes in the same way, despite having an extensive set of Transcription Guidelines, and fortnightly team meetings.

For example, in Q ‘de los’ almost always appears as one word – ‘delos’. Our guidelines are very clear on word spacing and division: if you see a space, type one, if you don’t see a space, don’t type one. Simples. But what if ‘de’ comes at the end of a line and ‘los’ at the start of the next line? Do we tag this line break as “lb/”, which assumes they are two separate words, or do we choose “lb break=”no”/ “, respecting the lack of word division elsewhere in the folio? In this case an executive decision was taken in the name of consistency amongst transcribers, both main team and crowdsourced, that we would assume the word has been divided into two by the scribe. But the case for “lb break=”no”/ ”  still has its merits. Hmm.

Another example which called for much scratching of heads, was whether the crossed L’s in abbreviations such as cauałło (which expands to cauallero, i.e. caballero, or as seen here, in the plural) Screen Shot 2014-10-09 at 10.30.16 Screen Shot 2014-10-09 at 10.26.11are an abbreviation in themselves, or is the abbreviation the stroke/hook across the L’s? ‘Who cares?’ I hear the uninitiated cry (and I would have been shouting with you a year ago, before I had ever really considered such matters in any depth). Well, the answer is linguists care. And so they should. As the wise Dr Jerez reminded Christian and me over a cup of tea and a digestive before home-time, we are dealing with one of the founding texts of historiography in Castiian and one of the first major prose works to be produced in medieval Castilian. These things do matter. Seeing the abbreviation as łł gives the edited version’s expansion as Screen Shot 2014-10-09 at 10.30.49 with the ‘ller’ italicised suggesting that all four of these letters are the expansion. This could give the impression that the double L is not there in the folio, but of course it is. Taking the abbreviation as just the hook or the bar through the L gives the expansion Screen Shot 2014-10-09 at 10.33.05, with just the ‘er’ italicised, as these are the letters suppressed by the hook, whilst the L’s are there loud and proud in the folio and are not shown italicised. I was pretty close to being convinced by the argument for the latter, until another wise Research Fellow asked ‘Well then, what about ‘pro’?’ What she meant by this was the abbreviation for ‘pro’, and his cousins ‘per’, ‘por’ and ‘par’, are all considered as abbreviations in themselves, even though in each case the letter ‘P’ is clearly seen in the folio, but with a bar, a hook, or a little swoosh representing the suppressed letters. In the expansion tags we repeat the letter ‘P’, which is then italicised in the edited version of the edition. The same logic as above would also suggest that italicising the ‘P’ would give the idea that it is not in the folio, when it clearly is. Even King Capelli of Abbreviation Land himself considers ‘pro’ and the related abbreviations as abbreviations in themselves, and expands to ‘pro’, repeating the ‘P’. Suddenly my head was spinning, with strong arguments for both sides. And all brought about by the need to ensure our guidelines are watertight before we can correct one another or train crowdsourcers.

Aengus was right back in 2005 (not that I ever doubted him, of course): you really don’t realise what you don’t know until you try to teach it.