Transcription Guidelines

The Estoria de Espanna Digital project has developed a series of guidelines for transcription. These represent the tags we used for transcribing the Estoria manuscripts.

We also developed a training course in the use of xml tagging specifically for this project here.

For a full explanation of the TEI, see here.

Page Layout

  1. Folio numbering

Each folio break is marked with the <pb/> tag. Folio numbering was generated automatically by the Textual Communities system.


  1. Textual divisions

We use the <div></div> tags to mark the beginning (and end) of major divisions in the text.

In the Estoria these almost always correspond to rubrics in the base text and any additions from other codices.

The <ab></ab> tag represents (broadly speaking) the level of sentence, and frequently mirrors the medieval punctuation. Where additional text is required (i.e. that was not in the base text) then <ab> tags were entered in multiples of ten. Thus, if there is additional text between <ab n=“400” and <ab n=“500”> then ab numbers 410, 420, 430 etc. were employed.

For those rubrics in E1 which correspond to “book” level divisions, we employ the tag <div n=“xxx-1”> for these to ensure their relationship to the surriounding divs is clear, thus <div n=“24-1”> on folio 10v of E1.


  1. Columns

We mark the opening of columns with the following tags (“a” for first column, “b” for the second etc.)

  • <cb n=”a”/>
  • <cb n=”b”/>

As column breaks can occur in the middle of words, the word break tag can also be used here.


  1. Line breaks

We encode all line breaks in the manuscripts:


  • <lb/> at the beginning of each line
  • <lb break=”no”/> at the beginning of each line if the break occurs in the middle of a word
  • <lb rend=”hyphen” break=”no”/> at the beginning of each line if the break occurs in the middle of a word AND there is a written hyphen


  1. Rubrics – see below in Text
  1. Running header

We use the <fw> (‘forme work’) element for these:

<fw type=”header” place=”tm”>Header</fw>: a header, in the top margin, centre

<fw type=”pageNum” place=”tr”>1</fw>: a page number, top right


  1. Footers/Catchwords

We use the <fw> element for these, encoding the place on the page they appear:

  • <fw type=”catch” place=”br”>Catchword</fw>: a catchword, bottom right
  • <fw type=”footer” place=”bl”>Footer</fw>: a footer, bottom left


  1. Signature

<fw type=”sig” place=”bm”>Signature</fw>: a signature, in the bottom margin, centre


  1. Illumination

We tag illuminations using the <figure> tag. Within this tag we provide a description of the illumination (even if this is only to recognise that the illumination was never realised):


  • <figure><figDesc>21-line space left for miniature, not inserted, left blank.</figDesc></figure>


If the illumination has text, then it appears within a <head> tag as follows:


  • <figure><head><hi rend=”init2″>E</hi>l Rey don Ramiro el primero delos Reys de leon que por este no<am>̄</am><ex>m</ex>bre fuero<am>̄</am><ex>n</ex> llamados</head><figDesc>Ramiro and 4 counsellors</figDesc>
  • </figure>



  1. Word Spacing

Medieval word spacing can be inconsistent. Transcribers followed their judgment on the evidence of the manuscript to see if there is truly a word boundary between e.g. “delos” and “de los”. At the end of lines, if there is no hyphen and it is not clear if there is one word broken across the line or two words, we assumed that there were two words.

  1. Alphabet

We did not produce palaeographic transcriptions; in consequence we did not attempt to represent all of the variant forms of individual letters; thus s and ∫ were both transcribed as ‘s’. For abbreviations, see below.


  1. Glyphs

Initial ff, rr etc. are treated as glyphs and are transcribed using the choice tag – the expanded form is therefore a majuscule:

  • <choice><abbr><am><g>ff</g></am></abbr><expan><ex>F</ex></expan></choice>ernando
  • rromanos <choice><abbr><am><g>rr</g></am></abbr><expan><ex>R</ex></expan></choice>
  • ssen̄orio <choice><abbr><am><g>ss</g></am></abbr><expan><ex>S</ex></expan></choice>
  1. Initials

Initials are maked with their height using the following:

  • A ‘P’ 8 lines high — <hi rend=”init8″>P</hi>
  • An ‘E’ 5 lines high — <hi rend=”init5″>E</hi>

Note that sometimes, there is a space for an initial but this has not been composed. In this case, we use a similar tag, but without the letter. Thus:


  • Illustrated initial <hi rend=”init2″>S</hi>
  • Unexecuted initial <hi rend=”init2unex”></hi>
  • Unexecuted initial with guide letter <hi rend=”init2unex”>S</hi>


When counting the height of letters, we only count the height of the text box, and not the additional height of any descenders that the letter might have.


  1. Punctuation:


We attempt to represent medieval punctuation with modern equivalents:

  • calderon ¶
  • punctus .
  • punctus elevatus ;
  • punctus interrogativus ?
  • tripunctus ⸫


  1. Supplied text

We do not supply any text. If manuscript text is unclear or damaged we tag this.


  1. Illegibility, damage and space
  1. a) Where it not clear that any text is/was present, we used the <gap> tag, specifying the number of characters or lines that are illegible:
  • <gap quantity=”…” unit=”chars” reason=”illegible” />
  • <gap quantity=”…” unit=”lines” reason=”illegible” />
  1. b) Where the folio itself is damaged we used the <unclear> tag. The reason element may include “damage” or “illegible”. If the damaged text could be deciphered we used:
  • <unclear reason=”…”>damaged text</unclear>


If the damaged text cannot be deciphered, we used the gap tag nested within the unclear tag:

  • <unclear reason=”…”><gap quantity=”4″ unit=”chars” /></unclear>
  1. c) To indicate empty space in the source text (for example, left for a word to be filled in later):
  • <space quantity=”1″ unit=”chars”/>
  • <space quantity=”30″ unit=”lines”/>


  1. Rubrics

Rubrics are tagged as headings.

  • <head rend=”h1″ n=”Rubric”><hi rend=”init1″>text of heading</head>

An unexecuted rubric is tagged as follows:

  • <head rend=”h1unex” n=”Rubric”></head>


Occasionally, it is possible to encounter rubrics that are not continuous (split rubrics).

Beginning of the rubric. Text text text

text text text text text End of the rubric

text text text text text

text text text text text

text text text text text

text text.

In cases such as this one, we encode:

  • <lb/><head><seg xml:id=”rubricStartxxx” next=”#rubricEndxxx”>start of the rubric</seg></head>

<lb/><ab>some text</ab><seg xml:id=“rubricEndxxx” “rubricEnd” prev=“#rubricStartxxx>rest of the rubric</seg>


An example of split rubrics can be seen in E2 15r:



<lb/><div n=”618″><head rend=”h1″ n=”Rubric”>

<seg xml:id=”rubricStartE215r” next=”#rubricEndE215r”>De como don Vermudo fue alçado Rey. <am>⁊</am><ex>e</ex>

<lb/>de la batalla q<am>̄</am><ex>ue</ex> ouiero<am>̄</am><ex>n</ex> entre Yssem <am>⁊</am><ex>e</ex></seg>


<lb/> <ab n=”100″><hi rend=”init5″>L</hi>uego q<am>̄</am><ex>ue</ex>

<seg xml:id=”rubricEndE215r” prev=”#rubricStartE215r”>Çulema</seg>

<lb/>Mauregato fue mu

<lb break=”no”/>erto alçaron los al

<lb break=”no”/>tos omnes por Rey


A rubric split over three lines can be seen in Q160v:


<div n=”397″>

<head rend=”h1″ n=”Rubric”>

<seg xml:id=”rubricStartQ160v” next=”#rubricMedQ160v”>Del asentamiento de Sçiçia </seg>


<ab n=”100″>Tierra

<seg xml:id=”rubricMedQ160v” prev=”#rubricStartQ160v”> ⁊ de com̄o se mātouierō milla los</seg>

de Sciçia yaze en frontera

<seg xml:id=”rubricEndQ160v” prev=”#rubricMedQ160v”>Godos.</seg>




  1. Abbreviations


Most abbreviations are resolved by the use of either

  • the <am></am><ex></ex> tag
  • the <choice></choice> tag


(i) <am></am><ex></ex>


We use this tag for abbreviations in which the expanded text clearly follows on from the abbreviation mark, and for those which have a Junicode character.

Some common abbreviations to which this applies are:


  • n-macron — <am>̄</am><ex>n</ex>
  • m-macron — <am>̄</am><ex>m</ex>
  • Superscript e — <am>ᵉ</am><ex>e</ex>
  • Superscript i — <am>ⁱ</am><ex>ri</ex>
  • Superscript a — <am>ª</am><ex>ua</ex>
  • Superscript u — <am>ͧ</am><ex>ur</ex>
  • Superscript o, abbreviation for ‘ro’ — <am>º</am><ex>o</ex> or <am>º</am><ex>ro</ex>
  • Superscript hook, abbreviation for ‘er’ or ‘re’ — <am>̉</am><ex>er</ex> or <am>̉</am><ex>re</ex>


  • per — <am>ꝑ</am><ex>per</ex>
  • Per — <am>Ꝑ</am><ex>Per</ex>
  • par — <am>ꝑ</am><ex>par</ex>
  • Par– <am>Ꝑ</am><ex>Par</ex>
  • pro — <am>ꝓ</am><ex>pro</ex>
  • Pro — <am>Ꝓ</am><ex>pro</ex>
  • que — q<am>̄</am><ex>ue</ex>
  • ser — <am>ſ̷</am><ex>ser</ex> (a long s with a combining short solidus overlay, unicode 0337)
  • us — <am>ꝰ</am><ex>us</ex>
  • Tironian sign — <am>𝛕</am><ex>e</ex>


  • Arçob̶po — Arço<am>b̶</am><ex>bis</ex>po
  • capło– cap<am>ł</am><ex>itul</ex>o
  • cauałłos —   caua<am>łł</am><ex>ller</ex>os caual<am>ł</am><ex>ler</ex>o
  • cłigo — c<am>ł</am><ex>ler</ex>igo
  • eglesia — eg<am>ł</am><ex>les</ex>ia
  • escriuieron — esc<am>ⁱ</am><ex>ri</ex>uieron
  • graçias — gr<am>̄</am><ex>açi</ex>as</expan>
  • gloria — g<am>ł</am><ex>lor</ex>ia
  • grand —   g<am>ª</am><ex>ra</ex>nd
  • hercules– <am>ħ</am><ex>her</ex>cules
  • hermano– <am>ħ</am><ex>her</ex>mano
  • otra — ot<am>ª</am><ex>ra</ex>
  • otro — ot<am>ᵒ</am><ex>ro</ex>
  • papa– <am>p̄p̄</am><ex>papa</ex>
  • para — <am>ꝑ</am><ex>par</ex>a
  • parte –<am>ꝑ</am><ex>par</ex>te
  • qual — q<am>̄</am><ex>ua</ex>l
  • que — q<am>̄</am><ex>ue</ex>
  • qui — q<am>ⁱ</am><ex>ui</ex>



Superscript letters, especially at line end, are not abbreviations and are encoded as follows:

  • <hi rend=”sup”>s</hi>


(ii) Choice

We use the choice tag for more complex abbreviations and for those which do not follow the rule above. This is especially the case when the abbreviation mark follows the expansion or if the abbreviation mark covers more than one expanson The entire word is contained within <choice> and </choice> in both its abbreviated and expanded forms.

The structure of the tag is as follows:

  • <choice><abbr>↑<am></am>↑</abbr><expan>↑<ex></ex>↑</expan></choice>


where the arrows represent what has to be filled in. Thus:


  • capitło   <choice><abbr>capit<am>ł</am>o</abbr><expan>capit<ex>ul</ex>o</expan></choice>
  • enł — <choice><abbr>en<am>ł</am></abbr><expan>en<ex>el</ex></expan></choice>
  • iħu– <choice><abbr><am>iħu</am></abbr><expan><ex>ihesu</ex></expan></choice
  • isrł — <choice><abbr>isr<am>ł</am></abbr><expan>isr<ex>ahel</ex></expan></choice>
  • mr̄a — <choice><abbr>ma<am>r̄</am>a</abbr><expan>ma<ex>ner</ex>a</expan></choice>
  • mr̄s <choice><abbr>mr<am>̄</am>s</abbr><expan>m<ex>a</ex>r<ex>avedis</ex></expan></choice>
  • nr̄a– <choice><abbr>nr<am>̄</am>a</abbr><expan>n<ex>uest</ex>ra</expan></choice>
  • muretos <choice><abbr>mu<am>ʳ</am>etos</abbr><expan>mue<ex>r</ex>tos</expan></choice>
  • nr̄o– <choice><abbr>nr<am>̄</am>o</abbr><expan>n<ex>uest</ex>ro</expan></choice>
  • obp̄o (compare b̶po above)–   <choice><abbr>obp<am>̄</am>o</abbr><expan>ob<ex>is</ex>po</expan></choice>
  • arcobp̄o   <choice><abbr>arcobp<am>̄</am>o</abbr><expan>arcob<ex>is</ex>po</expan></choice>
  • pūs <choice><abbr>pus<am>̄</am></abbr><expan>pu<ex>e</ex>s</expan></choice>
  • sc̄a– <choice><abbr>sc<am>̄</am>a</abbr><expan>s<ex>an</ex>c<ex>t</ex>a</expan></choice>
  • sc̄o– <choice><abbr>sc<am>̄</am>o</abbr><expan>s<ex>an</ex>c<ex>t</ex>o</expan></choice>


  • Sc̄ago — <choice><abbr>sc<am>̄</am>ago</abbr><expan>s<ex>an</ex>c<ex>ti</ex>ago</expan></choice>
  • soƀr– <choice><abbr>so<am>ƀr</am></abbr><expan>so<ex>bre</ex></expan></choice>
  • tiempo — <choice><abbr>tp<am>̄</am>o</abbr><expan>t<ex>iem</ex>po</expan></choice>
  • tierra — <choice><abbr>trr<am>̄</am>a</abbr><expan>t<ex>ie</ex>rra</expan></choice>
  • xp̄o– <choice><abbr><am>xp̄</am>o</abbr><expan><ex>crist</ex>o</expan></choice>
  • xp̄iano– <choice><abbr><am>xp̄</am>iano</abbr><expan><ex>crist</ex>iano</expan></choic
  • OR
  • xpⁱano– <choice><abbr><am>xpⁱ</am>ano</abbr><expan><ex>cristi</ex>ano</expan></choice>
  • xⁱano– <choice><abbr><am>xⁱ</am>ano</abbr><expan><ex>cristi</ex>ano</expan></choice>
  • xⁱanos– <choice><abbr><am>xⁱ</am>anos</abbr><expan><ex>cristi</ex>anos</expan></choice>
  • <choice><abbr><am>xp̄</am>ianos</abbr><expan><ex>crist</ex>ianos</expan></choice>
  • OR
  • <choice><abbr><am>xp̄</am>anos</abbr><expan><ex>cristi</ex>anos</expan></choice>
  • xpiandad– <choice><abbr><am>xp̄</am>andad</abbr><expan><ex>cristi</ex>andad</expan></choice>
  • OR
  • <choice><abbr><am>xⁱ</am>anos</abbr><expan><ex>cristi</ex>anos</expan></choice>



In those cases in which the word is (all but) written out in full, we do not attempt to tag abbreviations. Thus:

“xpristiandad”, “xpristiano” etc. are left in this form.


It may be the case that the nomina sacra (ihu xpo) appears in different forms. In all cases, we attempt to replicate scribal practice and expand the text in the way indicated above.


  1. Emendation – scribal/non-scribal

The <app> tag #

This is a very complex tagging system designed to overcome the problems presented by the TEI <add> and <del> tags. Bárbara Bordalejo first developed the system when working on the Divine Comedy and has continued to updating for various projects and to make it more readily understandable. Her article on the Divine Comedy and its encoding system can be downloaded here.

The <app> tag is used to mark a place of variation (this is defined as a place in which there is a significant change that affects the meaning of the text). This could happen as a result of the scribe noticing that he has made a mistake and correcting it or of a later corrector making changes to a text written by someone else.

The changes can happen in many forms: they can be, for example, a substitution by erasing or marking the previous text in such way as to allow a reader to understand that this is to be ignored or they can take the form of marginal comments or alternative reading. There are many types of modifications that can happen to a text, but <app> should be able to handle any of them. Below, we give some examples that can serve as a guide for this type of tag, but these are by no means exhaustive and many other combinations are possible.

The Structure of <app> #

Within <app> we have, at least, three separe elements:

  • <rdg type=”lit”>: encode how the text of this sequence appears
  • <rdg type=”orig”>: record how the text read before the change
  • <rdg type=”mod”>: record how the text read after the change

In <rdg type=”lit”> we encode the literal sequence as found in the document, including marks that might be interpreted by the reader as clues indicating how the text should be read.

In <rdg type=”orig”> we include what we interpret to have been the original reading intended by the scribe, that is, the initial text. By definition this reading was intended by the scribe who wrote the surrounding text, so there is no need to provide more information.

In <rdg type=”mod”> we include the modified text, the one that represents a latter stage in the document (a later stage in reference to <rdg type=”orig”>). This text might have been writen by the same scribe as the surrounding text or it might be a correction by a later hand. If the correcction is by the same hand or a hand that is indistinguishable from it there is no need for added tagging. However, if the hand is a different one, we should add @resp to specify this fact. In such cases we encode <rdg type=”mod” resp=”1″> for the first corrector and so on and so forth.

<rdg type=”lit”>
<seg rend=”overwritten”>
<seg type=”1″>les</seg><seg type=“2”>los</seg>
<rdg type=”orig”>les</rdg>
<rdg type=”mod” resp=”1″>los</rdg>

Other Tags #

Here are some other tags that you can use within <rdg type=”lit”>:

Underdotted      <seg rend=”ud”>underdotted text</seg>

Stroke through   <seg rend=”strike”>sticken text</seg>

Underlined         <seg rend=”ul”>underlined text</seg>

Erased              <seg rend=”er”>erased text</seg>

Erased that is unreadable <seg rend=”er”>gap</seg>

Scraped           <seg rend=”scp”>original</seg>

Crammed         <seg rend=”cr”>original</seg>  

Text added within available space <seg rend=”pl”>original</seg>  

Interlinear text  <seg rend=”int”>interlinear text</seg>

For the physical position of segments use:

left margin <seg place=”lm”>

right margin <seg place=”rm”>

top margin <seg place=”tm”>

bottom margin <seg place=”bm”>