0
15th June 2022 by

CLiC Quick-Start Guide

Dr Rosalind White takes you through a quick-start guide exploring some of CLiC’s features.  If you would prefer video instructions these instructions are available in a Twitter thread. You can also find further guidance on the help tab of the CLiC Web App.

Timeless books by Lin Kristensen, licensed under Creative Commons.

The CLiC Web App (Mahlberg et al. 2020) was designed specifically for the analysis of literary texts and developed as part of the CLiC Dickens project. You can view a full list of the texts available here. For more detailed instructions please refer to the CLiC user guide.

Basic Concordance

A concordance is a list of examples of a word as it occurs in a given corpus or corpora. Concordances are presented in lines so that you can view them in the context in which they occur in the text.

  1. Select a corpus or multiple corpora (in the digital humanities a corpus or corpora refers to a text or a group of texts).
  2. Select your subset (whether you want to search through ‘all text’ – the whole book(s) – or just one of the subsets: ‘short suspensions’, ‘long suspensions’, ‘quotes’ and ‘non- quotes’).
  3. Type in your search term and hit enter. can be used as a wildcard – so candle* would also find candles or candlestick; as well as as a placeholder between words – so with * hands would find with her hands, with his hands, with clean hands etc).

 

Sorting

Lines can be sorted alphabetically by any of the columns in the concordance by clicking on the header, which will then be marked with dark arrows. For example, by clicking on ‘Left‘ the lines will be sorted by the first word to the left of the node and by clicking on ‘Right’ by the first word on the right.

Filtering

The filter option lets you filter the concordance output further. For example, searching for hands in Oliver Twist yields 124 results; but when we use the option ‘filter rows’ and search for pockets this is filtered down to 8 results.

Distribution Plot

After you’ve run a basic concordance you can also view your results as a ‘Distribution plot’ if you select ‘view as distribution plot’. See, for example the distribution of “marriage” across three of Jane Austen’s works.

‘Marriage’ viewed as a distribution plot across three works by Jane Austen.

KWICGrouper and Tagging

The KWICGrouper is a tool that allows you to quickly group the concordance lines according to patterns that you find as you go through the concordance. Any matching lines will be highlighted and moved to the top of the screen. By dragging the slider, you can adjust the number of words that will be searched to the left and right of the search term (R1 refers to a word to the immediate right, whereas L1 refers to a word to the immediate left).

Once you have identified lines with patterns of interest, you might want to place these into one or more categories. CLiC provides a flexible tagging system for this. The tags are user-defined so you can create tags that are relevant to your project.

In this case, occurrences of dream in Oliver Twist have been tagged according to who is dreaming.

Keywords

The Keywords tool finds words (and phrases) that are used significantly more often in one corpus compared to another. Apart from comparing single words, CLiC also allows you to compare clusters (multiple words).

  1. Target corpora’: Choose the corpus/corpora that you are interested in.
    ‘within subset’: Specify which subset of the target corpus you want to compare (or simply choose ‘all text’)
  2.  ‘Reference corpora’: Choose the reference corpus to compare your target corpus to (we have built a nineteenth-century reference corpus specifically to compare against Dickens’ works).
  3. Within subset’: Specify the subset for the reference corpus.
  4. n-gram’: Do you want to compare single words (1-grams) or phrases (2-grams up to 7-grams)

Counts

The Counts tab lists information about individual books and all books in a corpus.

Clusters

Clusters are also called ‘n-grams’, where ‘n’ stands for the length of the phrase. If we choose a ‘1-gram’ (single word), we retrieve a simple word list. (In Oliver Twist, for example, the top 10 words retrieved from this tool are the, and, to, of, a, he, in, his, that – all function words, as we would generally expect.) From version 2.0 onwards, CLiC supports clusters of length 1 (single words) up to 7 (i am very much obliged to you).

Texts

The Texts tab shows the full text of individual books and allows you to navigate to particular chapters. Selections of the text can be easily copied and pasted into other applications (e.g. Microsoft Word documents). You can select the levels of annotation you want to display, such as “Sentences” and the “Quote” and “Non-quote” subsets, etc.

 
 
When you use CLiC in your work, please cite CLiC like this: Mahlberg, M., Stockwell, P., Wiegand, V. and Lentin, J. (2020) CLiC 2.1. Corpus Linguistics in Context, available at: clic.bham.ac.uk [Accessed: DATE]
 
 
If you try out the CLiC App you can tag us on Twitter @CLiC_Fiction. You can also email me at r.white.4@bham.ac.uk if you are interested in writing a guest post for our blog.
 
 
 
 

Author: Rosalind White

Rosie White is a Research Fellow in Corpus Linguistics at the Centre for Corpus Research and editor of the CLiC Fiction Blog. She is a Victorianist interested in questions of materiality, and the growing field of research on the history of emotions. Her doctoral thesis interpolated between the history of science and the history of emotions, two interdependent fields that mutually orbit around the same question: what stories emerge from the past when we cease mining it for teleological argument? She co-wrote Pre-Raphaelites in the Spirit World: The Séance Diary of William Michael Rossetti (2022) and is also RA on the ‘Finding Middlemarch' project at Royal Holloway, University of London.

Join the discussion

0 people are already talking about this, why not let us know what you think?

Leave a comment

Your email address will not be published. Required fields are marked *