The corpus application is developed by the Dutch Language Institute (Instituut voor de Nederlandse Taal or INT). The backend of the application is the BlackLab Lucene based search engine developed for corpora with token-based annotation (https://blacklab.ivdnt.org/). The web-based frontend is a further development of the corpus-frontend application developed by INT (https://github.com/instituutnederlandsetaal/blacklab-frontend) in CLARIN and CLARIAH projects. Its design is inspired by the first version of the OpenSoNaR user interface by Tilburg and Radboud University (https://github.com/Taalmonsters/WhiteLab2.0).
The Language of Leiden Corpus (LoL Corpus) is a diachronic corpus of written Dutch that comprises textual materials related to the city of Leiden from various social domains. The corpus was built to study language change in Dutch resulting from language contact with French. Unique to this corpus is the inclusion of social domain as a variable and the focus on one locality, namely the city of Leiden. The LoL Corpus was built at the Universiteit Leiden and is made available by the Instituut voor de Nederlandse Taal.
The Language of Leiden Corpus is constructed along two dimensions: time and social domain.
Time: The LoL Corpus covers the sixteenth to the nineteenth century. This 400-year period is divided into eight periods of 50 years each: 1500-1549, 1550-1599, etc. Textual material was chosen from around the middle of each period when possible (around 1525, 1575, etc.), or the selected material was equally divided over the whole 50-year period.
Social domain: The LoL Corpus comprises textual material from seven social domains representative of the social history of Leiden: Academia, Charity, Economy, Literature, Private Life, Public Opinion, and Religion. For each domain, one or two genres were selected: minutes of the university board for Academia; wills with bequests to charity organisations for Charity; ordinances of the city council aimed at the Leiden industries and requests from those industries to the city council for Economy; theatre plays for Literature; letters to friends and family for Private Life; newspaper articles for Public Opinion; and minutes of church council meetings for Religion.
The four social domains Academia, Charity, Economy, and Religion are all represented by genres that can be considered administrative. Therefore, another dimension of the LoL Corpus is the division between administrative and non-administrative texts. The administrative texts yielded very different results in various corpus analyses compared to the non-administrative texts, which indicates the importance of this additional dimension.
All textual materials were manually transcribed from photographs of the original documents and checked multiple times.
The LoL Corpus consists of 251,417 words. We aimed for 5,000 words per period for each social domain, with a limit of 1,250 words per scribe per period and per social domain. This means we included at least four texts or fragments in the LoL Corpus for each combination of a period and a social domain. The figure below shows the word count in the LoL Corpus per period and social domain. Due to a lack of texts available in the archives for some periods and social domains – especially in the first half of the sixteenth century – some cells are empty.
| Domain | Academy | Charity | Economy | Religion | Literature |
Private
Life |
Public opinion | |
| Genre | Administrative | Non-administrative | ||||||
| Minutes | Wills |
Ordinances
Requests |
Minutes | Plays | Letters |
Newspaper
articles |
||
| Period |
Subtotal
period |
|||||||
| 1500−1549 | - | 5,027 | 5,072 | - | - | - | - | 10,099 |
| 1550−1599 | 5,046 | 5,229 | 5,118 | 5,305 | 5,116 | 4,449 | - | 30,263 |
| 1600−1649 | 5,124 | 5,131 | 5,276 | 5,259 | 5,138 | 5,114 | - | 31,042 |
| 1650−1699 | 5,177 | 5,111 | 5,314 | 5,128 | 5,143 | 5,032 | 5,053 | 35,958 |
| 1700−1749 | 5,025 | 5,082 | 5,189 | 5,153 | 5,183 | 5,421 | 5,111 | 36,164 |
| 1750−1799 | 5,067 | 5,290 | 5,212 | 5,128 | 5,112 | 5,116 | 5,095 | 36,020 |
| 1800−1849 | 5,160 | 5,114 | 5,100 | 5,258 | 5,173 | 5,145 | 5,084 | 36,034 |
| 1850−1899 | 5,157 | 5,037 | 5,052 | 5,271 | 5,194 | 5,038 | 5,088 | 35,837 |
|
Subtotal
domains |
35,756 | 41,021 | 41,333 | 36,502 | 36,059 | 35,315 | 25,431 | 251,417 |
The LoL Corpus was developed at the Universiteit Leiden as part of the research project ‘Pardon my French. Dutch-French Language Contact in The Netherlands, 1500-1900’, funded by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO).
Assendelft, Brenda (2023). Verfransing onder de loep. Nederlands-Frans taalcontact (1500-1900) vanuit historisch-sociolinguïstisch perspectief. Amsterdam: LOT. Open access: https://www.lotpublications.nl/verfransing-onder-de-loep
Rutten, Gijsbert, Andreas Krogull, Brenda Assendelft & Jill Puttaert (2026). Pardon my French? Dutch-French Language Contact in the Netherlands (1500-1900). Amsterdam & Philadelphia: Benjamins. Open access: https://benjamins.com/catalog/ahs.15
To make the Language of Leiden Corpus more accessible, suggestions for query expansion are given, using the INT lexicon service with the historical computational lexicon GiGaNT-HILEX.
The current version of GiGaNT-HILEX in the lexicon service contains the lexicon modules based on the Dictionary of the Dutch Language (Woordenboek der Nederlandsche Taal, WNT) and the Dictionary of Middle Dutch (Middelnederlandsch Woordenboek, MNW).
If you want to make use of this service, please contact Katrien Depuydt (katrien.depuydt@ivdnt.org).
When referring to the LoL Corpus, please use the following reference:
Language of Leiden Corpus. Compiled by Brenda Assendelft & Gijsbert Rutten, with the help of Hanna Butter, Katharina Gunkler, Jacoline Maes, Odette Pielage & Marijke van der Wal. 1st release April 2026. Available at the Dutch Language Institute: https://hdl.handle.net/10032/tm-a3-d7.
For BlackLab:
Software available at https://github.com/instituutnederlandsetaal/BlackLab
Does, Jesse de, Jan Niestadt & Katrien Depuydt (2017), Creating research environments with BlackLab. In: Jan Odijk and Arjan van Hessen (eds.) CLARIN in the Low Countries, pp. 151-165. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi
For the corpus frontend:
Software available at: https://github.com/instituutnederlandsetaal/blacklab-frontend
Logo provenance:
Title page of 1743 edition of Reynerius Bontius, Belegering en Ontsetting der stadt Leyden, found at the Census Nederlands Toneel page Reynerius Bontius - Belegering ende het ontset der stadt Leyden - 1645.