Language of Leiden Corpus

About the Corpus application

The corpus application is developed by the Dutch Language Institute (Instituut voor de Nederlandse Taal or INT). The backend of the application is the BlackLab Lucene based search engine developed for corpora with token-based annotation (https://blacklab.ivdnt.org/). The web-based frontend is a further development of the corpus-frontend application developed by INT (https://github.com/instituutnederlandsetaal/blacklab-frontend) in CLARIN and CLARIAH projects. Its design is inspired by the first version of the OpenSoNaR user interface by Tilburg and Radboud University (https://github.com/Taalmonsters/WhiteLab2.0).

About the LoL Corpus

The Language of Leiden Corpus (LoL Corpus) is a diachronic corpus of written Dutch that comprises textual materials related to the city of Leiden from various social domains. The corpus was built to study language change in Dutch resulting from language contact with French. Unique to this corpus is the inclusion of social domain as a variable and the focus on one locality, namely the city of Leiden. The LoL Corpus was built at the Universiteit Leiden and is made available by the Instituut voor de Nederlandse Taal.

Structure of the LoL Corpus

The Language of Leiden Corpus is constructed along two dimensions: time and social domain.

Time: The LoL Corpus covers the sixteenth to the nineteenth century. This 400-year period is divided into eight periods of 50 years each: 1500-1549, 1550-1599, etc. Textual material was chosen from around the middle of each period when possible (around 1525, 1575, etc.), or the selected material was equally divided over the whole 50-year period.

Social domain: The LoL Corpus comprises textual material from seven social domains representative of the social history of Leiden: Academia, Charity, Economy, Literature, Private Life, Public Opinion, and Religion. For each domain, one or two genres were selected: minutes of the university board for Academia; wills with bequests to charity organisations for Charity; ordinances of the city council aimed at the Leiden industries and requests from those industries to the city council for Economy; theatre plays for Literature; letters to friends and family for Private Life; newspaper articles for Public Opinion; and minutes of church council meetings for Religion.

The four social domains Academia, Charity, Economy, and Religion are all represented by genres that can be considered administrative. Therefore, another dimension of the LoL Corpus is the division between administrative and non-administrative texts. The administrative texts yielded very different results in various corpus analyses compared to the non-administrative texts, which indicates the importance of this additional dimension.

Procedure

All textual materials were manually transcribed from photographs of the original documents and checked multiple times.

Size of the LoL Corpus

The LoL Corpus consists of 251,417 words. We aimed for 5,000 words per period for each social domain, with a limit of 1,250 words per scribe per period and per social domain. This means we included at least four texts or fragments in the LoL Corpus for each combination of a period and a social domain. The figure below shows the word count in the LoL Corpus per period and social domain. Due to a lack of texts available in the archives for some periods and social domains – especially in the first half of the sixteenth century – some cells are empty.

Overview of the LoL Corpus

Domain	Academy	Charity	Economy	Religion	Literature	Private Life	Public opinion
Genre	Administrative				Non-administrative
	Minutes	Wills	Ordinances Requests	Minutes	Plays	Letters	Newspaper articles
Period								Subtotal period
1500−1549	-	5,027	5,072	-	-	-	-	10,099
1550−1599	5,046	5,229	5,118	5,305	5,116	4,449	-	30,263
1600−1649	5,124	5,131	5,276	5,259	5,138	5,114	-	31,042
1650−1699	5,177	5,111	5,314	5,128	5,143	5,032	5,053	35,958
1700−1749	5,025	5,082	5,189	5,153	5,183	5,421	5,111	36,164
1750−1799	5,067	5,290	5,212	5,128	5,112	5,116	5,095	36,020
1800−1849	5,160	5,114	5,100	5,258	5,173	5,145	5,084	36,034
1850−1899	5,157	5,037	5,052	5,271	5,194	5,038	5,088	35,837
Subtotal domains	35,756	41,021	41,333	36,502	36,059	35,315	25,431	251,417

Funding

The LoL Corpus was developed at the Universiteit Leiden as part of the research project ‘Pardon my French. Dutch-French Language Contact in The Netherlands, 1500-1900’, funded by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO).

References for extensive information on the compilation of the LoL Corpus

Assendelft, Brenda (2023). Verfransing onder de loep. Nederlands-Frans taalcontact (1500-1900) vanuit historisch-sociolinguïstisch perspectief. Amsterdam: LOT. Open access: https://www.lotpublications.nl/verfransing-onder-de-loep

Rutten, Gijsbert, Andreas Krogull, Brenda Assendelft & Jill Puttaert (2026). Pardon my French? Dutch-French Language Contact in the Netherlands (1500-1900). Amsterdam & Philadelphia: Benjamins. Open access: https://benjamins.com/catalog/ahs.15

GiGaNT Lexicon service

To make the Language of Leiden Corpus more accessible, suggestions for query expansion are given, using the INT lexicon service with the historical computational lexicon GiGaNT-HILEX.

The current version of GiGaNT-HILEX in the lexicon service contains the lexicon modules based on the Dictionary of the Dutch Language (Woordenboek der Nederlandsche Taal, WNT) and the Dictionary of Middle Dutch (Middelnederlandsch Woordenboek, MNW).

If you want to make use of this service, please contact Katrien Depuydt (katrien.depuydt@ivdnt.org).

Credits

When referring to the LoL Corpus, please use the following reference:

Language of Leiden Corpus. Compiled by Brenda Assendelft & Gijsbert Rutten, with the help of Hanna Butter, Katharina Gunkler, Jacoline Maes, Odette Pielage & Marijke van der Wal. 1^st release April 2026. Available at the Dutch Language Institute: https://hdl.handle.net/10032/tm-a3-d7.

For BlackLab:

Software available at https://github.com/instituutnederlandsetaal/BlackLab

Does, Jesse de, Jan Niestadt & Katrien Depuydt (2017), Creating research environments with BlackLab. In: Jan Odijk and Arjan van Hessen (eds.) CLARIN in the Low Countries, pp. 151-165. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi

For the corpus frontend:

Software available at: https://github.com/instituutnederlandsetaal/blacklab-frontend

Logo provenance:

Title page of 1743 edition of Reynerius Bontius, Belegering en Ontsetting der stadt Leyden, found at the Census Nederlands Toneel page Reynerius Bontius - Belegering ende het ontset der stadt Leyden - 1645.