Refined personal name data from the census book of Vodskaja pjatina

Kanner, Antti

Refined personal name data from the census book of Vodskaja pjatina

Date

2021-01-13, 2021-01-13

Creator/contributor

Kanner, Antti

Publication Type

dataset

Repositories

Zenodo

Description

The data contains approximately 36,000 personal names derived from medieval Russian documentation. More preciously, names are collected from an edited version of the census book of Vodskaja pjatina, which was one of the five administrative areas in the late 15th century Novgorod. Editions were compiled in parts and the first two, which cover the northernmost region, are called Переписная окладная книга по новугороду вотской пятины (1851, 1852)(POKV I‒II). The third part of the book series Новгородские пистсовые книги (1868)(NPK III) covers the southern and western parts of the study area. The process of obtaining the personal from the inscription has been following: First, editions of the census book were obtained as scanned PDF files. These were transformed as editable copies by using OCR (=Optical Character Recognition) software Abbyy. The program read the original mid-19th century Russian text adequately with its old Russian alphabet package. After the initial corrections, a Python script was written to harvest the personal names. This was based on exploiting the systematic formalities in how most of the names were presented in the census book. The script looked for abbreviations “дв.” and “д.” and extracted all following capitalized words until section end markers “.”, “;” or “:”. As an output, a name to pogost matrix was produced, which held the raw frequencies of each word in each pogost. The process of cleaning the name data, in turn, has been done mostly by data wrangling program OpenRefine in following manner: For starters, all name forms shorter than four characters were removed as there were no personal names consisting of three or less letters. Furthermore, nouns that were not names were removed. This meant discarding expressions that described person’s special feature or profession, like such as being a widow (“вдова”) or working as a deacon (“діакъ”). For some reason, editors followed inconsistent conventions in capitalizing these non-name nouns. In addition, some orthographical and morphological harmonization was done on the data. The letter ы was cut from the end of bynames, where it denotes plurality. Similarity of so called soft and hard signs, ь and ъ caused some problems. As the latter one is not used in contemporary Russian and was not used in the original documents either (Неволин 1853 : 4 (in Appendix 1)) it was removed. The soft sign ь was also removed because it was absent in the original documents and it had been used inconsistently by the editors. The letter ѣ (yat) is rarely used in personal names but nevertheless, it was changed to е (like as it is in contemporary Russian) as since it was often confused with soft and hard signs (ь and ъ). Furthermore, the letter ѳ (fita) was often erroneously recognized as о or е. As it is only found in NPK III and only in the beginning of certain names, which all are also written with “Ф” (e.g. “Ѳедко” vs. “Федко”), it was replaced with Ф. In the second phase most of the erroneous orthographies were corrected. We do not detail herescribe all the OCR-errors here that were found, but in the following a short description is given of the most significant corrections. There were, for example, many letters whose similarity caused problems for the OCR-program (e.g.  и / й and б / в). In these cases, the correct orthography was sought in the census book editions and accordingly, Openrefine was used to change erroneous forms to right correct ones. After the corrections were made, the number of name types (= name variants) was reduced from 4942 to 2748. The Overall overall number of name tokens was dropped as well: from 36,405 to 35,726. Of the name types, more than half (1484) have only one occurrence. The refined and harmonized data is published as pogost-by-name frequency tabulations (pogost, equivalent of English parish). The file is in tab-delimited file (.tsv) format. References: Неволин, К. А. 1853, О пятинах и погостах новгородских в XVI веке, с приложением карты,  Санкт-Петербург (Из Записок Императорского русского географического общества, Кн. VIII). NPK III = Новгородские писцовые книги, Т. 3 : Переписная оброчная книга Вотской пятины, 1500 года, 1868, 1868, Санкт Петербург. POKV I, II = Переписная окладная книга по Новугороду Вотьской пятины, 1851, 1852, Имп. Моск. о-во истории и древностей рос., Москва.  

Link to original dataset

https://doi.org/10.5281/zenodo.4436307

Keyword

personal names, anthroponyms, Russian history, Finnic, digital humanities, onomastics

View full metadata

University of Helsinki

University of Helsinki Data catalogue

Refined personal name data from the census book of Vodskaja pjatina

Restricted Availability

Date

Persistent identifier of the Data Catalogue metadata

Creator/contributor

Editor

Journal title

Journal volume

Publisher

Publication Type

Peer Review Status

Repositories

Access rights

ISBN

ISSN

Description

Link to original dataset

Keyword (yso)

Keyword

Publication Series

Journal title

Location of the original dataset