Record Linkage
Jun.-Prof. Dr. Mark Hall
Sommersemester 2019
Prosopographie
Datenquellen
Problemstellen in den Daten
Fehlende Daten
Namensunsicherheit
Datumsunsicherheit
Verknüpfungsmethoden
Wortähnlichkeit
Levenshtein Distanz
Levenshtein Distanz
Apel → Opel = 1
Apel → Apelt = 1
Apel → Opelt = 2
Jaro-Winkler Distanz
Phonetische Ähnlichkeit
Soundex
tom → t5
tommy → t5
Kölner Phonetik
Apel → 0305 → 035
Opel → 0305 → 035
Apelt → 03058 → 0358
Verknüpfungsalgorithmen
Literatur
- James J. Feigenbaum. Automated census record linking: a machine learning approach, 2016. URL https://open.bu.edu/handle/2144/27526.
- Saskia Hin, Dalia A. Conde, and Adam Lenart. New light on roman census papyri through semi-automated record linkage. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 49(1):50–65, 2016. doi: 10.1080/01615440.2015.1071226. URL https://doi.org/10.1080/01615440.2015.1071226.
- Catherine G. Massey. Playing with matches: An assessment of accuracy in linked historical data. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 50(3):129–143, 2017. doi:10.1080/01615440.2017.1288598. URL https://doi.org/10.1080/01615440.2017.1288598.