Geographical and spatial descriptions in the premodern world are structurally different from the modern era, where spatial understanding is based on cartographic navigation. This paper presents an experimental process to tag, retrieve, and identify geographical information as described in premodern primary sources, together with the issues and possible solutions. The proposed method defines specific categories of geographical information and a markdown system to mark these categories in the source. Having tagged the data, we extract it and geographical locations and their connections are identified through a heuristic approach: the extracted geographical entities are initially aligned with existing geographical references and secondary sources. String similarity approaches might provide fuzzy identifications which need to be verified and disambiguated. In this paper, we describe the process of annotation and extraction of geographical descriptions, experiment some toponyms matching metrics, report the results, and offer possible solutions to handle disambiguation through the existing contextual information in the source. The process is applied to two different datasets, proposed as test cases: a classical Arabic geographical text and a Roman itinerary.
Premodern Geographical Description: Data Retrieval and Identification
Chiara Palladino
Data Curation
;
2017-01-01
Abstract
Geographical and spatial descriptions in the premodern world are structurally different from the modern era, where spatial understanding is based on cartographic navigation. This paper presents an experimental process to tag, retrieve, and identify geographical information as described in premodern primary sources, together with the issues and possible solutions. The proposed method defines specific categories of geographical information and a markdown system to mark these categories in the source. Having tagged the data, we extract it and geographical locations and their connections are identified through a heuristic approach: the extracted geographical entities are initially aligned with existing geographical references and secondary sources. String similarity approaches might provide fuzzy identifications which need to be verified and disambiguated. In this paper, we describe the process of annotation and extraction of geographical descriptions, experiment some toponyms matching metrics, report the results, and offer possible solutions to handle disambiguation through the existing contextual information in the source. The process is applied to two different datasets, proposed as test cases: a classical Arabic geographical text and a Roman itinerary.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.