Abstract:
Medieval manuscripts or other written documents from that period contain
valuable information about people, religion, and politics of the medieval period, making
the study of medieval documents a necessary pre-requisite to gaining in-depth knowledge
of medieval history. Although tool-less study of such documents is possible and has
been ongoing for centuries, much subtle information remains locked such manuscripts
unless it gets revealed by effective means of computational analysis. Automatic analysis
of medieval manuscripts is a non-trivial task mainly due to non-conforming styles,
spelling peculiarities, or lack of relational structures (hyper-links), which could be used
to answer meaningful queries. Natural Language Processing (NLP) tools and algorithms
are used to carry out computational analysis of text data. However due to high
percentage of spelling variations in medieval manuscripts, NLP tools and algorithms
cannot be applied directly for computational analysis. If the spelling variations are
mapped to standard dictionary words, then application of standard NLP tools and algorithms
becomes possible. In this paper we describe a web-based software tool CAMM
(Computational Analysis of Medieval Manuscripts) that maps medieval spelling variations
to a modern German dictionary. Here we describe the steps taken to acquire,
reformat, and analyze data, produce putative mappings as well as the steps taken to
evaluate the findings. At the time of the writing of this paper, CAMM provides access
to 11275 manuscripts organized into 54 collections containing a total of 242446
distinctly spelled words. CAMM accurately corrects spelling of 55% percent of the verifiable
words.