Computational analysis of medieval manuscripts: a new tool for analysis and mapping of medieval documents to modern orthography

dc.contributor.authorAhmad, Mushtag
dc.contributor.authorGruner, Stefan
dc.contributor.authorAfzal, Muhammad Tanvir
dc.contributor.emailsgruner@cs.up.ac.zaen_US
dc.date.accessioned2013-06-21T13:41:59Z
dc.date.available2013-06-21T13:41:59Z
dc.date.issued2012-12-01
dc.description.abstractMedieval manuscripts or other written documents from that period contain valuable information about people, religion, and politics of the medieval period, making the study of medieval documents a necessary pre-requisite to gaining in-depth knowledge of medieval history. Although tool-less study of such documents is possible and has been ongoing for centuries, much subtle information remains locked such manuscripts unless it gets revealed by effective means of computational analysis. Automatic analysis of medieval manuscripts is a non-trivial task mainly due to non-conforming styles, spelling peculiarities, or lack of relational structures (hyper-links), which could be used to answer meaningful queries. Natural Language Processing (NLP) tools and algorithms are used to carry out computational analysis of text data. However due to high percentage of spelling variations in medieval manuscripts, NLP tools and algorithms cannot be applied directly for computational analysis. If the spelling variations are mapped to standard dictionary words, then application of standard NLP tools and algorithms becomes possible. In this paper we describe a web-based software tool CAMM (Computational Analysis of Medieval Manuscripts) that maps medieval spelling variations to a modern German dictionary. Here we describe the steps taken to acquire, reformat, and analyze data, produce putative mappings as well as the steps taken to evaluate the findings. At the time of the writing of this paper, CAMM provides access to 11275 manuscripts organized into 54 collections containing a total of 242446 distinctly spelled words. CAMM accurately corrects spelling of 55% percent of the verifiable words.en_US
dc.description.librarianam2013en_US
dc.description.sponsorshipThanks to Georg Vogeler for his valuable suggestions about the algorithms. Thanks also to Jochen Graf and the Monasterium consortium for having given us access to the medieval dataset and for sharing valuable information about the existing EditMOM tools. Thanks to the Athabasca University, for providing a server to launch this tool, and thanks to theWeb Unit of the Computing Services Department at Athabasca for keeping the link alive.en_US
dc.description.urihttp://www.jucs.org/;internal&action=noaction&Parameter=1208164030958en_US
dc.format.extent21 p.en_US
dc.format.mediumPDFen_US
dc.identifier.citationAhmad, M, Gruner, S & Afzal, MT 2012, 'Computational analysis of medieval manuscripts : a new tool for analysis and mapping of medieval documents to modern orthography', Journal of Universal Computer Science, vol. 18, no. 20, pp. 2750-2770.en_US
dc.identifier.issn0948-695X
dc.identifier.urihttp://hdl.handle.net/2263/21689
dc.language.isoenen_US
dc.publisherGraz University of Technologyen_US
dc.rights© J.UCSen_US
dc.subjectMPEG spelling variationsen_US
dc.subjectPhonetic algorithmsen_US
dc.subjectComputational Analysis of Medieval Manuscripts (CAMM)en_US
dc.titleComputational analysis of medieval manuscripts: a new tool for analysis and mapping of medieval documents to modern orthographyen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ahmad_Computational(2012).pdf
Size:
292.68 KB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: