To the best of our knowledge, the only available implementations of FA-based string recognizers are the so-called conventional table-driven algorithm and, of course, its hardcoded counterpart suggested by Thompson, Penello, and DeRemer in 1967, 1986, and 2004 respectively. However, our early experiments have shown that the performance of both implementations is hampered by the random access nature of the automaton’s transition table in the case of table-driven, and also the random access nature of the directly executable instructions that make up each hardcoded state. Moreover, the problem of memory load and instruction load are also performance bottlenecks of these algorithms, since, as the automaton size grows, more space in memory is required to hold data/instructions relevant to the states. This thesis exploits the notion of cache optimization (that requires good data or instructions organization) in investigating various enhancements of both table-driven and hardcoding. Functions have been used to formally define the denotational semantics of string recognizers. These functions rely on various so-called strategy variables that are integrated into the formal definition of each recognizer. By appropriately selecting these variables, the conventional algorithms may be described, without loss of generality. By specializing these strategy variables, the new and enhanced recognizers can be denotationally described, and resulting algorithms can then be implemented. We first introduce the so-called Dynamic State Allocation (DSA) strategy regarded as a sort of Just-In-time (JIT) implementation of FA-based string recognizers whereby a predefined portion of the memory is reserved for acceptance testing. Then follows the State pre-Ordering (SpO) strategy that assumes some prior knowledge on the order in which states would be visited. In this case, acceptance testing takes place once each state have been allocated to its new position in memory. The last strategy referred to as the Allocated Virtual Caching (AVC) strategy is based on the premise that a portion of the memory originally occupied by the automaton’s states is virtually used as a sort of cache memory in which acceptance testing takes place, enabling therefore, the exploitation of the various performance enhancement notions on which hardware cache memory relies. It is shown that the algorithms can be classified in a taxonomy tree which is further mapped into a class-diagram that represents the design of a toolkit for FA-based string recognition. Also given in the thesis are empirical results that indicate that the algorithms suggested can, in general, outperform their conventional counterparts when recognizing large and appropriately chosen input strings.
Thesis (PhD (Computer Science))--University of Pretoria, 2007.