Finite state automaton construction through regular expression hashing

Show simple item record

dc.contributor.advisor Kourie, Derrick G. en
dc.contributor.advisor Watson, Bruce William en
dc.contributor.postgraduate Coetser, Rayner Johannes Lodewikus en
dc.date.accessioned 2013-09-07T11:44:37Z
dc.date.available 2010-08-25 en
dc.date.available 2013-09-07T11:44:37Z
dc.date.created 2010-04-12 en
dc.date.issued 2009 en
dc.date.submitted 2010-08-25 en
dc.description Dissertation (MEng)--University of Pretoria, 2009. en
dc.description.abstract In this study, the regular expressions forming abstract states in Brzozowski’s algorithm are not remapped to sequential state transition table addresses as would be the case in the classical approach, but are hashed to integers. Two regular expressions that are hashed to the same hash code are assigned the same integer address in the state transition table, reducing the number of states in the automaton. This reduction does not necessarily lead to the construction of a minimal automaton: no restrictions are placed on the hash function hashing two regular expressions to the same code. Depending on the quality of the hash function, a super-automaton, previously referred to as an approximate automaton, or an exact automaton can be constructed. When two regular expressions are hashed to the same state, and they do not represent the same regular language, a super-automaton is constructed. A super-automaton accepts the regular language of the input regular expression, in addition to some extra strings. If the hash function is bad, many regular expressions that do not represent the same regular language will be hashed together, resulting in a smaller automaton that accepts extra strings. In the ideal case, two regular expressions will only be hashed together when they represent the same regular language. In this case, an exact minimal automaton will be constructed. It is shown that, using the hashing approach, an exact or super-automaton is always constructed. Another outcome of the hashing approach is that a non-deterministic automaton may be constructed. A new version of the hashing version of Brzozowski’s algorithm is put forward which constructs a deterministic automaton. A method is also put forward for measuring the difference between an exact and a super-automaton: this takes the form of the k-equivalence measure: the k-equivalence measure measures the number of characters up to which the strings of two regular expressions are equal. The better the hash function, the higher the value of k, up to the point where the hash function results in regular expressions being hashed together if and only if they have the same regular language. Using the k-equivalence measure, eight generated hash functions and one hand coded hash function are evaluated for a large number of short regular expressions, which are generated using G¨odel numbers. The k-equivalence concept is extended to the average k-equivalence value in order to evaluate the hash functions for longer regular expressions. The hand coded hash function is found to produce good results. Copyright en
dc.description.availability unrestricted en
dc.description.department Computer Science en
dc.identifier.citation Coetser, RJL 2009, Finite state automaton construction through regular expression hashing, MSc dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/27536 > en
dc.identifier.other E10/460/gm en
dc.identifier.upetdurl http://upetd.up.ac.za/thesis/available/etd-08252010-133710/ en
dc.identifier.uri http://hdl.handle.net/2263/27536
dc.language.iso en
dc.publisher University of Pretoria en_ZA
dc.rights © 2009, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. en
dc.subject Regular expression hashing en
dc.subject Finite state en
dc.subject Super-automaton en
dc.subject Approximate automaton en
dc.subject UCTD en_US
dc.title Finite state automaton construction through regular expression hashing en
dc.type Dissertation en


Files in this item

This item appears in the following Collection(s)

Show simple item record