Abstract:
Recent advances in technology have increased awareness of the necessity for automated systems in
people’s everyday lives. Artificial systems are more frequently being introduced into environments
previously thought to be too perilous for humans to operate in. Some robots can be used to extract
potentially hazardous materials from sites inaccessible to humans, while others are being developed
to aid humans with laborious tasks.
A crucial aspect of all artificial systems is the manner in which they interact with their immediate surroundings.
Developing such a deceivingly simply aspect has proven to be significantly challenging, as
it not only entails the methods through which the system perceives its environment, but also its ability
to perform critical tasks. These undertakings often involve the coordination of numerous subsystems,
each performing its own complex duty. To complicate matters further, it is nowadays becoming
increasingly important for these artificial systems to be able to perform their tasks in real-time.
The task of object recognition is typically described as the process of retrieving the object in a database
that is most similar to an unknown, or query, object. Pose estimation, on the other hand, involves
estimating the position and orientation of an object in three-dimensional space, as seen from an observer’s
viewpoint. These two tasks are regarded as vital to many computer vision techniques and and
regularly serve as input to more complex perception algorithms.
An approach is presented which regards the object recognition and pose estimation procedures as
mutually dependent. The core idea is that dissimilar objects might appear similar when observed
from certain viewpoints. A feature-based conceptualisation, which makes use of a database, is implemented
and used to perform simultaneous object recognition and pose estimation. The design
incorporates data compression techniques, originally suggested by the image-processing community,
to facilitate fast processing of large databases.
System performance is quantified primarily on object recognition, pose estimation and execution time
characteristics. These aspects are investigated under ideal conditions by exploiting three-dimensional
models of relevant objects. The performance of the system is also analysed for practical scenarios
by acquiring input data from a structured light implementation, which resembles that obtained from
many commercial range scanners.
Practical experiments indicate that the system was capable of performing simultaneous object recognition
and pose estimation in approximately 230 ms once a novel object has been sensed. An average
object recognition accuracy of approximately 73% was achieved. The pose estimation results were
reasonable but prompted further research. The results are comparable to what has been achieved using
other suggested approaches such as Viewpoint Feature Histograms and Spin Images.