Abstract:
Analysing unstructured data with minimal contextual information is a challenge faced in spatial applications such as movement data. Movement data are sequences of time-stamped locations of a moving entity analogous to text data as sequences of words in a document. Text analytics is rich in methods to learn word embeddings and latent semantic clusters from unstructured data. In this work, the successes from probabilistic topic models which are used in natural language processing (NLP) were the inspiration for applying these methods on movement data. The motivation is based on the fact that topic models exhibit characteristics which are found both in clustering and dimensionality reduction techniques. Furthermore, the inferred matrices can be used as interpretable topic distributions for movement behaviour and the lower dimensional embeddings generated by the LDA model can be used to cluster movement behaviour.
In this work various existing techniques for trajectory clustering in the literature are explored and the advantages and disadvantages of each method are considered. The challenges of trajectory modelling with LDA are examined and solutions to these challenges are suggested. Lastly, the advantages of using LDA compared to traditional clustering techniques are discussed.
The analysis in this work explores the use of LDA to two use cases. Firstly, the ability of LDA to infer interpretable topics is explored by analysing the movement of jaguars in South America. Secondly, the ability of the LDA to cluster movement trajectories is investigated by clustering driver behaviour based on real world driving data. The results of the two experiments show that it is possible to derive interpretable topics and to cluster movement behavior of trajectories using the LDA model.