Temporal data is common in real-world datasets. Analysis of such data, for example by means of clustering algorithms, can be difficult due to its dynamic behaviour. There are various types of changes that may occur to clusters in a dataset. Firstly, data patterns can migrate between clusters, shrinking or expanding the clusters. Additionally, entire
clusters may move around the search space. Lastly, clusters can split and merge.
Data clustering, which is the process of grouping similar objects, is one approach to
determine relationships among data patterns, but data clustering approaches can face
limitations when applied to temporal data, such as difficulty tracking the moving clusters.
This research aims to analyse the ability of particle swarm optimisation (PSO)
and differential evolution (DE) algorithms to cluster temporal data. These algorithms
experience two weaknesses when applied to temporal data. The first weakness is the
loss of diversity, which refers to the fact that the population of the algorithm converges,
becoming less diverse and, therefore, limiting the algorithm’s exploration capabilities.
The second weakness, outdated memory, is only experienced by the PSO and refers to
the previous personal best solutions found by the particles becoming obsolete as the
environment changes. A data clustering algorithm that addresses these two weaknesses
is necessary to cluster temporal data.
This research describes various adaptations of PSO and DE algorithms for the purpose
of clustering temporal data. The algorithms proposed aim to address the loss of diversity
and outdated memory problems experienced by PSO and DE algorithms. These problems are addressed by combining approaches previously used for the purpose of dealing with temporal or dynamic data, such as repulsion and anti-convergence, with PSO and DE approaches used to cluster data. Six PSO algorithms are introduced in this research, namely the data clustering particle swarm optimisation (DCPSO), reinitialising data clustering particle swarm optimisation (RDCPSO), cooperative data clustering particle swarm optimisation (CDCPSO), multi-swarm data clustering particle swarm optimisation (MDCPSO), cooperative multi-swarm data clustering particle swarm optimisation (CMDCPSO), and elitist cooperative multi-swarm data clustering particle swarm optimisation (eCMDCPSO). Additionally, four DE algorithms are introduced, namely the data clustering differential evolution (DCDE), re-initialising data clustering differential evolution (RDCDE), dynamic data clustering differential evolution (DCDynDE), and cooperative dynamic data clustering differential evolution (CDCDynDE).
The PSO and DE algorithms introduced require prior knowledge of the total number of
clusters in the dataset. The total number of clusters in a real-world dataset, however, is
not always known. For this reason, the best performing PSO and best performing DE are
compared. The CDCDynDE is selected as the winning algorithm, which is then adapted
to determine the optimal number of clusters dynamically. The resulting algorithm is the
k-independent cooperative data clustering differential evolution (KCDCDynDE) algorithm, which was compared against the local network neighbourhood artificial immune system (LNNAIS) algorithm, which is an artificial immune system (AIS) designed to cluster temporal data and determine the total number of clusters dynamically. It was
determined that the KCDCDynDE performed the clustering task well for problems with
frequently changing data, high-dimensions, and pattern and cluster data migration types.