Fitness trackers equipped with accelerometers and global positioning systems are becoming
more popular among the running community. These devices allow runners across
the spectrum of athletic abilities to monitor their running metrics and track their performance
throughout their chosen routes. The size of the data sets and the frequency at
which it is generated place the tracking data from these devices into realm of big data.
There are calls from research elds focused on human locomotion during running to capitalise
on the data from tness trackers, in order to evaluate athletes in the real world
and outside of the sometimes unrealistic laboratory or clinical settings. Unfortunately,
the real world adds noise to the data and the signal from the data becomes obscured.
This dissertation explored the large tracking data sets from runners' running watches to
evaluate the extent of the noise and the possibilities to extract the signal from the data.
Data are cleaned and parametric as well as non-parametric regression analysis models are
tted to the data to nd interactions and aggregation methods that present the athlete
with a picture of his/her running form. These models may provide an athlete with a better
understanding of their own capabilities, which will help them improve their running
form and reduce risk factors attributed to poor form.
Results from the interaction models between running surface, cadence and pace suggest
that the running surface do have an e ect on cadence and running pace. However, the
distribution of pace per cadence level is extensive and skew in either direction, with the
-values for the tted models ranging between a weak 0.155 and moderate strength of
0.752 for the four case studies. The spread for road gradients (i.e. slopes) per cadence
level is large and also skew in either direction. The R2a
-values for the interaction models
for slope, cadence and pace range between 0.268 and 0.681. The data visualisations for
graded running is able to show the pattern of the data to a limited extent. The aggregated
distribution curves for cadence and pace serve as an extension on the interaction between
running surface, cadence and pace. Although all the distribution curve models had a R2-
value very close to 1, the generalised additive model outperformed the shape constrained
model with lower AIC-scores to t a smoothed line that represents the overall performance
of the athlete. The shape constrained models failed to pick up segmented improvements
in the running metrics, where the generalised additive models did pick up the changes in
the slope of the curves where the athlete's performance improved.
The data from tness trackers seem to hold potential to extend sport science research
in running, however the data may not always be a true representation of reality. This may
be due to its varying veracity and slow algorithm responses to changes in performance.
Dissertation (MEng)--University of Pretoria, 2018.