Abstract:
We describe a statistical framework for reconstructing the sequence of transmission
events between observed cases of an endemic infectious disease
using genetic, temporal and spatial information. Previous approaches to reconstructing
transmission trees have assumed all infections in the study area
originated from a single introduction and that a large fraction of cases were
observed. There are as yet no approaches appropriate for endemic situations
in which a disease is already well established in a host population and
in which there may be multiple origins of infection, or that can enumerate
unobserved infections missing from the sample. Our proposed framework
addresses these shortcomings, enabling reconstruction of partially observed
transmission trees and estimating the number of cases missing from the
sample. Analyses of simulated datasets show the method to be accurate in
identifying direct transmissions, while introductions and transmissions via
one or more unsampled intermediate cases could be identified at high to
moderate levels of case detection. When applied to partial genome sequences
of rabies virus sampled from an endemic region of South Africa, our
method reveals several distinct transmission cycles with little contact between
them, and direct transmission over long distances suggesting significant
anthropogenic influence in the movement of infected dogs.