Abstract:
Much ecological research relies on existing multispecies distribution datasets. Such datasets, however, can vary considerably
in quality, extent, resolution or taxonomic coverage. We provide a framework for a spatially-explicit evaluation of
geographical representation within large-scale species distribution datasets, using the comparison of an occurrence atlas
with a range atlas dataset as a working example. Specifically, we compared occurrence maps for 3773 taxa from the widelyused
Atlas Florae Europaeae (AFE) with digitised range maps for 2049 taxa of the lesser-known Atlas of North European
Vascular Plants. We calculated the level of agreement at a 50-km spatial resolution using average latitudinal and
longitudinal species range, and area of occupancy. Agreement in species distribution was calculated and mapped using
Jaccard similarity index and a reduced major axis (RMA) regression analysis of species richness between the entire atlases
(5221 taxa in total) and between co-occurring species (601 taxa). We found no difference in distribution ranges or in the
area of occupancy frequency distribution, indicating that atlases were sufficiently overlapping for a valid comparison. The
similarity index map showed high levels of agreement for central, western, and northern Europe. The RMA regression
confirmed that geographical representation of AFE was low in areas with a sparse data recording history (e.g., Russia,
Belarus and the Ukraine). For co-occurring species in south-eastern Europe, however, the Atlas of North European Vascular
Plants showed remarkably higher richness estimations. Geographical representation of atlas data can be much more
heterogeneous than often assumed. Level of agreement between datasets can be used to evaluate geographical
representation within datasets. Merging atlases into a single dataset is worthwhile in spite of methodological differences,
and helps to fill gaps in our knowledge of species distribution ranges. Species distribution dataset mergers, such as the one
exemplified here, can serve as a baseline towards comprehensive species distribution datasets.