ZASCA-sum : a dataset of the South Africa supreme courts of appeal judgments and media summaries for legal documents summarization research
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Elsevier
Abstract
This paper presents ZASCA-Sum, a novel dataset comprising judgments from the South Africa Supreme Court of Appeal and their manually curated media summaries. The dataset, collected from the court's official website, includes 4171 judgments, of which 2118 have summary pairs. The judgments and summaries have been extracted and prepared to support legal document summarization tasks across supervised, semi-supervised, and unsupervised settings. This paper provides a detailed description of the dataset, covering the data collection process, timeline, processing, and potential applications in the field. We provide the token-count distribution and analysis of the judgments and summaries that can be accommodated off-the-shelf by current summarization models with the largest input token size. The dataset, split into training, validation, and test sets, is made publicly available to encourage research in legal summarization. In addition to document summarization, researchers can use this data to localize English-centric models to support the South African dialect.
Description
DATA AVAILABILITY : ZASCA-Sum: A Dataset of the South Africa Supreme Courts of Appeal Judgments and Media Summaries for Legal Documents Summarization Research (Original data) (Huggingface).
Keywords
Natural language processing (NLP), Document summarization, Legal summarization, Summarization corpora, Supreme Court of Appeal of South Africa
Sustainable Development Goals
SDG-16: Peace, justice and strong institutions
Citation
Abdulmumin, I & Marivate, V. 2025, 'ZASCA-sum : a dataset of the South Africa supreme courts of appeal judgments and media summaries for legal documents summarization research', Data in Brief, vol. 60, art. 111567, pp. 1-14. https://doi.org/10.1016/j.dib.2025.111567.
