ZASCA-sum : a dataset of the South Africa supreme courts of appeal judgments and media summaries for legal documents summarization research

dc.contributor.authorAdulmumin, Idris
dc.contributor.authorMarivate, Vukosi
dc.date.accessioned2025-11-25T10:17:54Z
dc.date.available2025-11-25T10:17:54Z
dc.date.issued2025-06
dc.descriptionDATA AVAILABILITY : ZASCA-Sum: A Dataset of the South Africa Supreme Courts of Appeal Judgments and Media Summaries for Legal Documents Summarization Research (Original data) (Huggingface).
dc.description.abstractThis paper presents ZASCA-Sum, a novel dataset comprising judgments from the South Africa Supreme Court of Appeal and their manually curated media summaries. The dataset, collected from the court's official website, includes 4171 judgments, of which 2118 have summary pairs. The judgments and summaries have been extracted and prepared to support legal document summarization tasks across supervised, semi-supervised, and unsupervised settings. This paper provides a detailed description of the dataset, covering the data collection process, timeline, processing, and potential applications in the field. We provide the token-count distribution and analysis of the judgments and summaries that can be accommodated off-the-shelf by current summarization models with the largest input token size. The dataset, split into training, validation, and test sets, is made publicly available to encourage research in legal summarization. In addition to document summarization, researchers can use this data to localize English-centric models to support the South African dialect.
dc.description.departmentComputer Science
dc.description.librarianam2025
dc.description.sdgSDG-16: Peace, justice and strong institutions
dc.description.sponsorshipThe ABSA UP Chair of Data Science.
dc.description.urihttps://www.sciencedirect.com/journal/data-in-brief
dc.identifier.citationAbdulmumin, I & Marivate, V. 2025, 'ZASCA-sum : a dataset of the South Africa supreme courts of appeal judgments and media summaries for legal documents summarization research', Data in Brief, vol. 60, art. 111567, pp. 1-14. https://doi.org/10.1016/j.dib.2025.111567.
dc.identifier.issn2352-3409 (online)
dc.identifier.other10.1016/j.dib.2025.111567
dc.identifier.urihttp://hdl.handle.net/2263/105482
dc.language.isoen
dc.publisherElsevier
dc.rights© 2025 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license.
dc.subjectNatural language processing (NLP)
dc.subjectDocument summarization
dc.subjectLegal summarization
dc.subjectSummarization corpora
dc.subjectSupreme Court of Appeal of South Africa
dc.titleZASCA-sum : a dataset of the South Africa supreme courts of appeal judgments and media summaries for legal documents summarization research
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Abdulnumin_ZASCAsum_2025.pdf
Size:
2.88 MB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: