Abstract:
In the 21st century, our worlds are becoming increasingly digitised and the business world is
no exception. Financial transactions and accounting data are stored electronically, allowing for
sharing and electronic processing. One of the most popular formats employed to store such
data is the eXtensible Markup Language (XML).
Together with digitisation, crime involving electronic and digital means (cyber-crime) is
rising sharply. Investigating acts of cyber-crime involve specialist skills, called digital forensics.
Accounting data stored in XML format is particularly vulnerable to unauthorised data modification
(tampering) due XML’s requirement to be human-readable. As a result, cyber-criminals
can easily commit fraud or obtain financial gain by tampering with XML financial data in this
manner. However, detecting such tampering is extremely difficult due to the so-called big data
problem or needle-in-a-haystack problem. This involves searching for a particular item (in
this case, the set of changes) in a large set of data (the entire XML accounting data file). To
exacerbate the problem, it is not known whether tampering has occurred in any given XML
accounting data file, causing one to possibly search for evidence of tampering which does not
even exist.
Traditional approaches to isolate such tampering is not feasible. Firstly, due to the big data
problem, using a sequential search of all data contained in the XML file to detect tampering is
not efficient. Also, testing for data tampering using standard accounting rules is not possible,
as modified data may still be valid in terms of accounting rules. Detecting data tampering in
XML data therefore calls for a novel approach, forming the foundation of this work.
This study aims to enable an investigator to determine whether tampering occurred in a specific
set of XML financial data as well as reconstructing the events leading to the tampering in
order to determine the extent and detail of such tampering. In order enable the detection of potential
tampering with data, this study proposes the creation of an automated tool to detect such
irregularities in XML financial data. Using the parallel of forensic pathology, it is argued that
XML financial data needs to be analysed for any artefacts (irregularities) that are not consistent
with known normal (or “healthy”) XML financial accounting data. This study furthermore argues
that these artefacts are often noted in patterns, allowing one to attribute potential causality
to groups of artefacts. It is also noted that compilers are typically used to parse input and detect
any input not conforming to a pre-defined set of rules describing normal input. As a result it
is therefore proposed that compilers and similar techniques should be used to protect financial
data on the merit of their capabilities regarding parsing of data and error reporting and/or error
correction.
Furthermore, the work performed as part of this study suggests a means to enable an investigator
to reconstruct the events leading to the tampering in order to determine the extent
and detail of the tampering. As data regarding the history and extent of changes is typically not retained by either the operating system or the XML data format itself, it is proposed that
instrumentation be employed to record the additional data necessary in order to reconstruct
events.
This is achieved by using a combination of version control and audit logging to ensure
that data is available to reconstruct tampering events. The XML Accounting Trail Model is
therefore proposed to collect data about all modifications affecting the file containing XML
accounting data. The model also proposes the combination of digital signatures and a reference
monitor to ensure that all changes to the XML data files are recorded and that the XML
Accounting Trail Model cannot be circumvented by direct editing of the XML file.