Elsevier

Digital Investigation

Volume 30, September 2019, Pages 117-126
Digital Investigation

Automated recovery of damaged audio files using deep neural networks

https://doi.org/10.1016/j.diin.2019.07.007Get rights and content

Highlights

  • Methods to recover damaged audio files using deep neural networks are proposed.

  • Proposed methods differ from conventional file carving-based method.

  • The first method is to recover the damaged audio file by inferring header information.

  • The second method is to locate, and identify format and encoding type of the damaged audio file.

Abstract

In this paper, we propose two methods to recover damaged audio files using deep neural networks. The presented audio file recovery methods differ from the conventional file carving-based recovery method because the former restore lost data, which are difficult to recover with the latter method. This research suggests that recovery tasks, which are essential yet very difficult or very time consuming, can be automated with the proposed recovery methods using deep neural networks. We apply feed-forward and Long Short Term Memory neural networks for the tasks. The experimental results show that deep neural networks can distinguish speech signals from non-speech signals, and can also identify the encoding methods of the audio files at the level of bits. This leads to successful recovery of the damaged audio files, which are otherwise difficult to recover using the conventional file-carving-based methods.

Introduction

The past century has shown a wide adoption of audio recording devices, including smartphones. Thus, the providing of audio files as evidence in court settings has become more common. Audio files that are claimed to be legal evidence usually proceed through the conventional validation process, whereby investigators listen to and identify the contents and examine counterfeit audio files to establish a legal case. However, audio files collected through digital devices, such as smartphones, can be deleted owing to malicious purposes or lack of storage in the devices. For the deleted files to qualify as legal evidence, the process of restoring the audio files from storage, where the deletion occurred, and validating the data are required.

In a typical file recovery environment, file carving—a method to restore deleted files in the file system—has been widely adopted and applied (Poisel et al., 2011). However, the file-carving method often results in incomplete recovery of audio files, which are thus unable to be heard. For instance, after an audio file is deleted from a file system and overwriting takes place, the data in the region might not be restored, thus preventing the complete recovery of the file. Moreover, if the damaged block is an essential part in playing the audio file (i.e., headers of audio files), one would be fully unable to play the file on account of the partial yet critical damage. Therefore, to restore the damaged audio files, we should devise a new approach to recovery. Such a recovery method would involve inferring the lost data based on the data that remain in the file. When a file proceeds through a complete recovery process, the process should successfully recover the lost data that the conventional methods cannot restore. The focus of the present research is the application of deep neural networks for automation of the tasks that are vital yet unsuitable to process manually owing to the difficulty level and time required.

The remainder of this paper is organized as follows. Section 2 explains the conventional file carving method. Section 3 outlines the application of deep neural networks to the present objective. Sections Experiments, Case studies present the experiments and results verifying the accuracy of the proposed deep neural network method, and we conclude the paper in Section 6.

Section snippets

File-carving methods

Most existing file-recovery methods are file-carving-based methods based on the structures and contents of the files deleted from the file system. Fig. 1 illustrates the full recovery process of a file-carving method. The file system indicates the complete system that manages all the files employed by users. When a user saves a specific file, the file system generates metadata with information, including the physical location of the saved file and the generated time. It prevents other files

Proposed recovery method based on deep neural networks

A WAV file with a corrupted header, which prevents the file from playing, can be fixed with the proposed recovery method to infer damaged information based on data other than the header (i.e., encoded signals). This recovery process is different from that of the existing file-carving method and addresses problems that the existing method cannot solve. Nevertheless, the proposed process requires tasks that are not suitable to perform manually because the tasks are challenging and time-intensive.

Experiments

In this study, we designed and conducted experiments to verify the performance and application feasibility of the proposed methods using deep neural networks. We hypothesized the contexts wherein the existing file-carving method cannot restore the original waveform at all. Thus, we did not consider the existing restoration system in the experiments. Construction and training of deep neural networks were implemented in a Keras (Chollet and others, 2015) environment with Tensor-Flow (Martin Abadi

Case studies

This section introduces a case study to elucidate the audio file identification method and its applications among the proposed file recovery methods. We hypothesized the case in which the actual audio-file recovery occurs using the identification method. We therefore generated a file for restoration. First, a non-audio file with an adequate size was prepared. Next, a certain section of the file was deleted, and a segment of an audio file with corrupted header was inserted into the deleted

Conclusion

It is difficult to restore damaged audio files using the conventional file-carving method. In this paper, we proposed a recovery method that can infer the damaged information from the files. Herein, we propose the application of deep neural networks to develop file-recovery methods. Experiments were conducted to identify whether the deep neural networks could perform the given tasks, specifically the tasks that were essential for inferring the lost data, which are too difficult and

Acknowledgement

This work was supported by the research service of the Supreme Prosecutors' Office (research title: study on recovery methods for damaged audio files).

References (14)

  • Ashish∼Agarwal MartiñAbadi et al.

    {TensorFlow}: Large-Scale Machine Learning on Heterogeneous Systems

    (2015)
  • F. Chollet

    Keras

    (2015)
  • G. Fant

    Acoustic Theory of Speech Production: with Calculations Based on X-Ray Studies of Russian Articulations

    (1971)
  • I. Goodfellow et al.

    Deep Learning

    (2016)
  • D. Hewlett et al.

    WikiReading: a novel large-scale language understanding task over wikipedia

    (2016)
  • G. Hinton et al.

    Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups

    IEEE Signal Process. Mag.

    (2012)
  • G. Hinton et al.

    Neural Networks for Machine Learning-Lecture 6a-Overview of Mini-Batch Gradient Descent

    (2012)
There are more references available in the full text version of this article.

Cited by (0)

View full text