How to split WAVE files larger than 4 GB to chunks Abstract

Transcription

How to split WAVE files larger than 4 GB to chunks Abstract
How to split WAVE files larger than 4
GB to chunks
1/5
Abstract
There are many variations of audio WAVE (.wav) files. Some of them have a limitation on the maximum possible file size
(4GB). This article shows how to split files that surpassed that limit to useable chunks.
This article as PDF
General Information about WAV: http://en.wikipedia.org/wiki/WAV
1
Sample Format and File Type
(this section is copied from http://ardour.org/files/reference/dsy298-ARDOUR.html)
1.1
32-bit floating point
This is the standard format in which Ardour saves recorded files. 32-bit floating point ensures the best quality for further
processing of your audio data, even if the target format of your recording is just the 16-bit audio-cd format. The downside of the
format is high disk space usage.
Excursion: The format also enables nice digital side effects like no digital clipping, i.e. if a track goes higher than 0dbfs, there
will be no distortion. Still, when reaching the output of (in most cases) the master, the digital signal will be transferred into an
analog fixed-point or integer signal (D/A conversion) and will clip in your analog equipment.
1.2
24-bit integer
The 24-bit integer format will save your recorded files in a good quality but smaller file size. Even if 24-bit is the highest
possible bit depth that a sound card can record, 32-bit floating point is recommended, since 24-bit files can suffer when heavily
signal-processed.
1.3
16-bit integer
16-bit integer is only recommended when you need to save disk space by all means. This is simple audio-CD quality, and the
files can audible suffer when processed by effects such as amplification and compression as unwanted noise can be brought up.
Even if mixing for a normal audio-cd as the final product, use the 32-bit floating point format and mix down to 16 bit at the very
end of the work flow.
1.4
Broadcast WAVE
The Broadcast WAVE format (extension .wav) was developed by the European Broadcasting Union in order to add specific
metadata to the existing WAVE audio file format. Broadcast WAVE files are typically compatible in applications which use the
WAVE format. The Broadcast WAVE format is limited in file size to 4GB.
1.5
WAVE
Originally developed by Microsoft, the WAVE format (extension .wav) is a popular audio file format that can be read by many
audio applications. WAVE files can store uncompressed or compressed audio data. The WAVE format is limited in file size to
4GB.
1.6
WAVE-64
The WAVE-64 file format (extension .w64) was originally developed by Sonic Foundry, which was later acquired by Sony. It is
a 64-bit file format as opposed to the 32-bit format used by WAVE and Broadcast WAVE formats. The WAVE-64 format is able
to store multichannel audio and the file size can exceed 4GB.-
How to split WAVE files larger than 4
GB to chunks
2
2/5
Splitting WAVE files exceeding the 4GB limit
In the following example, a file was recorded using Ardour, running with 32 bit floating point WAVE format. It was easy to
bypass the 4GB filesize limit while recording (new bytes are just happily added to the end of the file), but the problem starts
when trying to open/read that file afterwards. Let’s have a closer look.
The file we are going to split is a mono recording with a size > 4GB.
$ ls -l "Audio 2-3.wav"
-rw-r--r-- 1 tom tom 7861665848 Dez 28 15:37 Audio 2-3.wav
$ du -hs "Audio 2-3.wav"
7.4G
Audio 2-3.wav
With the file-command, we see it’s a mono RIFF 44100 Hz audio file:
$ file "Audio 2-3.wav"
Audio 2-3.wav: RIFF (little-endian) data, WAVE audio, mono 44100 Hz
More information we get with the command sndfile-info (part of libsndfile):
$ sndfile-info "Audio 2-3.wav"
Version : libsndfile-1.0.25
‘========================================
File : Audio 2-3.wav
Length : 7861665848
1v
Warning : filelength > 0xffffffff. This is bad!!!!
RIFF : 3566698544
WAVE
fmt : 16
Format
: 0x3 => WAVE_FORMAT_IEEE_FLOAT
Channels
: 1
Sample Rate
: 44100
Block Align
: 4
Bit Width
: 32
Bytes/sec
: 176400
fact : 4
frames : 1965416448
data : 3566698496
*** Unknown chunk marker (3EBC3578) at position 3566698552. Exiting parser.
‘---------------------------------------Sample Rate : 44100
Frames
: 891674624
Channels
: 1
Format
: 0x00010006
Sections
: 1
Seekable
: TRUE
Duration
: 05:36:59.379
v
1
What we expected, the filelength is larger than what the format allows (2ˆ32 - 1 bytes)
So let’s split up the large file to smaller chunks in a rudimentary way, using the split-command:
$ split -n 10 "Audio 2-3.wav" split_
We have now 10 split_*-files, that we need to rename for further processing to a .raw-ending
How to split WAVE files larger than 4
GB to chunks
3/5
$ ls -1 split_* | while read line; do mv "$line" "$line".raw; done
We now have the file split_aa.raw (containing the original wav header from Audio 2-3.wav), and 9 split_a*.rawfiles without a wav header (header-less, raw):
$ ls -l
-rw-r--r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--
1
1
1
1
1
1
1
1
1
1
1
tom
tom
tom
tom
tom
tom
tom
tom
tom
tom
tom
tom 7861665848 Dez 28 15:37 Audio 2-3.wav
tom 786166584 Dez 29 19:50 split_aa.raw
tom 786166584 Dez 29 19:50 split_ab.raw
tom 786166584 Dez 29 19:50 split_ac.raw
tom 786166584 Dez 29 19:50 split_ad.raw
tom 786166584 Dez 29 19:51 split_ae.raw
tom 786166584 Dez 29 19:51 split_af.raw
tom 786166584 Dez 29 19:51 split_ag.raw
tom 786166584 Dez 29 19:51 split_ah.raw
tom 786166584 Dez 29 19:52 split_ai.raw
tom 786166592 Dez 29 19:52 split_aj.raw
We see that the file-command on a file does not recognize that it’s audio data, because the header is missing.
$ file split_aj.raw
split_aj.raw: data
Using the sox-command, the raw, header-less files are treated so they have a header and become self-contained. The important
bits are to set the exact parameters of the originating raw file to read, it that case 44100 Hz, float, 32 bits, 1 channel (mono):
$ ls -1 split_*.raw | while read line; do sox -r 44100 -e float -b 32 -c 1 "$line" "$line". ←wav; done
sox WARN wav: wave header missing FmtExt chunk
sox WARN wav: Premature EOF on .wav input file
sox WARN sox: ‘split_aa.raw’ input clipped 128 samples
sox WARN sox: ‘split_aa.raw.wav’ output clipped 3 samples; decrease volume?
sox WARN sox: ‘split_af.raw’ input clipped 1 samples
Test again with file:
$ file split_aj.raw.wav
split_aj.raw.wav: RIFF (little-endian) data, WAVE audio, mono 44100 Hz
Again using sndfile-info shows that the file looks all right. As a final test, open the files in your audio tool of choice, and
see and hear if it is the desired result.
$ sndfile-info split_aj.raw.wav
Version : libsndfile-1.0.25
========================================
File : split_aj.raw.wav
Length : 786166650
RIFF : 786166642
WAVE
fmt : 18
Format
: 0x3 => WAVE_FORMAT_IEEE_FLOAT
Channels
: 1
Sample Rate
: 44100
Block Align
: 4
Bit Width
: 32
Bytes/sec
: 176400
fact : 4
frames : 196541648
data : 786166592
How to split WAVE files larger than 4
GB to chunks
4/5
End
‘---------------------------------------Sample Rate : 44100
Frames
: 196541648
Channels
: 1
Format
: 0x00010006
Sections
: 1
Seekable
: TRUE
Duration
: 01:14:16.727
3
Importing WAVE chunks as contiguous sequence in Ardour
Before importing the files, make sure the playhead is at the position for insertion. Then set Insert at to playhead and Mapping to
sequence files before clicking on Apply
Session > Import
How to split WAVE files larger than 4
GB to chunks
5/5
The files will be aligned one after another like on the next image:
Doing the blind test and listening to a region boundary, it sounds good. Looking closer at a transition shows that it is clean.
4
Facit
• WAVE files have a header, that limits the filesize to 4GB
• There is no problem to create WAVE files bigger than 4GB (limited by storage space and filesystem)
• To read and edit WAVE files > 4GB, they must be either converted to a format that can handle that (W64) or split up to chunks
with a size that is within specification
• Chunks of header-less raw WAVE data can have a header added with sox to make them useful
• Chunks can be imported to an audio tool where they can be lined up (sequenced) one after another