How to split WAVE files larger than 4 GB to chunks Abstract
Transcription
How to split WAVE files larger than 4 GB to chunks Abstract
How to split WAVE files larger than 4 GB to chunks 1/5 Abstract There are many variations of audio WAVE (.wav) files. Some of them have a limitation on the maximum possible file size (4GB). This article shows how to split files that surpassed that limit to useable chunks. This article as PDF General Information about WAV: http://en.wikipedia.org/wiki/WAV 1 Sample Format and File Type (this section is copied from http://ardour.org/files/reference/dsy298-ARDOUR.html) 1.1 32-bit floating point This is the standard format in which Ardour saves recorded files. 32-bit floating point ensures the best quality for further processing of your audio data, even if the target format of your recording is just the 16-bit audio-cd format. The downside of the format is high disk space usage. Excursion: The format also enables nice digital side effects like no digital clipping, i.e. if a track goes higher than 0dbfs, there will be no distortion. Still, when reaching the output of (in most cases) the master, the digital signal will be transferred into an analog fixed-point or integer signal (D/A conversion) and will clip in your analog equipment. 1.2 24-bit integer The 24-bit integer format will save your recorded files in a good quality but smaller file size. Even if 24-bit is the highest possible bit depth that a sound card can record, 32-bit floating point is recommended, since 24-bit files can suffer when heavily signal-processed. 1.3 16-bit integer 16-bit integer is only recommended when you need to save disk space by all means. This is simple audio-CD quality, and the files can audible suffer when processed by effects such as amplification and compression as unwanted noise can be brought up. Even if mixing for a normal audio-cd as the final product, use the 32-bit floating point format and mix down to 16 bit at the very end of the work flow. 1.4 Broadcast WAVE The Broadcast WAVE format (extension .wav) was developed by the European Broadcasting Union in order to add specific metadata to the existing WAVE audio file format. Broadcast WAVE files are typically compatible in applications which use the WAVE format. The Broadcast WAVE format is limited in file size to 4GB. 1.5 WAVE Originally developed by Microsoft, the WAVE format (extension .wav) is a popular audio file format that can be read by many audio applications. WAVE files can store uncompressed or compressed audio data. The WAVE format is limited in file size to 4GB. 1.6 WAVE-64 The WAVE-64 file format (extension .w64) was originally developed by Sonic Foundry, which was later acquired by Sony. It is a 64-bit file format as opposed to the 32-bit format used by WAVE and Broadcast WAVE formats. The WAVE-64 format is able to store multichannel audio and the file size can exceed 4GB.- How to split WAVE files larger than 4 GB to chunks 2 2/5 Splitting WAVE files exceeding the 4GB limit In the following example, a file was recorded using Ardour, running with 32 bit floating point WAVE format. It was easy to bypass the 4GB filesize limit while recording (new bytes are just happily added to the end of the file), but the problem starts when trying to open/read that file afterwards. Let’s have a closer look. The file we are going to split is a mono recording with a size > 4GB. $ ls -l "Audio 2-3.wav" -rw-r--r-- 1 tom tom 7861665848 Dez 28 15:37 Audio 2-3.wav $ du -hs "Audio 2-3.wav" 7.4G Audio 2-3.wav With the file-command, we see it’s a mono RIFF 44100 Hz audio file: $ file "Audio 2-3.wav" Audio 2-3.wav: RIFF (little-endian) data, WAVE audio, mono 44100 Hz More information we get with the command sndfile-info (part of libsndfile): $ sndfile-info "Audio 2-3.wav" Version : libsndfile-1.0.25 ‘======================================== File : Audio 2-3.wav Length : 7861665848 1v Warning : filelength > 0xffffffff. This is bad!!!! RIFF : 3566698544 WAVE fmt : 16 Format : 0x3 => WAVE_FORMAT_IEEE_FLOAT Channels : 1 Sample Rate : 44100 Block Align : 4 Bit Width : 32 Bytes/sec : 176400 fact : 4 frames : 1965416448 data : 3566698496 *** Unknown chunk marker (3EBC3578) at position 3566698552. Exiting parser. ‘---------------------------------------Sample Rate : 44100 Frames : 891674624 Channels : 1 Format : 0x00010006 Sections : 1 Seekable : TRUE Duration : 05:36:59.379 v 1 What we expected, the filelength is larger than what the format allows (2ˆ32 - 1 bytes) So let’s split up the large file to smaller chunks in a rudimentary way, using the split-command: $ split -n 10 "Audio 2-3.wav" split_ We have now 10 split_*-files, that we need to rename for further processing to a .raw-ending How to split WAVE files larger than 4 GB to chunks 3/5 $ ls -1 split_* | while read line; do mv "$line" "$line".raw; done We now have the file split_aa.raw (containing the original wav header from Audio 2-3.wav), and 9 split_a*.rawfiles without a wav header (header-less, raw): $ ls -l -rw-r--r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r-- 1 1 1 1 1 1 1 1 1 1 1 tom tom tom tom tom tom tom tom tom tom tom tom 7861665848 Dez 28 15:37 Audio 2-3.wav tom 786166584 Dez 29 19:50 split_aa.raw tom 786166584 Dez 29 19:50 split_ab.raw tom 786166584 Dez 29 19:50 split_ac.raw tom 786166584 Dez 29 19:50 split_ad.raw tom 786166584 Dez 29 19:51 split_ae.raw tom 786166584 Dez 29 19:51 split_af.raw tom 786166584 Dez 29 19:51 split_ag.raw tom 786166584 Dez 29 19:51 split_ah.raw tom 786166584 Dez 29 19:52 split_ai.raw tom 786166592 Dez 29 19:52 split_aj.raw We see that the file-command on a file does not recognize that it’s audio data, because the header is missing. $ file split_aj.raw split_aj.raw: data Using the sox-command, the raw, header-less files are treated so they have a header and become self-contained. The important bits are to set the exact parameters of the originating raw file to read, it that case 44100 Hz, float, 32 bits, 1 channel (mono): $ ls -1 split_*.raw | while read line; do sox -r 44100 -e float -b 32 -c 1 "$line" "$line". ←wav; done sox WARN wav: wave header missing FmtExt chunk sox WARN wav: Premature EOF on .wav input file sox WARN sox: ‘split_aa.raw’ input clipped 128 samples sox WARN sox: ‘split_aa.raw.wav’ output clipped 3 samples; decrease volume? sox WARN sox: ‘split_af.raw’ input clipped 1 samples Test again with file: $ file split_aj.raw.wav split_aj.raw.wav: RIFF (little-endian) data, WAVE audio, mono 44100 Hz Again using sndfile-info shows that the file looks all right. As a final test, open the files in your audio tool of choice, and see and hear if it is the desired result. $ sndfile-info split_aj.raw.wav Version : libsndfile-1.0.25 ======================================== File : split_aj.raw.wav Length : 786166650 RIFF : 786166642 WAVE fmt : 18 Format : 0x3 => WAVE_FORMAT_IEEE_FLOAT Channels : 1 Sample Rate : 44100 Block Align : 4 Bit Width : 32 Bytes/sec : 176400 fact : 4 frames : 196541648 data : 786166592 How to split WAVE files larger than 4 GB to chunks 4/5 End ‘---------------------------------------Sample Rate : 44100 Frames : 196541648 Channels : 1 Format : 0x00010006 Sections : 1 Seekable : TRUE Duration : 01:14:16.727 3 Importing WAVE chunks as contiguous sequence in Ardour Before importing the files, make sure the playhead is at the position for insertion. Then set Insert at to playhead and Mapping to sequence files before clicking on Apply Session > Import How to split WAVE files larger than 4 GB to chunks 5/5 The files will be aligned one after another like on the next image: Doing the blind test and listening to a region boundary, it sounds good. Looking closer at a transition shows that it is clean. 4 Facit • WAVE files have a header, that limits the filesize to 4GB • There is no problem to create WAVE files bigger than 4GB (limited by storage space and filesystem) • To read and edit WAVE files > 4GB, they must be either converted to a format that can handle that (W64) or split up to chunks with a size that is within specification • Chunks of header-less raw WAVE data can have a header added with sox to make them useful • Chunks can be imported to an audio tool where they can be lined up (sequenced) one after another