Digital Audio Specifications

Michigan State University Digital & Multimedia Center

 

 

A summary of best practices for capturing digital audio from the diverse analog collections at the Michigan State University Library would not be complete without also addressing the array of technical metadata that also accompanies the process.  So we will begin by briefly reviewing the processes and standards used at the MSU Library, the criteria and attributes for their use, and conclude by listing some possible resulting elements that might be included as technical metadata.

 

One of the most important considerations in digital audio recording is a signal chain with a low signal to noise ratio and low harmonic distortion.  Keeping the noise floor low is important for taking advantage of the wide dynamic range of current digital audio systems - especially when dealing with music. At the MSU Voice Library, we do apply a small amount of limiting and compression to curtail clipping and to optimize the overall dynamic range of spoken word recordings engineered by different sources. Levels are set to about an average of -6db, with peaks not exceeding -3db or falling below -12db.  Being spoken word recordings of limited bandwidth (most recordings were originally mastered on equipment with a limited frequency response of 30hz - 13khz), we save final output as single channel (mono) RIFF Windows pcm wav files with a 44.1khz sampling rate and a 16 bit word depth.  In addition to collecting all available sound information without inflating storage demands, this standard allows for the ease of transfer to audio CD without undue processing for patrons who request items from the archived collection through interlibrary loan.

 

The capture process begins by running the analog source signal through the Yamaha O1V digital mixer.  Note that during this capture stage, the Yamaha mixer actually uses 32 bit processing for all applications to the sound data.  The digital signal is then routed to a PC digital audio workstation or "DAW" where it is formatted into the Windows pcm wav file using the sampling rate indicated above.  This method allows for maximum formatting and editing flexibility.

 

The most important component of the PC DAW is the sound card.  We look for professional quality low noise sound cards that have digital inputs, balanced analog inputs, or ideally those that provide a separate break out box isolated from the noisy environment of the computer.  Some of the cards we have employed are the AdB  MultiWav! Pro24, The Gina24 by Echo Digital, Sta Audio DSP24 MKII, and the Omni Studio by M-Audio.

 

Some low level recordings may also need to be "normalized" with editing software such as Cool Edit 2000, however we refrain from applying much in the way of filters, noise removal, etc. to any of the archived wav files themselves.  We only consider applying these steps to extractions from the original wav files (mp3, RealMedia, QuickTime, Win Media) to optimize listening on the web.

 

 

Digital Audio Technical Metadata

 

One of the advantages of using the RIFF (Resource Information File Format) specification for wav files is that it allows extra user information to be embedded and saved as part of the wav file.  This metadata may include information describing capture standards and engineering specifications.  Some of this technical information could also be included in the library catalog finding aid or marc record along with the other descriptive information about the recording.  Some of the technical parameters that might be considered for inclusion are:

 

1>   Original medium: (cassette tape, phonograph record, DAT, sound from video, etc.)

2>   Original quality: (types of distortion, noise, physical condition, hydrolysis damage, etc.)

3>   Source supplier: (institution name, gift source, etc.)

4>   Genre: (music, spoken word)

5>   Digitization source device (example: Tascam BR-20 open reel tape deck)

6>   Capture device (example: Yamaha 01V to AdB  MultiWav! Pro24 PC DAW

7>   Sampling rate (example: 44.1Khz @ 16 bit mono)

8>   Archive File Format   (example: RIFF Windows pcm wav)

9>   File size: (bytes = 000MB, duration = 00:00:00)

10>           Extracted file format(s): (mp3. RealMedia, etc.)

a.      Codec bit rate: (mp3 = 64kbs, RM = 16 kbs, etc.)

11>           Software: (example: Cool Edit 2000)

12>           Digitization date: (month/day/year)

13>           Special Processing: (examples: limiter/compressor on record, normalized to 95% of peak, 

broadband noise reduction, EQ filter, high pass filter, etc.)

14>           Editing tasks: (segment information, unwanted material cuts, fades applied, etc.)

15>           Engineers: (digitizer, recording engineer, technician in charge, etc.)

16>           Notes:

 

We do not currently use all of these elements at the MSU Library.  However, we have used metadata in various records that has included the date,
original medium, source supplier, genre, file formats, sampling rate and file size.  Some of the files also have more detailed information embedded
in the wav file itself.


Prepared by Rick Peiffer

MSU Digital & Multimedia Center Technologist

Vincent Voice Library Home