Digital Audio Specifications
Michigan State University Digital & Multimedia Center
A summary of best practices
for capturing digital audio from the diverse analog collections at the Michigan
State University Library would not be complete without also addressing the
array of technical metadata that also accompanies the process. So we will begin by briefly reviewing the processes and standards
used at the MSU Library, the criteria and attributes for their use, and conclude
by listing some possible resulting elements that might be included as technical
metadata.
One of the most important
considerations in digital audio recording is a signal chain with a low signal
to noise ratio and low harmonic distortion. Keeping the noise floor low is important for
taking advantage of the wide dynamic range of current digital audio systems
- especially when dealing with music. At the MSU Voice Library, we do apply
a small amount of limiting and compression to curtail clipping and to optimize
the overall dynamic range of spoken word recordings engineered by different
sources. Levels are set to about an
average of -6db, with peaks not exceeding -3db or falling below -12db.
Being spoken word recordings of limited bandwidth (most recordings
were originally mastered on equipment with a limited frequency response of
30hz - 13khz), we save final output as single channel (mono) RIFF Windows
pcm wav files with a 44.1khz sampling rate and a 16 bit word depth.
In addition to collecting all available sound information without inflating
storage demands, this standard allows for the ease of transfer to audio CD
without undue processing for patrons who request items from the archived collection
through interlibrary loan.
The capture process begins
by running the analog source signal through the Yamaha O1V digital mixer. Note that during this capture stage, the
Yamaha mixer actually uses 32 bit processing for all applications to the sound
data. The digital signal is then routed
to a PC digital audio workstation or "DAW" where it is formatted into
the Windows pcm wav file using the sampling rate indicated above. This method allows for maximum formatting
and editing flexibility.
The most important component
of the PC DAW is the sound card. We
look for professional quality low noise sound cards that have digital inputs,
balanced analog inputs, or ideally those that provide a separate break out
box isolated from the noisy environment of the computer. Some of the cards we have employed are the AdB MultiWav! Pro24, The Gina24 by Echo Digital,
Sta Audio DSP24 MKII, and the Omni Studio by M-Audio.
Some low level recordings
may also need to be "normalized" with editing software such as Cool
Edit 2000, however we refrain from applying much in the way of filters, noise
removal, etc. to any of the archived wav files themselves. We only consider applying these steps to
extractions from the original wav files (mp3, RealMedia, QuickTime, Win Media)
to optimize listening on the web.
Digital Audio Technical Metadata
One
of the advantages of using the RIFF (Resource Information File Format)
specification for wav files is that it allows extra user information to be
embedded and saved as part of the wav file.
This metadata may include information describing capture standards and
engineering specifications. Some of
this technical information could also be included in the library catalog
finding aid or marc record along with the other descriptive information about
the recording. Some of the technical
parameters that might be considered for inclusion are:
1>
Original medium: (cassette tape, phonograph record, DAT, sound from
video, etc.)
2> Original quality: (types of
distortion, noise, physical condition, hydrolysis damage, etc.)
3>
Source supplier: (institution name, gift source, etc.)
4>
Genre: (music, spoken word)
5>
Digitization source device (example: Tascam BR-20 open reel
tape deck)
6>
Capture device (example:
Yamaha 01V to AdB MultiWav!
Pro24 PC DAW
7> Sampling rate (example: 44.1Khz @ 16 bit mono)
8>
Archive File
Format (example:
RIFF Windows pcm wav)
9>
File size: (bytes =
000MB, duration = 00:00:00)
10>
Extracted file
format(s): (mp3. RealMedia, etc.)
a.
Codec bit rate: (mp3 =
64kbs, RM = 16 kbs, etc.)
11>
Software: (example:
Cool Edit 2000)
12>
Digitization date: (month/day/year)
13>
Special Processing: (examples:
limiter/compressor on record, normalized to 95% of peak,
broadband
noise reduction, EQ filter, high pass filter, etc.)
14>
Editing tasks: (segment
information, unwanted material cuts, fades applied, etc.)
15>
Engineers: (digitizer,
recording engineer, technician in charge, etc.)
16>
Notes:
We
do not currently use all of these elements at the MSU Library. However, we have used metadata in various records
that has included the date,
original medium, source supplier, genre, file formats, sampling rate and file
size. Some of the files also have
more detailed information embedded
in the wav file itself.