Citation
Multichannel audio upconversion through convolution based on surround microphone patterns

Material Information

Title:
Multichannel audio upconversion through convolution based on surround microphone patterns
Creator:
Selter, Scott J
Publication Date:
Language:
English
Physical Description:
x, 110 leaves : illustrations ; 28 cm

Subjects

Subjects / Keywords:
Surround-sound systems ( lcsh )
Sound -- Recording and reproducing -- Digital techniques ( lcsh )
Convolutions (Mathematics) ( lcsh )
Microphone ( lcsh )
Multichannel communication ( lcsh )
Convolutions (Mathematics) ( fast )
Microphone ( fast )
Multichannel communication ( fast )
Sound -- Recording and reproducing -- Digital techniques ( fast )
Surround-sound systems ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 109-110).
General Note:
College of Arts and Media
Statement of Responsibility:
by Scott J. Selter.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
786291849 ( OCLC )
ocn786291849
Classification:
LD1193.A70 2011M S46 ( lcc )

Full Text
MULTICHANNEL AUDIO UPCONVERSION THROUGH
CONVOLUTION BASED ON SURROUND
MICROPHONE PATTERNS
by
Scott J. Selter
B.A., Loyola Marymount University, 2004
A thesis submitted to the
University of Colorado Denver
In partial fulfillment
of the requirements for the degree of
Master of Science
Recording Arts
2011


This thesis for the Master of Science
degree by
Scott J. Selter
has been approved
by
It
Date


Selter, Scott J. (M.S., Recording Arts)
Multichannel Audio Upconversion Through Convolution Based On Surround
Microphone Patterns
Thesis directed by Assistant Professor Lome Bregitzer
ABSTRACT
This thesis intends to provide an education of surround sound in addition to a
software tool that can be used for the conversion of mono and stereo audio to a
surround sound format. Hydra is a VST plug-in, which provides both consumers and
professionals with the ability to create interactive surround sound mixes of pre-
recorded stereo music and sound effects. Hydra uses a FFT convolution method to
convert audio into various surround sound layouts up to and including 7.1 channels.
The impulse response files used by Hydra are based on documented surround sound
microphone recording patterns making this plug-in different from typical reverb
processing units. Creation of this plug-in provides a step forward in increasing
surround sound awareness to consumers.
This abstract accurately represents the content of the candidate's thesis. I recommend
its publication.
Signed
Lome Bregitzer


DEDICATION
This thesis is dedicated to my father. There is no better person to emulate. He has
inspired an untold number through his wisdom and leadership. His guidance led me
to discover my passions for music, audio, technology and computers, without which
thesis would not exist.


TABLE OF CONTENTS
Figures..................................................................ix
Chapter
1. Introduction...........................................................1
1.1 Surround Sound In Film...............................................1
1.1.1 Fantasia and Fantasound.............................................1
1.1.2 Cinerama............................................................2
1.1.3 Cinemascope.........................................................2
1.1.4 Todd AO.............................................................3
1.1.5 Dolby Stereo........................................................3
1.1.6 "Baby Boom" And Stereo Surround.....................................3
1.1.7 Digital Surround Sound For Film.....................................4
1.2 Surround Sound And Home Video........................................5
1.2.1 LaserDisc...........................................................5
1.2.2 DVD-Video, Dolby Digital, DTS.......................................5
1.2.3 Blu-ray And Lossless Audio Formats..................................6
1.3 Surround Sound In Music.............................................8
1.3.1 Quadraphonic Development............................................8
1.3.2 DVD-Audio And SACD................................................ 9
v


1.4 Expanded Surround Sound Layouts......................................10
1.4.1 5.1 Versus 6.1......................................................11
1.4.2 5.1 Versus 6.1 Versus 7.1...........................................12
1.5 Existing Surround Expansion Techniques...............................14
1.5.1 Dolby ProLogic, DTS Neo, THX.........................................15
1.5.2 Surround Upmixing For Professionals..................................15
1.6 Quadraphonic Media Center PC Integration Trend.......................16
2. Impulse Responses.....................................................17
2.1 Surround Microphone Patterns..........................................17
2.1.1 Rear-Facing Cardioids................................................17
2.1.2 OCT-2................................................................18
2.1.3 IRT Cross............................................................19
2.1.4 Hamasaki Square......................................................20
2.1.5 Double M/S...........................................................21
2.1.6 Modified Decca Tree..................................................22
2.2 Impulse Response Recording Setup......................................24
2.2.1 The Room.............................................................24
2.2.2 The Speakers.........................................................25
2.2.3 The Microphones......................................................26
2.2.4 Sine Sweep Recording Method..........................................32
2.3 Impulse Response Editing..............................................33
vi


2.3.1 Normalization Trials
34
3. The VST Programming Process...............................................37
3.1 Basic Plug-in Signal Flow..................................................37
3.2 Surround Sound Pattern Structure..........................................38
3.2.1 Basic Structure For All Patterns........................................38
3.2.2 Double M/S Alterations...................................................47
3.3 Master Delay Structure....................................................47
3.3.1 Basic Delay Flow.........................................................49
3.3.2 Pattern-Specific Delay Controls..........................................50
3.4 Master Peak Limiter Structure.............................................50
3.5 LFE Structure.............................................................53
3.6 Main Feed Pass-through Structure..........................................54
3.7 Final Mixing And Metering Structure.......................................55
3.8 Bass Management Structure.................................................57
4. Graphical User Interface Construction......................................60
4.1 Primary Screens Edit And Mix............................................60
4.2 Multi-Layers..............................................................60
4.3 Secondary Screen LFE/Main...............................................63
5. Highlighted Features Of Hydra..............................................65
5.1 Blending................................................................. 65
vii


5.2 Delay DSP Effect......................................................66
5.3 Mixing................................................................66
6. Setup And Testing.....................................................68
6.1 Software..............................................................68
6.2 Hardware..............................................................69
6.2.1 Computer Requirements...............................................70
6.3 Testing...............................................................71
6.3.1 Impulse Response Testing............................................71
6.3.2 Width And LFE Functions Testing.....................................73
7. Uses For Hydra........................................................75
7.1 Next Steps............................................................77
Appendix
A. User Manual............................................................79
Bibliography..............................................................109
viii


LIST OF FIGURES
Figure
1.1 Quadraphonic Speaker Layout.............................................9
1.2 ITU Standard 5.1 Speaker Layout.......................................11
1.3 ITU Standard 6.1 Speaker Layout........................................12
1.4 7.1 Cinema Speaker Layout..............................................13
1.5 7.1 Music Speaker Layout...............................................14
2.1 Rear-Facing Cardioid Pattern...........................................18
2.2 OCT-2 Pattern..........................................................19
2.3 IRT Cross Pattern......................................................20
2.4 Hamasaki Square Pattern................................................21
2.5 Double M/S Pattern.....................................................22
2.6 Modified Decca Tree Pattern............................................24
2.7 OCT-2 Recording Setup..................................................27
2.8 IRT Cross Recording Setup..............................................28
2.9 Hamasaki Square Recording Setup........................................29
2.10 Double M/S Recording Setup..........................................30
2.11 Modified Decca Tree Recording Setup.................................31
2.12 30 Second Sine Sweep Waveform.......................................33
IX


2.13 Voxengo Deconvolver
34
3.1 Top-Level Containers of Hydra............................................39
3.2 Basic Surround Pattern Structure........................................40
3.3 Primary Bypass Structure................................................41
3.4 Rear-Facing Cardioids EQ Structure......................................45
3.5 Typical Structure of a Metering Container...............................46
3.6 Structure of a Single Level Meter.......................................48
3.7 Basic Delay Flow........................................................49
3.8 Basic Peak Limiter Structure............................................51
3.9 Basic Flow of LFE Container.............................................54
3.10 Main Feed Pass-through Basic Flow.....................................55
3.11 Mixing And Metering Container.........................................56
3.12 Bass Management Container.............................................58
4.1 Edit Screen With LFE/Main................................................61
4.2 Mix Screen With LFE Screen..............................................62
4.3 Pattern Select Drop-Down................................................63
x


1.
Introduction
This paper intends to provide an education on surround sound along with a process by
which professionals and consumers can integrate a monophonic or stereophonic to
multichannel surround expansion technique into custom systems. Hydra vl.l
Stereo2Surround Conversion Suite is a VST plug-in designed to allow both
professionals and consumers to expand mono or stereo audio content to a
multichannel audio format up to and including 7.1 audio channels. The technique
used by Hydra is a series of convolution modules employing impulse responses based
on documented surround sound microphone recording patterns.
The importance of this project is to provide further education on surround sound to
the consumer market, as well as give another tool to professionals for implementing
current 5.1 surround sound standards and advanced 7.1 surround sound in their work.
The hope for this plug-in is that it makes more available the ability to create and have
control over multichannel projects, inspiring people to want to have surround sound
setups readily available.
1.1 Surround Sound In Film
Prior to understanding the creation and usage of Hydra, it is crucial to comprehend
the evolution of surround sound in various areas of the industry. This will provide a
fundamental knowledge as to why Hydra is a necessary step in pushing for further
advancements in the field of multichannel audio. The inclusion of the history of
surround sound formats and channel layouts here leads to a greater understanding of
Hydras flexibility as a professional and consumer developmental tool.
Multichannel surround audio for film and music has evolved exponentially for more
than 70 years. As acceptance of the sound all around1 concept by professionals and
consumers increases, it becomes important to understand the history of surround
sound as it pertains to film, music and the consumer industries.
1.1.1 Fantasia and Fantasound
In the world of cinematic audio, the first revolutionary surround project comes from
1 This idea is the fundamental principle of surround sound in which audio content
immerses the listener through direct and/or ambient sounds.


the creators at Disney with the construction of Fantasia in 1940. Using a
combination of multiple optical sound projectors and a specially designed speaker
system, Disney offered the world the first discrete surround sound format known as
Fantasound. Fantasound used a new panning system, which allowed audio content to
be displaced from one loudspeaker to another in a rig that equates to the modern five-
channel surround configuration. Fantasound consists of three screen channels,
speakers placed upfront, behind the projection screen, now known as Left, Center and
Right. In addition, two rear channels supplemented the front three channels.
The importance of Fantasound went beyond the number of speakers. When shown
theatrically, surround sound speaker arrays were introduced to accommodate for the
large seating area. Work on Fantasia introduced cinema to more than just audio
panning. It also resulted in new multitrack recording and overdubbing processes.2
1.1.2 Cinerama
It wouldnt be until 1952 that the next evolution of surround sound arose. Fred
Waller and Hazard Reeves created a seven-channel system known as Cinerama.
Remarkably, this system is nearly the same as the as of yet non-standardized, 7.1
configuration. Cinerama uses five screen channels, introducing an extra speaker
between the left and center and right and center speakers. The number of rear
channels remains at two.
Despite the technological leaps made by Fantasound and Cinerama, it was still seen
more as a gimmick by audiences and largely not accepted. In addition, the cost of
implementing these technologies was more than studios could justify given the
noticeable decline in theater attendance. 20th Century Fox modified Cinerama to fit
within budgetary means and within the technological means of current movie theater
setups.
1.1.3 Cinemascope
Fox introduced Cinemascope in 1953. This new system sacrificed the number of
channels available for a more economic medium, which would become the standard
for the next 20 years magnetic, four-track prints. Despite the frequency and
dynamic range issues present with magnetic prints at the time, Cinemascope
2 Tomlinson Holman provides excellent details on Fantasia and Fantasound in his
book, Surround Sound: Up And Running.
2


presented a cost efficient way in which to provide three front screen channels and
now just one discrete channel for the surround array system in theaters.
1.1.4 Todd AO
At roughly the same time as Cinemascope, Todd AO introduced the 70mm film print
format, which is capable of six audio channels, printed on the film using magnetic
stripes. This six-track system brings back the two additional front screen channels,
left-center and right-center, but leaves the number of rear channels at one. Once
again, while the format was of impressive quality, its expense dictated the limitation
of its use.
1.1.5 Dolby Stereo
Cinemascope and Todd AO remained the leaders of cinematic surround sound
formats until the mid-1970s, at which time Dolby Stereo was introduced. Dolby
Stereo provides a way in which to optically print surround sound on film prints. The
significance of this is three-fold. First, the signal-to-noise ratio of magnetically
printed audio tracks is subpar in comparison with optical printing. Second, the
frequency response of the soundtrack on magnetic stripes is limited compared to the
range of human hearing. Finally, printing audio tracks alongside the film print is
faster and cheaper than magnetic striping.
Despite its name, Dolby Stereo was indeed a surround sound format, one that would
be unsurpassed for another 20 years. Dolby Stereo used an amplitude-phase matrix to
encode four channels of audio, the left, center, right and mono surround channels,
into two channels, the space limitation of optical film prints for audio content.
Throughout the 1970s and 1980s advancements were made to the Dolby Stereo
standard to increase the quality of the frequency range and signal-to-noise level.
1.1.6 Baby Boom And Stereo Surround
Improvements were also made to Todd AOs six-track, 70mm format, paving way for
the 5.1 surround sound audio configuration that we know today. In 1977 Star Wars
introduced the baby boom channel to surround sound. This channel gave rise to
what we now refer to as the LFE channel or Low Frequency Enhancement. Given the
space limitation of optical prints and the matrix of Dolby Stereo, the baby boom
was only available for 70mm-print showings of Star Wars. Using the left-center and
right-center print area of the six-track format, the creators of Star Wars were able to
add more low-frequency-only content to the soundtrack. Theaters equipped with
70mm print capabilities adopted this new practice quickly, and in just a few shorts
3


months had dedicated subwoofers installed for Close Encounters Of The Third Kind.
Star Wars had used the left-center and right-center speakers present in theaters from
the speaker setup required by Todd AO.
At this point in cinematic history, theaters were essentially playing 4.1 audio content;
the front left, center and right channels, a mono surround channel and a variation of
the modem LFE channel. In 1979, Superman: The Movie provided us the first release
of stereo surround content since Cinerama. Again, this was only available on 70mm
prints of the film. The six-track Todd AO format continued to be of use. Low
frequencies were split from the left-center and right-center tracks and sent to the
subwoofers while the primary bandwidth of the two tracks were filtered and sent to
stereo surround speaker arrays. A mono surround audio track remained on the six-
track print for those theaters not equipped with a stereo surround sound speaker array.
1.1.7 Digital Surround Sound For Film
Now that five main bandwidth channels had been introduced, along with a subwoofer
channel, it became time for engineers to find a way to provide this surround format to
all theater houses and not just those capable of 70mm print playback. Until this point
audio is printed in its analog form either optically or magnetically. Thus arose the
idea of accompanying film playback with digital sound. Engineers struggled with a
way in which to code the data to be printable on a film print. Linear PCM data, the
professionals' choice of an editing format, and the format chosen for stereo sound on
CD Audio discs, consumes far too much bandwidth to be considered a viable option.
In 1992, Batman Returns became the first theatrically released film to use the new
digital audio format known as Dolby Surround. This encoding technique is lossy.3
The Dolby Surround format results in encoded data in theaters with a bitrate of 320
kilobits per second. Audio data in its lossless, editable format of Linear PCM has a
bitrate of roughly 705.6 kbps per audio channel. Dolby Surround is placing five main
bandwidth channels and the LFE channel all within 320 kbps. Given the bandwidth
of the LFE channel is 1/200th that of a main audio channel, decimally it can be
represented as 0.005. Basic math tells us that each channel is playing back at a bitrate
of roughly 63.9 kbps, varying depending on the Dolby algorithm's judgment of audio
channels with the greatest need at any given moment. Regardless, the quality of
Dolby Surround exceeds that of the frequency-limited, noise-plagued formats based
on optical and magnetic analog printing.
3 Lossy is a term used to indicate the digital audio data is no longer in a format
which is equal to the size and quality of the original Linear PCM data rate.
4


Competition soon followed with the introduction of DTS and SDDS, both digital,
lossy formats also encoded on the film print. DTS has a slightly higher bitrate than
Dolby Surround and thus is usually preferred when a film is shown in theaters with an
option for Dolby Surround or DTS listening. SDDS is an audio format developed by
Sony and has the added bonus of being capable of 7.1 audio channels. Jurassic Park
was the first to introduce the DTS audio format in 1993, and in the same year Last
Action Hero was the first film in the SDDS format with 7.1 audio channels.
1.2 Surround Sound And Home Video
The evolution of surround sound for theatrical showings of films evolved over 50
years. Once digital audio was a viable theatrical option the natural next step was to
bring the surround sound experience to consumers homes. It is here that the lack of
education on surround sound for the consumer impedes the progress of moving past
stereo and to properly set up and used home surround sound systems.
1.2.1 LaserDisc
The introduction of the LaserDisc format in 1995 ended as a failed attempt at a digital
home video and audio format for the consumer after just a couple short years. The
format suffered from a lack of marketing, expensive components, large record-sized
media and a lack of commonplace electronics in the marketplace at the time, such as
surround sound home theater receivers and pre-packaged speaker systems. The
format was immediately replaced by the DVD-Video format in 1997.
1.2.2 DVD-Video, Dolby Digital, DTS
DVD-Video offered a more affordable and compact system to consumers. Like
LaserDisc, the format was full of untapped potential for consumers. Home theater
receivers were still absent from households. DVD players were sold a la carte,
resulting in simply three-cable connections from the DVD player to the existing
televisions, one for video and two for stereo audio. DVD Player manuals contained
minimal information on setting up and describing surround sound in the home. The
capabilities and the existence of digital audio connections such as Toslink Optical and
S/PDIF4 were lost upon the consumer. The S/PDIF connection looks simply like a
single, common RCA connection.
4 Toslink Optical involves the use of a fiber optics cable to deliver lossy Dolby
Digital or DTS surround formats. S/PDIF, also referred to as Digital Coaxial, stands
for Sony/Philips Digital Interface Format and uses a single Digital Coaxial (a form of
RCA) connector to deliver the lossy, matrix surround formats.
5


There is no single reason as to why surround sound in the home was not adopted
immediately, however, a lack of education and available demos is a major factor.
As surround sound receivers and home-theater-in-a-box packages started to become
common household electronics, surround sound got a push forward. Equipment
manuals contained more detailed information on the connections available. For those
consumers interested, it was now much simpler to discover and connect a surround
sound system.
As discussed earlier, theaters primarily adopted the Dolby Surround digital audio
format for 5.1 playback. In home video releases a form of Dolby Surround was
created for the DVD-Video specification. Dolby Digital is the most common audio
format present on DVD-Video consumer discs. DTS is also readily available for
inclusion on DVD-Video and has a much better bitrate, up to 1536 kbps versus Dolby
Digitals 448 kbps.5 However, Dolby cornered the marketplace as the surround sound
format leader on DVD-Video.
1.2.3 Blu-ray And Lossless Audio Formats
Just as consumers began to get comfortable with the surround formats delivered on
their DVD-Video purchases, the next technological advancement brought more
confusion to the surround sound forefront. The leaps made by Blu-ray technology are
tremendous, but it brought much debate to the table with regard to the audio formats
available, the hardware setup necessary, choosing the right settings, making the right
connections, and so on and so forth. As with the start of LaserDisc and DVD-Video,
Blu-ray initially lacked a solid set of instructions as to what the new surround audio
formats were and how to properly set them up. This caused much undue stress on the
part of the consumer.
Video quality aside, the Blu-ray format brought to consumers lossless audio formats,
giving consumers a way in which to experience film soundtracks that otherwise could
not be heard in theaters, given the usage of data compressed surround formats
theatrically.
The lossless audio on Blu-ray comes in one of three formats. The first is Dolby.
TrueHD. This format has the capability of delivering uncompromised audio up to 7.1
5 While theatrical systems using Dolby Surround are restricted to a maximum of 320
kbps, Dolby Digital on the DVD-Video format is capable of bitrates up to 448 kbps,
which is typically the chosen bitrate for DVD-Video releases.
6


audio channels.6 The second format is DTS-HD Master Audio, which is also capable
of delivering lossless, 7.1 audio. Lastly, Blu-ray has the ability to deliver Linear
PCM up to 7.1.
Now that it is known these three formats deliver uncompromised audio quality, the
next step is to understand how to connect the components in the system to properly
transmit the lossless data. Both Dolby TrueHD and DTS-HD Master Audio contain
their lossy counterparts embedded within the lossless stream. Without proper
wiring and settings, it is most likely the lossy data is what is automatically being
sent to the speaker system.
Transmission of lossless audio from Blu-ray requires a minimum of HDMI version
1.1 for Linear PCM and a minimum of HDMI version 1.3 for transmission of the
bitstream7 formats. HDMI is the digital requirement for transmission of these three
formats. The other option is to use analog, RCA component cables, a total of eight
needed for 7.1 audio, connected directly out of the Blu-ray player and into the home
theater receivers multichannel analog input. Note, not all Blu-ray players contain 7.1
analog outputs, and those that do generally do not output the lossless audio stream via
the analog outputs, only the lossy data stream. The products manual states how
the unit processes lossless audio when using the multichannel analog outputs.
Once transmitted from the Blu-ray unit to the receiver it is necessary to properly set
up the receivers processing handling. If sending the receiver the bitstream data then
it is important to set the receiver to bitstream in the settings. If sending the receiver
the lossless Linear PCM audio stream then the receiver should be set to MultiCh or
LPCM. Generally, receivers now detect bitstream and LPCM automatically, however
some manufacturers have left this as a manual select option.
After this description it is easy to see how numerous surround sound formats and their
complex setups can keep consumers from ending up with a properly set up surround
6 In fact, Dolby TrueHD has the ability to deliver 14 discrete channels of audio,
however, given the current surround sound mixing formats and the lack of more than
7.1 audio decoders on the market, it is stated TrueHD is capable of 7.1 at 24-bits and
96kHz or 24-bits and 192kHz using 5.1 channels.
7 Bitstream refers to the audio data as it is packed and encoded by DTS and Dolby.
These formats are not directly playable. They ultimately need to be converted to
Linear PCM for playback. However, it is more efficient to transmit the audio data as
bitstream from one device to another, only being decoded to Linear PCM at the end.
7


sound system. Matters are only complicated when surround sound is applied to the
music industry.
1.3 Surround Sound In Music
While cinema has provided the stage for the largest advancements in the field of
surround sound, multichannel audio for music tends to be relegated to more of a niche
market. Several of the technological formats created for cinema and home video
surround sound also apply to the delivery of surround sound for music without
picture. Therefore, there have been fewer steps in the delivery and advancement of
multichannel music than with cinema surround sound. Multichannel musics first
attempt lay in the invention of the Quadraphonic system and then makes attempts at
using various alterations of mediums developed for home video surround sound, but
to no avail.
1.3.1 Quadraphonic Development
Implementation of Quadraphonic, also known as Quad, began in the mid 1960s and
continued through the 1970s. This was the first home delivery format of surround
sound for music. However, the format was not without its problems and the
novelty of the idea wore off.
Quad suffered from a lack of format consolidation. Consumers could possess Quad
recordings in any of more than eight different formats. In terms of vinyl formats,
there were three main varieties of a Quad matrix, each requiring its own decoding
system.8 Another format at the time was tape. In this area Quad included formats
such as the Q8 (Quad 8) or eight-track tape and the Q4, Quad Reel to Reel format,
which used /4-inch reel-to-reel tape machines with four discrete channels. Like with
the digital surround formats of consumer home video technologies, quadraphonic
technologies lacked in consumer education on the multitude of available formats.
Quad existed on principles of amplitude and phase matrices. The system employed
for Quadraphonic music was actually the basis for Dolby Stereo in cinema. The
amplitude/phase relationship was altered for Dolby Stereo to derive center and mono
rear channels rather than Quadraphonic's two front and two rear channels. The
format was also adjusted to be compatible with optical film prints.
8 Vinyl Quad formats include SQ matrix, QS matrix and CD-4 Discrete. Further
details on these formats are widely published.
8


Quad also arose at a time when the pyschoacoustics of human hearing was not widely
understood by engineers. The use of four loudspeakers, one in each comer (see
Figure 1.1), results in several conflicts in the way in which we hear versus the
physical setup. For instance, humans hear differently from front to back than side to
side. Thus, sound emanating from a quad speaker system is not heard as perfect
stereo from the four sides. Pyschoacoustic principles cause sounds that should come
directly from the middle of the side stereo channels to be perceived as originating
closer to the front stereo pair. When this phenomenon is coupled with a lack of
agreement upon how quad mixes should envelop the listener, Quadraphonic had a
very difficult time creating a strong foothold in homes.
Finally, Quad was held back by the consumers lack of wanting to place two
additional speakers in the home. In addition, the lack of surround education for the
consumer in the 1970s, which continues today, partially kept the format from
becoming a household surround sound format.
Figure 1.1 Quadraphonic Speaker Layout
1.3.2 DVD-Audio And SACD
In the mid-1990s, the music industry once again attempted to deliver surround sound
to consumers, this time through use of the new DVD medium, along with an altered
version of the CD, called SACD or Super Audio CD. The failure of both of these
9


formats has been the subject of lengthy debates and will not be discussed in detail at
this time.
Since the specifications created for the DVD-Video format were not capable of
transmitting high-resolution surround sound audio, the DVD Consortium created the
DVD-Audio standard for the delivery of surround sound music without the
accompaniment of a film. The DVD-Audio format used exactly the same type of disc
as DVD-Video, however the data was authored and contained in the AUDIO_TS
folder instead of the VIDEOTS. Only select DVD players had the capability of
playing the DVD-Audio format, one of the reasons why DVD-Audio failed as a
consumer music surround sound delivery system. DVD-Audio is capable of
transmitting up to 5.1 channels of audio at 16-bits, 96kHz when formatted as Linear
PCM.9 When using the MLP (Meridian Lossless Packing) format, DVD-Audio could
deliver up to 5.1 audio channels at 20-bits, 176.4kHz or 16-bits and 192kHz.
SACD was a format developed by Sony and Philips at the same time as DVD-Audio.
Unlike traditional digital audio workstations, which used Linear PCM as a raw audio
format, SACD was designed to use a format known as DSD, Direct Stream Digital.
DSD is a 1-bit system, sampling at a rate of 2.8224MHz, nearly 15 times that of the
192kHz sampling rate. Systems that natively recorded and edited the DSD stream
were expensive, so many of the SACD titles simply used Linear PCM audio encoded
as a DSD stream.
Neither format lasted more than a few years, with a limited number of titles released
on either format. The complexity of the setups and the cost of the hardware needed
for playback, resulted in a lack of interest on the part of consumers. The formats may
have been more successful had they appeared after DVD-Video and Dolby Digital
surround sound connections became more widely popular with consumers, circa
2005.
1.4 Expanded Surround Sound Layouts
Thus far the discussion of surround sound in film and music has been assuming a
complacent standard of 5.1 audio channels. However, there is an upward pressure for
the expansion of the number of audio channels to improve the accuracy of surround
9 DVD-Audio has a maximum bitrate of 9.6 Mbps. When using Linear PCM,
calculate the bitrate by multiplying together the bit depth, sample rate and number of
audio channels. Remember, in Linear PCM the .1, LFE channel of 5.1 counts as a
full bandwidth audio channel, thus 5.1 should be calculated using six as the number
of audio channels.
10


sound playback. In terms of this paper the expansion of 5.1 to 6.1 and 7.1 will be
addressed, as they are the two formats most widely available to both professionals
and consumers.
1.4.1 5.1 Versus 6.1
The limitation of the 5.1 surround sound setup lies in the properties of
pyschoacoustics. As already stated, humans do not hear the same when sound
emanates from front and back and sides. This is rooted in the shape of head and the
principles of ILD (Interaural Level Difference) and ITD (Interaural Time Difference).
When concerned with a 5.1 ITU standard setup in which the left/right speakers are
located at 30 degrees, center at 0 degrees and the stereo surround speakers as 110
degrees (Figure 1.2), there are two primary concerns with its accuracy.
Figure 1.2 ITU Standard 5.1 Speaker Layout
The first concern is the lack of an originating device for the sides. Sounds that are
meant to come from the sides of the listener must be phantom imaged between a
surround speaker and a front speaker. This leads to unstable image placement and
variations in the frequency response of the sound source. The second issue lies in
lack of a smooth transition of sound in the rear soundfield. A front center speaker is
necessary to anchor the center of the sound in the front soundfield. However, in 5.1
there is no rear center speaker to create a stable rear sound image when panning
11


sound from one rear speaker to another or when positioning a single sound in the rear
center. Like with side sounds, the rear center is a phantom image. This latter
problem of the rear soundfield is mitigated with the introduction of 6.1.
A 6.1 speaker layout is the same as 5.1 in terms of speaker angles, but with the
addition of a single main loudspeaker to the rear center (Figure 1.3). This helps to
provide a more stable rear soundfield. Despite this fix, the side vacancy problem still
remains. This is where 7.1 provides a solution.
Figure 1.3 ITU Standard 6.1 Speaker Layout
1.4.2 5.1 Versus 6.1 Versus 7.1
While a standard for 7.1 surround sound has not been solidified as of yet, various 7.1
speaker layouts do exist and it is important to understand the concept. The most
widely accepted 7.1 speaker layout, the one utilized by professional film mixers and
theater playback systems, is known as 7.1 Cinema, and corrects the issue with sounds
emanating from the sides of listeners.
The layout of 7.1 Cinema follows the same guidelines for the front three speakers as
5.1 and 6.1. The changes occur in the rear soundfield setup. In the configuration of
7.1 Cinema, the normal surround channels of a 5.1 setup are shifted forward to a 90
degree position, directly to the sides of the listeners. Then, two additional speakers
12


are placed in the rear anywhere from 135 degrees to 150 degrees, the latter being
the most commonly used position. See Figure 1.4. This rectifies to some degree the
side issue. However, what 7.1 takes away that was offered by 6.1 is the mono rear
center speaker. So, once again the rear image becomes unstable. However, based on
human hearing sensitivity, it is more important to obtain coherency between the front
and rear soundfields than rear image placement.
Figure 1.4 7.1 Cinema Speaker Layout
The second popular 7.1 speaker layout is known as 7.1 Music (Figure 1.5). This
layout leaves the standard 5.1 surround speakers in place and adds two surround back
channels. This still leaves a phantom image in the rear center, but provides better
compatibility with the playback of 5.1 surround content.
Expanded speaker rigs exist to offer complete sound immersion, such as an ambisonic
16-channel speaker setup and NHKs 22.2 system. However, neither is practical in
terms of a home surround system and has no bearing on this project at this time.
13


Figure 1.5 7.1 Music Speaker Layout
1.5 Existing Surround Expansion Techniques
Surround sound has received its primary success in movie theaters. In the matters of
home delivery, consumers have faced numerous obstacles over the years in the
development of surround sound. As demonstrated in the section Blu-ray and
Lossless Audio Formats consumers encountered numerous challenges in hardware
and connection setup. As shown in the sections Quadraphonic Development,
DVD-Audio and SACD, consumers have been thrown a great number of surround
sound delivery formats, only serving to confuse understanding of the surround
concept and yield an acceptable, single delivery medium. Despite the obstacles in
front of consumers, when it comes to integration of surround sound in the home, the
ability for individuals to create their own surround sound experience no matter the
source material could be a critical way in which to solidify multichannel setups in
residences. Currently, there are several consumer upmixing10 techniques available,
all of which are based on the home theater audio/video receiver.
10 Upmixing is a term referring to the conversion of source material to a greater
number of audio channels based on algorithmic or other processes. Usually,
upmixing refers to the expansion of stereo content to a specified surround sound
format, typically 5.1.


1.5.1 Dolby ProLogic, DTS Neo, THX
As discussed thus far, the materials presented in surround sound have been natively
recorded, edited and mixed in the various surround formats. However, there is more
non-surround audio content in existence than native surround content. The ability to,
in real-time, manipulate stereo sound sources to pseudo-surround is a prospect that
has been adopted by home theater receiver manufacturers in conjunction with sound
laboratories.
Dolby Laboratories has created the most well known upmixing, algorithmic formula,
instituted in every surround receiver for the past decade. Dolby ProLogic, which has
gone through a variety of upgrades from ProLogic to ProLogic II, IIx and IIz, gives
consumers the ability to listen to stereo material that has been flagged as Dolby
encoded, as a matrixed surround experience instead of two channels. Regardless of
the metadata Dolby-encoded flag, the ProLogic decoder is capable of creating a
surround mix of any stereo material, but with varying results.
DTS soon followed with their own algorithmic formula, which they dubbed DTS
Neo. Various presets exist such as DTS Neo: Cinema, Music, and Game, each
providing a slightly different balance of audio in the surround sound system.
With the expansion of home theater systems to include 6.1 and 7.1, the algorithms
have progressed to allow expansion of stereo content up to 7.1, as well as
manipulation of 5.1 content to a 7.1 layout. THX, a certified standard created by
Tomlinson Holman, has been implemented into THX-certified receivers to allow
these newer surround sound conversions.
1.5.2 Surround Upmixing For Professionals
Upmixing of existing stereo content is not limited to consumers. There are few
professional tools available for creators to expand their creations to surround sound
without having to perform a complete remix of the original source material. Most
notable is TC Electronics Unwrap, a plug-in designed for Pro Tools|HD systems.
The plug-in uses algorithms to extract center and surround channels from stereo
content. However, currently, this plug-in is limited to 5.1 upmixing. In addition, the
expense of the software and hardware system makes this less of a feasible option for
independent professionals.
15


1.6 Media Center PC Integration Trend
It is becoming more and more common for home theaters to be controlled via an
integrated media center PC system. Essentially, one unit replaces the need for a
receiver, disc player and set-top box. With the power of a computer integrated into
home surround sound setups, it becomes possible for consumers to create their own
surround experiences based on their listening preferences and surround speaker setup.
As this trend will undoubtedly continue given the popularity of digital downloads of
music, movies, and TV shows, in addition to the rise of streaming video services, a
need for expansion and customization of existing mono/stereo sounds into
multichannel formats can be argued. Hydra is a plug-in designed to fulfill this need
for consumers, professionals and hobbyists.
16


2. Impulse Responses
Hydras fundamental working principle is impulse response convolution. Audio
signals are convolved via the plug-in's library of impulse recordings made at various
locations. Despite its basis in convolution, Hydra differs from typical reverb
convolution plug-ins in multiple ways. First, the impulse responses are recorded in a
near-field environment where direct and early reflections are the primary component
of the impulse response. Second, whereas typical impulse responses for reverb
convolution systems are recorded using spaced mono and stereo microphone
configurations for ambient pickup, Hydra uses recordings made using specific
surround sound microphone pattern techniques. These two guidelines allow for the
plug-in to deliver a more direct sound, removing the need for the mixing of a dry vs.
wet signal to achieve the final product. Hydra simply convolves the original audio
signal and relies on the resultant signal as the only available usable audio signal.
2.1 Surround Microphone Patterns
To create flexibility in the manipulation of audio signals in the plug-in, the impulse
responses were recorded using documented surround sound microphone patterns.
The surround microphone layouts and microphone pick-up patterns allow for the
simulation of surround recordings within a particular space given a stereo input signal
into the plug-in. The impulse responses of six different surround configurations are
used in Hydra, and are as follows.
2.1.1 Rear-Facing Cardioids
The first chosen surround pattern only consists of two audio channels. The Rear-
Facing Cardioid pattern uses two microphones both with a cardioid pick-up pattern,
placed opposite the sound source and spaced horizontally to create a left/right stereo
effect (Figure 2.1). The distance between the left and right microphones is variable
and is determined by the recording engineers preference of the impulse responses
resulting width as well as the sound source setup. The wider the microphones are
spaced from each other the greater the stereo separation at the cost of a weaker
phantom image. As will be discussed in section 6.3.1, the orientation of the sound
source greatly impacts the stereo separation of the Rear-Facing Cardioid pattern. The
signal received by these microphones will be assigned as surround channels, thus it is
important for the microphones to be oriented with the capsules facing away from the
sound source so as to capture a more ambient signal. The distance between the
17


microphones and the sound source is variable and depends on the reverb time of the
room and the desired directness of the recorded impulse responses. The Rear-Facing
Cardioid pattern is generally elevated in a room in order to receive strong reflections
over a dominant direct sound.
User-Specific Distances
Figure 2.1 Rear-Facing Cardioid Pattern
2.1.2 OCT-2
OCT-2 stands for Optimized Cardioid Triangle. The two simply designates a
difference in the spacing of the microphones compared to the original OCT
documentation. In the OCT pattern three microphones are used to capture audio that
is akin to a left/center/right configuration. The setup uses one microphone with a
cardioid pick-up pattern and two microphones with a hypercardioid pattern. The
three microphones are positioned in a manner that yields various recording angles, all
within the 180 degrees of a triangle, hence the name (Figure 2.2).
The center microphone is always spaced further ahead of the side microphones to
provide greater de-correlation from the left and right sound images. In the OCT-2
pattern this distance is 40 centimeters, versus only eight centimeters in the original
OCT pattern. The left and right microphones vary in their position, mainly in
distance. It is generally agreed the azimuth angle of the side microphones from the
center should be 90 degrees given the polar pattern of a hypercardioid microphone. It
is important that the chosen microphone have a smooth sensitivity response. The
OCT setup has also been tested with the side microphones having an azimuth angle
18


less than 90 degrees. However, this leads to more of a mono-sounding result the
closer the azimuth comes to zero (completely forward-facing). By placing the side
microphones at 90 degrees the center-facing microphone becomes the dominant
figure for sounds that fall between the side and center microphone positions,
providing the best channel separation. The distance between the left and right
microphones is what creates the different recording angles. The minimum distance
between the two microphones should be 40 centimeters, which creates a 160-degree
recording angle. The largest distance between the microphones should be 90
centimeters, yielding a 90-degree recording angle. The larger the recording angle the
less stereo separation.

Figure 2.2 OCT-2 Pattern
2.1.3 IRT Cross
Both of the previous microphone configurations are only part of a complete surround
configuration, with Rear-Facing Cardioids capturing only surround channels and
OCT-2 only capturing the front three channels. IRT Cross is the first of the surround
patterns chosen to capture both front and rear channels.
IRT simply stands for the Institute Of Radio Technology, so it may come as no
surprise that the IRT Cross pattern was developed as a pick-up pattern for ambient
sounds in broadcasting. The IRT Cross utilizes four cardioid microphones configured
in a symmetrical arrangement. The distances between each adjacent microphone
capsule should be 25 centimeters, with each of the four capsules positioned to face
19


each of the 45 degrees of a square. When any two adjacent microphones are
combined the resulting recorded angle is 90 degrees as shown in Figure 2.3.
Therefore sound captured through the IRT Cross pattern is conducive to a four-
channel, square speaker setup, much like that of a quadraphonic system. While none
of the ITU standard surround sound speaker configurations conform to this standard,
the sound still translates to a 5.1 speaker rig when using primarily stationary sound
sources. An IRT recorded source that contains a sound source moving in a circle
around the four microphones will result in poor side directionality when played on a
5.1 speaker system. This is due to the pyschoacoustics of human hearing at side
positions as discussed in the section on Quadraphonic Development.
Figure 2.3 IRT Cross Pattern
2.1.4 Hamasaki Square
The Hamasaki Square is another pattern that results in a four-channel output. As
compared to the IRT Cross, the Hamasaki Square implements a larger recorded
soundfield along with a different set of microphone polar patterns. The Hamasaki
Square uses four microphones set to a figure-8 pattern and positioned in a square with
side lengths measuring between two and three meters. See Figure 2.4. The height of
the microphones is negotiable, but generally they are elevated to remove the direct
sound of a floors reflective surface. The nulls point, or side of the microphone faces
the sound source. Much like Rear-Facing Cardioids, this pattern receives more
reflective than directional sound. Due to the polar patterns of figure-8 microphones
cm
25
20


and the larger area covered by the setup, the Hamasaki Square has much better
directionality over a 5.1 loudspeaker system when using moving sound sources.
2 3 m Square
r
i
Figure 2.4 Hamasaki Square Pattern
2.1.5 Double M/S
Up to this point the microphone patterns discussed have all been setups lacking a
complete surround sound configuration by todays standard of 5.1 or greater. Thus
far the patterns have captured two, three or four channels. Double M/S is the first
pattern in Hydra to result in the five main channels of a 5.1 surround configuration.
Double M/S, which stands for Double Mid/Side, is unique in other ways as well
compared to the other patterns chosen for this plug-in, namely its use of only three
microphones and the use of matrix coefficients to derive the individual audio
In stereo music recording in the studio it is common for an engineer to use a mid/side
microphone pattern to capture stereo audio. The mid/side pattern uses the addition of
positive and negative signals recorded through one figure-8 microphone and one
cardioid microphone. The Double M/S pattern, shown in Figure 2.5, was created to
expand the mid/side stereo technique to surround sound recording using only three
microphones. One figure-8 microphone is used as the common sideways-facing
microphone for each of the two mid microphones. One cardioid microphone is
placed above the sideways microphone and is oriented forward, toward the sound
source. A second cardioid microphone is placed below the sideways microphone and
channels.
21


faces the rear, away from the sound source. It is important to place the capsules of
the three microphones as close together as possible as the principle of Double M/S
relies solely on the difference in signal levels among the three microphones.
Minimizing the time difference among the microphones will help improve the
accuracy of the final recording. The distance of the microphone setup from the sound
source varies depending on the desired directness of the recorded sound.

Figure 2.5
Front-Facing,
Above Figure-8
The other microphone configurations result in one audio channel per microphone.
With Double M/S a set of coefficients is applied to each of the recorded signals and
summed together to create the five main channels of a 5.1 surround setup. Each of
the audio signals created through the matrix has a virtual hypercardioid pattern to
provide the best separation among the audio channels. Various coefficient matrices
exist and Hydra implements several of them, which will be discussed in more detail
in section 3.2.2.
2.1.6 Modified Decca Tree
The final surround microphone configuration used in Hydra is a modification of the
Decca Tree pattern used most commonly in orchestral recording. In orchestral
recording a set of three microphones, typically with cardioid and omni-directional
patterns, is placed above and behind the conductors head to capture the full sound of
the orchestra. These are then combined with spot microphones on various sections of
22


the orchestra. It is also common for outrigger microphones to be recorded and
blended with the mix to create a wider stereo field.
In the Modified Decca Tree pattern, various elements of the original setup are used in
conjunction with additional microphones to create a complete 7.1 surround sound
simulation. The pattern consists of using one omni-directional microphone, four wide
cardioid microphones and two hypercardioid microphones placed in specific positions
relative to one another. There is no set standard for the placement of a Modified
Decca Tree configuration, and thus experimentation with different microphone
patterns will yield varying results. For example, two of the four wide cardioid
microphones could be replaced with hypercardioid patterns to improve spatial
imaging in the rear soundfield.
For simplicity, the setup here is described using one unit as the basis for the
measurement of distance, as shown in Figure 2.6. The omni-directional microphone,
which could also be of a figure-8 pattern, is placed in the center position and further
forward than all other microphones. This will act as the center channel of a 7.1
configuration. Two wide cardioid patterned microphones, to function as the main left
and right channels, are placed one unit back from the center microphone, and are
spaced equally apart (one unit between them or one half of one unit from one
microphone to the centerline), creating a triangle among the three microphones thus
far. The two cardioids are position so the front of the capsule is facing directly at the
sound source. Two hypercardioid microphones are each positioned one half of one
unit further outside of the left/right microphones. The distance between a
hypercardioid microphone and the centerline should be one unit. The azimuth angle
of the capsule position can vary. These recorded signals will function as the side or
surround channels of a 7.1 configuration. Positioning the capsules at a direct 90
degrees from the center position will provide excellent side speaker signals.
However, for flexibility the capsules should be turned closer to 110 degrees of the
center position, equating more to the surround speaker positions. The final two
microphones of the pattern are again wide cardioids and are placed facing directly
opposite the sound source and spaced one half of one unit further back from the front
left/right line of microphones. These final signals will correspond to the surround
back channels of a 7.1 speaker system. The distance between the final two
microphones can be adjusted to meet the needs of the recording engineer. The closer
together they are placed the more the output will recreate a 6.1 speaker system with a
mono rear center channel. The further apart they are placed the greater the stereo
separation of the surround back channels, but the more they will share in common
with the hypercardioid microphone signals. To improve the spatial separation in the
rear soundfield it is suggested to use hypercardioid microphones for all four of the
rear signals.
23


Adjustable
Figure 2.6 Modified Decca Tree Pattern
The Modified Decca Tree pattern is the only surround pattern used in Hydra that can
standalone as a full seven-channel system. Using the various surround patterns in
Hydra is discussed in chapter five.
2.2 Impulse Response Recording Setup
The audio signals to create the impulse response files for use in Hydra were recorded
in Studio G of the Arts building at the University of Colorado Denver Downtown
Campus. The room was chosen for practicality and its offering of the conditions
needed for the recording of the files.
2.2.1 The Room
As stated earlier, Hydra is not a convolution program for use as a reverb plug-in.
While the convolution process is similar, the difference lays in the recorded impulse
response files. Typically, impulse responses meant for reverb are recorded in a way
that captures delayed reflections or the ambient nature of a room, most commonly
churches, cathedrals and other large venue spaces. Studio G was chosen due to its
low reverb time, allowing for the signals convolved through the recorded impulse
responses to retain more of the direct nature of sound waves. This makes Hydra
useable as a direct sound conversion program.
24


Studio G is a tracking room adjacent to Studio J, the control room. The room is
equipped with a high, non-conforming ceiling, which measures on average 23 feet
from floor to ceiling around the perimeter, and 12 feet to the ceiling acoustic
treatment in the center of the room. The height of ceiling removes the addition of
dominant early reflections to the microphones. The room is irregular in regards to the
width and length due to the presence of the recording glass, as well as the presence of
non-perpendicular walls. This is ideal to limit standing waves in the room, and has
minimal impact on the balance between the left and right microphone setups in the
room. On average the length of the room is 25 feet. The width of the room where
recording glass is not present is 20 feet. The irregular wall shape consists of four-foot
segments, slightly angled from one another. There are 16 wall segments in total. The
recording glass creates an out cove in the tracking room with dimensions roughly
eight feet wide by four feet tall.
With the standard room setup, being a tracking room, the impulse response recordings
would be on the side of too direct. Baffles, with a wood face were position randomly
around the room to slightly increase the reflectivity of the environment.
2.2.2 The Speakers
The impulse response signals were recorded using a sine-sweep method.11 To
function as the sound source for the sine sweeps, two Mackie MR8 reference studio
monitors were set up in Studio G. The two Mackie monitors were positioned as a
stereo pair and angled in the room to be non-parallel with any wall. This helps to
reduce the possibility of the speakers emanating sound in a plane that has standing
waves between any parallel surfaces.
The speakers were placed on On-Stage monitor stands with height and azimuth
adjustability. Both monitors were aligned to the same position on the monitor stand
top plate, and both stands were aligned to be flush to a starting line, insuring that each
speaker was at the same distance on both the x and y-axis. The stand pins were used
to place both speakers at the same height on the z-axis.
The monitors were spaced at 72-inches apart from the center of the tweeter position.
The elevation of the speaker is thus that the tweeter is located at 62-inches from the
ground. During setup of the monitors an idea was implemented to record the impulse
response signals in two different monitor positions. The first position is at 180
degrees to each other, or zero degrees, facing straight on the monitor stands. The
second position is toed-in, the position in which monitors are placed when in a
11 See section 2.2.4
25


surround configuration. The monitor stands were marked with a position indicator
for zero degrees and 30 degrees from the center position to ensure the monitors
could be turned back and forth and remain the same relative to the initial setup. The
purpose for recording signals with the monitors in these two positions is discussed in
further detail in chapter three.
2.2.3 The Microphones
As discussed earlier in the chapter, Hydra uses impulse responses created based on
audio signals recorded through documented surround sound microphone
configurations. The specifics of these patterns as pertaining to the Studio G setup are
discussed herein. The height of the microphone pattern remains relative to the height
of the monitors used as the sound source. Thus, the capsule of the each microphone
is situated at the same height as the tweeter of the loudspeaker unless otherwise
noted. In this particular case the height of each loudspeakers tweeter is 62-inches.
2.2.3.1 Rear-Facing Cardioids
The setup of the Rear-Facing Cardioid pattern consists of two microphones in a
spaced-stereo array. This particular setup uses two, Neumann TLM193 large
diaphragm condenser microphones, with the capsule facing away from the Mackie
monitors. The TLM193s are spaced 64-inches apart from the capsules. Each of the
microphones are also placed 64-inches from their respective speakers. By
coincidence this formed an isosceles triangle between each speaker and two cardioid
microphones. The cross-distance, the distance between a speaker and the microphone
diagonally located from it measures just over 90-inches. The distances between the
monitors and the microphones remain roughly the same regardless of the monitors
being in the straight or toed positions.
2.2.3.2 OCT-2
The setup of the OCT-2 pattern consists of three microphones; one Shure KSM44
with the pattern set to cardioid and two AKG 414B microphones with the pattern set
to hypercardioid. See Figure 2.7. Using the measurement guidelines of the OCT-2
configuration, the Shure microphone is positioned 40 centimeters, or 15.75 inches
ahead of the line containing the left and right AKG microphones. This particular 12
12 When facing toward the source monitors the left speaker corresponds to the
microphone that is on the rear left, what will become the signal for the left surround
speaker. The same goes for the right speaker and rear right microphone.
26


setup utilizes a 90-degree recording angle and thus the AKG microphones are situated
90 centimeters, or 35.5 inches from capsule to capsule.
Figure 2.7 OCT-2 Recording Setup
In both the toed-in and straight speaker positions, the center microphone is located
64-inches from both the left and right monitors relative to the tweeter. The distance
between each monitor and its respective microphone is 88-inches. Once again, the
difference in distance between the toed-in and straight monitor positions is negligible.
This will be the case for all of the subsequent microphone patterns as well.
2.2.3.3 IRT Cross
The IRT Cross pattern calls for four microphones in a cardioid pattern (Figure 2.8).
This setup uses two Shure KSM44 and two Neumann TLM193 microphones. Ideally,
all four microphones should be a matched set. However, due to a lack of availability,
it was decided to pair these two particular sets together.
The IRT Cross uses a square array with the capsules of any two adjacent microphones
spaced 25 centimeters, or 10 inches apart. The KSM44 microphones are used as the
front-facing pair of the square array, pointed at 45 degrees from zero degrees center,
or 45 and 315 degrees. The Neumann TLM193s are positioned as the rear-facing
microphones in the array due to a greater sensitivity of the diaphragms. These
microphones mirror the front-facing KSM44s, thus they are situated at 135 degrees,
or 135 and 225 degrees. The distance between the front microphones and the
27


respective monitors is 61 inches. By way of simple arithmetic, the rear-facing
microphones are at a distance of 71-inches from the respective monitors.
Figure 2.8 IRT Cross Recording Setup
2.2.3.4 Hamasaki Square
The Hamasaki Square uses four figure-8 pattern microphones. Once again, it is best
to use a matching set of microphones. However, this setup uses two Shure KSM44
and two AKG 414B microphones, all set to a figure-8 pattern as seen in Figure 2.9.
The KSM44s are placed in the front of the square array and the 414Bs are placed in
the rear of the array. The array has equal side lengths of two meters, or 78-inches.
The front of the array is spaced 48-inches from the Mackie source monitors. The
negative side of each microphone is positioned such that it faces the negative side of
the microphone directly opposite itself. The positive sides of the microphones are
directed toward to outside of the square array. The source monitors are directed at the
null of the figure-8 pattern. The Hamasaki Square is the only exception as to the
height standard. The four figure-8 microphones are elevated to a position roughly
two feet above the tweeter of the loudspeakers. This is done in an effort to capture a
slightly greater ambient sound.
28


Figure 2.9 Hamasaki Square Recording Setup
2.2.3.5 Double M/S
The simplest of the microphone patterns setups in this project is that of the Double
M/S pattern. All signals are derived from three microphones, all of which are
positioned next to each other, easing the number of the measurements that need to be
calculated compared to the other setups.
The Double M/S pattern uses two cardioid microphones and one figure-8. Flere, all
three microphones are Shure KSM44s, with one set to a figure-8 pattern as shown in
Figure 2.10. As per the Double M/S pattern basis, the microphones are positioned as
close as possible to one another to minimize time differences. All three microphones
are positioned at a distance of 64-inches from each speaker, allowing for the
microphones to receive direct sound from each of the monitors at the same time.
Reflected sound to the microphones will of course vary depending on the room
layout. The top cardioid microphone faces forward, toward the source sound, the
middle figure-8 microphone is oriented so the positive and negative capsules face 90-
degrees relative to the sound source. The final cardioid microphone is placed on the
bottom of the setup and faces the rear of the room, away from the sound source. The
height of the Double M/S unit is positioned so the figure-8 microphone is horizontally
in line with the tweeter of the loudspeakers.
29


Figure 2.10 Double M/S Recording Setup
2.2.3.6 Modified Decca Tree
The most complex of the microphone pattern setups is that of the Modified Decca
Tree, requiring the most number of microphones and most calculations. Seven
microphones are used in this setup with various polar patterns. See Figure 2.11. As
described in section 2.1.6, each of the microphones is specifically placed in relation to
one another. In this case the basis for one unit equals one meter, or 39.5 inches.
It is best to begin with the setup of the center microphone to establish a reference
point for the remaining microphones. The center microphone is positioned exactly
64-inches from both the left and right Mackie monitors, and the capsule of the
microphone is angled precisely in the middle of the two speakers. These steps are
important to insure that resulting signal will correspond directly to the center sound
source. Positioning all microphones relative to the center microphone will keep the
30


balance of left to center to right intact. The center microphone is a Shure KSM44
with an omni-directional polar pattern.13
Figure 2.11 Modified Decca Tree Recording Setup
The next two microphones to place are the left and right positions. In this setup both
microphones are KSM44s with a cardioid polar pattern. First, the imaginary line for
the microphones must be determined. Since one meter is the basis, that is the
distance behind the center microphone that the left and right positions should be
located. The left and right positions should be one unit, or 39.5-inches apart, forming
an equilateral triangle with the center microphone. To accomplish this, the left and
right positions are calculated one half of one unit from the center point located one
unit directly behind the center microphone position.
The outrigger, surround microphones are the next for positioning. This setup uses
two AKG 414B microphones set to a hypercardioid polar pattern. The distance
between the left surround and right surround microphones should be two units, or in
this case 78.75-inches. The easiest way to position these microphones is to measure
one half of one unit from each of the left and right microphones. This will place the
two microphones at two units apart since the left and right microphones are
positioned one unit apart. The angle of these microphones is open to interpretation.
13 Some recording engineers choose to use a figure-8 pattern for the center
microphone to improve separation and directionality of the overall pattern.
31


For this particular setup the microphones are angled slightly passed 90-degrees of the
center position.
The final two microphones used in this pattern are Neumann TLM193s with a
cardioid polar pattern. These microphones are positioned one half of one unit further
back than the line containing the left and right microphones. The distance between
the final two microphones can vary depending on desired separation or width of the
stereo pair. These microphones function as the surround back signals. In this setup
the microphones are positioned one unit apart and one half of one unit back. This
allows for a more coherent rear center image, as opposed to the width gained by
positioning the microphones greater than one unit apart.
The most accurate way to space the microphones would be to create a single
microphone stand with cross bar and sliding microphone clip positions. However, the
setup can be accomplished using individual microphone stands carefully placed and
angled to fit in the specified array.
2.2.4 Sine Sweep Recording Method
The recording hardware necessary for this session is located in the adjacent control
room, Studio J. The room is running Pro Tools connected to a Yamaha DM2000
digital console. A basic Pro Tools session at 24-bits and 48kHz14 is created with
inputs one through eight setup for recording, with only those needed for each
particular pattern enabled for recording at the given time.
The gains on the digital console are set for each input channel based on test material
being played back through one source monitor at a time. The levels are adjusted until
they are such that no digitally clipping occurs and the levels remain relative to all
other microphones used in the surround pattern. This helps to ensure the balance
between left and right and between front and rear microphones remains proportional.
As noted in section 2.2.2, two Mackie monitors are used for playback of the sine
sweeps. The purpose of using two monitors is to allow for separate convolution of
left and right input channels in the Hydra plug-in. Therefore, the gain levels only
need to be set using one loudspeaker at a time.
14 The intention was to use 96kHz as the sampling rate, given its an easy down
conversion to 48kHz should the plug-in require less CPU usage. However, the setup
at the time was not compatible with 96kHz recording.
32


For each of the surround microphone configurations, each of the speaker positions
(straight and toed-in) and each of the source loudspeakers (left and right), recordings
are made to later de-convolve, creating the impulse response files for Hydra. At the
same time select stereo music test material is recorded for each of the setups to use
later as a comparison to the output created by Hydra.
The sine sweeps used in the recording process are generated at 24-bits and 48kHz by
Voxengo Deconvolver v. 1.9.3. The frequency response of the generated sine sweep
is 20Hz to 20kHz and the length of time in which the sine sweep fully plays back is
30 seconds, with five seconds of silence recorded at the end to ensure capturing of
any lingering sound (Figure 2.12).
Figure 2.12 30 Second Sine Sweep Waveform
2.3 Impulse Response Editing
Once the surround pattern audio signals have been recorded the files need to be de-
convolved to create the impulse response files that will be used in Hydra. This
process is completed using the Voxengo Deconvolver (Figure 2.13). The advantage
of using this system is that the de-convolved files are in a standard .Wav audio
format, allowing the impulse responses to be used in virtually any plug-in or
convolution program. Other impulse response creation programs such as Altiverb IR
Processor and the Waves IR Utility result in de-convolved files in a proprietary
format. For this reason alone, the Voxengo Deconvolver was the final choice.
33


9Voxengo Deconvolver

Test tone fite:
jC:\Documents and Setting*\Scott Slt*r\D*ktop\Sin* Su*eps\Sweep-48000-24-M-30.Os.wav
Fie folder: p Include subfolders
growse
jI:\IR\StudioC\Unprocessed\Decca_Scraight\
Fies to process: if^^toiders.Ut Qear 1st {
Browse
S:\IR\ScudioG\Unprocessed\Pecca_
H:\IR\StudioC\Unprocessed\Pecca_
E:\IR\ScudioG\Unprocessed\Decca_
E: \IR\StudioG\Unprocessed\I>ecca_
1:\IR\StudioG\Unproce**ed\Decca_
E:\IR\ScudioG\Unprocessed\Decca_
B:\IR\StudioG\Unproc**ed\Decca_
E:\IR\StudioG\Unprocs*ed\t>cca_
E: \IR\StudioG\Unproces*ed\Decc&_
E: \IR\StudioGMFnproc**ed\Decca_
E:\IR\StudioG\Unprocef*ed\Decca
Output folder: f Do not add *dc suffix
StreighcAL_C.wav
St r ai ghe \ L_L. wav
St r ai ght\L_Ls.wav
Straight\L_Lsb.wav
St r ai ghc\L_R.wav
Str aight\L_Ps.wav
Streight\L_Rsb.wav
Straight\R__C.wav
St r ai ght\R_L.wav
Streight\R^Ls.wav
Streight\R Lsb.wav
jE:\IR\StudioG\Unproc*5ed\Dcea_Straight\
Out b* depth; f~ ^durne c£ f" gtence (sec) P Reversed technique
3
|24
P Ignore already processed fies
fibout I Help J Jest To*1* Gen
p ^ow Cut Hz jio
p \$P Transform P High Cut Hz
r [Jorma6zeto-0.3dBFS Cut slope: |-12 dB/o 3
Figure 2.13 Voxengo Deconvolver
De-convolution of the audio signals relies on the comparison of the original sine
sweep audio file to the recordings made using the surround sound patterns. To de-
convolve the files it is found to best leave many of the options in Voxengo
unchecked. In particular, it is important to not check the option to Normalize to -0.3
dBFS. Using this option would disrupt the level balance between the audio files and
thus destroy the layout of the surround sound pattern intended. The one option that is
important to check is Low Cut. Voxengo provides the option to manually identify
the low cut frequency. To curb likelihood of sub-sonic rumble, a low cut of 20Hz is
added to the de-convolution process.
2.3.1 Normalization Trials
Not using the built-in Voxengo normalization option does not mean that
normalization of the impulse response files is not necessary. On the contrary, to
obtain an output level from Hydra to function as direct sound, the files must be
normalized to some value. Various attempts were made to normalize the values of
the impulse response files without resulting in digital clipping once the Hydra input
signal has been convolved. Despite normalizing an impulse response file to -0.3
dBFS, the resulting convolved sound can still clip in the digital domain. Through
trial and error the best, normalized value discovered is a maximum of -15 dBFS.
As a side-point to the VST plug-in creation, the impulse response files were
normalized in various ways to discover the best sounding convolved signal. Using a
34


Pro Tools software system, the impulse response files were altered in the following
ways.
In the first batch, the de-convolved impulse response files are left as is. No alteration
to the amplitude of the files is performed. In the second set, the peak value of the
impulse response files are normalized to -15 dBFS, with all values remaining relative
to the largest amplitude present in the surround sound group of audio files. For
example, in the OCT-2 setup, the center channel recorded has the largest amplitude.
Therefore, the value of that audio file is normalized to -15 dBFS and the value
necessary to achieve that peak normalization on the center channel is applied to the
remaining audio files in the group. So, in the OCT-2 example, the left speaker to the
center microphone yields the greatest peak value. It requires 7.2 dB of gain to reach a
peak level of-15 dBFS. Subsequently, when 7.2 is applied to the rest of the set the
results are as thus; the left speaker to the left microphone peaks at -23.8 dB, left
speaker to right microphone peaks at -27.1 dB, the right speaker to the center
microphone peaks at -18 dB, the right speaker to the left microphone peaks at -29.4
dB and the right speaker to the right microphone peaks at -25.4 dB.
The third set of normalization values adjusts the peak values of each stereo pair.
Here, using the OCT-2 setup as a basis, the values of the left speaker to the center
microphone and right speaker to the center microphone are normalized to an equal
peak value. Following suit, the right speaker to the right microphone would have the
same peak value as the left speaker to the left microphone, and finally the cross
speaker/microphone combinations would have the same peak normalized values.
This is done in an effort to see if any irregularities in the left/right balance could be
corrected.
In the final set, the RMS (Root Mean Square) values of the audio files are adjusted to
be equal for each stereo pair, just as with normalizing the peak values in the third set.
This results in slightly different peak values than in the third set and offers a different
way in which to balance out the irregularities in the de-convolved impulse response
files.
Two additional sets are added to the collection but only for the Double M/S pattern.
While level differences between the three signals is the basis of the Double M/S
pattern, two sets of impulse response were created adjusting max uniform gain, in
which all signals in the set are normalized to a peak value of -15 dBFS, as well as a
set with maximum RMS values for all the audio files. In the latter set, the RMS is
adjusted to ensure the peak value never rises above -15 dBFS.
35


The final step in preparing the impulse response files for use in Hydra involves
trimming and fading the files. This is necessary to remove extraneous noises in the
tail of the de-convolved impulse response, to save CPU by limiting the maximum
sample length of the impulse response, and to provide a smoother transition in the
convolution step. Each impulse response file is trimmed at the end to a length of 2.5
seconds. A fade out of .213 seconds is added to the file to prevent clicking when
used to convolve an audio signal. A fade in is unnecessary as the de-convolved audio
files begin at the zero crossing.
36


3.
The VST Programming Process
The Hydra plug-in is written in a VST, module-based software program known as
Synthedit. This program allows for VST plug-in creation by arranging and tweaking
available compiled modules. Each module contains input and output pins allowing
for dynamic audio signals and static information to be passed from one module to
another. Modules exist on three levels DSP, GUI and Text. DSP modules utilize
the most CPU as these modules are constantly recalculating as the audio signal
passing through them changes. GUI, or Graphical User Interface, modules are
dedicated to creation of user control functions such as knobs, sliders, switches, etc.
Finally, Text modules simply allow for the display of text-based material within the
program, such as the display of the numerical value of a knob's position. The latter
two forms of modules are different from DSP in that they don't receive a voltage
audio signal and thus they dont constantly trigger. GUI and Text modules are static
while DSP modules are dynamic.
By combining and manipulating the various modules it is possible to create many
different types of audio VST plug-ins. Synthedit is based on C++, and that is the
language in which the modules used within the program are written. The creation of
Hydra requires little in the way of C++ manipulation. The only code manipulation
performed for the creation of Hydra occurs in the convolution section of the structure.
This is discussed later in the chapter.
For complete plug-in efficiency, Hydra would need to be natively written in C++,
DSP or another computer language. However, that is beyond the scope of this
project.
3.1 Basic Plug-in Signal Flow
The Hydra plug-in follows a basic signal path at the top level with more complex
signal routing and module manipulation accomplished within the designated
containers.15 Hydra follows a flow as thus: Input to Surround Sound Pattern
Structure (which includes convolution) to Delay to Peak Limiter to Final
Mixing/Metering to Bass Management and Output. There are several other functions
15 Organization of building a plug-in is accomplished through the use of containers.
Containers allow for like routines to be placed in a single area providing easier access
to groups of components of the plug-in.
37


to the plug-in that fall in-between these major containers, which shall be discussed in
more detail as the chapter progresses. Figure 3.1 shows the basic signal flow at the
top most level of containers.
3.2 Surround Sound Pattern Structure
Each of the six surround microphone patterns used in Hydra follows a basic signal
flow, with alterations made to accommodate the number of audio channels in the
surround pattern, necessary changes in the order of the events and the math involved
in the computation of the convolved signals. The latter is namely for the Double M/S
pattern.
3.2.1 Basic Structure For All Patterns
This section discusses the similar components shared by all of the surround patterns
in Hydra. The top-level structure of the Rear-Facing Cardioids pattern is used as a
reference for visual representation of the basic structure for all the patterns. This
structure can be seen in Figure 3.2. Additional visuals are inserted as needed to show
more detail.
When the stereo audio input signal reaches the input of the plug-in, each of the two
channels is split and sent to the various surround sound pattern containers,
maintaining stereo separation of the audio into each of the surround patterns. This is
important as the impulse responses created were based on the use of two loudspeakers
in a stereo layout. Inserted just after the input of the plug-in is the primary bypass
switch. This will inactivate all signal processing in the plug-in and route the original
stereo audio signal to the output of plug-in. This provides a quick way to compare
the original audio with the processed audio. Figure 3.3 shows the top-level, breakout
structure of the Primary Bypass container.
After reaching the designated surround pattern container, the audio signal first passes
through a bypass switch, or power switch as shown in Figure 3.2, Item 1. The
primary function of this switch is to serve as a CPU-saving module. By defaulting to
the off or bypassed position the audio signal cannot go any further in the chain of
modules. As DSP modules use CPU processing power, stopping the signal at this
point puts the subsequent modules in the chain into sleep mode. Also, this simple
switch stops all six of the surround patterns from firing at the same time, which would
result in a CPU overload and a mess of distorted audio.
38


u>
VO
Lefttn-
Rightfn-
Spate
Controls on Parent
Leftln
Left-Plugin-;
Rlgh*Plugliv| f. ^
LettBypaw!
RightBypa:
Mute-:
p 5t.~hM*?mGrv Jr
Name-
Value-
Animation P option-:
Menu itoms-l
Mnu Selection-1
Mouse Own-;
Value Out
Value
Contests on Parent
Control* on Parent
Spare-
>tfltvl
Fie*J VaF
Control* en Parent
Animation Position
-Spare
Controls on PsioM
Ffame Number
-Spec*
/ConOoMon Pw
OMS-UROui-
OMS-RlgMOuti
0M3CtM*<
DMSLsOutj
DMS RsOut. vv
L*Mn f\\\
4R DMS-LsbOut^
DMS-RibOwti
Deeca-LeftOut:
Deeoa-RightOuliA.'
Deeca-CenterOutv'i
Decoa-lsOuM.
Deoea-RsOwt
Oeoca-UbOutj
Ceooa-RsbOwt
-OUS towel In [
Do tea Level In
...............
^Control* on Parent
Leftln
RigMIn
IRTNormal-LeftOut
IRT-Normal-RigMOwt-*
IRT-Normal-LsOuVl ^
IRT-Nermal-RsCuti
IRT-Flipped-LsbOuti
IRT-Ftfpped-RsbOut-
HAM-Normal-LeftOuti.
HAM-NormaTRigMOut-
HAM-NormatUOut.
HAM-NormatRsOutn .
V HAM Flipped-L*bOut>
MAM-Flippe4-RsbOut:
;-IRT towel In
HAM Lewel In
Spare
Controls
Leftln
vRlghtln
OCT2-LeftOut-
OCT2-CenterOut
OCT2-ftlgMOut
OCT2 Level In
Spare
Patent ,-Controls on Parertf.
. Leftln ,
I tRIghtlr. .
| RF-NormalLsOut
RF-NormalRjOub1
RF-FlippedUbOutl iA
RF-FlippedRsbOutVy
RFC Lewot In *
Spare
IP a
OCT2 UvotlA
RFCUvet!
Mjileft-
Control* on Parent
-RF-NormalLs
j|>RF-NormalRs
ZS-RF-FHppedLsb 1
RF-FlippedRib
RF-UO*l*yOut>
RF-Rs-DelayOirt-l
RF-Lsb-DelayOutl *
RF-Rib-Dtl#yQut;
-OCT-Lefl
OCT-Cerrtei
-OCTRight
OCT-Left-OelayOut;
OCT-Center-OelayOui
. OCT-Right-Del*yOut>
V IRT-Left
k IRT-RIght
IRT-Ls
IRT-R*
't|RT-Fllpp*d-Lsb
IRT-Flipped-R*b
IRT-Left-DelayOut
IRT-RightDelsyOut
IRT-Ls-OelayOut-
IRT-Rs-DelayOut-:
IRT-Lsb-DelayOut-!
IRT-Rsb-DelayOuti
Hamaiald-tett
Hama*aW-Rlght
Hamasakt-U
Hamasakl-Rs
Hamnaki- Left DelayOut
lHama*aWRIghl-C>elayOui
H a m as aid-l> 0 e I ayO uV
Hamasaki-Rs-CeljyOuti
H a m a*aW- Lfb- 0 e I ay0 uti
HamaiaW-Rib-DelayOut-
vv -DMS-Lett
-OMS-Center
- V,DMS-Right
i.; *OMS-Ls
' >PMS R*
DMS-FlippeO-Lsb
OMS-FHppeO-Rsb
OMS-Len-OelayOut--
OMS-RigfttDelayOut?
OMS-Center-DelayOut-j
DMS-Ls-DolayOutj
OMS-Rs-OelayOut-!'
OMS-Lib-DelifOirt-1
DMS-Rsb-OelayOub
'-t>*ccj-Left
^Oecoa-Center
i tDecea-RIght
^Deooa-ts
A {.r)rm*.R
V IV
Figure 3.1 Top-Level Containers of Hydra
Controls on Parent ] ^ fConttels on Parent ..Controls on Parent , LeftOwt
PostDlJy i ^RF-NormalLs-PostLimit f-Left RlghtOut
tf>RF-HormalRPostOelay ' ^RF-NormalRs-Posttimlt Right CenterOut
f RF-FlippedLsb-PoftDeloy | ^RF-FlippedLsb>PostLimrt; Conte' LFESubOut
^-RF-FlippedRib-PostOeliy j ' ^RF-FMppedRsb-PostUmrf LFE LsOvt
RF-NormalLS'ymitOut: - * ,-OCT-FLPo*tLlmlt Ls : r-RsOut
RF-NormalRs-LimltOut-j ,>OCT-C-PostLlmit -R* i g -Lib Out
RF-FlippedUb-UmitOut; . VOCT-FR-PostUmit Lst. -Rib Out
RF-FlippedRsb-LimitOul-j i-IRT-FyPo^.imit -Rib : -Spare -
OCT-Uft-PostOelay -IRT-FR-PostUmit L-BMBypass-
.-OCT-Center-PoftOelay >{RT-RL-PostUmrt R-BMBypas- :
-OCT-RightPostDelay r-tRT-RR-PostUrT.lt | C-BMBypaw
OCT-Left-UmltOgt ..IRT-Lsb-PostUmrt ] LFE-BMBypae^- >
OCT-Centei-UmrtOut r>lRT-Rsb-PostUmit i LBMBypaw
OCT-RigM-LimltOuP iHamasaW FL-PostLImit j R8MBypass-
r1RT-Left-PostCelay ; 'j'HamasaktFR-PostUmit Ub-BMBypa
,IRT-RlflhtPtOelay ' ^-Hamasatl-RL PottLImlt j Rsb-BMBypass-'
.rlRT-Ls-PoftOelay ' ^Hamasakl-RR-PostLimlt j LFE-Sub-PejtBMJ
,-IRT-RpPostOtIay ' :> ^HamasaW-Lst< Pe*tLlmlt! UPostBM-i
r1RT-Lsb-Po*U>elay 'V-Hamafalrt-Rsb-PostUmlt; R-PoetBM-|,
-IRT'Rsb-PostOelay -E>MS-L-PosU.imit C-Pest8M->:
IRT-teflUmitOutv
IRT-Ripht-LimltOuL
IRT-Ls-UmHOut.'
IRT-Rs-timttOirt-'
IRT'tsb-UmrtOuV
IRT-Rsb-LimitOut-,
f H a m asaW t e ft P o*tO e I ay
rf-Hama*aW-Rlght-PostDlay
HamasaM-Lf-PortOelay
^-HamasNri-Rs-PostDofay 1
HamafAdUb-PostDelay
.-HamasaM-Rsb-PostDelay
Hamasaki-Left-UmrtOuti
Hamasato-RightLimftOot
Hamasak-L*-UmrtOut'
Hamasaki-Rs-UmltOul-!
Hamasaki-Lsb-UmitOuti
Hamasaki-Rsb-LimitOut'
..-OMS-Left-PostDelay
H-OMS-RightPestOetay
.f&MS-Center-Post&elay
,i-C>MS-Ls-PostDe!jy
^.OMSRAPostOelay
a-CMS-Lsb-PostDelay
-DMS-Rsb-PoitOelay
DMS-Left-tlmltOut-! *!\J
OMS-RiflM-UmitOutJ^I
CMS-Cente-UmftOut>^
DMS-Ls-UmitOuV,^
&MS-R*-timltOutJf
OMS-tsb-UmltOuV.1
C-MS-Rjb-UmltOutJ
^-Deeea-Left-PojtPelay j
w 0 oea- R1 gM-P ostO el ay
- -f>e*na>PentoPftffkei*v
fDMS-RPostUmit
. ic>MS-C-PoitUmlt
re-t>MS-Ls-Posttimit
^1-DMS-Rs-PiwtUmit
,^-OMS-Lsb-PosttJmit
jYfrDMS-Rfb-Pofttlmrt
Oeooa-LPoftUmrt
Deooa-R-Poittlmit
Deooa-C*PostLfmit
Decca-tAPofttlmH
^'-Oeooa-RpPosttimlt
D e oe a-Lsb-P ostf m i t
i C> e oca-Rsb-P osttl m it
LFS-PortUmit
Left Out-j
Right Out;
Center Out
UfE Out
Lf Out
R* Out
L*t> Out
Rsb Out
RePestS MT;
Ub-PestSM^
tgjf-'-Left To Mist
iff 'Right To Mix
v'V.....Master
'jrContat* on Parent
Kt-PostSM
'R-PosttM
- .C-PestSM
fLFE-Sub-PostBM
rts-PostBM
Rs-PostBM
-Lsb-PostSM
vRsb-PostBM
Left-FinaiOut!
RightFinalOutl
CentenFInalOut!
LFE-Sub-FinalOgt'
Ls-FinalOutl
Ra-FinalOutj
LstvFlnalOutr
Rib-Fin alOutl
MuteOrv!
Spare


IIP Mod
Left Out-
i Right Qut-
iotf-Mute
^NofmalLs
NormatRs
-FlippedLsb
-FlippedRsb .
LeftToLs/b-
LeftToRs/b-
RightToLs/b-
RightToRsto-
RFC Level-
Spare

PF Conv L Switch! [Width l Switch
Choice i-Choice
Input input
Off-; Mix-
On*j Wide-
I Spare Output-j Spare Output-
1 !33SPSTEE?SH?I Width R Switch
I -Choice j I -Choice
I --Input i -Input
Off. Mix-
On-I Wide-
1 Spare Output-j r v: Spare Output-
Power f-S iWidth
I Controls on Parent i j Controls on Parent
| Value Out-! Value Out-
-Spare Spare |
!
js s

1 v 2
Input ; BL-Rs/b Convolver 1
Input !
Output- Output-j
-FileName , -FileName r
Maximum FFT Order ; -Maximum FFT Order I
| Rea! Latency-/"^ Real Latency*!
R*Ls/b Convolver ^R-Rs/b Convolvei
-Input -Input i
Output- Output-;
FileName -FileName 1
Maximum FFT Order -Maximum FFT Order
Real Latency- Real Latency-
m t*

WKKKHKm Controls on Parent Controls or^^fenf
-LsIn ; -EQ Ls :
-Rsln -EQ Rs ^7 :
EQ Ls On-i -EQ Lsb /
EQ Rs On-! -EQ Rsb 1
r 1 EQ Ls Off-? Ls Out-;
^ EQ Rs Off-; Rs Out-|
FlippedLsb i Lsb Out-:'
-FlippedRsb . Rsb Out--
Mute-i -RFC Level
EQ Lsb Off-!. EQ Rsb Off-!. EQ Lsb On-i EQ Rsb On-[ -Spare
Spare i

EBlSSPf IChannelFlips witch
1 i-Cholce ! 'Input 1 Controls on Parent ! 1
1 -Input i -Input 2 | ToFUpSwitchLsrt) j 1
1 ; DipOff-j Output-; -ToFlipSwitchRs/b i 1
1 DipOn-i NormalLs-i
I Spare Output-i NormalRs-i
FlippedLsb-;
l3dB-0iD R Switch! FlippedRsb-,
I-Choice Input 1 j. "value Out
I -Input i i -Input 2 -Spare
DipOff-j Output-];, " ^
DipQn-j1 *
Spare Output-i'
CTl'SlHpi " iFlinPowar
-Controls on Parent ^Controls on Parent
Value Out- ; Value Out-
-Spare -Spare

J-"*k
i k > cm >
Figure 3.2
Basic Surround Pattern Structure



V
10 Mod
Leftln-
Rightlr-jv
Left-Plugin
Right-Plugin
Left-Bypass
Right-Bypassp'-
Mute
Spate
\
mihwkbmi
Choice
Input
Oi
Bypass-
Spare Output-

PluainBypass-R
\ -Choice
ylnput
| On-
j Bypass-
I Spate Output-
Controls on Paten!
Value Out-
Spate
35.
MuteL
-(Choree
Input
Inputl'
Spate
ate Output-
Choice
^tj-lnput
'i Input2-
j Spare-
Spate Output-
m
liSiSSHIII
i-Controls on Patent
Leftln
(Right In
/" j Left To Mix-?
1 Right To Mix-!
Spate
^-Controls on Patent
Value Out-!
Spate
Figure 3.3 Primary Bypass Structure
Following the bypass switch is the Width Correction field (Figure 3.2, Item 2). This
section defines how the input signal will be sent to the convolution section. There are
two settings, Mixed and Wide. Mixed is the default setting for the switch. When in
this mode the audio is sent to the convolution section in a blended fashion. For
example, the Rear-Facing Cardioid pattern uses two microphones and two
loudspeakers. When the impulse response recordings were made, each loudspeaker
resulted in a signal recorded from the left and right microphones, for a total of four
impulse response files; Left To Left, Left To Right, Right To Left and Right To
Right, where the relationship is speaker to microphone. Using the mixed mode
utilizes all of these impulse response files, where the relationship above is replaced by
input audio signal channel designation to impulse response microphone location.
The other mode is Wide. In this case, two of the four impulse response files are no
longer used. Instead, the input audio signal is only paired with its corresponding
microphone location. Using the Rear-Facing Cardioid example, the left input audio
signal would only pass to the Left To Left impulse response and the right input audio
signal would only pass to the Right To Right impulse response.
The importance of this choice setting is to again save CPU power and to provide two
different fields of width to the end user. Choosing the Wide mode saves CPU in that
convolution is the heaviest CPU process in the plug-in and the Wide mode always
uses only half the convolution modules as the Mixed mode. Note: the Double M/S
pattern does not have the availability of a Width option due to the nature of the
41


microphone setup, in which all microphones are spaced in the center of the left and
right loudspeakers and there is no stereo separation.
The next subsequent section of modules to the basic structure is convolution seen in
Figure 3.2, Item 3. This is the primary element of Hydra, as all output audio signals
are dependent on properly passing through the convolution algorithm. As mentioned
earlier, the convolution module is complex and is the only module to be written
outside of the Synthedit domain. The source code for the convolution module is
open-source Delphi code. The complexity of the convolution script is beyond the
scope of this paper. However, the Delphi code was converted into a C++ compatible
script and then compiled as a Synthedit DSP module. Important to understand
however, are the different options available on the convolution module.
First, a pin exists on the module to set the Maximum FFT (Fast Fourier Transform)
Order. The importance is two-fold. First, this setting must be set high enough to
efficiently convolve an audio signal to the full length of the impulse response without
the addition of a large amount of zeroes beyond the end of the actual impulse
response file. The value in this field relates directly to audio samples. The number of
samples equals two to the power of the order. So a Maximum FFT Order of 10
equates to 1,024 samples. Given that at this time Hydra uses pre-determined impulse
response lengths of 2.5 seconds at 48kHz sampling rate, the Maximum FFT Order
should equal at least 120,000 samples. Balancing the length of the impulse response
with CPU efficiency, the Maximum FFT Order setting in this case is set to 17, or
131,072 samples.
The second option of importance is the Desired Latency pin. Presets have been
implemented for consistency. Synthedit processes audio in blocks of data and not per
sample, as this would not be CPU efficient. Therefore, the presets are multiples of
the block sizes to ensure smooth transitions. The purpose of Hydra in its current form
is for use as a post-production tool or music convolution by a consumer. Therefore, it
is not important for the plug-in to have zero latency, as it would be if it were used in
live recording. Since that is not the case, the Desired Latency has been set to the
maximum of 8192 to provide the best CPU performance.
The final option pin on the convolution module is the File Load function. This field
simply serves to tell the convolution module which impulse response to read. As
discussed in chapter two, various sets of files were created. To provide a selection
function for the user, a drop-down menu was created listing the various impulse
responses available for use. Selection of any one of the listed items will feed the
corresponding pathname of the impulse response to the File Load field of the
convolution module. Only one selection needs to be made per surround pattern to
42


feed the correct pathnames to all of the convolution modules used in a surround
pattern. As part of the File Load function, an information button is linked to each
surround pattern. When clicked on the graphical user interface, a file is loaded
displaying information on the impulse response chosen. The information includes an
image of the setup, speaker types, microphone types and distances. This information
is helpful in determining the delays needed when using the blending feature.16
Each surround pattern has twice the number of convolution modules as the number of
audio channels present in the pattern. This is due to the way in which the impulse
response recordings were made as discussed earlier. For example, the OCT-2 pattern
has six convolution modules, as each of the input signals can be routed to the three
microphone locations of the surround sound pattern.
Once the audio signals have been convolved, the signals are sent to a Level
Adjustment switch shown as Item 4 in Figure 3.2. This provides the user with the
ability to lower the output level by a fixed three dB. This option is present as to
remedy the level change when adding two uncorrelated, equal volume audio signals.
The reason it remains only an option is that the output levels of the convolved signal
are generally low enough in level that when added together the peaks will not reach
the digital limit of zero dBFS. The other reason for its presence is due to the Width
option. Should the user select the Wide mode then the three dB level dip is not
necessary, though is still available, as no convolved signals are being added. The
exception to this rule is when the surround pattern used is capable of a center channel.
This applies to OCT-2 and Modified Decca Tree. In these cases, a three dB
adjustment is automatically applied to the center channel convolved signal during the
Wide mode, so as to remain balanced with the left and right convolved audio signals.
Though the Double M/S pattern results in a center channel, it does not follow this
exception since the center channel is derived from a coefficient matrix.
Following the Level Adjustment switch is the Channel Flip option (Figure 3.2, Item
5). This option is only available for Double M/S, IRT Cross, Hamasaki Square and
Rear-Facing Cardioids. This option allows the user to redirect the left surround and
right surround audio channels to the left surround back and right surround back
outputs. This is an either or option. For IRT Cross and Hamasaki Square, the
Channel Flip option is post-metering, as the left/right surround channel strips on the
user interface are the same as the left/right surround back channel strips. This switch
only changes the signal routing for options after the basic surround pattern structure.
Therefore, the EQ, Mute and Fader selections for these two surround patterns are the
same regardless of the position of the Channel Flip switch.
16 See section 5.1 on "Blending."
43


For the Double M/S and Rear-Facing Cardioids patterns, two extra channel strips are
available and thus separate EQ, Mute and Fader settings can be preserved for
left/right surround and surround back channels regardless of the Channel Flip switch
position. The Modified Decca Tree and OCT-2 patterns do not have flip switches
available as the former is already a full seven-channel surround configuration, and the
latter only consists of the front three channels.
Once the audio signal has been convolved and routed through Level Adjustment and
Channel assignment options, it is sent to the EQ section (Figure 3.2, Item 6). This
section also includes individual channel mute options. The Mute function is pre-EQ
and metering. Again, this is a technique employed to conserve CPU processing
power when an individual channel of a surround pattern is turned off.
To allow for more functions to be added to Hydra, it was decided to limit the EQ
section to a three-band tone control high frequency, mid frequency and low
frequency. See Figure 3.4 for the breakout of the EQ container. The EQ section uses
Biquad filters, which provide high-shelf, Peak EQ and low-shelf filtering. The high-
shelf is set at 12kHz for manipulation of the high frequencies. The low-shelf is set at
80 Hz for low frequencies. The mid frequencies are defined at 2.5kHz with a wider
Q17 than the high and low-shelf filters. Each EQ has a 18dB adjustment range.
Each channel in a surround pattern has a separate three-band EQ. For conservation of
CPU, each channel strip is equipped with an EQ power on/off switch.
The final section of the basic structure of a surround pattern in Hydra is that of
metering and level control (Figure 3.2, Item 7). Figure 3.5 displays the components
of the Metering container. The level of each channel of a surround pattern can be
controlled individually, yielding great mixing flexibility. Each channel fader has a
range of -60 dB to +20 dB. In typical digital audio workstations the level faders
reach a height of + 12 dB. The reason for the extra decibel range is that Hydra is
working with convolved audio signals, which potentially could require a bit more
push in level adjusting.
Once the audio signal passes through the individual level adjustments, the signal
continues to a global level adjustment section. This section provides fine-tuning
control over the level of each surround pattern as a whole. Here, the level of the
entire surround pattern is adjusted in relation to the individual channel settings,
maintaining the balance of the mix already created. Both level adjustment sections
17 Q indicates the bandwidth or slope of an EQ filter. A higher Q results in a filter
that affects less of the frequency range surrounding the defined center frequency.
44


HM8HM3
U In-)
Rs ln-;
-EOUOn :
EQ Rs On :
*fEQ Ls Off :
-W-EQ Rs OH |
,Fllpp*dlsb-|
F||ppdR*b4
j-Mut* :
*-EQ Lsb Off;
EQ Rsb Off!
EQ Lsb On:
-EQ Rsb On;
Spar*
ffiwasBBami mmsm iimaamui
Controls on Paront -Chore* -Controls on Parent ;
Vatu* Out-:' -Input Valu* Out
Spar*
Inptrt-
Spa*-i
Spj(*Qutput-)
Choice
input
EQ Off-J-
EQ On-'
Spar* Output-;
Controls on Partnf
Input
Spar*-
Sparo Output-
Controls on Parent
Value Out-j
Spot*
Controls on Parent
; v*iu out-; .chotc
: -Input
Controls on Parent
Vilu* Out-1
Input-1 'Spar* ________
8pat*- iia
8j>sro_OutpuV ........- ...-
Controls on Parent
p ^ Vatu* Out-'
lnputj~-vSpar
Chore*
Input
EQ Of4-
EQ On-!
Spar* Output;
FChoic* !
Input
Spjro-^
input;
_____Spar* Output-'
Chore* i
Input
Spar*-.'
Input-
Spar* Output-

Figure 3.4 Rear-Facing Cardioids EQ Structure
fControlson Parang .---Input
Output-) C*nt*rFr*q
Sparo-j -Bandwidth
| :*tOain
1 ^Filter Typ*
^ >a2
^Controls on PdfOflti ^Inpu*
Outputf-*Cnt*r Fr*q
Sparo-1,1-Bandwidth
p-kGaln
I -Filter Typ*
i Vi) IS
! iiZ
i I *b1
:r------b2
Output-
rControipon Parantf ^input
Outputj/i ;-C*nt*r Frq
Sparo-j / ^Bandwidth
k*Gain
a I -Filter Typ*
ns
: r* 2
bi
b2
: fControlsonPsronl-j-lnput
| . Outputj *C*nt*rFrq
Spai*-M-Bandwidth
i p-Oaln
OS fFiltor Typ*
i:S
: -a2
r *bi
................. -b2
Output-
Output-'
Controls on Paront
Output-;
Spar*-.
O
Controls on Paront Input Controls on Paront rlnput
Center Fr*q Output Cntr Fr*q Output- -Center Freq
Bandwidth Spar*- ' -Bandwidth Spar*. 8andwidth
-Gain Gain - -Gain
Fitter Typ* r^S. I -Fiit*r Typ* -Fitter Typ*
aO ill aO i ; \ "
-a1 i -at
-a2 a2 : a2
-b1 bt -b1
62 " ~ ' -b2 -62
Output
Output--
Outputs
Controls on Paront
Output-i
Sparo-i
)
Input
C*nt*r Fr*q
Bandwidth
Gain
Filter Typ*
aO
at
a2
Controls on Parn1
Output-
Spar*-
: O
^Input
- C*ntr Fr*q
Bandwidth
Gain
Filt*r Typ*
aO
a1
-j2
b1
62
Output'
mnsaKinasiai'j-Ji
Controls on Paronf >lnput .
Output! fCntr Frq ;
Sparo-j. -Bandwidth
rGjin j
j -Filter Typ* j
!'* I
i I
a2 :
j *b1 I
J -b2
Output


EQ U-
EO Rs-
EQ Uto-
| EQ Rsb-
4U Out
*Rs Out
Lsb Out
-Rsb Out
RFC Level-
Spare
edium Fader
Controls on Parent
Level*
Spare
Controls on Paront
Inputl
Level (d8)
| Outputi*
....fcSpare
liBMIBMBBUl g
Controls on Paront *i
Controls on Paront
LovH :
Sparc j
Controls on Parent
Inputl
i-Lovel(dB)
<£*_
Output1< B-.
Controls on Patent
flnput1
. 'tluvel (dB)
'zsmamzm
-Controls on Parent
Level-
Spare
Controls on Parent
Inputl
-Level (d8)
Outputi-
^Spare............j
v
raSisiasBsg-'g
Controls on Parent { "Consols on Par
Leeef- -Inputl
rSpate -Level (dB)
Outputi-:
Spare
vi.|n L
fcln R
|-Time (ms)
Outputi >
Input2 ,-lnput3 --InpuM Output2-i

. OutpuG-t
tSEia Output^
rvfSIgnal
>Volum
\ | Output-
l-Rcsponsc Curve
Out L-
Out R
'vSIgnal
ln l j ^Volume f
'vln R | ! Output-i
1-Time (ms) -Response Curve]
OutLf-
Out R.J
..J -Signal
Volume
Output-
Response Curve
^Controls on f
-^Signal in
i-Spart
^Controls on Parent
^Signal in
Spare
mmmm
Controls on Paront
Signal in
Spare
ON
Figure 3.5 Typical Structure of a Metering Container


are pre-meter, thus all changes will appear on the level meters.
The level meters consist of four segments green, yellow, orange and red. The
entirety of the level meter displays signal level from -60 dB to 0 dB. The green
segment represents the lowest energy level with a range of -60 to -12 dB. The yellow
section displays signal level from -12 to -3 dB. The orange section displays -3 to -
0.001 dB. The red level is a simple peak indicator that illuminates when the signal
reaches -0.001 dB or higher. The peak indicator remains lit until clicked on for
resetting. The structure of one level meter is shown in Figure 3.6
3.2.2 Double M/S Alterations
The sections above represent the basic configuration of each of the surround sound
patterns. However, the Double M/S pattern contains one important section not
included in any other pattern. Succeeding the convolution and level adjustment
sections the Double M/S pattern has a Type section. Double M/S relies on
coefficients applied to each of the convolved audio signals for the creation of the
individual audio channels of the pattern. Various matrices of coefficients exist for
Double M/S. Each of the matrices results in a slightly different virtual microphone
pattern. While the adjustments are small, they have been included in Hydra as a user
option. The primary reason for the inclusion of several matrices is that the Double
M/S pattern can be figured with or without a center channel. In Hydra, Types 1,1(b)
and 1(c) all include a center channel. Types II and 11(b) use matrix coefficients that
do not factor a center channel.
3.3 Master Delay Structure
Once the audio signal has passed from convolution to EQ and to mixing, the discrete
speaker feeds of each surround pattern are sent to the Master Delay container. Here,
the audio signal is given the option of being delayed by a number of milliseconds.
There are two primary reasons for a delay unit in Hydra.
First, when various surround patterns are mixed together using different impulse
responses, the delay unit functions as a distance correction component. Mixing the
surround channels of one pattern with the surround channels of another pattern with a
different distance from the source may result in a comb-filtering effect. Use of the
delay unit can compensate for the difference in distance, wherein one millisecond of
delay equals approximately 1.1 feet of distance. The second primary use for the delay
unit is to create a more discrete front and rear soundfield. Delaying the surround
channels can change the depth of the surround pattern.
47


4^
00
10 Mod
Signal in-i
-Spare
[peak Follower!
-Signal in j
-Atta ch. i
-Decay
Sjgnal Out;
IMonostablel
-Signal in
-Pulse Length j
SignaljOut-i
[Sample And Hold!
-Audio
-Hold
Output-;
iSESBEEBS
-Volts In j
-Response j
-Update Rate |
Float Out-!
fSSQBSBSB
(Animation Position
-Filename
-Hint

-Menu Items
Menu Selection
Mouse Down
Frame Count
v
[Patch Mem Float 0ut
Name-
Value-
Animation Position-
Menu Items
Menu Selection-j
-Value In [
Spring;
-Animation Position
Mouse Down
i-Reset Value
-Volts In
-Response
'-Update Rate
Float Out-
[Patch Mem Float Out
Name-
Value-
Animation Position-
[ Menu Items-
Menu Selection-
-Value In
Imaqe2
[-Animation Position
-Filename
(-Hint
-Menu Items
-Menu Selection
-Mouse Down
i Frame Count
[Patch Mem Float Out!
Name-
Value-
; Animation Position-
Menu Items
Menu Selection-
i-Value In
[Patch Mem Float 'jut|
Name-
Value-
| Animation Position-
j Menu Items
| Menu Selection-
i-Value In
lmaae2
Animation Position
-Filename
Hint
Menu Items
-Menu Selection
-Mouse Down
Frame Count-
1
Image2
(Animation Position
(Filename
[-Hint
-Menu Items
-Menu Selection
-Mouse Down
I Frame Count
I
Figure 3.6
Structure of a Single Level Meter


3.3.1 Basic Delay Flow
The delay unit in Hydra consists of two simple components time and feedback.
Figure 3.7 shows the delay unit of the Rear-Facing Cardioids pattern.
.10'Mod'
NormalLs-
NormaIRs-
-RF-Ls-DelayActive
) R F- Rs- D e I ayActi ve
-Spare____________
^Controls on Pareni
I Output-
1 Spare-'
i-Signal In
{Modulation
; Signal Out-
j-Feedbatk
N
V.'
-Controls on Pareni
Output-
Spare-
Deiay2.
Signal In
-Modulation
Signal Out-*
^-Feedback
Figure 3.7 Basic Delay Flow
Each of the surround patterns has a separate delay, and each of the stereo pairs in a
surround pattern has separate delay controls. The exception is the center channel,
which has its own delay settings. Each surround pattern enters the delay section to a
power on/off switch to conserve CPU when delay is not being used.
The delay unit allows for up to 500 milliseconds of delay time. While in general
delay times will remain under 75 milliseconds, larger delay times could be used to
create DSP effects within Hydra. See the section on "Highlighted Features Of The
Plug-in."
The second component to the delay unit is that of feedback. This function determines
how much of the delayed sound should be fed back to the pre-delay buffer and then
delayed by the same time again. In Hydra, feedback is measured in percentage.
3.3.2 Pattern-Specific Delay Controls
As mentioned above, each of the surround patterns have separate delay controls for
each of its stereo pairs. For the OCT-2 pattern, there is a time/feedback control for
49


the left and right stereo pair, and a separate time/feedback control for the center
channel. This allows for a user to make the center channel more or less prominent.
In the case of OCT-2, the center channel is created from an impulse response based
on a microphone positioned 40 centimeters or about 1.1 feet in front of the left/right
microphones. Thus, a one-millisecond delay will place the center channel back in
line with the left/right channels.
As an added function to the delay unit, separate left/right surround back delay
controls are included for those patterns supporting the channels. For IRT Cross,
Hamasaki Square, Rear-Facing Cardioids and Double M/S, patterns that include the
Channel Flip switch, another flip switch exists in the delay structure. If the left/right
surround back flip switch has been engaged in the surround pattern, then the flip
switch in the delay unit should also be engaged. This will hide the delay parameters
for the left/right surround channels and reveal the left/right surround back delay
parameters. This function exists in order to maintain settings for each stereo pair in
the event that the pattern is flipped back to its original state. Only one delay or the
other is active depending on the position of the flip switch.
Each surround pattern that contains a center channel contains a separate delay control
for that single channel. The remaining channels of the pattern are grouped in stereo
pairs.
3.4 Master Peak Limiter Structure
Either the bypassed or active audio signals from the delay unit are sent to the peak
limiter container. The audio signals continue to remain discrete. The peak limiter
section of Hydra was built as a safety precaution. With proper mixing, the peak
limiter should not be needed.
Each of the surround patterns has a dedicated peak limiter that examines the incoming
signal, tailored to the number of channels in a specific pattern, and limits the peak of
any one of the audio channels to a user-defined threshold. Figure 3.8 shows the
various components of the Rear-Facing Cardioids Peak Limiter and should be used as
a visual reference for the following.
The audio signal enters the peak limiter section and immediately passes through a
bypass switch for CPU conservation (Figure 3.8, Item 1). Either the active or
bypassed discrete audio signals are routed out of the peak limiter container. If the
limiter is activated, the discrete audio signals will pass to a multi-channel level
adjustment as shown in Figure 3.8, Item 2. This level adjustment has enough inputs
to accommodate the number of channels for the particular surround pattern. A
50


RF-NcrmalLsLimii
-RF-NormalRsLimitByp ass
Rf-Fllpp*dLst>LimltBypass
RF'Flipp*dRsfeLimitBypass
RF'NormjiLsActtv* Limit
-RF-N o r m a IRsAotiv* Li m it
-RF-FItppadLsbActhre Limit
-RF-F!ipp*dRsbAeti\reLimit
RF-NormalL*-
RF-NormaiRs-
RF-FlippdLsb-
RF-Fllpptdfisb-
-Spat*
Input
RF-FL-Pr*Limlt
RF-FR-LlmiV
i __ Spare Output,
Cheifc* '
,r-lnput
\ RF-FR-PreLlmit;
-j RF-FR-LlmlV
Spar* Output-:
-Ctiolc*
input
RF-RL-PreLimit:
RF-RL-LlmiV
Spare Output-
Umiter Switch pp
-Choi**
Input
RF-RR-PreUmrt'
IRF-RR-Um*'
Spar* Output.
Controls on Parent
Vatu* Out
Chore*-
Spar*
s
Controls on Parent
-Inputf
,rlnput2
-L*wl (dB)
j OutpUttr
Output?-'
Spare-
Controls on Parent
inputt
Input?
L*vl (dB)
Outputs*'
2 0utpgQ-l
Spar*-!
SEFUMKa
Controls on Parent
Output-
Spare-
Q
Falu*
ontrols on Parent
ontrots on Parent--1
Spare-
'155513!?
input 1 Signal
input 2 -Volum* i.
Output Output^
jRspons* Cupref
'Controls on Parent
"^Signal In
VSpare

DEI
Max Out
Input |
Input
Input
,iaiai
"Signal in
Attack
D*cay
SjgjiJIJDiri-
Controls on Parent
Output-
Spare-
O
Figure 3.8 Basic Peak Limiter Structure
Volts In
vVre
dB out'
^Signal In i
fHI Limit L
Lo Limit l
Signal OutL-
^input 1
*4 Input2
^ Output
fS*pnal in j
Signal Out
<4>
^Controls on PsnrJ
InpuH I
*>lnput2
Outputt-|
Outputtj
_8psr*j
Control on Pj
-rlnputl
-flnput2
-H-Lov*? (dB)
... Outputi-i
Outpr,
______ ___Spare
-input 1 Signal Controls on Parent
Input2 -Volum* -'Signal In
Output; Output] -Spar*
Rospons* Cur**:
' '>Stgnal ^.Controls on Parent
Volum* ! ^Signal in i
Output! >Spar* *
"l'Rspons* Cunrej
-fcControlaon Paren^
Spare


pre-gain knob globally adjusts the level of the input signals. In the pre-gain stage, up
to 12 dB of gain can be added or subtracted to the incoming signal.
From the pre-gain output the audio signals pass to a max peak detector, which
instantaneously determines the highest peak of all the channels at that moment and
outputs the value of that peak. At the same time, the output of the pre-gain signal is
summed and passed to a level meter (Figure 3.8, Item 3), which provides a visual
reference for the level of the signal inputted into the peak limiter. Each of the
surround patterns can send anywhere from one to seven audio signals to the peak
limiter. Therefore, a multiplier is used prior to the input level meter to compensate
for the number of summed audio signals. For example, if two audio signals are
summed then the multiplier would be 0.5. If the number of audio signals summed is
three then the multiplier would be 0.333, and so on and so forth. The value of the
multiplier is user-defined and selected via a toggle switch at the base of the peak
limiter channel strip.
The output of the max peak detector is sent to a peak follower module (Figure 3.8,
Item 4). This module provides attack and release functions for the peak limiter.
Since this is a peak limiter, the attack is locked at zero to ensure instantaneous
reaction times. The decay pin is connected to a release knob, which provides up to
2000 milliseconds of decay time. Once the peak and release of the audio signal have
been determined, the signal is converted from voltage into decibels. The calculation
of the peak limiter's reduction value is accomplished more efficiently by making this
conversion.
The decibel output value is sent to input one of a subtraction module. Here input two
is subtracted from input one. A user-controllable threshold knob determines the value
of input two. Now that the peak limiter is functioning in terms of decibels, the
threshold ranges from zero down to -30 dB. The resulting value of the subtraction
module must only contain positive values, as negative values will create errors in the
limiter. Therefore, a clipper module exists to limit the output to positive values.
At this point, the peak limiter input value has been determined, as has the gain
reduction amount. Since the output of the clipper module is now completely positive,
an inverter is used to obtain the subtraction value needed for the final output level
adjustment module. The output of the inverter is distributed to a multichannel level
adjustment module as well as a gain reduction level meter. Figure 3.8, Item 5 shows
this progression.
52


Once the gain of the individual audio signals has been adjusted, the discrete signals
are outputted from the peak limiter container, in addition to the summation of the
audio signals to be sent to another multiplier and level meter, representing the overall
output level of the peak limiter, as seen in Item 6 of Figure 3.8.
3.5 LFE Structure
Prior to the discussion of the final two containers of the Hydra plug-in, it is important
to address two of the non-surround sound pattern specific containers in Hydra LFE
and Main Feed Pass-through.
In multichannel music it is common for an LFE channel to exist, with the exception
of the majority of classical music. The use of an LFE signal in music is greatly
debated. In film, the LFE channel serves to enhance the low-frequency output of
sequences including explosions and other related sounds. In music however, there is
not much need for such extreme low frequencies. In addition, extension of musical
instruments into the LFE signal can result in unwanted or loss of localization of
instruments, as well as the loss of instrument prominence when the LFE signal is left
out of the equation in lesser audio systems. Despite the reasons against the use of the
LFE channel in multichannel music, the ability to add an LFE signal to the output of
Hydra has been included. This is meant more for the creation of surround sound
effects rather than the upmixing of music.
Figure 3.9 shows the basic LFE signal flow. The LFE signal is derived directly from
the original left and right audio signals. No convolution is performed for creation of
the signal. The LFE container is comprised of power switches, EQ and delay. There
is a separate power switch for the activation of the LFE signal, EQ and delay units.
Once activated, the input audio signals are summed and sent to a set of low-pass
filters. There are four low-pass filters set up in a series. Each low-pass module unit
represents a 6 dB per octave slope, which results in a final low-pass filter at 24 dB per
octave. After high-frequency content has been trimmed, the output signal is passed to
a series of two high-pass filters totaling 12 dB per octave. The low-pass filter has a
frequency cutoff of 120Hz. The high-pass filter has a frequency cutoff of 20Hz.
The EQ section of the LFE channel differs from that of the main channels in the
surround sound patterns in that it is a parametric EQ. The slope of the EQ is fixed at
one. However, the frequency band can be swept from 20 to 120Hz. This EQ
provides the ability to create focused changes in the LFE signal, as it is common to
want to trim at 60Hz or boost at 40Hz.
53


L*h-
Rights n->
-ue-ot
'Oft
-8p*f*
Figure 3.9 Basic Flow of LFE Container
The post-EQ signal, or EQ-bypassed signal, is passed to the delay unit. This is a
necessity, as the latency introduced by the convolution of the main audio signals will
cause a significant offset when using the LFE option.
The LFE unit contains dedicated level meter and fader controls. The output signal of
the LFE container bypasses the Master Delay and Master Peak Limiter containers and
goes directly to the Final Mix container.
CboL*
Input
LFE 0;
LFE On-i
Spji* Output]
BS9S9HHS
C*Ntwhon Parang
Valut Out-j
Spar* j

/S !
Input 1
Input 2
Output
/Signal
1 i-Prtoh |
Output; .
Signal
-Pitch !
Output}*
''Signal
r -Pitch
! ^ Output!
.-Centreis on Pattnf
Value Out
rlnput
, SypassEO-1
EQAettv*-
Spat*
.tv
Signal
Pitch
Output]
^Cofttwts on PaaoM
Value Out ]
Sparc
Signal
Pitch
Output-'
Choree
input
LFE&elarSyP-
LFE-DelayActivt-r
Spare Output.
Partly
Value Out-
3.6 Main Feed Pass-through Structure
The other non-surround pattern specific container in Hydra is the Main Feed Pass-
through section (Figure 3.10). This is a fairly simple concept that allows for the input
signals of the Hydra to be directly combined with the convolved outputs of the plug-
in. This section consists of a basic power switch for the activation of the Main Feed
Pass-through as well as a delay unit. Once again, the delay unit is a necessity to
compensate for the lack of latency. A stereo level meter, fader control and mute
button are also present in this section. As with the LFE section, the Main Feed Pass-
through signal bypasses the Master Delay and Peak Limiter containers.
The addition of this section allows the user to strengthen the front channels should the
mix created using the convolved audio signals not prove to contain enough presence.
This section is directly routed to the final front left and right channels. At this time
the signals cannot be combined with any other stereo pair of the final surround sound
mix.
54


IOMod
Leftln*
j RigMtn-
^LtftOut
j-RightOul ,
"{off
(>Muted
(Spare
[LeftMix-Switchl
(Choice |
j-lnput j
Main Off-
-< Main On*!
] Spare Output-!
RiahtMix-Switch
(Choice
*lnpui
Off*
; On-
;______Spare Output*;
(-Controls on Parei
| Value Out-
(-Spare
[Main-Delay
Controls on Parent
Left
Right
Left-Delayed-
Right-Delayed-
-Vafue Out
L-Bypass- J
R-Bypass-
-Spare_____________
[PelayPowerfet
[Controls on Parent
Value Out-
Spare
MuteL
Choice
Input
Left-
Spare-
Spate Output-
[MuteR
(Choice
-Input
Right-
Spa r e-
Spare Output-
(Controls on Parent
(Left In
j-Right In
! Left To Mbc- -
j Right To Mix- ~
-Spare_____________
Mute MainFeede
Controls on Patent
Value Out-
-Spare
Figure 3.10
Main Feed Pass-through Basic Flow
3.7 Final Mixing And Metering Structure
Virtually the last stage of the Hydra plug-in is the complete output mixing stage. The
complete flow of the mixing/metering stage is shown in Figure 3.11. This area
provides the user the ability to make adjustments to the levels of the combined
surround output of each of the up to eight channels. At this point in the plug-in, all
the surround patterns that have been activated and any delays or peak limiters used
have been combined for final level adjustments.
Each of the eight master channels is equipped with a Master Mute button, muting all
signals being output to that designated audio channel. As with the individual
surround patterns, the output of the eight individual channels feeds into a final master
level adjustment, allowing for the global adjustment of the final level relative to the
level adjustments made per channel. This master level control is pre-meter, so all
level adjustment changes will be displayed on the master level meters.
Another feature of the final mixing section is channel fold-down. This option is a
simple switch, which gives the user the ability to monitor the full 7.1 surround output
or a fold-down of 7.1 to 6.1. This is included for those that monitor with a rear center
channel instead of two additional surround channels. A math multiplier is used to
sum the left and right surround back channels and display the adjusted level of what
is now the rear center channel on the left surround back channel strip. No feature
exists at this time to fold-down 7.1 audio channels to 5.1, for at this point it is best to
simply mix with only 5.1 audio channels. Subsequent versions of the plug-in will
include a feature for 7.1 to 5.1 mixdown for monitoring purposes.
55


Lh
ON
Left-
Right-
Center--
LeftS lift
RlghtSure-
-rLeftOut
-%j-Right Out
-^Center Out
7*Ls Out
-Rs Out
Lsfc-
fUtj,
Lsb or RC Out
-Rsb Out
LFB-I
LFE Out
Spare *
E3
Controls on Parent
Value Out-:
-Spare
m........
" .IMSBHK9.V
Controls on Parens
Value Gut*!
-Spare
Master-
-Spare
Controls on Parei
. Value Out-!
'Spare
EL
mamm
Controlson Parent)
Value Out-
Spare
Choice
Input
Inputl-!
Spare-,
Spare Output-
Mute Switch, fc
'Choice
Input
Input-!-
Spare-!
Spare Output-
-Controls on Parang
Level*;
-Spare i
Controls on Parent)
^Inputl
Level (dB) |
Outputl4~~
______ Spare-;
-In L
iln R
i-TIm* (ms)
OutL-
OutR-
Controls on Parent -Controls on Parent
Level-, ^j-O'-Inputt
-Spare
Level (dB)
ianwwqtigi'gi tS)
Controls on Parent Nil!
-Controls on Parent
Ub Out
Rsb Out
Lsb-
Rsb-
RC-;
Spare
Controls on Paren
Value Out
Spare
SESMC3
Controls on Parent\
Value Out-
Spare
:Controls on Patently
j Value Out-1
fSpare
!.....El.......j'
Controls on Parent
Value Out-.
-Spare _____
MSWHI18
^ V ^-Choice
H-Input
Inputl-;
Spare--
Span* Output-;
k 'Choice
' / -Input
Vi\ '* Inputl-!
__________Spare-!
___________ Spare Output-;
Controls on Parent
LeveH
-Spare
Mix Fader f.
Controls on Parent)
Level-;
Spare
-Cbotae
Lsb Out-i
Spare-!
Spare Output-!
Controls on Parent
* Level-]
Spare {
Controls on Parent
-Inputl
Level (dB) i
Outputt-
Spare-;
Controls on Pan
-Inputl
Level (d8)
Output!
Spare-:
Controls on Parent
Inputl
(-Level (dB)
Outout1-
,~j-!n L
4>n R
i-Time(ms)
Out L-k,
Out r-k
(Controls on Parent
'Signal in
- rSpa/e
Signal
Volume ;
Output-;
-Response Curve;
Controls on Parent
^Signal In
Spare______
^Signal
Volume
Output-!-
Response Curve;
paaiapiaai
Controls on Parent j --Signal
Signal In
(Spare
Volume
Output-!
Response Curve;
''(Signal
. -Volume ;
Output-!'
Response Curve!
fControU on Parent j -^Signal
^Signal In t -Volume
^Spare________________[~* I Output-!
\ (Rtsoonse Curve!
Figure 3.11
Mixing And Metering Container


Once level adjustments and surround layout structure have been chosen in the mixing
section, the audio is output from the plug-in to the system's audio device. Hydra has
eight discrete outputs and follows the channel order layout of SMPTE/ITU. In a 5.1
surround setup the output order from the plug-in is Left, Right, Center, LFE, Left
Surround, Right Surround. When using 7.1 the Left and Right Surround Back signals
are output from the seventh and eight channels. In the case of 7.1, the Left and Right
Surround speakers could also be shifted forward to be side surround speakers, as
discussed in the section on surround sound setups. If using the fold-down feature of
Hydra, the seventh channel outputs the signal for the Rear Center speaker.
3.8 Bass Management Structure
An importance in surround sound monitoring is bass management. A bass-managed
system gives the mixing engineer the ability to monitor the final mix as might be
heard by a consumer using a less powerful, smaller speaker rig. In addition to
monitoring purposes, a consumer may find this feature useful if a satellite or home-
theater-in-a-box speaker system is the only system available. Bass management
allows the listener to hear the lower octaves of the main bandwidth channels through
the subwoofer that would otherwise be unheard. This is due to the fact that the
majority of main channel speakers do not possess the ability to extend down to the
lower two octaves of the human hearing, namely 20 to 40Hz and 40 to 80Hz. The
inability to hear this information in full bandwidth speakers can lead to unwanted
noises during bass-managed playback. While professional studios will likely have a
hardware bass management system already in place, a software version has been
included in Hydra for the consumer and semi-professional studios.
The items in Figure 3.12 should be used as reference for this section. As with all
elements in Hydra, the bass management function begins with a bypass switch
(Figure 3.12, Item 1). The switch should be in the bypass mode when exporting
audio from the plug-in to a multi-channel audio file.
First, each of the seven main audio channels is sent through a 12 dB per octave high-
pass filter and sent to the output of the plug-in. The high-pass filter is set at 80Hz. At
this point, Hydra does not allow for a variable high-pass filter setting. At the same
time, the seven main channels are also summed and sent to a 12 dB per octave low-
pass filter set to 80Hz. This is seen in Figure 3.12, Items 2 and 3.
Where the bass management system of Hydra is unique to all other bass management
systems is in its calibration compensation options. Each studio calibrates a surround
system in a different way, usually referencing a specific standard such as
SMPTE/ITU. What changes from one system to another is the manner in which the
57


oo
lO MDd
Left-:
Right-!
Center-
LFE-
Ls-
R s-
Lsb-
Rsb-
^ L-9 M Bypass
R'BM Bypass
'4-C-BMBypass
-4-LFE-BMBypass .
-efLs-BM Bypass
Rs-BMByp ass
Lsb-B M Bypass
^Rsb- B M Byp ass
LFE-Sub-PosiBM
i-L-PostBM
- VR-PostBM
- 1-C-PostBM
* -Ls-PostBM
- -Rs-PostBM
- 1-Lsb-PostBM
-4Rsb-PostBM
Spare __________.1
Choice
rlnput
Spare-! -y
Signal-^"
Spare Output*]
Switch R
Choice
1-Input
.] Spjr-
Signal--
Spare Output-
Choice
-Input
Spare*!1*^
Signal-:"
Spare Output--
Switch Lc
wi \ V"'"-Input j
- ' - Spar*-!
\ j' '-4 SignaPi
v * ' v spare
i. V"

1
Switch Rs
Choice
Input
Spare-j^,
Signal-;
Space Output-}
,N-4fcSZ
Choice I
Input
Spare-*
Signal-
Spare Output-:
Switch Psh
-Choice
'-Input ,
Spare-
Signal-
Spate Output-
%
i Pole HP
fmm
Signal -Signal i
-Pitch : -Pitch ?
Output-1 Output-!
1 Pole hp*i Pole HP
Signal
Pitch \
____Output-!
Signal
-Pitch
Output-,
1 Pole HP Pole HP
l -Signal
-Pitch 1
L Output!
-Signal
-Pitch
Output
1 Pole HPHl Pole HP
Signal ! Signal
-Pitch ' -Pitch
Output-r Output-
H;MiM
_ifSigna!
1 Pitch
A
Output-
Signal
-Pitch
Output-
i Pole hiWi Pole H

Signal
-Pitch
Output-
l Pole HP
Signal
-Pitch
Output-
-Signal
-Pitch
Output-
ngiaas
Signal
-Pitch
Output-
Figure 3.12 Bass Management Container
/
ll Pole LPI 1 Pole LP^^B levelAdi
Signal j Signal j -Controls on Parent!
-Pitch S -Pitch I Mnputl i
Output-] Output-} / Output!-;
Level (d9)
-Spare
lSub-10 11 ifeMSM
1 -Controls on Parent -Choice Fixed Values!
1 Value Out-; Input ; -10-!
Spare | LFE-Sub-BW-i Spare Value-:
Inputl-
s Spare Output- Q
i o
s
Choice
i-lnput
Spare-h
i Input-f
'Spare Outpu
^Signal
Pitch |
j Output-1
1 Pole LPH1 Pole LP
"4$ignal
Pitch
Output-}-
^fSignal I
Pitch
j Output-]-
1-Signal
Pitch
Output-
^SSBSsmmm --mssmm Fixed Values!
Controls on Ptrenl "'Choice [Controls on Paren ] +10-F
Value Out- ~ 'M-Jnput *~'~'£lnput1 Spare Value-;
Spare Spare- | Output!*
o | Input!- ' 4l*I(<18) A
j Spare Output- t-Spcr* A



subwoofer speaker is calibrated. The level set for the subwoofer depends on whether
the only signal sent to the subwoofer is the LFE channel or whether the subwoofer
receives both the LFE signal and a bass managed summation. When the subwoofer is
calibrated for LFE only, then the reference signal is aligned to 89 dB SPL versus a
main channel calibrated at 85 dB SPL. This is 10 dB per one-third octave band
greater than one main channel. This relates to fact that human hearing needs more
level at lower frequencies to sound as equally loud as a full bandwidth channel.
In a bass-managed system the subwoofer is calibrated to 79 dB SPL, which should
measure 85 dB SPL when paired with one bass-managed main channel. The problem
here is that the subwoofer is now calibrated for 79 dB SPL, which is 10 dB SPL too
low when an LFE signal is fed to the same subwoofer. This is corrected in a bass-
managed system by including 10 dB of in-band gain to the LFE channel before it is
summed with the bass-managed subwoofer signal.
In Hydra, there are two gain adjustment options to allow for correct bass management
monitoring no matter in which manner the studio has been calibrated. Should the
studio not have a bass-managed system and calibrated the subwoofer to receive LFE
signal only, meaning the +10 dB of in-band gain for the LFE has already been
accounted for, then the Sub Dip switch should be engaged. See Figure 3.12, Item 3.
This switch will adjust down the gain of the summed bass signal from the main
channels such that when output to the studio subwoofer mixed with the LFE signal,
the overall signal will receive the necessary boost through the calibrated subwoofer's
level. On the opposite end, if the studio has been calibrated for bass management not
factoring in the possibility of the subwoofer receiving an LFE signal, then the LFE
Boost switch should be engaged. See Figure 3.12, Item 4. This switch will apply a
10 dB in-band gain boost to the LFE signal before it is summed with the bass
managed signal. In no case should both the Sub Dip and LFE Boost switches been
engaged at the same time.
Despite the fact the LFE channel is originally created by passing through a 24 dB per
octave low-pass filter, a second of these low-pass filters is included in the bass
management scheme. This low-pass filter is applied prior to the 10 dB boost switch.
The final function in the bass management section is the ability to mute the main
bandwidth channels post-bass management. In this case, the signal to the main
loudspeaker is muted, but the bass-managed portion of the channel is still heard
through the subwoofer. Therefore, by muting all the main bandwidth channels post-
bass management, the summed bass signal can still be heard through the subwoofer.
59


4. Graphical User Interface Construction
The previous chapter discusses the signal flow and elements of which Hydra is
comprised. This chapter covers the construction of the graphical user interface
(GUI). While programming the DSP elements of Hydra, the GUI is created
simultaneously. For each DSP element that requires user interactivity, such as knobs
for the EQ functions, it is required that the graphical element be inserted at that time.
This is generally only the initial user GUI. The final layout and adjustments to the
images are completed once all the DSP elements needed have been created. This
allows for a landscape of the plug-in to be created upon which the individual GUI
elements can be placed. If completion of the DSP elements is not done first it can be
difficult to find screen space to add more user controls later.
4.1 Primary Screens Edit And Mix
Hydra is designed with two primary user screens Edit and Mix, and a third area -
LFE/Main, which remains the same no matter the chosen screen. These screens can
be seen in Figures 4.1 and 4.2. By default the Edit screen is shown upon loading of
the plug-in. This screen contains the header section, where elements needed to
activate the various surround patterns are located, the EQ section, and level adjust
section for individual surround pattern channels. The Mix screen contains the GUI
for the peak limiters, delay units and final mixing, metering and bass management
options. User selection of the screen view is accomplished by toggling the switch
located in the Screen Mode section of the LFE/Main window.
4.2 Multi-Layers
Each screen view has multiple layers. By adding multiple screen views and multiple
layers to each screen, it is possible to neatly arrange all the GUI elements needed for
Hydra operation without the need for multiple floating windows. Hydra uses six
surround patterns. However, the initial Edit screen only shows four patterns Double
M/S, IRT Cross, OCT-2 and Rear-Facing Cardioids. Since the Double M/S and
Modified Decca Tree patterns and IRT Cross and Hamasaki Square patterns each
share the same channel layout, they have been layered. A drop-down menu is
provided in the header of the appropriate surround pattern section to allow the user to
select which pattern to have visible on the screen as highlighted in Figure 4.3.
Changing from one layer to another does not inactivate the non-shown layer. Thus,
both the IRT Cross and Hamasaki Square can be activated at the same time, but only
60


a a >w/
EO Oetcy 0,0 dB 0 0 dB ,0.0 ms
Figure 4.1 Edit Screen With LFE/Main


Peak Limiters
Delays
ON
K>
LFE Support Master Level
Power Level EQ Delay * QMS
Center Freq Feedback
0 e
& ir 20 Hz 0 H i
LFE B , i , * ;.
a -m- * u
EQ Delay 0.0 dB 00 d8 00 ms 00 dB
Figure 4.2 Mix Screen With LFE/Main
~n TTn rr 0j i f*". Tr r j i n
Bass Management


one surround pattern's settings can be viewed at any one time. In the Double M/S
pattern, which is only five audio channels, the sixth and seventh channels strips are
used when the Channel Flip option has been activated. Modified Decca Tree uses all
seven of the channel strips simultaneously.
Figure 4.3 Pattern Select Drop-Down
The same multi-layer structure for the surround patterns exists on the delay section of
the Mix screen. A drop-down is present in the header section providing surround
sound pattern delay viewing options. Each of the six surround patterns is present in
the drop-down. Selection of any one of the patterns displays the delay controls
available for that pattern below. Like with the activation of the surround patterns on
the Edit screen, the delays that have been activated remain enabled whether or not
they are shown on the Mix screen.
4.3 Secondary Screen LFE/Main
The final GUI screen available to users is the LFE/Main section, which can be seen in
the bottom 25 percent of both Figures 4.1 and 4.2. This screen remains at the bottom
of the plug-in whether on the Edit or Mix screens. This section contains elements that
may be needed despite the screen mode position. The middle of the LFE/Main
section contains the title card for the plug-in. On the left hand side is the location of
the GUI elements needed for control of the LFE channel. To right of that section is
the Master Level Control area. This area contains a left/right scrolling function,
allowing access to each of the master level controls for the surround patterns. This is
useful in the bottom section as it allows all final mixing to be performed on just the
Mix screen, without the need to flip back to the Edit window. Flipping the screen
should only be necessary to make last minute EQ or single channel level adjustments.
Aside from individual surround pattern master level controls, this section also
features a master level control for the final mix output.
63


To the right of the title section is located the Screen Mode toggle switch. To the right
of that section is the Primary Bypass section as well as the controls needed to enable
the Main Feed Pass-through function of Hydra.
64


5. Highlighted Features Of Hydra
Now that the fundamental architecture and GUI layout have been discussed, some of
the features of Hydra can be more easily understood. The complete user manual for
Hydra can be found in the Appendix. However, there are few functions of the plug-in
that are highlighted here. Hydra is a versatile plug-in combining convolution with a
complete digital audio workstation. These highlighted features are simply a starting
point for Hydra's functionality.
5.1 Blending
Blending is a function by which the user may combine multiple surround sound
patterns, eliminate individual channels and create virtual surround patterns. While
basic testing of the plug-in indicates that no more than three surround patterns should
be active at any given time, computers with above average CPU and RAM should be
able to activate more than three.
Since not all six of the surround patterns contain the same number or layout of audio
channels, it is an important feature that Hydra is able to combine surround patterns.
The Modified Decca Tree is the only complete seven-channel surround pattern.
However, combining Double M/S with Rear-Facing Cardioids creates a seven-
channel layout as well. Either of the patterns can flip the left/right surround channels
to the left/right surround back channels. Combining OCT-2 and Rear-Facing
Cardioids provides a five-channel surround layout.
The combination options are not limited to whichever pairs of surround patterns
provide an exact five or seven-channel layout. For example, by adding the IRT Cross
to the OCT-2 and Rear-Facing Cardioids patterns, nine channels are active, with two
different patterns feeding the front left and right channels. By simply muting one of
the pattern's left and right feeds, a seven-layout has been created. While it is advised
to only have one surround pattern feeding any one of the audio channels, multiple
surround patterns can feed the same channels. In this case it is important to use the
delay components to time align the respective audio channels of one pattern with the
other. For example, if using both the IRT Cross and OCT-2 patterns to feed the front
left and right channels, then a delay should be introduced to time align the channels of
those two patterns. In the current version of Hydra this is not an issue as the impulse
response recordings were made with relatively the same distances for each pattern.
Should the plug-in incorporate third-party impulse responses, it is likely the delay
65


would be needed. Information on the distances used in the original impulse response
recording can always be found in the header section of the respective surround pattern
by clicking on the info button.
By combining different existing surround patterns, new virtual patterns are created by
the user. For example, the blend of a surround channel with one based on a
hypercardioid impulse response with one based on a figure-8 response would create a
wide cardioid polar pattern, in which the hypercardioid pattern adds information lost
by the null point of the figure-8 microphone. Options for creation of new surround
patterns is seemingly limited only to the imagination. A future upgrade to the plug-in
will include the generation of a polar plot for the final user output based on the
surround pattern configurations chosen.
5.2 Delay DSP Effect
Hydra does not contain a reverb unit. This decision was made in order to conserve
CPU so more surround pattern convolution could take place at one time. However,
an interesting DSP effect could be implemented by adjusting the delay units of the
various surround patterns.
A home theater receiver generally has DSP presets, such as Hall, Theater, Concert.
Each of these presets processes the audio in a way that resembles the location chosen
when the audio is played back. This is a real-time effect as it can by done on the fly.
With Hydra, adjusting the delay components of the surround patterns introduces this
type of DSP effect. The higher the delay value of the surround channels the larger the
sound becomes, like it is being played through a concert hall. Of course, as the delay
time of the surround channels increases, the front channels become less prominent.
However, for an enveloping surround experience, this is an effect that could be
desirable.
5.3 Mixing
The mixing sections of Hydra have been designed to allow the greatest flexibility
possible, attempting to account for various mixing styles. It would be all too simple
to program Hydra to convolve the surround pattern and provide one fader for level
control. However, this would not allow for any fine-tuning of the surround pattern. It
is more than likely the impulse response files are not going to be of the precise level
needed for balance of each of the main channels in a surround pattern. This depends
on the care taken during the recording, but more importantly with the strength of
reflections on one side of a room versus another. The impulse response files may
have the same peak level, however, the presence of one side of the room may be more
66


powerful than the other. With Hydra, this can be corrected. Individual channel level
controls are provided for all of the surround patterns. Presence can be corrected
through the included EQ section for each of the channels.
Once a user is content with the mix created, it would be tedious to go back and adjust
the level of seven faders just to raise or lower the overall gain of the surround pattern
by a few decibels. This is where the master level control section is useful.
Originally, the design of Hydra included a fader link function, which when enabled
would allow the user to adjust all the faders of a surround pattern by simply adjusting
one fader, with the values of all faders in a surround pattern remaining relative to one
another. However, due to limitations in the Synthedit program, this was not possible.
To remedy this, the master control section was created. While it uses a fraction more
CPU power to have another set of level adjustment modules, the functionality of
having single fader level control over an entire surround pattern is of greater value.
Since this is a not a plug-in designed for live recording, all level meter displays are
post processing in Hydra. This means that any changes to the level or frequency of
an audio signal will be displayed on the level meter. Essentially, the level meters are
post-EQ, post-Mute and post-Fader. This is an important feature as it accurately,
within 0.001 dBFS, informs the user of digital clipping at any stage of the editing and
mixing process.
67


6. Setup And Testing
This chapter discusses the hardware and software requirements of Hydra, as well as
discoveries made during the testing/listening phase. More details on the complete
setup necessary for this program are included in the user manual in the Appendix.
6.1 Software
Hydra was designed and created using Synthedit. This is a Windows operating
system program and as such the compiled version of Hydra is a Windows-only VST
audio plug-in. On the Windows platform, VST plug-ins are represented by the .dll
file format.
To provide greater flexibility for use by consumer and professionals, Hydra is
designed to function almost like a standalone program. All that is missing from
Hydra is an audio in/out parameter. Because of this missing feature, Hydra will only
work as a plug-in. This plug-in is supported by digital audio workstations that
support VST plug-ins, such as Cubase and Nuendo. Despite the availability of the
FXpansion VST to RTAS Wrapper tool for conversion of VST plug-ins to an Avid Pro
Tools compatible RTAS plug-in, Hydra will not function in Pro Tools. This is due to
an incompatibility with the number of input and output channels of Hydra. Pro Tools
only reads Hydra as a stereo effect plug-in. Work on fixing this bug is currently in
progress and should be available for the next version of Hydra.
Even though Hydra does work in digital audio workstations, the plug-in is designed
to interact with an open architecture audio program. An open architecture audio
program means the user is given control over the components to link together
between the input and output of the connected audio device. In this case, Plogue
Bidule is the most efficient open architecture program. The reason the plug-in is
designed for use with this type of audio program is due to necessary CPU usage.
Convolution requires much DSP processing. A digital audio workstation also
requires a large amount of computer CPU and memory to operate. Combining the
two could cause instability in non-high-performance computer systems.
With Plogue Bidule, the program itself occupies virtually no CPU power. Setup of
Plogue Bidule is quite simple. The user defines the input and output hardware device.
See the next section on hardware requirements. Depending on what type of audio is
being passed to Hydra an input audio hardware device may not be necessary. It is
68


only required if passing audio to Hydra from a source external from the computer.
Most often pre-recorded material would be sent to Hydra from the hard drive. In this
case, Plogue Bidule is equipped with a two-channel audio file player. This program
functions a bit like Synthedit, in that pins and connections are used to send signal
from one component to another. Here, the two output pins of the audio file player
should connect to the two input pins of the Hydra plug-in left to left, right to right.
After the Hydra plug-in, an audio output device needs to be selected. The eight
output pins of Hydra should be connected to the eight input pins of the chosen output
device. The order in which the pins are connected depends on the specific wiring of
the studio using the plug-in. The outputs of Hydra are predetermined based on the
SMPTE/ITU channel order for 7.1, thus the connections should matching
accordingly.
At this point, Plogue Bidule is nearly ready for playback. See the user manual for
advanced preference settings that should and should not be enabled for smooth audio
playback. Once an audio file has been loaded into the player, the Hydra window can
be opened and use of the plug-in can commence.
Output from Hydra is real-time, allowing for simple music listening. However, an
audio file recorder can be added to the program and connected to the output of Hydra.
This will write the output of the plug-in to a multi-channel audio file or multiple
mono audio files.
6.2 Hardware
Usage of this plug-in currently requires more of a professional collection of
equipment. However, as discussed below, it is currently possible for an average
consumer to use a described personal computer to effectively run Hydra. As the
plug-in completes all processing through the CPU, the only external requirement of
the hardware setup is eight discrete channels for 7.1 monitoring, six for a 5.1-only,
five for 4.1, and so on and so forth.
There are numerous hardware devices available on the market capable of providing
the eight discrete outputs. Testing of Hydra was done using three different hardware
units, each with a different communication protocol with the Windows platform.18
18
A "communication protocol" is how an audio device communicates with the
operating system. In Windows, multiple protocols exist. The most common types are
ASIO, MME and WDM, each requiring different drivers from the soundcard
manufacturer.
69


Explanation of each of the protocols is beyond the scope of this project, but
information on the differences and requirements of the protocols is widely available.
The first tested device is an M-Audio ProFire Lightbridge 34-in/36-out. This device
uses AD AT Lightpipe connections to feed banks of eight audio channels to and from
the device. The M-Audio device is paired with a Behringer ADA8000 eight-channel
analog/digital and digital/analog converter. This hardware setup is capable of up to
eight channels of 24-bit, 48kHz audio. The protocol used in Plogue Bidule with the
M-Audio hardware is ASIO.
The second tested device is a MOTU 24i/o with PCIe communication card. This
device is capable of 24 analog inputs and 24 analog outputs. Communication with
the computer and A/D/A processing is done through the PCIe card, which is
connected to the 24i/o device via firewire. This device is capable of 24 channels of
audio at 24-bit, 96kHz. Given that the impulse responses were done in 48kHz the
24i/o device was locked to that sample rate. The protocol in this case is the MME
driver.
As mentioned earlier, professional audio equipment is not a necessity for the use of
Hydra. The requirement for consumer use is a computer motherboard with either 5.1
or 7.1 onboard audio output. This testing used a Gigabyte GA-X58A-UD3R
motherboard with 7.1 audio output onboard. Motherboards with this feature require
connection using mini 1/8-inch stereo plug connectors. A 7.1 computer speaker
system capable of accepting these mini plugs is required, or in this case the mini
plugs were connected with y-cables and sent unbalanced to a surround controller.
Communication with this device through Windows is accomplished using built-in
motherboard drivers, either MME or WDM.
Each of the above devices require tweaks in the preferences of Plogue Bidule, but all
devices work in transmitting the stereo to 7.1 upmixed content to a surround speaker
setup.
6.2.1 Computer Requirements
The computer necessary to run Hydra needs to be of average performance quality or
better. There are too many CPU types to test all of them. However, Hydra best
performs on an Intel processor given the use of Intel Performance Primitives in the
convolution algorithmic module of the plug-in. While AMD Athlon processors
should work, no tests using those processors has been performed at this time. A
typical computer needed to run Hydra should contain a minimum of an Intel Core 2
Duo processor, with an Intel Quad core CPU recommended. The minimum speed of
70


the processor varies depending on other hardware components such as onboard or
external audio hardware, but should not fall below 2GHz, with 2.8GHZ or higher
recommended.
Almost more important than the CPU type and speed is the amount of physical
memory installed in the computer. A minimum of six gigabytes of memory is needed
to efficiently run Hydra as the plug-in was not written natively and does not contain
memory allocation checks. For example, while the plug-in works flawlessly for
several hours, eventually glitches occur in the playback due to poor memory
management. If this occurs, a simply restart of the program will empty the memory.
While two gigabytes is the absolute minimum, eight or more is recommended for
advanced usage of the plug-in. This test was performed on systems with two and 12
gigabytes.
The only other computer requirement is the installation of a stable Windows
operating system, running Windows XP SP2 or higher. Hydra works with both 32-bit
and 64-bit operating systems. Testing was performed using Windows XP SP2 32-bit
and Windows 7 64-bit.
The two machines used in this test were: 1) a Macbook Pro Core 2 Duo at 2.33GHz
with 2GB of RAM and 2) a custom-built desktop with an Intel i7 Quad at 2.8GHZ
with 12GB of RAM.
6.3 Testing
Testing of Hydra was done over the course of its creation. Each element of the plug-
in was tested and confirmed working as it was developed. Therefore, all elements
discussed throughout the documentation are known to be working. This section will
focus on testing the outcome of the different functions built into Hydra. The primary
focus will be on the variations of the impulse response recordings and the initial
setup, along with the Width and LFE functions.
6.3.1 Impulse Response Testing
As mentioned in sections 2.3 and 2.3.1 on editing of the impulse response files, the
files were altered in various ways to discover the best output possible from Hydra. It
wasn't until Hydra was finished and operational that those alterations could be
thoroughly tested. Testing of impulse response files is divided into two areas,
original speaker orientation (straight vs. toe) and impulse response level adjustments.
71


The effect of the straight and toed-in speaker positions on the resulting sound in
Hydra is noticeable, but does not yield as strong of a change in audio image as
expected. Theoretically, the toed-in speaker position should result in a dominant
center image, with the left/right side signals presenting a more ambient image. While
this is the case, the effect is not pronounced enough to necessitate an entire separate
set of impulse responses. The reason for the lack of a pronounced change in image
could be any number of things. However, there are two most plausible explanations.
The first is the off-axis pickup pattern of the microphones used in the impulse
response recording process. Should the microphones provide a strong presence off-
axis then that would account for the only slight shift in image change. The other
explanation could be due to the shape and reflections of the room. Strong side
reflections in the room may leave the left/right microphones with a signal that
matches the presence of the center image.
Regardless of the size of the image shift, pyschoacoustically, the set of impulse
responses recorded with the speakers in the straight position are more pleasing to the
ear.
Of the two issues presented here, speaker position versus normalization alterations,
the normalization of the impulse response has the greatest impact on the output from
Hydra. As a reminder, the impulse responses were left unaltered, normalized relative
to a single peak value of -15 dBFS, normalized relative to -15 dBFS in respect to
left/right balance, and normalized in terms of the RMS value in respect to left/right
balance in which the peak value does not exceed -15 dBFS.
Surprisingly, none of the normalized alterations resulted in a dramatic signal to noise
ratio issue. All normalized sets are useable without the presence of strong noise
levels. The unaltered and peak normalized impulse response sets are very similar,
with only a slight increase in loudness for the latter set. The balanced peak
normalized set brought some changes to the forefront. The sound image created by
altering the peak in relation to left and right balances, disturbed the coherency of the
soundfields. Instead of the typical front and rear soundfields in a surround sound
setup, the balanced peak normalized set created three soundfields which ran front to
back in three strips, left, center and right. Balancing the levels of left-to-left and
right-to-right increased the perceived loudness of the stereo left and right soundfields.
However, since the peak values of left-to-right and right-to-left were also matched,
this brought the dominant sound image to the center soundfield. This places the user
in an incoherent sound image that psychologically brings on a feeling claustrophobia.
The RMS altered set resulted in the same pyschoacoustic phenomenon as the
balanced peak normalized set. The change in the RMS set lies in the loudness of the
72


output signal. Since the values were adjusted on the RMS scale, the output from this
set results in the overall loudest sound.
After thorough testing of all the impulse response patterns, weighing the speaker
positions and altered levels against the complete functionality of Hydra, the unaltered
set of impulse responses yielded the best output from Hydra. Using the unaltered set
required only minimal boost in output levels to match the stereo listening reference
level. In addition, the unaltered set yielded a smoother output through the EQ and
peak limiter components of the plug-in.
6.3.2 Width And LFE Functions Testing
The two most important functions of Hydra for changing and strengthening the sound
image are the Width and LFE functions. The Width function provides the change in
the sound image expected from the difference in speaker position. The LFE addition
helps to improve the low end of the output.
When in its detente position, the Width switch indicates the Mix mode. In this mode,
the left-to-left and right-to-left impulse response signals are convolved and summed
together to created the left speaker feed. When flipped to the up position, the Width
function is in Wide mode. In this mode, only the impulse response of left-to-left
feeds the left speaker signal. The mix mode results in a narrowing of the front
soundfield image. The dominant signal is pulled to the center channel. Muting of the
dedicated center channel leaves a strong phantom image placed in the center created
by the left and right speaker feeds. This is not a desired result in typical surround
sound creation. In addition to the narrowing of the front sound image, the mix
function yields a diffuse rear soundfield. While the narrowing of the front soundfield
is not ideal, the diffuse rear soundfield could be a wanted effect in certain types of
surround processing. For example, creating an upmix of a pop song for a film may
use the original stereo recording for the front left/right channels, in which case it
would benefit from a less directional rear soundfield signal. Where both the
narrowing of the front and diffusion of the rear could be of use would be in the upmix
of a sound effect to surround sound. In this case the sound effect would remain
largely mono in the screen channels and at the same time contain an enveloping
signal in the rear channels.
Conversely, when the Width switch in the Wide mode, the sound image created fits
the term, "sound all around." In this situation, the stereo fields in both the front and
rear remain spaced, yielding a large overall soundfield. Sounds are more directional
from each of the loudspeakers in the system. Due to the proximity of the microphone
setups during the impulse recording process, the soundfield remains dominant up
73


front, giving the listener an anchor. However, as mentioned earlier, the delay
component of the plug-in can be used to further widen the soundfield, resulting in the
lack of a listening anchor, instead placing the listener within a circle of "sound all
around." Use of the delay DSP effect works best with the Wide mode. When used
with the Mix mode, the front sound image is nearly completely overwhelmed by the
rear soundfield.
The drawback to the Wide mode is it results in more of a quadraphonic sound. The
center of the soundfield is largely empty. Therefore, it is important for the center
channel to receive more gain to fill in the stereo separation, as well as EQ to match
the presence. In quadraphonic mixes, typically there is a soundfield generated
between the left and left surround speakers and the right and right surround speakers.
Modem surround sound mixes have rotated the quadraphonic image to be front and
rear instead of left and right.
Use of the unaltered impulse responses in the Wide mode yielded the most coherent
soundfield during testing. This is excellent for the primary bandwidth channels.
However, given the frequency response and pickup patterns of the microphones used
in the recording process, music recordings put through Hydra lack a solid low-
frequency output. Therefore, the LFE function of Hydra is actually of use during
music upmixing. Since the LFE signal represents a frequency range of 20 to 120Hz,
mixing a small amount of LFE signal could help to fill out the bottom frequencies of
the bass drum and bass in music listening. However, given the latency of the
convolved signal, the LFE output must be delayed by 17 milliseconds. This is easily
accomplished by activating the delay component in the LFE section of Hydra.
All the components of Hydra are fully functional. The impulse response files work as
planned. However, the manipulation of the surround patterns and the different
functions of Hydra are left to the end user. By selecting different impulse response
sets, varying the Width function, incorporating delays and changing level balances,
any number of different soundfield environments can be created.
74


7.
Uses For Hydra
Throughout the documentation potential uses for Hydra have been mentioned.
However, there are two primary reasons for the creation of this plug-in professional
cinema post-production and consumer encouragement.
On the professional side, Hydra is designed to provide post-production sound
designers with a more flexible tool to upmix music and sound effects for use in a
multichannel surround film or television show. Expensive upmixing systems do
exist, but for the independent post-production house, Hydra may offer a more
economic means of creating surround material. As already discussed in the testing
section, by manipulating the functions of Hydra, music upmixes can be created that
are tailored to the surround sound design of films. Additionally, mono and stereo
sound effects can be expanded into multichannel surround to force a large impact.
The ability to integrate Hydra with professional digital audio workstations such as
Nuendo and Cubase allows the plug-in to provide an integrated solution to surround
sound upmixing for industry professionals.
Despite its professional applications, Hydra was truly created as a piece of
educational software. It was initially designed to influence the consumer to move to a
surround sound setup by providing the user with an interactive system. As future
technology continues to move toward interactivity, such as interactive menus and
commercials, Hydra's design provides this functionality as well. In its current form
Hydra functions mainly as a plug-in for real-time music upconversion. This relies on
it being installed as a third-party plug-in into a music software program. Such music
library programs are limited at this time. For the computer audio hobbyist, Hydra is
easily integrated on the Windows platform. For layman consumers, Hydra would
need to be integrated with a popular music player such as Winamp or iTunes.
While in its current state Hydra is a tool for real-time music upmixing and post-
production multichannel manipulation, the design features of Hydra makes it prime
for future integration with other hardware components. As mentioned in section 1.6,
the media center PC is growing in popularity. Integrating Hydra into the output stage
of the media center PC will provide hobbyists and consumers with an interactive tool
to control the surround sound experience. While this will likely excite hobbyists with
advanced knowledge of computer audio integration, it may be overwhelming for the
typical consumer. For those users it is possible to adapt Hydra into a Lite version that
removes the advanced mixing options and provides ready made presets for activation.
75


Using the plug-in in this way still gives the user the effect of Hydra and can
encourage the user to use the full mixing and manipulation power that Hydra was
designed to perform.
Aside from the media center PC, Hydra still has use in other hardware systems. With
manipulation and efficiency coding, Hydra could be printed to a chip, allowing for its
integration in home theater receivers and set-top boxes. This still allows for user
interactivity via hardware menus. However, it also allows for simple DSP presets
built within the hardware. As with current home theater receivers, Hydra can be
integrated as a DSP environment effect or integrated as a new surround decoding
mode, like that of Dolby ProLogic and DTS Neo. Placement of a Hydra chip in set-
top boxes removes the interactivity of the plug-in, but gives the layman consumer an
easy surround output from their cable or satellite receiver.
To demonstrate how Hydra could work set up through a media center PC, receiver or
set-top box, a real situation will be conveyed. Digital broadcasting has enabled
multichannel audio to be packed in various lossy bitstreams. Dolby Stereo, Dolby
Digital and Dolby ProLogic have already been discussed in chapter one. Taking this
a step further, broadcasting companies send information to the user in a Dolby
format. Typically, the audio stream is two channels, left and right, and is flagged in
the metadata of the bitstream as either Dolby encoded or not. If Dolby encoded, then
the stereo audio stream will be automatically decoded by a ProLogic system should
one be in place. If it is not Dolby encoded then the stream is played back as stereo
unless a surround-decoding scheme such as THX is selected.
What has been noticed of late is that many broadcasting stations in high-definition are
airing two-channel content encoded into a Dolby Digital 5.1 bitstream. Therefore, a
decoding system decodes six channels, though only two exist. By transmitting the
audio data in this form it removes the option of playing back the content as anything
but two channels.
For example, a network station aired Superman: The Movie, the first modem film to
use stereo left and right surround channels, in this forced, two-channel Dolby Digital
5.1 format. By using Hydra, the stereo analog channels were routed into the
computer and sent to the Hydra plug-in. Hydra was easily able to expand the live
feed to surround sound. As a reminder, Hydra was not developed for live upmixing
when sync is involved, therefore, the latency parameters of the plug-in would need
adjusting in order to maintain relative audio/video sync. Should Hydra be built into
to a hardware unit, a video delay processor in the hardware could easily keep the sync
locked. Since Hydra processes audio in data blocks, the latency of the audio is
consistent.
76


The more Hydra is integrated into systems, the greater the number of uses for its
surround sound upconversion and mixing features can be found.
7.1 Next Steps
There are several advancements that can be made after the initial completion of
Hydra, both to the plug-in itself and its use as a guide to somewhere new. User
feedback would lead to more general fixes and tweaks to the user interface and
functionality of the plug-in. Currently in development is a feature for altering the
output channel order so that it can differ from SMPTE/ITU. This includes output
orders for the channel layouts of Dolby/Film, various DTS mastering formats and
AAC. Expansion of the bass management options is also underway, which includes a
selectable high and low-pass setting and more advanced mute functions. A feature
that is noticeably lacking in Hydra is that of soloing. Soloing of individual channel
strips and individual surround patterns as a whole is a complicated task in Synthedit,
surprisingly requiring much CPU power through the incorporation of numerous
switches. While this feature is planned for a future release, it should not be expected
in the near future.
Aside from changes in the features in Hydra, the next important step in its
development is cross-platform compatibility. In the industry, Apple is the leader
when it comes to hardware systems for multimedia editing. As such, it is important
that Hydra be compatible with the OS X operating platform. While it would be of
use to extreme computer enthusiasts, a Linux version of Hydra is not planned at this
time.
Creating Hydra on the Java platform would create a more efficient plug-in and
provide one-step cross-platform integration. Once compatible with the Mac OS X
platform, Hydra is much more easily integrated than on the Windows platform. Use
of the Soundflower program for the Mac will enable Hydra to run as a background
process. Soundflower simply allows the user to route the audio output of one
program to the input of another program before outputting the audio to the hardware.
Use of this program would easily allow for a user's iTunes music library to be routed
to Hydra for multi-channel upconversion, one of the primary goals of the plug-in.
Beyond Hydra, lies the potential for other plug-ins to be developed based on different
surround sound upconversion principles. Convolution is not the most practical in
terms of efficiency and therefore plug-ins developed using non-FFT based algorithms
could provide a more stable and enhanced surround sound experience.
77


By providing Hydra's interactivity to computer users, it is the hope of this author that
surround sound setups and usage in the home will grow and evolve, with the
expectation that professional sound designers will be encouraged to expand their
work into multichannel surround sound beyond the complacent 5.1 standard.
78


APPENDIX A
User Manual
Hydra vl.l
Stereo2Surround Conversion Suite
USER MANUAL
79


Contents
Welcome to Hydra vl. 1.....................................................81
System Requirements........................................................82
Installing/Uninstalling....................................................82
Optimizing Windows.........................................................83
Configuring Your System....................................................84
Hydra GUI Components and Features..........................................85
Screen 1 Edit.....................................................85
Screen 2 Mix......................................................91
Screen 3 LFE/Main.................................................99
Interfacing with Plogue Bidule............................................105
Tips & Recommendations....................................................108
80


Welcome to Hydra vl.l
Congratulations on your purchase of Hydra vl.l! This user manual will cover
everything you need to know to get started with the plug-in and create your own
stereo2surround conversions.
What is Hydra?
Hydra is a convolution-based VST plug-in that upconverts stereo audio content to a
maximum 7.1 surround sound layout.
Does Hydra do more than just upconvert?
Yes. Aside from performing an FFT conversion of the audio, Hydra functions as an
interactive digital audio workstation. This means you have the ability to manipulate
each channel of the converted audio. This includes level adjustments, EQ, peak
limiting, delay processing and more.
Why do I need Hydra?
Hydra offers you the ability to create your own mixes, giving you full control over
your music listening experience. If you are a professional sound designer, Hydra
provides you with the ability to create surround content for use in your projects.
Regardless of your level of expertise, Hydra gives you a way to fully utilize your
surround system with stereo audio content.
How does Hydra interface with my system?
Hydra is designed to work within any digital audio program that natively supports
VST plug-ins, including Nuendo and Cubase. In addition, open-architecture audio
programs, such as Plogue Bidule support the usage of Hydra. Plogue Bidule is the
recommended program for usage of this plug-in.
81


System Requirements
Dual 2GHz CPU (Quad 2.8GHz or higher recommended)
2GB of RAM (6 or more recommended)
Windows XP SP2 or higher, 32 or 64-bit (Windows 7 recommended)
Up to 8 discrete audio outputs
The number of discrete audio outputs from your system determines which surround
sound speaker layout to use. If using 6 outputs then up to 5.1 can be heard. If 8
outputs are available then a full 7.1 layout can be utilized.
Installing/Uninstalling
Make sure you are logged in as an administrator when installing and running this
program.
To Install:
Hydra comes in an archived .zip file format. Extract the one folder and one file from
the archive to somewhere on your computer hard drive. For simplicity, we
recommend placing the contents on your Desktop temporarily.
Once the folder has been extracted, move the entire folder to the root directory of the
C:\ hard drive. Future versions of this plug-in will be available as a self-extracting
executable. Relative paths will also be introduced in the next version of Hydra.
The file extracted from the archive, Hydra vl 1 .dll, should be moved to your default
VST plug-in directory. Typically this is located at C:\Program
Files\Steinberg\Vstplugins. There is no need to place the file in a subfolder, as this
will be done automatically when your audio program first initializes the plug-in.
To Uninstall:
Simply delete the folder from your root directory and the file from your VST Plug-ins
directory. If an audio program has initialized the plug-in, the .dll file may also have a
subfolder of the same name in the VST Plug-ins directory. This can be deleted as
well. The Hydra plug-in does not result in any other system files being manipulating,
nor does it place extraneous temporary files on your hard drive.
82


Optimizing Windows
Hydra uses a great deal of CPU to perform the FFT convolutions. Therefore,
optimizing Windows is a highly recommended task should you not have an above
average computer system. Make sure you are logged on as an Administrator to
perform system changes.
Recommended Setting Changes:
Change the User Account Control (UAC) setting to "Never." This stops a
notification process from running.
Hard Drive Sleep Timer: Change the power settings of Windows to Never.
Never putting the hard drive to sleep will keep the audio transferring
smoothly.
Startup Items: Upon startup Windows loads a great deal of tasks, which
constantly run in the background. Disabling unnecessary system tasks is a
great way to free CPU and improve your overall system performance. Look
for items such as Task Scheduler, Event Scheduler, and Notification handling
processes.
Turn Off Unnecessary Software Tasks: Having software programs running in
the background can hinder the performance of your system. Shut down all
software programs that do not need to be running during operation of the
Hydra plug-in. This also includes software items that were automatically
loaded upon startup of Windows. Such items generally include Adobe
products and Anti-Virus programs.
83


Configuring Your System
Once you have optimized the performance of Windows, your audio system must be
configured to properly output surround sound from your audio program. This is
tailored specifically to your audio hardware. See you manufacturer's instructions for
interfacing your audio hardware with your computer.
For those using onboard audio outputs, a Y-cable should be used to split the stereo
audio signals and then routed to the appropriate speaker locations. For those using an
external audio hardware unit, a mixer or surround controller is necessary to route the
audio to your speakers.
The output of Hydra follows a SMPTE/ITU channel layout for 5.1 and the most
commonly accepted speaker layout for 7.1 audio output. The channel order from
Hydra is L R C LFE Ls Rs for a 5.1 layout. For 7.1, the output order is L R C LFE
Lss Rss Lsr Rsr. For those not using side speakers in the 7.1 speaker layout, the Lss
and Rss output channels correspond to the normal surround speaker positions at 110
degrees. The Lsr and Rsr channels are for the rear surround speakers. If using a 6.1
speaker layout the channel output order is L R C LFE Ls Rs Rc.
84


Hydra GUI Components and Features
Hydra offers you the ability to create your own surround sound upmixes using a
combination of different techniques. The plug-in is based on six surround sound
patterns typically used in the recording process. Impulse response files are used to
convolve an audio signal into what it would sound like in a particular room through
the surround sound microphone configuration pattern. The surround patterns
correspond directly to speaker locations once the audio has been convolved.
This section will discuss the surround sound patterns used in Hydra and how to
implement the features the plug-in offers in association with those patterns.
Hydra has two fundamental screens, with a third area that is static no matter which
screen you happen to be viewing at the time. The third area is located at the bottom
of the two primary screens, occupying about 25 percent of the space.
The three screens are labeled as Edit, Mix/Effects and LFE/Main.
Primary Screen One Edit
The default screen of the plug-in is the Edit window. This screen gives you all the
components necessary to activate and adjust the individual surround sound patterns.
The Edit window consists of the Header, EQ, and Level Adjust sections.
85


Header Section Double M/S
The first of the six surround sound patterns is the Double M/S pattern. This is the
most complex surround pattern used in Hydra as it is based on level differences
between the three microphones used in the impulse response recording process. This
pattern is capable of up to five audio channels.
1 - Power. This switch turns on the convolution processing for the Double M/S
pattern.
2 - -3dB: This switch engages a post-convolution level dip to compensate for any
possible digital clipping when summing the audio signals. This is likely not needed.
However, should the peak indicators light when the audio is convolved with no other
effects or level adjustments engaged, then the dip switch can be activated to alleviate
the issue.
3 - Flip: The Flip switch is designed to provide flexibility in channel placement. The
Double M/S pattern has two surround channels and no surround back channels. The
Flip switch allows you to send the left and right surround convolved signals to the left
and right surround back outputs. This function is either/or. The audio signals cannot
be sent to both the surround and surround back outputs at the same time.
4 - Pattern : The Pattern drop-down box selects which of the two surround patterns
will be displayed in the given area below the header. To conserve screen space the
Double M/S and Modified Decca Tree patterns were layered together. Both patterns
contain the same channel layout, with the Modified Decca Tree pattern capable of
two additional channels over Double M/S. Changing the Pattern selection does not
inactivate the non-shown surround pattern, nor does changing the selection result in
the loss of any settings made on the non-displayed surround pattern.
5 - Type: Exclusive to the Double M/S pattern is the Type drop-down. The Double
M/S pattern relies on matrix coefficients applied to each of the convolved signals,
thereby yielding left, right, center and surround channels. There is more than one
86


table of matrix coefficients for the pattern. Type I is the most common and thus
placed first. Type 1(b) and 1(c) are variations of the Type I matrix. Type II and Type
11(b) result in a Double M/S output that removes the center channel from the
equation.
6 IR Preset: The IR Preset drop-down is where you will select which set of impulse
response recordings to use for convolution of the audio signal. There are two sets,
Straight and Toe. IR sets with the "Straight" designation indicate the speakers were
facing flush with the microphone arrangement, and likely will result in a wider stereo
image. The "Toe" designation indicates the IR set was recorded with the speakers
toed-in toward the center of the microphone configuration. This will result in a
narrower stereo image concentrated more toward the center channel.
7 - Info Button: The Info Button loads a document containing information on the IR
for the surround pattern chosen. The information in this document is related to the
recording process, containing IR setup images, speakers and microphones used as
well as distances. The distances information will factor in when calculating delays.
See the Mix Screen section on Delay Processing.
Header Section ModifiedDecca Tree, IRT Cross, Hamasaki Square, OCT-2, Rear-
Facing Cardioids
The header section of the remaining five surround patterns is very similar to the
Double M/S section. There are only a couple differences as shown here.
Modified Decca Tree'. This pattern is capable of up to seven audio channels, being
the only surround pattern capable of the full usage of seven-channel surround.
Because of this fact, this pattern does not include a Flip switch, as all channels are
full.
IRT Cross/Hamasaki Square: Like the Double M/S and Modified Decca Tree, the
IRT Cross and Hamasaki Square pattern are layered together given their channel
layout commonality. These patterns are capable of four audio channels two front
channels and two surround channels. Again, the Flip switch will reroute the surround
channels to the rear surround channels.
OCT-2: This is one of two patterns in Hydra that is not layered. This pattern consists
of three audio channels left, center and right. Like Modified Decca Tree, the OCT-
2 pattern does not contain a Flip switch, as there are no surround channels in the
pattern.
87


Rear-Facing Cardioids: This is the other non-layered pattern. Rear-Facing Cardioids
is capable of only two channels left and right surround. The Flip switch can be
engaged here to as well to send the audio to the rear surround outputs.
The one addition to all the surround patterns other than Double M/S, is the Width
switch.
1 Width: The Width changes the way in which the surround pattern convolves the
audio. When left in the off position, also called the Mix position, the audio is
convolved in a true stereo format. In this situation, the left speaker channel consists
of a convolved signal combining the left and right signals. Essentially, each channel
is comprised of two convolved signals based on the original IR recording process in
which the left microphone position recorded an impulse response from the left and
right speakers separately. In the Mix position, the audio is more center-focused as the
left and right speaker positions share more of a common signal.
When the switch is engaged, the plug-in convolves only a single signal per audio
channel. In this case, only the original left audio signal would be convolved and sent
to the left speaker, and the right to the right speaker. When a center channel is
involved in the surround pattern and the Width switch has been engaged, a multiplier
is used to maintain the level balance between the left/right and center speakers.
Using the Width switch will result in a wider surround sound field, and is generally
more conducive to music upmixing.
88


EQ Section All Patterns
The EQ section of all six patterns is the same. Each channel strip in the Hydra plug-
in contains a Pre-Fade, 3-band tone EQ.
The EQ is engaged by clicking the EQ button at the top of each channel strip under
the Header Section.
Three knobs are stacked vertically, providing decibel control over the EQ bands, with
a range of 18dB. Below each knob is a display indicating the level change on the
EQ band to the tenth of a decibel.
The 3-band EQ consists of a high shelf, Peak EQ and low shelf filter. These are
designated next to the knobs as HF, MF and LF, respectively, from top to bottom.
The high shelf is set internally at 12kHz. The low shelf is set at 80Hz. The Peak EQ
has a broader Q than the shelf EQs and is centered at 2.5kHz.
89


Level Adjustment Section-All Patterns
T& m TUI TJ it
mm m u M s tm mm to* N tto mm km u
0.0 dB 0.0 dB 0.0 dB 0.0 dB 0.0 dB 0.0 dB 0.0 dB
L R c Ls Rs Lsb Rsb
Under the EQ section, each channel strip is equipped with a level adjustment fader,
mute button, level meter and dB display, and a channel designation display.
Immediately after the EQ there is a Mute button. This is perhaps a bit misleading as
the Mute function is pre-EQ in order to conserve CPU should the EQ be engaged at
the same time as the Mute function.
Each channel strip is equipped with a level adjustment fader. Each fader has a -60 to
+20dB range.
Also on each channel strip is a segmented level meter. This meter has display ranges
of -60 to -12dB, -12 to -3dB, -3 to -O.OOldB. The final segment is a red peak
indicator, which once activated remains lit until clicked upon to reset. The level
meters are Post-EQ, Post-Mute and Post-Fade.
Below each fader and meter is a decibel readout indicating the position of the fader.
The final display on each channel strip is the channel designation. L (Left), R (Right),
C (Center), Ls (Left Surround), Rs (Right Surround), Lsb (Left Surround Back), Rsb
(Right Surround Back).
Note: In the case of Double M/S and Rear-Facing Cardioids, the activation of the
Flip switch will result in the audio signal shifting from the Ls/Rs channel strips to the
Lsb/Rsb channel strips. This allows you to maintain individual EQ and level
adjustment settings for the surround and surround back channels should you find the
need to move back and forth.
Note: The IRT Cross/Hamasaki Square patterns are only equipped with four tracks
despite the ability to flip the surround channels. In this case the outputs occupy the
same channel strips [Ls(b), Rs(b)j and cannot maintain separate EQ/Level settings.
90