Citation
The accuracy and consistency of spectrographic analysis for voice identification

Material Information

Title:
The accuracy and consistency of spectrographic analysis for voice identification
Creator:
Smith, Jeffrey Matthew
Publication Date:
Language:
English
Physical Description:
[vii], 37 leaves : ; 28 cm

Thesis/Dissertation Information

Degree:
Master's ( Master of Science)
Degree Grantor:
University of Colorado Denver
Degree Divisions:
Department of Music and Entertainment Industry Studies, CU Denver
Degree Disciplines:
Recording Arts
Committee Chair:
Pritts, Roy A.
Committee Members:
Walker, Gregory T. S.
Sanders, Richard W.

Subjects

Subjects / Keywords:
Voiceprints ( lcsh )
Forensic audiology ( lcsh )
Spectrum analysis ( lcsh )
Forensic audiology ( fast )
Spectrum analysis ( fast )
Voiceprints ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 35-37).
General Note:
College of Arts and Media
Statement of Responsibility:
by Jeffrey Matthew Smith.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
123005752 ( OCLC )
ocn123005752
Classification:
LD1193.A70 2006m S54 ( lcc )

Full Text
r
THE ACCURACY AND CONSISTENCY OF SPECTROGRAPHIC
ANALYSIS FOR VOICE IDENTIFICATION
by
Jeffrey Matthew Smith
B.M., University of Colorado at Boulder, 2003
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Recording Arts
2006


This thesis for the Master of Science
degree by
Jeffrey Matthew Smith
has been approved
by
Roy A. Pritts
Date
Richard W. Sanders


Smith, Jeffrey Matthew (M.S., Recording Arts)
THE ACCURACY AND CONSISTENCY OF SPECTROGRAPHIC
ANALYSIS FOR VOICE IDENTIFICATION
Thesis directed by College of Arts and Media Professor
and Department Chair Richard W. Sanders
ABSTRACT
This test investigated the accuracy and consistency of voice identification
comparisons made by 5 trained examiners over a three-week period.
These individuals were all students of the University of Colorado at
Denver and had taken a semester long course in Audio Forensics with
limited training in voice identification. Each week, examiners conducted 8
closed-trial comparisons of 4 clue-phrases from both male and female
speakers. In simulating a closed set spectrographic line-up, each
comparison consisted of spectrograms from a pool of 4 known speakers
and one unknown speaker; audio recordings of the known and unknown
speakers were made 9 months apart. From the pool of known speakers,
the examiner made a positive identification match to the unknown. After
the three-week period, data reveled that examiners reached the same
conclusion in all three examinations for only 50% of the comparisons. The


average accuracy of these examinations was 65%. This paper discusses
the outcome of the experiment including interpretation of these and other
results.
This abstract accurately represents the content of the candidates thesis. I
recommend its publication.
Signed


ACKNOWLEDGEMENTS
This thesis was funded in part by a Graduate Education Grant from the
Audio Engineering Society. Thanks to Roy Pritts for his help in making
that happen.
Thanks to my thesis advisor, Rich Sanders, for his guidance in this and
many other projects- without his support and tutelage this could not have
happened.


CONTENTS
Figures.............................................................viii
Tables................................................................ix
Chapter
1. Introduction and Background.......................................1
1.1 Method.............................................................1
1.2 History...........................................................3
2. Experiment in Voice Identification..................................7
2.1 Preparation........................................................7
2.2 Design and Experimental Procedure.................................9
2.3 Examiners........................................................10
3. Results...........................................................11
4. Interpretation of Data.............................................14
5. Conclusion.........................................................22
5.1 Qualifications of Voice Spectrogram Examiner......................22
5.2 Accuracy and Consistency.........................................23
5.3 Use of an Aural Aid..............................................23
5.4 Proposed Changes to Test Model
25


Appendix
A. Raw Data..................................................27
References...................................................35


FIGURES
Figure
2.1 Example of a digital spectrogram..............................8
4.1 Average consistency and accuracy of examiners.................15
4.2 Average consistency per phrase................................16
4.3 Average accuracy per phrase...................................16
4.4 Voice spectrogram of phrase Foot met boot then let be.......18
4.5 Voice spectrogram of phrase Vote for the thin bird..........19
4.6 Voice spectrogram of female saying Vote for the thin bird...20
4.7 Voice spectrogram of male saying Vote for the thin bird.....21
5.1 Venn diagram comparison of spectrographic and aural analyses..25


TABLES
Table
2.1 Speakers and Phrases...........................................9
3.1 Each examiners consistency versus their accuracy..............11
3.2 Average consistency versus accuracy derived from Table 3.1....12
3.3 Average accuracy of examiners from examination to examination.... 12
3.4 Number of identical conclusions among examinations............12
3.5 Examiners average consistency and accuracy in each phrase.....13
3.6 Examiners average consistency and accuracy by sex of sample...13


1. INTRODUCTION AND BACKGROUND
Positive identification of persons is extremely important when considering
the potential for law enforcement and the apprehension of criminals.
Fingerprinting, photographic, and DNA testing are some of the techniques
used to identify a perpetrator but often times the only evidence available is
an audio recording. These recordings can take the form of a phone
conversation, a hidden microphone worn on the body, a surveillance
recording, or the audio track on a security camera. When this evidence is
in an investigators possession, great care must be taken to preserve the
recording itself and ensure its safe keeping so that its contents remain
reliable. Also, care must be taken to properly interpret the evidence for its
use as a tool to indicate identity. The Aural/Spectrographic Method of
voice identification is the technique employed by an audio forensics expert
to interpret this recording.
1.1 Method
The Aural/Spectrographic Method is the combination of an aural
comparison and a spectrographic comparison between an evidential
recording and a known recording of a suspect. It is the voice identification
experts responsibility to make these comparisons objectively and
methodically to ensure accurate and unbiased results. The need for
extremely accurate identification is of the utmost importance because the
1


last thing anyone wants to happen is the acquittal of a criminal or the
imprisonment of an innocent person.
A spectrographic comparison consists of prepared voice spectrograms
from sample recordings. These spectrograms can be derived from and
prepared in either the analog or digital domain. The result is a graphic
representation of the voice; a voiceprint showing the unique
characteristics in ones speech pattern and physiology. This readout
shows time along an x-axis, frequency of speech along a y-axis, and the
relative intensity of the speech signal in the darkness of the graph. See
Figure 2.1 for an example. When a speech signal is analyzed in this way,
the spectrogram will show clusters of frequency resonances or
concentrations of energy in ones speech. These concentrations are
referred to as formants and they represent the frequency characteristics in
speech that are needed to communicate vowel sounds. When analyzed,
formants can show a persons fundamental speaking frequency and
apparent partials of the utterances (referred to as fo, fi, f2, etc.;
respectively.) Note in Figure 2.1 that the frequency and trajectory of
formants determine the type of vowel sound made.
Voice identification employing these means relies on the improbability
that two speakers would have vocal cavity dimensions and articulator use-
patterns identical enough to confound voiceprint identification methods.
(Kersta, 1962 pg 1255) Therefore, the uniqueness of a persons speech is
yielded from varying sizes of vocal cavity resonators (throat, nasal, and
two oral cavities formed by the setting of the tongue) as well as the unique
behavioral usage of the speech articulators (lips, teeth, tongue, soft
palate, and jaw muscles). These elements directly correspond to an
1


individuals unique spectrographic read-out. Graphic bandwidths, mean
frequencies, and trajectory of vowel formants; vertical striations,
distribution of formant energy and nasal resonances; stops, plosives and
fricatives; interformant features; and any peculiar acoustic patterning are
clues found in ones voice spectrogram that hold the secret to their
identity. (McDermott etal. 1996) However, one major claim made by
opponents of spectrographically aided voice identification is that unlike the
invariant fingerprint, two voiceprints from the same individual are never
identical.
Nonetheless, the voice spectrogram can be very powerful when combined
with findings from the aural comparison of material. The examiner can
reliably reach an outcome of positive identification, probable identification,
possible identification, positive elimination, probable elimination, possible
elimination, or no decision1. This expert opinion is then submitted to the
court as evidence and the voice identification practitioner can be used as
an expert witness providing a testimony of their findings. The
Aural/Spectrographic Method has aided countless court cases and has
been established as a reliable method of interpreting recorded evidence.2
1 These are the 7 conclusions prescribed by the American Board of Recorded Evidence
(1996). It is recommended by other authorities (McDermott etal. 1996) to reach one of 5
conclusions including positive identification, probable identification, positive elimination,
probable elimination, or no decision.
2 While the Aural/Spectrographic Method is an established form of voice identification, its
use as evidence is contested often and separate court systems have defined its
admissibility in different ways. (McDermott et al. 1996)
2


1.2 History
The term voiceprint was first published in 1944 by Gray and Kopp when
discussing their development and research of the sound spectrograph at
Bell Labs. Although there were other applications prognosticated for this
device (such as an aid in teaching the deaf to speak and establishing a
telephone system for the deaf), the groundwork was laid for what has
become a tool utilized in many disciplines. A tool that changed the way
we study the human voice from its use in pathological research to its aid in
criminal investigation. The future holds many applications for it as well;
among studies currently conducted on the voice spectrogram is identifying
ways it can be exploited in an automated real-time identity verification
system3.
When likening the voice spectrogram to a fingerprint and identifying its
potential for speaker identification, intentions for the device became
clearer. Research at Bell Labs intensified during WWII as possibilities for
added national security became apparent. However, the war ended
before the voice spectrograms full potential was realized and research on
the subject all but stopped. It wasnt until the early 1960s when the New
York City Police Department asked Bell Labs for their help with cases
involving recorded phone calls that voice identification development once
again picked up.
Lawrence Kersta of Bell Labs headed this research and set out to develop
a method by which the spectrographic voiceprint of an individual could
3 The National Center for Voice and Speech in Denver, CO currently has research
studying digitally interpolated voice spectrograms for parameters to be used in this way.
3


reliably be assessed and matched to an unknown speaker. In 1962,
Kersta published the findings of this research and a test of his method that
showed error rates of approximately 1%. Voice researchers were
astounded and experimental activity peaked. Several experiments
followed with varying and somewhat disparaging results. Young and
Campbell (1967) conducted a follow-up experiment under similar
conditions and reported an error rate of 21.6%. Stevens et al. (1968)
followed this with their own assessment of Kerstas method achieving an
error rate ranging from 18% to 50% depending on the utterance.
Concerns were expressed among the scientific community and all the
while Kersta provided expert testimony and training to law enforcement
using his voiceprint method.
With these inconsistent results, the only thing experts agreed on was that
more research and testing was necessary before the discussion would be
put to rest. In 1972, Tosi etal. published their findings from a two-year
experiment named The Michigan State University Voice Identification
Project. The goal of this program was not only to verify Kerstas claim
and method but to test it under other models including the addition of
variables related to forensic tasks. This goal was borne out of a need for
experiments to parallel real-life application and hold pertinence to actual
forensic investigations. Another important aspect of this experiment was
that it was funded by the United States Department of Justice and enacted
under a contract with the Michigan Department of State Police. Trained
examiners were experienced and closely monitored and the conditions
were controlled and consistent. In experiments modeled after Kerstas, a
less than 1% error rate was observed. Furthermore, under more
complicated forensic models, error rates were still very low: 6% false
4


identifications and 13% false eliminations. Tosi predicted that the
accuracy would increase if examiners were acting under real conditions.
Despite these validating results, experiments followed where findings once
again showed inaccuracy. In a test conducted by Barry Hazen (1973),
results led to the conclusion that the value of spectrograms for speaker
identification purposes is limited to use as an investigative aid... (Hazen
1973, pg 650)
As one can see, comparisons of tests from this period reveal disparate
results in two categories: conclusions that voiceprint analysis is effective
and accurate, then those where findings reveal the method as unreliable.
The variables that separate tests like Tosis from those with less accurate
results not only lie in the duration of the experiment but in the quality,
depth, and duration of training for examiners.
A Federal Bureau of Investigation survey of FBI employed voice
identification examiners (Koenig 1986) helped bridge the gap between
supporters and opponents. In two thousand comparisons over a 15 year
period, these examiners registered only two false eliminations and one
false identification- an error rate of less than 1%. This survey is important
for two reasons. Firstly, it concluded that voice identification made by
properly trained individuals utilizing a full range of procedures produces
very accurate results. Secondly, the qualifications of examiners involved in
this survey set precedence for what is now considered proper training: a
formal course of study lasting two to four weeks, at least two years of
study completing 100 voice comparison cases under advisement of a
5


recognized expert, and an examination by a board of experts in the field of
spectrographic voice identification analysis4.
Furthermore, these FBI examiners, in utilizing a full range of voice
identification techniques, used aural comparisons in conjunction with
spectrographic voiceprints to assess their comparisons. In reviewing
experiments like this FBI survey and the others discussed in this paper, it
is noted that where an aural comparison was not complimentarily involved,
accuracy results were lower. This element of aural comparison in
conjunction with carefully produced and inspected spectrographic read-
outs constitutes the proper means of voice identification and the
culmination of the Aural/Spectrographic Method.
Presently, with the method of spectrographically aided voice identification
more refined and more widely accepted, experiments like Hazens that
linger from the forensic spectrograms past should not stand up to more
contemporary results like the FBI survey where a standard procedure and
proper training are implemented.
4 Although Bruce Koenig gathered and published this data validating the
Aural/Spectrographic Method he has since changed his stance on its worthiness saying it
will ...not produce conclusive results, but meaningful findings are possible with careful
analysis of speech samples collected under forensic conditions. (Koenig 2003)
6


2. EXPERIMENT IN VOICE IDENTIFICATION
If a voice identification practitioner were asked to blindly reevaluate a
comparison they had made at some earlier date, would their assessment
bear the same conclusion? The test presented in this paper was designed
to answer this question- to quantify the consistency of spectrographically
aided voice identification and how it relates to its inherent accuracy.
This hypothesis was inspired by a discussion of examiner bias in a paper
by Poza and Begault (2005) in which the writers defined bias and applied
it to the Aural/Spectrographic Method. In realizing that examiner bias
exists and cannot be extinguished, they went on to offer methodology
designed to reduce it. Poza and Begaults paper prompts interesting
thoughts regarding the value of not just the method but also the examiner.
Though the test presented in the present paper does not necessarily
quantify bias, it follows Poza and Begaults line of thinking by attempting to
shed light on the practitioner of the method and not just the method itself.
2.1 Preparation
Unknown speaker samples used for this experiment were part of a
subject database previously recorded by the author of this paper for
another publication. (Smith etal. 2005) Known samples were recorded
in an identical setting 9 months later. Speakers were made up of 4
females and 4 males all recorded through a high-quality, high bandwidth
7


transmission line in an isolated environment free of noise. Digital
spectrograms were produced using Multispeech Model 3700 software by
Kay Elemetrics Corp. (http://www.kavelemetrics.com/). These were
contemporary spectrograms showing a persons speech graphically with
time along the x-axis, frequency along the y-axis, and amplitude
represented as darkness. See Figure 2.1 for an example.
To ensure consistency between spectrograms, standard analysis settings
were used. These settings were as follows:
Filter Order = 36
Pre-Emphasis Factor = 1.00
Window Weighting = Blackman
Analysis Method = Autocorrelation
FFT Analysis Window Size = 256 points or 252.34 Hz
Spoken: Boy s
I
Figure 2.1 Example of a digital spectrogram;
the diphthong oy from the word Boys.
8


Once digital spectrograms were created, high quality prints were made
and laminated for multiple uses.
2.2 Design and Experimental Procedure
Examiners engaged in 3 identical examinations over the course of 3
weeks. Each examination consisted of 8 closed-trial comparisons in a
closed set line-up (Poza and Begault 2005, pg 26) where examiners could
freely browse through the 4 known speakers spectrograms to compare
to the unknown spectrogram until a conclusion was reached; there
existed one positive match to the unknown. There were four clue
phrases that represent a variety of sounds and were chosen for their
wealth of vowels and diphthongs; elements essential for effective voice
identification. These phrases can be seen in Table 2.1.
Comparison Sex Phrase
1 Male She hates my azure key
2 Female She hates my azure key
3 Male Few boys fear slow newts
4 Female Few boys fear slow newts
5 Male Vote for the thin bird
6 Female Vote for the thin bird
7 Male Foot met boot then let be
8 Female Foot met boot then let be
Table 2.1 The comparisons made by examiners by speaker and phrase.
9


Examiners were charged with coming to a conclusion of positive
identification and were made aware that a positive match did exist. They
were also allowed as much time as necessary though no single
comparison exceeded 15 minutes. Results were tabulated after each
examination and examiners were not given information regarding their
performance or the performance of their peers. One week later, the same
8 comparisons were administered to the same examiners; and again the
following week. It should be noted here that the examiners were not
allowed an aural inspection of material.
2.3 Examiners
The examiners were graduate level students at the University of Colorado
at Denver and Health Sciences Center and all had a very common
background in regards to their experience with voice spectrograms. They
had all concluded a semester long study of audio forensics in which they
gained a fundamental knowledge of vocal spectrograms and were
familiarized with the Aural/Spectrographic Method. Also in this course,
they had engaged in one voice identification examination. Lastly, they are
all published authors in the field of voice identification research5.
However, they did not have training equal to the prescribed qualifications
of an expert in voice identification analysis (see pg 6).
5 To preserve subjects identity, reference to these publications will not be revealed.
10


3. RESULTS
All values in tables below are derived from the performance of examiners
shown in Appendix 1: Raw Data. Examiners involved in this experiment
will be identified as E1-E5. Lastly, consistency refers to how often an
examiner produced identical conclusions for a comparison in all three
examinations.
The first two tables of results show the overall performance of the
examiners. In Table 3.1 we see each examiners percent of consistency
against their percent of accuracy as averaged from all three examinations.
Table 3.2 then shows averages of these results.
Examiner Consistency Accuracy
E1 38% 59%
E2 63% 84%
E3 38% 67%
E4 50% 63%
E5 63% 54%
Table 3.1 Each examiners consistency versus their accuracy.
Consistency represents how often the examiner made the
same positive identification for all three examinations.
11


Average Consistency Average Accuracy
50% 65%
Table 3.2 Averages derived from Table 3.1.
The next two tables give information regarding the examiners
performance from examination to examination. Table 3.3 shows the
average accuracy of examiners from each individual examination. Table
3.4 tallies the number of positive identifications that were consistent
between the 1st and 2nd examinations and the 1st and 3rd examinations.
Examination Ave. Accuracy
1st 70%
2nd 63%
3rd 63%
Table 3.3 Average accuracy of examiners from
examination to examination.
Examination Consistent Comparisons
1st to 2nd 28
1st to 3rd 25
Table 3.4 Number of identical conclusions in comparisons
between the 1st and 2nd examinations and
between the 1st and 3rd examinations.
The next two tables explore results regarding the examiners performance
related to both clue phrases and differing sexes in comparison samples.
Table 3.5 shows the average consistency of examiners per phrase. This
12


table also shows the average accuracy of examiners per phrase. Table
3.6 shows the examiners average consistency and accuracy in
comparisons of male speakers versus female speakers.
Sentence Ave. Consistency Ave. Accuracy
She hates my azure key 40% 80%
Few boys fear slow newts 50% 57%
Foot met boot then let be 30% 43%
Vote for the thin bird 80% 80%
Table 3.5 A comparison of the examiners average consistency and
accuracy with each individual phrase.
Sex Ave. Consistency Ave. Accuracy
Male 55% 67%
Female 45% 63%
Table 3.6 A comparison of the examiners average consistency and
accuracy separated by sex of comparison samples.
13


4. INTERPRETATION OF DATA
To interpret this data, it should be looked at from all sides. Most
importantly, the ability of the examiners needs to be taken into
consideration. As stated earlier, the examiners all had a similar
background in regards to spectrographically aided voice identification
which left them short of meeting the prescribed qualifications of a voice
identification expert (see pg 6). This is the first factor to consider when
comparing these results of accuracy to others in the past. Also,
comparisons were made without the aid of an aural comparison. Upon
reviewing the decades of tests made in this field it is clear that when either
of these two elements is compromised (and more so when, like in this
experiment, both are) accuracy of results will diminish. Therefore, the low
average accuracy of examiners shown in Table 3.2 is expected. But how
does this accuracy and the implied inability of the examiners relate to their
consistency? This chapter will refer to and interpret tables from Chapter 3
to answer this question.
Figure 4.1 shows that in 4 out of 5 examiners, consistency increases as
accuracy increases. This would lead one to believe, though not
definitively, that better qualified examiners would also produce more
consistent results. This proposition is further supported when Tables 3.3
and 3.4 are taken into consideration. We see that as accuracy declines in
successive examinations, so, too, do the amount of consistent answers.
14


Accuracy Consistency
Figure 4.1 Average consistency and accuracy of examiners.
In Table 3.5, it stands out that the phrase Foot met boot then let be is
more problematic for some examiners than others- especially Vote for the
thin bird. This can be seen in Figures 4.2 and 4.3 more clearly where we
see the consistency and accuracy of each particular phrase graphically.
15


E1
E2 '
E3 -*r- E4
Figure 4.2 Average consistency per phrase as spoken by both males
and females.
Figure 4.3 Average accuracy per phrase as spoken by both males
and females.
The dip in both consistency and accuracy indicate the difficulty examiners
had in making spectrographic comparisons of this phrase. Note that E2
was the only examiner that did not have difficulty in comparisons of the
phrase Foot met boot then let be- this person is also the examiner with
the highest degree of accuracy and consistency in Table 3.1. The
difficulty of this phrase prompts implications of the necessity for clarity and
substance in words chosen for spectrographic comparison. As noted
16


earlier, a wealth of vowels and diphthongs in a recorded phrase is
necessary for effective spectrographic voice identification. Compare
spectrograms from Foot met boot then let be and Vote for the thin bird
in Figures 4.4 and 4.5 on pages 18 and 19. Note differences between a
successful phrase and an unsuccessful phrase; specifically, the length of
vowel sounds and useful formant trajectories.
Lastly, Table 3.6 compares results in consistency and accuracy between
male comparison samples and female comparison samples. It is
reassuring to note that performance results between these two groups are
very similar. However, both accuracy and consistency results in female
comparison samples are slightly lower. This gives an indication that
female spectrograms are harder to analyze than those of males- if only
slightly. Any number of factors could contribute to this but I would like to
offer that the possible increase in difficulty is related to the higher
fundamental speaking frequency (on average) of females. Based on the
physiological properties of speech, a womans smaller build leads to
thinner and less defined spectrographic striations. These striations are
defined as the vertical lines in a voice spectrogram that relate to openings
and closings of the vocal folds. The resulting clarity of the spectrogram is
blurred and more difficult to interpret. See Figures 4.6 and 4.7 on pages
20 and 21 for a comparison between male and female spectrograms.
17


Figure 4.4 Female voice spectrogram of phrase Foot met boot then let be;


I
I
r
d
V-ot-e f-o-r th-e th
n b
Figure 4.5 Female voice spectrogram of phrase Vote for the thin bird.


Vo---------t-e f-o-r th-e
thin
bi-----r----d
Figure 4.6 Voice spectrogram of female saying Vote for the thin bird:


Vo-t-e for th-e thin bi...rd
Figure 4.7 Voice spectrogram of male saying Vote for the thin bird.


5. CONCLUSION
In reviewing the data collected from this experiment, it is clear that results
and, thus, conclusions were hindered by the lack of qualification in
examiners. However, there are important implications gleaned from this
test.
5.1 Qualifications of Voice Spectrogram Examiner
These results show that an unqualified examiner will produce neither
consistent nor accurate results. This is very important in the area of law
enforcement where the misinterpretation of spectrographic data could lead
to the erroneous acquittal of a perpetrator, or worse, conviction of an
innocent person. Therefore, persons engaged in voice identification for
legal purposes should be rigorously trained and, by all indications, the
examiners involved in this study were not. As put forth in previous
experiments, it is best to rely on voice identification comparisons made by
an individual meeting qualifications outlined in the FBI survey (Koenig
1986).
Similar tests on the accuracy of voice identification can be compared to
the one presented in this paper to support this claim. In a test involving
spectrographic comparison without the use of an aural aid, Tosi et al.
(1972) produced accuracy results of approximately 80%. In another test,
test without the use of an aural aid, Reich etal. (1976) came to results of
22


56.67%. Then, the accuracy of results in the experiment presented in this
paper reached 65%. Why did Tosis experiment yield such better
accuracy? The answer is simple: examiners in his test were more
rigorously screened and trained in spectrographic voice identification.
5.2 Accuracy and Consistency
Another important implication drawn from this study is the relationship
between consistency and accuracy. In a majority of examiners, it was
seen that as accuracy improved so did consistency. The flipside to this
relationship would be in E5 where we see a higher degree of consistency
than accuracy; meaning this examiner consistently provided inaccurate
conclusions. This examiners performance should be seen as a fluke and
not contradictory to implications drawn from the majority of examiners.
Accepting this claim would lead to the assumption that a qualified
examiner would not only provide accurate results from one comparison to
another, but produce the same consistent identifications if asked to repeat
these comparisons.
5.3 Use of an Aural Aid
It has been concluded in previous studies of spectrographic voice
comparison that results will suffer if an aural aid is not in use. For
example, in the aforementioned experiment conducted by Tosi etal.
(1972), results were not as accurate as the results in other contemporary
experiments. Tosis conclusion was that ...if, in addition to visual
23


comparisons of spectrograms, the examiners had been allowed to listen to
the unknown and known voices, these errors might have been further
reduced. (Tosi et al. 1971, pg 2041) Results seen in the present paper
confirm this conclusion.
As a secondary test, the examiners used in this study were asked at the
end of the third session to make all of the same comparisons from the
previous examinations with only the audio samples and without
spectrographic comparison. The accuracy results were higher than with
the spectrographically aided comparisons; 75% compared to 65%.
In addition to the increased accuracy in an aural-only comparison, when
results are compared between these assessments, further conclusions
attest to the importance of an aural aid in spectrographic analysis. Figure
5.1 is a Venn diagram showing the average accuracy of all three sets of
spectrographic analyses compared to the accuracy of the aural
examination.6 In Figure 5.1 all X marks inside the left circle represent
correct conclusions through spectrographic analysis. All X marks inside
the right circle represent correct conclusions through aural analysis. X
marks in the overlapping of these two circles represent conclusions that
were correct in both analyses. Lastly, X marks outside of the circles
represent conclusions that were incorrect with both types of analyses.
When looking at the data in this way, it can be concluded that had the
6 The average accuracy of each spectrographic comparison is determined in this way: if
the examiner concluded 2 or 3 out of 3 of the assessments correctly for one phrase it was
determined as a correct conclusion through spectrographic analysis; if the examiner
concluded incorrectly in 2 or 3 out of 3 of the assessments for one phrase it was
determined as an incorrect conclusion through spectrographic analysis. This average
accuracy representing a majority of conclusions made by examiners through
spectrographic comparison could then be compared to the accuracy of the aural
comparison in proportion.
24


examiners been given an aural comparison to accompany their
spectrographic analyses, they could have potentially reached an error rate
of less than 8%. This is a very good error rate; much better than the 35%
actually yielded in this experiment though not yet as strong as the less
than 1 % seen in other successful studies employing the full range of
forensic techniques (Kersta 1962, Tosi 1972). Although this data was not
presented in the results section, it is included here as a matter of interest.
Figure 5.1 Venn diagram comparing the averaged accuracy of the
spectrographic analyses to the aural comparison.
5.4 Proposed Changes to Test Model
While the experiment presented in this paper is useful for interested
persons and to the benefit of the authors education, here are ways it
could have been made better. Firstly, qualified examiners would have
25


shaped the results better. Though it would be nearly impossible to put
together a group of examiners with qualifications akin to FBI forensic
scientists, using unqualified examiners limited the scope of results and
conclusions yielded from this data. Secondly, orienting examiners to a
standard procedure to limit their technique would provide results based on
the consistency of the group and not just the performance of an individual.
Lastly, the experiment would be further enhanced with the incorporation of
an aural aid. This way, the Aural/Spectrographic Method would be more
fully tested for accuracy and consistency.
26


APPENDIX
Appendix A: Raw Data
In the tables below, each examiners conclusions are shown for all three
spectrographic assessments and the one aural assessment. In the first
column, the letter in parentheses refers to the sex of the speaker. In the
second and third columns, matches and conclusions are represented by
initials of the speaker. A Y in the fourth column means that the examiner
came to a correct conclusion and an N means they did not. In the last
column, the examiners conclusion from the one and only aural
examination is noted as Y or N as well.
Examiner 1
(Sex)Sentence Correct Match Conclusions Correct YorN Audio YorN
(F)She Hates... KS
Asses. 1 KS Y
Asses. 2 KS Y
Asses. 3 KM N N
(F)Few Boys... ES
Asses. 1 ES Y
Asses. 2 ES Y
Asses. 3 ES Y N
27


(F)Foot Met... JS
Asses. 1 ES N
Asses. 2 KS N
Asses. 3 JS Y Y
(F)Vote For... KM
Asses. 1 ES N
Asses. 2 KM Y
Asses. 3 ES N Y
(M)She Hates... LL
Asses. 1 LL Y
Asses. 2 LL Y
Asses. 3 LL Y Y
(M)Few Boys... JH
Asses. 1 LL N
Asses. 2 NZ N
Asses. 3 LL N N
(M)Foot Met... NZ
Asses. 1 LL N
Asses. 2 LL N
Asses. 3 NZ Y Y
(M)Vote For... PD
Asses. 1 PD Y
Asses. 2 PD Y
Asses. 3 PD Y Y
28


Examiner 2
(Sex)Sentence Correct Match Conclusions Correct YorN Audio YorN
(F)She Hates... KS
Asses. 1 KS Y
Asses. 2 KS Y
Asses. 3 ES N Y
(F)Few Boys... ES
Asses. 1 ES Y
Asses. 2 KM N
Asses. 3 KM N Y
(F)Foot Met... JS
Asses. 1 JS Y
Asses. 2 JS Y
Asses. 3 JS Y Y
(F)Vote For... KM
Asses. 1 KM Y
Asses. 2 KM Y
Asses. 3 KM Y Y
(M)She Hates... LL
Asses. 1 LL Y
Asses. 2 LL Y
Asses. 3 LL Y Y
29


(M)Few Boys... JH
Asses. 1 JH Y
Asses. 2 JH Y
Asses. 3 NZ N N
(M)Foot Met... NZ
Asses. 1 NZ Y
Asses. 2 NZ Y
Asses. 3 NZ Y Y
(M)Vote For... PD
Asses. 1 PD Y
Asses. 2 PD Y
Asses. 3 PD Y N
Examiner 3
(Sex)Sentence Correct Match Conclusions Correct YorN Audio YorN
(F)She Hates... KS
Asses. 1 KS Y
Asses. 2 KS Y
Asses. 3 ES N Y
(F)Few Boys... ES
Asses. 1 ES Y
Asses. 2 ES Y
Asses. 3 ES Y Y
30


(F)Foot Met... JS
Asses. 1 JS Y
Asses. 2 KM N
Asses. 3 JS Y Y
(F)Vote For... KM
Asses. 1 KM Y
Asses. 2 ES N
Asses. 3 KM Y Y
(M)She Hates... LL
Asses. 1 LL Y
Asses. 2 LL Y
Asses. 3 LL Y Y
(M)Few Boys... JH
Asses. 1 NZ N
Asses. 2 NZ N
Asses. 3 PD N N
(M)Foot Met... NZ
Asses. 1 PD N
Asses. 2 LL N
Asses. 3 NZ Y Y
(M)Vote For... PD
Asses. 1 PD Y
Asses. 2 PD Y
Asses. 3 PD Y Y
31


Examiner 4
(Sex)Sentence Correct Match Conclusions Correct YorN Audio YorN
(F)She Hates... KS
Asses. 1 KS Y
Asses. 2 KS Y
Asses. 3 ES N N
(F)Few Boys... ES
Asses. 1 ES Y
Asses. 2 ES Y
Asses. 3 ES Y Y
(F)Foot Met... JS
Asses. 1 ES N
Asses. 2 KM N
Asses. 3 KM N Y
(F)Vote For... KM
Asses. 1 KM Y
Asses. 2 KM Y
Asses. 3 KM Y Y
(M)She Hates... LL
Asses. 1 LL Y
Asses. 2 NZ N
Asses. 3 LL Y Y
32


(M)Few Boys... JH
Asses. 1 NZ N
Asses. 2 NZ N
Asses. 3 NZ N N
(M)Foot Met... NZ
Asses. 1 NZ Y
Asses. 2 NZ Y
Asses. 3 LL N Y
(M)Vote For... PD
Asses. 1 PD Y
Asses. 2 PD Y
Asses. 3 PD Y Y
Examiner 5
(Sex)Sentence Correct Match Conclusions Correct YorN Audio YorN
(F)She Hates... KS
Asses. 1 KS Y
Asses. 2 ES N
Asses. 3 KS Y Y
(F)Few Boys... ES
Asses. 1 ES Y
Asses. 2 ES Y
Asses. 3 ES Y Y
33


(F)Foot Met... JS
Asses. 1 KM N
Asses. 2 KM N
Asses. 3 KM N Y
(F)Vote For... KM
Asses. 1 ES N
Asses. 2 ES N
Asses. 3 ES N Y
(M)She Hates... LL
Asses. 1 LL Y
Asses. 2 LL Y
Asses. 3 LL Y N
(M)Few Boys... JH
Asses. 1 NZ N
Asses. 2 JH Y
Asses. 3 JH Y N
(M)Foot Met... NZ
Asses. 1 NZ Y
Asses. 2 PD N
Asses. 3 LL N Y
(M)Vote For... PD
Asses. 1 PD Y
Asses. 2 PD Y
Asses. 3 PD Y Y
34


REFERENCES
1. Gray, C.H.G.; Kopp, George A. Voiceprint Identification. Report
presented to the Bell Telephone Laboratory, Inc. (1944) 1-14.
2. Potter, Ralph K.; Kopp, George A.; Kopp, Harriet G. Visible
Speech. New York: Dover Publications, 1966. (Reprint from 1947
edition.)
3. Kersta, Lawrence G. Voiceprint Identification. Nature 196 (1962)
1253-57.
4. Young, Martin A.; Campbell, Richard A. Effects of Context on
Talker Identification. Journal of the Acoustical Society of America
42 (1967) 1250-54.
5. Stevens, Kenneth N.; Williams C.E.; Carbonell, J.R.; Woods,
Barbara. Speaker Authentication and Identification: A Comparison
of Spectrographic and Auditory Presentations of Speech Material.
Journal of the Acoustical Society of America 44 (1968) 1596-1607.
6. Bolt, Richard H.; Cooper, Franklin S.; David, Edward E. Jr.; Dens,
Peter B.; Pickett, James M.; Stevens, Kenneth N. Speaker
Identification by Speech Spectrograms: A Scientists View of its
Reliability for Legal Purposes. Journal of the Acoustical Society of
America 47 (1970) 597-612.
7. Tosi, Oscar; Oyer, Herbert; Lashbrook, William; Pedrey, Charles;
Nicol, Julie; Nash, Ernest. Experiment on Voice Identification.
Journal of the Acoustical Society of America 51 (1972) 2030-43.
35


8. Hazen, Barry. Effects of Differing Phonetic Contexts on
Spectrographic Speaker Identification. Journal of the Acoustical
Society of America 54 (1973) 650-60.
9. Hollien, Harry. Peculiar Case of Voiceprints. Journal of the
Acoustical Society of America 56 (1974) 210-13.
10. Reich, Alan R.; Moll, Kenneth L.; Curtis, James F. Effects of
Selected Vocal Disguises upon Spectrographic Speaker
Identification. Journal of the Acoustical Society of America 60
(1976)919-25.
11. Reich, Alan R.; Duke, James E. Effects of Selected Vocal
Disguises upon Speaker Identification by Listening. Journal of the
Acoustical Society of America 66 (1979) 1023-28.
12. Koenig, Bruce E. Spectrographic Voice Identification: A Forensic
Survey. Journal of the Acoustical Society of America 79 (1986)
2088-91.
13. McDermott, Michael C.; Owen, Tom; McDermott, Frank M. Voice
Identification: The Aural/Spectrographic Method. Owl
Investigations, Inc. 1996.
au ral_spect rograph ic/f u I Itext. htm l>
14. American Board of Recorded Evidence- Voice Comparison
Standards. From a meeting of the American Board of Recorded
Evidence of the American College of Forensic Examiners. San
Diego, CA. December, 1996. Available on the World Wide Web at
voice id/standards.html>
36


15. Koenig, Bruce E.; Lacey, Douglas S.; Herlod, Noel. Equipping the
Modern Audio-Video Forensic Laboratory. Forensic Science
Communications 5 (2003).
16. Smith, Jeffrey M.; Fanberg, Bradd; Wright, Rebecca.
Spectrographic Analysis of Vocal Alterations Audio Forensics in
the Digital Age: The Proceedings of the AES 26th International
Conference. New York: Audio Engineering Society, 2005.
17. Poza, Fausto; Begault, Durand R. Voice Identification and
Elimination Using Aural-Spectrographic Protocols Audio Forensics
in the Digital Age: The Proceedings of the AES 26th International
Conference. New York: Audio Engineering Society, 2005.
37