Citation
A method for performance verification of freeware HASH utilities using MATLAB

Material Information

Title:
A method for performance verification of freeware HASH utilities using MATLAB
Creator:
Jacobson, Sean Robert ( author )
Place of Publication:
Denver, CO
Publisher:
University of Colorado Denver
Publication Date:
Language:
English
Physical Description:
1 electronic file. : ;

Subjects

Subjects / Keywords:
Hashing (Computer science) ( lcsh )
Computer security ( lcsh )
Computer security ( fast )
Hashing (Computer science) ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Abstract:
HASH values are the result of a cryptographic HASH function that can be used for security or file authentication. Due to the rise of file exchange on the internet this value has become even more important when working with digital evidence. In recent years freeware programs have become available online for both commercial and personal use to check for this value. However, there has not yet been a set way to verify these programs and to check that the results they are giving is in fact accurate. This thesis proposes a series of tests that allows for a user or agency to check the program and see if it is indeed accurate.
System Details:
System requirements: Adobe Reader.
Thesis:
Recording arts
General Note:
College of Arts and Media
Statement of Responsibility:
by Sean Robert Jacobson.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
879676780 ( OCLC )
ocn879676780

Downloads

This item has the following downloads:


Full Text
A METHOD FOR PERFORMANCE VERIFICATION OF FREEWARE HASH
UTITLIES USING MATLAB
by
SEAN ROBERT JACOBSON
B.S., University of Colorado Denver, 2011
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Master of Science
Recording Arts
2013


This thesis for the Master of Science degree by
Sean Robert Jacobson
has been approved for the
Recording Arts Program
by
Catalin Grigoras, Chair
Jeff Smith
Leslie M Gaston Bird
Date: 7/11/2013


Jacobson, Sean Robert, M.S., Recording Arts
A Method for Performance Verification of Freeware HASH Utilities Using MATLAB
Thesis directed by Associate Professor Catalin Grigoras
ABSTRACT
HASH values are the result of a cryptographic HASH function that can be used for security
or file authentication. Due to the rise of file exchange on the internet this value has become
even more important when working with digital evidence. In recent years freeware
programs have become available online for both commercial and personal use to check for
this value. However, there has not yet been a set way to verify these programs and to check
that the results they are giving is in fact accurate. This thesis proposes a series of tests that
allows for a user or agency to check the program and see if it is indeed accurate.
The form and content of this abstract are approved. I recommend its publication.
Approved: Catalin Grigroas
m


ACKNOWLEDEMENTS
Firstly I want to say thank you to my mom and dad, Teri and David Jacobson. You put me
through school and kept encouraging me. This thesis wouldnt be possible without your
support. Secondly to my teachers Jeff Smith and Catalin Grigoras. For giving me great
advice on where to find information, putting up with frustrations, and being there when I
needed help, thanks. Leslie Gaston Bird a long time teacher for being on my committee
and reviewing my multiple drafts. And to my co-workers at the Bookworm, for putting up
with me during the final process and my stressful moments. Thanks so much for
everything!
IV


TABLE OF CONTENTS
CHAPTER
I. INTRODUCTION.........................................1
What are HASH Values.................................1
Common HASH Algorithms...............................4
Problems with HASH Values............................6
NIST and NIJ work with HASH..........................8
II. PROPOSED FRAMEWORK AND EXPERIMENTAL PROCESS.........10
III. CONCLUSION..........................................25
BIBLIOGRAPHY.............................................26
APPENDIX.................................................27
v


LIST OF TABLES
Table
1: Excel Document MD5 Results................................................15
2: Excel Document SHA-1 Results..............................................15
3: JPEG Image File MD5 Results...............................................16
4: JPEG Image File SHA-l Results.............................................16
5: PDF File MD5 Results......................................................17
6: PDF File SHA-1 Results....................................................17
7: Text File MD5 Results.....................................................18
8: Text File SHA-1 Results...................................................18
9: Word Document MD5 Results.................................................19
10: Word Document SHA-1 Results..............................................19
11: WAVE File MD5 Results....................................................20
12: WAVE File SHA-1 Results..................................................20
13: MP3 File MD5 Results.....................................................21
14: MP3 File SHA-1 Results...................................................21
15: WMA File MD5 Results.....................................................22
16: WMA File SHA-1 Results...................................................22
17: AVI Video File MD5 Results...............................................23
18: AVI Video File SHA-1 Results.............................................23
19: QuickTime Video File MD5 Results.........................................24
20: QuickTime Video File SHA-1 Results.......................................24
vi


LIST OF FIGURES
Figure
1-1 The Original File and its HASH..........................................2
1-2 The Copy of the Orignal File and its HASH...............................3
1-3 The Original File and its HASH...........................................3
1-4 The Copied File Altered..................................................4
1-5 WHIRLPOOL Algorithm Diagram..............................................6
1-6 The Letter of Recommendation............................................7
1- 7 The Altered Document...................................................7
2- 1 Excel Spreadsheet Example..............................................12
2-2 All HASH Values Match...................................................12
2-3 All HASH Values Match, columns..........................................13
2-4 HASH Value Differences..................................................13
2-5 HASH Value Differences, column..........................................14
vii


CHAPTERI
INTRODUCTION
In the emerging field of media forensics digital files are an important part of evidence
analysis. When working with a file such as video or an image file it is important for a
forensic examiner to know that the file that they are working with is the file that was
originally collected at the crime scene. In order to verify evidence integrity upon seizure
and throughout processing, HASH values should be calculated and compared.
In todays society the internet has opened up avenues to free exchange of information. This
has also caused some problems with knowing whether or not the file received is the original
file or if there have been alterations made to it. The HASH value is a tool used to determine
whether or not the files contents is the same or if it is different. There are many different
kinds of programs that allow for a person to check the HASH value, some of them have
been authenticated by the forensic community and some are freeware programs. In this
paper eight different freeware programs are compared to two forensically sound programs
to determine whether or not they function the same.
What are HASH Values?
A HASH value is a product of cryptographic HASH function used to authenticate a digital
file [1], The HASH function will take information from the digital file and run through an
algorithm to generate the value. In digital media each digital file should have a unique
HASH value. No other digital file should have the same HASH value unless it is a clone.
[2]
1


A HASH value should be unique to the file as it currently is. If a person were to make
alterations to the file the value would also be changed. In digital forensics the HASH value
has come to be more important with the rise in file exchange sites and download utilities
on the internet. Below is an example of this using a program called WinHex to check for
the HASH value. In Figure 1-1 and Figure 1-2 an example of this is given. A text file called
Test File was created and placed in WinHex to get its HASH value. Figure 1-1 is the
original file and Figure 1-2 is its copy.
WinHex [Test File.txt]
ggj File Edit Search Navigation View Tools Specialist Options Window Help
& y m is1 & o % m % s? m ?;E. & m - -a 4= 1 ji % o p
2


WinHex [Test File Copy.txt]
lygg File Edit Search Navigation View Tools Specialist Options Window Help
D tfHifiS .o ife ffl Ote & M -< -B <$= n
File Edit
Test File.txt Test File Copy.txt
Off set
0
9 10 11 12 13
00000000 54 68 69 73 20 69 73 20 61 20 74 65 73 74
00000016 69 6C 65
Figure 1-2: The Copy of the Orignal File and its HASH.
In Figure 1-3 and Figure 1-4 one of the files contents have been changed. This has caused
the HASH value of that file to have also changed.
Test File.txt TestFile-Copy.txt
Of f set
0 1
8 9 10 11 12 13 14 15
00000000 54 68 69 73 20 69 73 20 61 20 74 65 73 74 20 66 This is a test f
00000016 69 6C 65 ile
Figure 1-3: The Original File and its HASH.
3


Test File.txt Test File -Copy.txt
Of fset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
00000000 54 68 69 73 20 69 73 20 61 6E 20 61 6C 74 65 72 This is an alter
00000016 65 64 20 74 65 73 74 20 66 69 6C 65 ed test file
MD5 (128 bit)
..for Test File Copy.txt:
59988623B1E D 84340CE 24D CB E 4A84F64|
Qose
Figure 1-4: The Copied File Altered.
When working with digital information it is important for an examiner to know that the file
they are working with is the original. A HASH value, when checked, gives the examiner
the assurance that this is indeed a bit-for-bit copy and accurately reflects the original
evidence. The tables in Chapter two are the results of testing for errors in multiple HASH
checker programs.
Common HASH Algorithms
There are many different kinds of HASH algorithms used today. Some of the most common
have been CRC32, Adler32, MD5, SHA1, and WHIRLPOOL.
The cyclic redundancy check (CRC) was first invented in 1961 by W. Wesley Peterson, an
American mathematician and computer scientist. The CRC operates by working on
multiple blocks of data at one time. However, the CRC is not a cryptographic function. It
is a linear function. This means that there is a series of steps that the function needs to go
through in order to find the errors. The following is an example of a CRC algorithm [3]:
F(X) = Xn-kG(X) + R(X) = Q(X)P(X),
4


The Adler32 algorithm was developed by Mark Adler in 1995. Here the algorithm
calculates two 16-bit checksums and then links them together to form a 32-bit result. The
problem with this is that if the data to be analyzed (called a message) is too small than
errors will occur [4],
The Message-Digest-Algorithm 5 (MD5) is a cryptographic function designed by Professor
Ronal Rivest in 1991 as a stronger version, more secure version of MD4. The algorithm
breaks down the message into 512-bits and processes it through the following function [5]:
Hi+1 0 The Secure HASH Algorithm (SHA-1) is another cryptographic function developed by the
National Institute of Science and Technology (NIST) as a processing standard. The
algorithm has been widely used in government and industry security checks. The algorithm
also processes messages in 512-bit blocks. The following is an example of the SHA-1
algorithm [6]:
mi = (mi-3 mi-8 mt-u m^ie) << L
WHIRLPOOL is one of the first freeware algorithms in the market today. Designed by
Paulo S. L. M. Barreto, a cryptographer from Brazil, it is a 512-bit hashing function that
works with messages less than 2256 in length. Figure 1-5 is a diagram of the algorithm [7]:
5


in-
H
i-l
W

Figure 1-5: WHIRLPOOL Algorithm Diagram.
Source: http://www. larc. usp. br/~pbarreto/WhirlpoolPage.html
Problems with HASH Values
Some of the most commonly used HASH values are the MD5 and SHA-1. However, it is
known that neither HASH algorithm is infallible. In 1993 collisions were found that caused
some concern with the MD5s algorithm [8], Collision means that two different documents
when put through the HASHS algorithm will come up with the same value. Thus new
algorithms have been created to improve upon security. Newer values like SHA-2,
WHIRLPOOL and Tiger-192 have not caused problems just yet in testing.
A famous example of HASH collision is The Story of Alice and her Boss, created by
Magnus Daum and Stefan Lucks as a way of illustrating how two files can have the same
HASH value. Alice is an intern with Caesar and brings a letter of recommendation for
Caesar to sign [9], The HASH value is shown in Figure 1-6.
6


Ietter_af_rec.ps
12 13 14 15
2E 30 0D 0A KIFS-Adobe-l.0
78 3A 20 30 XFJBoundingBox: 0
20 20 20 20 0 612 792
20 0D 0A 28 (
E5 9C Cl A7 eB|jbM aQ|_izra|A§
B1 FB 7F 08 /£ | F~^A T >u
13 A3 78 52 Em3 uSa[i I £xR
52 84 79 EB !Z3t6| I nEZRlye
OF 0B D9 CA /Vi loWav : id.- tl£
D4 36 IE 20 psYE2o} ,^06
86 D8 6B C8 yi;1io}e£6 Sl0kE
94 B5 03 90 oloNa | 11' IoIjj
5F F8 E5 9C )(eB|jM a0|_Bal
54 3E B1 FB A§/£- 1 F~U T>u
AO 83 13 A3 Ein3 uSa[i 1 £
45 5A 52 84 xR!Z3l6l 1 nEZRl
PIT A A np np rra /Xi. 1 ^T.Tatr 'Hi
Of fset
8 9 10 11 12
0000 0 Jinn
71____cn 99 7F| ,11___SJ___&£_____99 cc on o
0000C MD5 (128 bit)
OOOOC
OOOOC
0000C
OOOOC
OOOOC
OOOOC
OOOOC
OOOOC
OOOOC
..for letter_of_rec.ps:
A25F7F0B 29E E 0B 3968C8S0738533A4B 9|
Close
UUUULA^^^ .j
00000192 ' 29 28 E8 42 A6 6A DE 4D 00 E0 D5 89
00000208 Cl A7 2F CA B7 97 0A 46 7E AA CO 03
00000224 7F 08 45 6D 33 05 01 FC 53 E0 5B EF
00000240 78 52 ED 5A 33 CE 36 99 0D 9C 07 6E
nnnnn^ct 7 Q PP ?P pn np QC PC C 7 PC 7 £ 7 A p r
Figure 1-6: The Letter of Recommendation.
In the scenario Alice also wants to have a security clearance and sets up an algorithm so
that two documents will have the same MD5. The altered document for security is shown
below in Figure 1-7.
letter_of_nec.ps
onder.ps
Offset ,01234567 8 9 10 11 12 13 14 15
31 2E 30 0D 0A
6F 78 3A 20 30
20 20 20 20 20
20 20 0D 0A 28
F8 E5 9C Cl A7
3E B1 FB 7F 08
83 13 23 79 52
DA 52 84 79 EB
AA OF 0B D9 CA
76 D4 36 IE 20
53 86 58 6B C8
EF 94 B5 03 90
89 5F F8 E5 9C
00000208 Cl A7 2F CA B7 97 0A 46 7E AA CO 03 54 3E B1 FB
00000224 7F 08 45 6D 33 05 01 FC 53 E0 5B EF A0 83 13 A3
nnnnn74n 7fi 97 FPi qi T9 nr 96 qq nn qr n? 6F 49 94 97 R4
MD5 (128 bit)

or
0(
0!
Of
Of
Of
Of
Of
Of
Of
Of
Ofruuui/b 1 54' Ft If 7C BS BU WO c/
00000192 29 28 E8 42 A6 6A DE 4D 00 E0 D5
...for order.ps:
A25F7F0B29E E 0B3968C8G0733533A4B 8
Close
% I PSAdobe1.0
%5iBoundingBox: 0
0 612 792
(
eB|jbM aO|_sa|AS
/£ F~Sl T>u |=|
Em3 iiSa[i I #yR
iZ3l6l | nEfjRlye
lo¥av:id^ tf£
jjsYH2o} 1 £v06
yi;Iio}e£6 SlXkE
oloNa | I I Ii Ijj
)(eB|jM! aO|_sa|
A§/£-I F~ai T>u
Em 3 iiSa[i I £
vnS71ff,l V.K7PI
Figure 1-7: The Altered Document.
How could this happen? Two files are not supposed to have the same HASH value right?
7


In theory a HASH algorithm is supposed to be collision proof. Two cryptographers, Bert
von Boer and Antoon Bosselaers, published a paper in 1994 explaining an algorithm that
could find collisions within the MD5 algorithm. In the paper they talk about how the search
algorithm can search for collisions in the MD5 algorithm by going through four different
rounds [9], After its publication a group of scientists took it a step further.
Xiaoyun Wang and Hongbo Yu are both scientists at Shandong University China. In 2005
they published a paper about the different ways to create collisions and break HASH
functions such as MD5. Their results showed how easily the algorithm could be broken by
putting a file through different test rounds and generating the result [5], Wang and Yu went
on to publish another paper about collision attacks against SHA-1 with Yiqun Lisa Yin
later in 2005 [6],
NIST and NIJ work with HASH
There has been significant work done with HASH values aside from testing for collisions.
The National Institute of Justice (Nil) began work with the National Software Reference
Library (NSRL) to create a program that would help a computer forensic examiner [10],
When working on a case with digital files a computer forensic specialist must determine
what files are the most important for analysis. The new Reference Data Set (RDS) contains
software profiles that can help an examiner find these files. In this dataset a file is given a
profile and a HASH value unique to the dataset allowing for faster results. This is an
ongoing project and is continually updated [11],
Because of the need for security, there is a need for collision proof algorithms. The
National Institute of Science and Technology (NIST) had a competition in 2007 for the
8


generation of the SHA algorithm, SHA-3. Sixty-four submissions were received for the
first round and were narrowed down to five finalists with the help of public opinion. The
hope was that after the selection the algorithm would be able to be collision-proof for at
least twenty years. The finalists were placed through a series of tests that would test for
strength and compatibility. The algorithms not only had to be able to handle large
messages, but shorter ones as well [12], In 2012 the algorithm Keccak, designed by Guido
Bertoni et al., was chosen as the new SHA-3 algorithm.
When using a HASH algorithm through a software program it is important for the user to
know that they are getting accurate results. There is some good forensically sound software,
like WinHex, that is used by law enforcement when analyzing cases. Unfortunately, this
software is often limited to only law enforcement use. Freeware software allows for
personal and professional use. This paper proposes a way to test some of these programs
for validity.
9


CHAPTER II
PROPOSED FRAMEWORK AND EXPERIMENTAL PROCESS
In the forensic sciences, it necessary to employ techniques and tools that are known to
generate repeatable and reproducible results. This means that the testing procedure must
be able to generate the same results when done by another person. During testing a series
is run more than once in order to make sure that accuracy is maintained without bias to
allow for reliability and consistency.
The following is a proposal when testing different freeware HASH programs for validity
and accuracy.
In this series of tests ten different kinds of file types were used. The files were chosen based
on commonality and what can be easily found in a personal computer. For each file type
one hundred files were created. The files were numbered from one to one hundred.
Example: BookOOl.xlsx, this is the first file for the Excel Documents. The file types made
were:
Excel Documents
Word Documents
JPEG Image Files
Notepad Text File
PDF Document
MP3 Audio File
WAVE Audio Fie
WMA File
10


AVI Video File
QuickTime Video File
Once each file was created ten different kinds of HASH checker utilities were used to check
both the MD5 and SHA-1 HASH values. Eight of the programs were freeware programs
that can be found and downloaded from the internet and two programs were forensically
sound programs commonly used by forensic examiners. The reason for this was to provide
the ground truth when comparing HASH values to each other. These values will be called
root values. The programs used were:
FTK, forensically sound program
WinHex, forensically sound program
Advanced Hash Calculator (AHC)
Arpoon
Febooti Hash-CRC
HashTab
HashGenerator
MD5-SHA1 Hash Utility
IgorWare Hasher
SFVNinja
11


Each of the different file types were put into a program one at a time and then run three
times to for later comparison. The value was copied and placed into an Excel spreadsheet.
Three columns were labeled Test 1, Test 2, and Test 3 for each of the different runs.
1 File Name
2 BookOOl
3 Book002
4 8ook003
5 Book004
6 Book005
7 Book006
8 Book007
9 8ook008
10 Book009
44 n i.m n
_________________B________________
_______________Testl______________
a2363b2057eeb8f5f33a2150ee380fl4
3c5fb96eed88d02aa6c800351a5fd32e
0294a1546a21f54368f58b0e 25153925
61dc6135505a045c3f496e93546b9dbl
8e60cea5d634646bbd69794275084dl3
Ifa78edadab9c018ae42ade44e3f6246
281dlac905cll6d497eeblae0e6872f7
ef49e68d7flda885dcdfcf6837931867
2921c6466bb35e4005b0dfaf9bl7861a
I Test 2
a2363b2057eeb8f5f33a2150ee380fl4
3c5fb96eed88d02aa6c800351a5fd32e
0294a1546a21f54368f58b0e25153925
61dc6135505a045c3f496e93546b9dbl
8e60cea5d634646bbd69794275084dl3
Ifa78edadab9c018ae42ade44e3f6246
281dlac905cll6d497eeblae0e6872f7
ef49e68d7flda886dcdfcf6837931867
2921c6466bb35e4005b0dfaf9bl7861a
Figure 2-1: Excel Spreadsheet Example.
________________D_________________
Test 3
a2363b2057eeb8f5f33a2150ee380fl4
3c5fb96eed88d02aa6c800351a5fd32e
0294a1546a21f54368f58b0e25153925
61dc6135505a045c3f496e93546b9dbl
8e60cea5d634646b bd69794275084d 13
Ifa78edadab9c018ae42ade44e3f6246
281dlac905cll6d497eeblae0e6872f7
ef49e68d7flda886dcdfcf6837931867
2921c6466bb35e4005bOdf a f9b17861a
After all the HASH values were gathered and placed into Excel MATLab was used to
compare each of the value to ensure that they were the same. A script created by Catalin
Grigoras was used for this. The full script can be found in the Appendix.
In MATLab each freeware value was compared to the two root values, first FTK then
WinHex. The Excel documents were loaded in MATLab and the script was run. First the
three columns were compared to see if the values matched. If they did a 1 would appear in
the row and a message All the HASH values match would appear.
EDU SeanThesisOl
All the ell HASH values match.
All the cl2 HASH values match.
All the cl3 HASH values match.
All the c21 HASH values match.
All the c22 HASH values match.
All the c23 HASH values match.
All the HASH values match.
/* EDU
Figure 2-2: All HASH Values Match.
12


cl
111
111
111
111
111
111
111
111
111
111
111
111
111
Figure 2-3: All HASH Values Match, columns.
If the values did not match, a 0 would appear in the work and a message Check the HASH
differences would appear.
EDU SeanThesisQl
All the ell HASH values match.
All the cl2 HASH values match.
All the cl3 HASH values match.
All the c21 HASH values match.
All the c22 HASH values match.|
All the c23 HASH values match.
Check the HASH differences.
fx EDU |
Figure 2-4: HASH Value Differences.
13


cl
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Figure 2-5: HASH Value Differences, column.
If a difference was detected the file would be run again through that HASH program and
then run through MATLab again. If a value of 0 appeared again then the file was run
through the program once more as well as MATLab. If the value was again 0 then it would
be assumed that there was something wrong within the freeware HASH checker and it
would fail the test.
Once all three columns had matching values the freeware file values were compared to the
root value. If the values all matched a value of 1 was given and a message All the HASH
values match would appear. If the values did not match a value of 0 was given and a
message Check the HASH differences would appear. If a difference was detected the file
would be run again through that HASH program and then run through MATLab again. If
a value of 0 appeared again then the file was run through the program once more as well
as MATLab. If the value was again 0 then it would be assumed that there was something
14


wrong within the freeware HASH checker. Results were recorded in an Excel spreadsheet.
A green X was used to indicate that all files matched and a red O was used to indicate
where files did not match despite being tested three different times.
Table 1: Excel Document MD5 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Flash-CRC X X
FlashTab X X
FlashGenerator X X
MD5 SHA1 X X
IgorWare Flasher X X
SFVNinja X X
Table 2: Excel Document SHA-1 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Hash-CRC X X
HashTab X X
HashGenerator X X
MD5 SHA1 X X
IgorWare Hasher X X
SFVNinja X X
15


Table 3: JPEG Image File MD5 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Flash-CRC X X
FlashTab X X
FlashGenerator X X
MD5 SHA1 X X
IgorWare Flasher X X
SFVNinja X X
Table 4: JPEG Image File SHA-1 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Hash-CRC X X
HashTab X X
HashGenerator X X
MD5 SHA1 X X
IgorWare Hasher X X
SFVNinja X X
16


Table 5: PDF File MD5 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Flash-CRC X X
FlashTab X X
FlashGenerator X X
MD5 SHA1 X X
IgorWare Flasher X X
SFVNinja X X
Table 6: PDF File SHA-1 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Hash-CRC X X
HashTab X X
HashGenerator X X
MD5 SHA1 X X
IgorWare Hasher X X
SFVNinja X X
17


Table 7: Text File MD5 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Flash-CRC X X
FlashTab X X
FlashGenerator X X
MD5 SHA1 X X
IgorWare Flasher X X
SFVNinja X X
Table 8: Text File SHA-1 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Hash-CRC X X
HashTab X X
HashGenerator X X
MD5 SHA1 X X
IgorWare Hasher X X
SFVNinja X X
18


Table 9: Word Document MD5 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Flash-CRC X X
FlashTab X X
FlashGenerator X X
MD5 SHA1 X X
IgorWare Flasher X X
SFVNinja X X
Table 10: Word Document SHA-1 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Hash-CRC X X
HashTab X X
HashGenerator X X
MD5 SHA1 X X
IgorWare Hasher X X
SFVNinja X X
19


Table 11: WAVE File MD5 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Flash-CRC X X
FlashTab X X
FlashGenerator X X
MD5 SHA1 X X
IgorWare Flasher X X
SFVNinja X X
Table 12: WAVE File SHA-1 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Hash-CRC X X
HashTab X X
HashGenerator X X
MD5 SHA1 X X
IgorWare Hasher X X
SFVNinja X X
20


Table 13: MP3 File MD5 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Flash-CRC X X
FlashTab X X
FlashGenerator X X
MD5 SHA1 X X
IgorWare Flasher X X
SFVNinja X X
Table 14: MP3 File SHA-1 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Hash-CRC X X
HashTab X X
HashGenerator X X
MD5 SHA1 X X
IgorWare Hasher X X
SFVNinja X X
21


Table 15: WMA File MD5 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Flash-CRC X X
FlashTab X X
FlashGenerator X X
MD5 SHA1 X X
IgorWare Flasher X X
SFVNinja X X
Table 16: WMA File SHA-1 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Hash-CRC X X
HashTab X X
HashGenerator X X
MD5 SHA1 X X
IgorWare Hasher X X
SFVNinja X X
22


Table 17: AVI Video File MD5 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Flash-CRC X X
FlashTab X X
FlashGenerator X X
MD5 SHA1 X X
IgorWare Flasher X X
SFVNinja X X
Table 18: AVI Video File SHA-1 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Hash-CRC X X
HashTab X X
HashGenerator X X
MD5 SHA1 X X
IgorWare Hasher X X
SFVNinja X X
23


Table 19: QuickTime Video File MD5 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Flash-CRC X X
FlashTab X X
FlashGenerator X X
MD5 SHA1 X X
IgorWare Flasher X X
SFVNinja X X
Table 20: QuickTime Video File SHA-1 Results
Checker Name FTK WinHex
FTK X X
WinHex X X
AHC X X
Arpoon X X
Febooti Hash-CRC X X
HashTab X X
HashGenerator X X
MD5 SHA1 X X
IgorWare Hasher X X
SFVNinja X X
24


CHAPTER III
CONCLUSION
As seen in the previous tables there were no errors when gathering the HASH values from
these freeware programs. However, this represents only a fraction of the different kinds of
programs available online for download off of the internet. It is recommended that new
programs are tested in a similar way, if not the same way, before they are considered to be
verified. Testing other HASH algorithms, such as WHIRLPOOL, can help in the
verification process. As a reminder MD5 and SHA-1 used together helps to safeguard
against errors when safeguards are checked.
Freeware software is something that can be used to verify digital evidence from a working
case. Proper protocol for handling digital files should still be followed depending on the
Standard Operating Procedure (SOP) setup by a department. It is still recommended to
double check the HASH values with a forensically based program like FTK, but when
gathering evidence at a crime scene freeware programs can be used.
This thesis has proposed a way of testing freeware software for errors, but not for collisions.
Future tests can be done with the dataset to see if collisions occur when the HASH of each
file is compared to the other.
25


BIBLIOGRAPHY
1. What is a Hash Function? 2012, EMC Corporation. 7 Jul 2013

2. Lu, Wenjun, Avinash L. Varna and Min Wu. "Forensic Hash for Multimedia
Information." Media Forensics and Secutiry 7/7541 (2010).
3. Peterson, William Wesley, and Daniel T. Brown. "Cyclic codes for error
detection." Proceedings of the IRE 49.1, 1961: 228-235.
4. Maxino, Theresa. "Revisiting Fletcher and Adler Checksums." (2006).
5. Wang, Xiayon, and HongBo Yu. "How to Break MD5 and Other Hash Functions."
Lecture Notes in Computer Science 3494 (2005): 19-35
6. Wang, Xiaoyun, Yiqun Lisa Yin, and Hongbo Yu. "Finding collisions in the full
SHA-1." Advances in Cryptology-CRYPTO 2005. Springer Berlin Heidelberg, 2005.
7. The WHIRLPOOL Hash Function. Ed. Paulo S. L. M. Barreto. 2008. 7 Jul 2013

8. den Boer, Bert, and Antoon Bosselaers. "Collisions for the compression function of
MD5." Lecture Notes in Computer Science 765 (1994): 293-304
9. Daum, Magnus and Stefan Lucks. Hash Collisions (The Poisoned Message Attack):
"The Story of Alice and her Boss". 26 May 2013 mannheim.de/people/lucks/HashCollisions/>
10. National Institute of Standards and Technology (NIST), and United States of
America. "Forensic Examination of Digital Evidence: A Guide for Law
Enforcement." 2004
11. National Software Reference Library. 20 Aug. 2003. National Institute of Software
and Technology. 7 Jul. 2013.
12. Turan, Meltem Sonmez et al. Status Report on the Second Round of the SHA-3
Cryptographic Hash Algorithm Competition. NIST Interagency Report 7764, 2011.
26


APPENDIX
HASH Programs
WinHex
£ WinHex [Thesis Audio 001.mp3] J

J2[ File Edit Search Navigation View Tools Specialist Options Window Help
QGJH§i?&
ft, 1 4 & ft $ % $ Ml p S o di
Thesis Audio 001 np3

Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
00000000 49 44 33 04 00 00 00 00 00 17 54 S3 53 45 00 00 ID3 T5SE
00000016 00 0D 00 00 03 4C 61 76 66 35 32 2E 39 33 2E 30 Lavf 52.93.0
00000032 00 FF FB B0 44 00 0C 83 A9 54 43 13 09 1A FO 79 yuB |TC 3y
00000048 8F 07 A0 3C 63 6E 52 B5 8B 18 4C A5 OD C2 4B 3C 00000064 E0 18 F3 19 B8 01 01 95 ID 2D FA CO 29 AO 50 06 a o | -uA) P
00000080 62 2A 05 CE 84 B0 CO 44 63 40 60 6C 01 9F 41 71 b* IriDcn |Aq
00000096 23 6F 54 40 81 8F 6D D5 A3 7A 91 6E 33 92 8C 2E #oT@ t5iz'n3'|.
00000112 8E 90 23 47 3E BB 75 08 4F DO 40 02 00 08 4E B9 1 #G>u QD@ N1
00000128 C2 E0 B3 FA EE EF A0 01 3F F9 DO EF C4 CF 4D DC iaTuii ?uBiAIMij
00000144 DC SB 92 22 10 71 67 FI 38 95 DC 42 21 39 C4 24 U['" qgn8|tIB!9A5
00000160 44 FC FD DD CE BB BB 96 9F A7 ID DC 38 BF OF C3 DSyflIIS t8i I
00000176 13 EE CB BF AA 50 E7 FF E7 35 8F C5 3A 8D FB F6 iE£JPQy$5 A: uo
00000192 35 7C 7F 7F 4A 53 4E EE 94 OC SB C4 42 40 82 OB S| JSNil [ABf|
00000208 E0 41 0E 06 2F 0E 06 EE 6E 7B 9F A2 44 E2 7F 00 aA / ia{|cDa
00000224 C5 FB BC 2F 4D F4 44 43 FD FD DC BC AB FD 08 22 Au^/M6DCyyti^y "
00000240 87 03 73 88 06 06 06 2C CA C2 EB BB 81 BB 9B 9F 1 s| ,£Ae ||
00000256 FB BB A7 B9 D7 70 E2 09 DC E7 E9 45 9E 84 88 6F u§xpa UqeEl I o
00000272 FA 7F AO 00 82 C2 26 E7 FC 28 8E 6E 61 04 17 D3 u | A&gu (| na. 6
00000288 88 4E 7F F5 E8 18 18 IB FC 4D 3D IF 44 BE E6 EE |N oe uM= DJieei
00000304 EE 10 00 40 0B 57 9D 07 82 B2 B8 A9 8B A8 59 F2 i @
00000320 DE B1 94 AC 6E E9 9A A9 97 E3 66 8A 04 A6 06 IB fc+|-ne|@|af! |
00000336 12 OF 94 6D 22 86 5B 2E 52 C8 E2 26 OA 12 A9 91 |ni"|[ REa& '
00000352 6B 08 D9 44 F5 8F A8 82 79 27 C2 4D CA OD EE DD k (IDS ly1 AMfi if
00000368 21 53 FE C3 El 0A 66 1C EA 59 15 11 6C 15 FE C5 !Sb& i a 1 ti
00000384 11 C4 41 39 56 E3 4E B1 AA 2B 10 4E 3D E2 68 4A AA9VaN+*+ N=ahJ
00000400 26 61 95 4D 23 8A 1C BD 17 56 6D 77 64 D9 CA FO ialBl H VwdMS
00000416 3A DA 26 52 D5 D1 4F 10 6C BE 8A SB 22 AC B4 19 :04BE)
00000432 71 8F EB 96 77 47 72 AD 3A DE 13 6A 84 C7 05 96 q e|wGr-:t= jlC I
00000448 07 0A 82 22 1A 3B FE 9D 03 F6 OD 08 01 12 C3 8C ln 6 l\
00000464 EA 2E 2D BC 43 9E B3 C5 7C 4D CD 33 07 D6 El 16 e.-kCl3A|Ml3 04
00000480 3A 11 4F SB 64 B6 DO 19 ED 4D 4A CE 81 8E 08 4F ; Old iliJl O
00000496 AC E2 0C 99 A0 C3 0E 2C 71 BE A3 12 54 35 F5 05 -a 1 1 ,q|£ T5o
00000512 45 99 E9 9D A2 ED ED 46 20 7B DD B6 1C 45 05 6A Ele ciiF {? E j
00000528 47 1C CF 79 A3 D6 A1 D1 A4 6E EF 29 CE 87 EE 83 G Iy£OiSni)llil
00000544 B4 B4 C4 DD 67 7C 35 E7 77 72 3F 34 AF AB EB 79 ''A?g|5g¥r?4-ey
00000560 BD 68 77 F7 F7 65 El 33 FC 36 D1 OD 86 83 2F 4A fchw-H-ea3u6S I |/J
00000576 A2 37 36 F5 EA 61 B6 3A AD 0B 08 BB AB Cl 06 E9 c76oeaf:- A e
00000592 FE 35 2D A9 7D 8C 39 9E CA 3B 21 1C 00 40 14 92 t>5-8}[9l£;! 1 '
00000608 72 05 B0 46 C7 FI C2 4D E0 42 2B A9 9E 8F 58 93 r Ti;aiH4B+8| K|
Page 1 of 386 Offset: 0
[unregistered}
Thesis Audio 001 jnp3
J:\TTiess\TestJ3ocunerts\N
File size: 235 KB
240,777 bytes
DOS name: THESIS'1.MP3
Default Edit Mode
State: original
Undo level: 0
Undo reverses: n/a
Creation time: 04/12/2013
15:04:29
Last write time: 04/09/2013
20:53:07
Attributes: A
Icons: 0
Mode: hexadecimal
Character set: CP 1252
Offsets: decimal
Bytes per page. Windowtt No. of windows: 39(16=624 1 1
CSpboard: available
TEMP folder 448 G8 free
NJA~1\AppData\Locai\Temp
Data Interpreter [s
SBt$ isat(; 32Bt(i 73 17481 70468681
27


IgorWare Hasher
0 IgorWare Hasher
&
Options Help
File Text
Browse
Verification data
SHA-1: | Verification file not found y
MD5: | Verification file not found
CRC32: | Verification file not found
Hash Result
g] SHA-1:
[V] MD5:
CRC32:
Copy Save
Ready.
Calculate
MD5-SHA1 Utility
28


Advanced Hash Calculator
Advanced Hash Calculator
Is)
File Action Options Help
L? X 3 .1. a
New List Add Files Clear Calculate Settings About
Selected Files
| Hash
Drag and drop files from Windows Explorer or dick [Add Files] button
Hash: SHA-1
//.
29


Febooti HASH-CRC
30


HashTab

BookOOl.xIsx Properties
General | Hash / CRC" File Hashes Security Details | Previous Versions
Name Hash Value
CRC32 CFS7EA63
MD5 A2363B2D57EEB 3F5F33A2150EE330F14
SHA-1 2E742D FE45EB3A1C2D B A99D E332A6930FEC234...
Settings
Hash Comparison:
Compare a file...
HashTab v5.1.0 :: 2010 Implbits Software [http:/Vimplbits.com]
[ OK
Cancel
Apply
31


Hash Generator
^ Hash Generator www.SecurityXploded.com
Han

Hash Generator
All-in-one Hash Generator Software
Show Help About
Select Input Type: File Text
Select File Location: C: \windowsViotepad.exe
Generate Hash
.............tt
Hash Type
Hash Length Hash Value
Download More Free Tools from SecurityXploded
Report
32


MATLab Script
clear
% cd directory
cd J:\Thesis\Test Documents!
%cd J:\Thesis\Test Documents\Excel Files\Excel Program HASHes\SHAl
% cd
%Get root file
[ExcelRoot, Root] = uigetfile ('*.xlsx', 'Select Root Excel File');
addpath (Root);
%Get comparison file
[ExcelCompare, Compare] = uigetfile('*.xlsx', 'Select Comparison Excel
File');
addpath (Compare)
% read the Excel files
%[seanOl,txtOl]=xlsread('Checksums.xlt') ;
%[sean02,txt02]=xlsread('HASH CRC.xlt');
%[seanOl,txtOl]=xlsread('Excel File MD5 HASH FTK.xls');
%[sean02,txt02]=xlsread('Excel File MD5 HASH WinHex.xls');
%MD5
%[seanOl,txtOl]=xlsread('Excel File MD5 WinHex');
%[sean02,txt02]=xlsread(ExcelCompare);
%SHA1
[seanOl,txtOl]=xlsread(ExcelRoot);
[sean02,txt02]=xlsread(ExcelCompare);
% compute the tables lines and columns numbers
[a01,b01]=size(txtOl);
[a02,b02]=size(txtOl);
% compare the HASH values of the Testl...Test3 columns
% txtOl
for kl=2:aOl
ell(kl-1)=strcmpi(txtOl(kl,2),txtOl(kl, 3) ) ;
cl2(kl-1)=strcmpi(txtOl(kl,2),txtOl(kl, 4) ) ;
cl3(kl-1)=strcmpi(txtOl(kl,3),txtOl(kl, 4) ) ;
end
if prod(ell(:))==1
disp('All the ell HASH values match.')
elseif prod(ell(:))==0
disp('Check the ell HASH differences.')
end
if prod(cl2(:))==1
disp('All the cl2 HASH values match.')
33


elseif prod(cl2(:))==0
disp('Check the cl2 HASH differences.')
end
if prod(cl3(:))==1
disp('All the cl3 HASH values match.')
elseif prod(cl3(:))==0
disp('Check the cl3 HASH differences.')
end
% txt02
for kl=2:a02
c21(kl-1)=strcmpi(txt02(kl,2) txt02(kl, 3) ) ;
c22(kl-1)=strcmpi(txt02(kl, 2) txt02(kl, 4 ) ) ;
c2 3(kl-1)=strcmpi(txt02(kl, 3) txt02(kl,4) ) ;
end
if prod(c21(:))==1
disp('All the c21 HASH values match.')
elseif prod(c21(:))==0
disp ('Check the c21 HASH differences.')
end
if prod(c22(:))==1
disp('All the c22 HASH values match.')
elseif prod(c22(:))==0
disp('Check the c22 HASH differences.')
end
if prod(c23(:))==1
disp('All the c23 HASH values match.')
elseif prod(c23(:))==0
disp ('Check the c23 HASH differences.')
end
% compare the HASH values of two different Excel files
for kl=2:min(aOl,a02)
for k2=2:min(bOl,b02)
cl(kl-1,k2-l)=strcmpi(txtOl(kl,k2),txt02(kl,k2))
end
end
if prod(cl(:))==1
disp('All the HASH values match.')
elseif prod(cl(:))==0
disp('Check the HASH differences.')
end
34


Full Text

PAGE 1

A METHOD FOR PERFORMANCE VERIFICATION OF FREEWARE HASH UTITLIES USING MATLAB by SEAN ROBERT JACOBSON B.S., University of Colorado Denver, 2011 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Master of Science Recording Arts 2013

PAGE 2

ii This thesis for the Master o f Science degree by Sean Robert Jacobson has been approved for the Recording Arts Program by Catalin Grigoras, Chair Jeff Smith Leslie M Gaston Bird Date: 7/11/2013

PAGE 3

iii J acobson, Sean Robert M.S., Recording Arts A Method for Performance Verification of Freeware HASH Utilities Using MATLAB Thesis directed by Associate Profes sor Catalin Grigoras ABSTRACT HASH values are the result of a cryptographic HASH function that can be used for security or file authentication. Due to the rise of file exchange on the internet this value has become even more important when working with digital evidence. In recent years freeware programs have become available online for both commercial and personal use to check for this value. However, there has not yet been a set way to verify these programs and to check that the results they are giving is in fact accurate. This thesis proposes a series of tests that allows for a user or agency to check the program and see if it is indeed accurate. The form and content of this abstract are approved. I recommend its publication. Approved: Catalin Grigroas

PAGE 4

iv ACKNOWLEDEMENTS Firstly I want to say thank you to my mom and dad, Teri and David Jacobson. You put me through school and kept encouraging m support. Secondly to my teachers Jeff Smith and Catalin Grigoras. For giving me great advice on where to find information, putting up with frustrations, and being there when I needed help, thanks Leslie Gaston Bird a long time teacher for being on my committee and reviewing my multiple drafts And to my co workers at the Bookworm, for putting up with me during the final process and my stressful moments. Thanks so much for everything!

PAGE 5

v TABLE OF CONTENTS C HAPTER I. INTRODUCTION 1 What are HASH 1 Common HASH Algo 4 Problems with H 6 NIST and NIJ work II. PROPOSED FRAMEWORK AND EXP ERIMENTAL PROCESS 10 III. CONCLUSIO N 25 BIBLIOGRAPHY 26 27

PAGE 6

vi LIST OF TABLES Table 1: Excel Document MD5 15 2: Excel Document SHA 15 4: JPEG Image File SHA 16 5: 17 6: PDF File SHA 17 18 8: Text File SHA 18 9: Word Document MD5 Results 19 10: Word Document SHA 19 11: WAVE File MD5 20 12: WAVE File SHA 20 21 14: MP3 File SHA 21 22 16: WMA File SHA 22 23 18: AVI Video File SHA 23 24 20: QuickTime Video File SHA 24

PAGE 7

vii LIST OF FIGURES Figure 1 1 The Original File and its HASH 2 1 2 The Copy of the Orignal File and its HASH 3 1 3 The Original File and its HASH 3 1 4 The Copied File Altered 4 1 5 WHIRLPOOL Algorithm Diagram 6 1 6 The Letter of Recommendation 7 1 7 The Altered Document 7 2 1 Excel Spreadsheet Example 2 2 All HASH Values Match 12 2 3 All HASH Values Match, columns 13 2 4 HASH Value Differences 13 2 5 HASH Value Differences, column 14

PAGE 8

1 CHAPTER I INTRODUCTION In the emerging field of media forensics digital files are an important part of evidence analysis. When working with a file such as video or an image file it is important for a forensic examiner to know that the file that they are working with is the file that was originally collected at the crime scene. In order to verify evidence in tegrity upon seizure and throughout processing, HASH values should be calculated and compared. has also caused some problems with knowing whether or not the file re ceived is the original file or if there have been alterations made to it. The HASH value is a tool used to determine whether or not the file is the same or if it is different. There are many different kinds of programs that allow for a person to check the HASH value, some of them have been authenticated by the forensic community and some are freeware programs. In this paper eight different freeware programs are compared to two forensically sound programs to determine whether or not they function the same. What are HASH V alues? A HASH value is a product of cryptographic HASH function used to authenticate a digital file [1 ] The HASH function will take information from the digital file and run through an algorithm to generate the value. In digital media each digital file should have a unique HASH value. No other digital file should have the same HASH value unless it is a clone [2 ]

PAGE 9

2 A HASH value should be unique to the file as it currently is. If a person were to make alterations to the file the val ue would also be changed. In digital forensics the HASH value has come to be more important with the rise in file exchange sites and download utilities on the internet. Below is an example of this using a program call ed WinHex to check for the HASH value. In Figure 1 1 and Figure 1 2 an example of this is given. A text file called Test File was created and placed in WinHex to get its HASH value. Figure 1 1 is the original file and Figure 1 2 is its copy Figure 1 1: The Original File and its HASH.

PAGE 10

3 Figure 1 2 : The Copy of the Orignal File and its HASH I n Figure 1 3 and Figure 1 4 one of the file s contents have been changed. This has caused the HASH value of that file to have also changed. Figure 1 3: The Original File and its HASH

PAGE 11

4 Figure 1 4 : The Copied File Altered When working with digital information it is important for an examiner to know that the file they are working with is the original. A HASH value, when checked, gives the examiner the assurance that this is indeed a bit for bit co py and accurately reflects the original evidence. The tables in Chapter two are the results of testing for errors in multiple HASH checker programs. Common HASH Algorithms There are many different kinds of HASH algorithms used today. Some of the most commo n have been CRC32, Adler32, MD5, SHA1, and WHIRLPOOL The cyclic redundancy check (CRC) was first invented in 1961 by W. Wesley Peterson, an American mathematician and computer scientist. The CRC operates by working on multiple blocks of data at one time However, the CRC is not a cryptographic function I t is a linear function. This means that there is a series of steps that the function needs to go through in order to find the errors. The following is an example of a CRC algorithm [3 ]: F(X) = Xn kG(X) + R(X) = Q(X)P(X),

PAGE 12

5 The Adler32 algorithm was developed by Mark Adler in 1995. Here the algorithm calculates two 16 bit checksums and then links them together to form a 32 bit result. The problem with this is that if the data to be analyzed (called a message ) is t oo small than errors will occur [4 ]. The Message Digest Algorithm 5 ( MD5 ) is a cryptographic function designed by Professor Ronal Rivest in 1991 as a stronger version, more secure version of MD4. The algorithm breaks down the message into 512 bits and processes it through the following function [5 ] : Hi +1 = f ( Hi,Mi ) 0 i t 1 The Secure HASH Algorithm (SHA 1) is another cryptographic function developed by the National Institute of Science and Technology (NIST) as a processing standard. The algorithm has been widely used in government and industry security checks. The algorithm also processes messages in 512 bit blocks. The following is an example of the SHA 1 algorithm [6 ] : mi = ( m i 3 m i 8 m i 14 m i 16 ) << 1 : WHIRLPOOL is one of the first freeware algorithms in the market today. Designed by Paulo S. L. M. Barreto a cryptographer from Brazil, it is a 512 bit hashing function that works with messages less than 2 256 in length. Figure 1 5 is a diagram of the algorithm [7 ]:

PAGE 13

6 Figure 1 5: WHIRLPOOL Algorithm Diagram. Source: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html Problems with HASH Values Some of t he most commonly used HASH values are the MD5 and SHA 1. However, it is known that neither HASH algorithm is infallible. In 1993 collisions were found that caused [8] Collision means that two different documents Thus new algorithms have been created to i mprove upon security Newer values like SHA 2, WHIRLPOOL and Tiger 192 have not caused problems just yet in testing. A famous example of HASH collision Magnus Daum and Stefan Lucks as a way of illustrating h ow two files can have the same HASH value. Alice is an intern with Caesar and brings a letter of recommendation for Caesar to sign [ 9 ] The HAS H value is shown in Figure 1 6

PAGE 14

7 Figure 1 6 : The Letter of Recommendation In the scenario Alice also wants to have a security clearance and sets up an algorithm so that two documents will have the same MD5. The altered document for security is shown below in Figure 1 7 Figure 1 7: The Altered Document. How could this happen? Two files are not supposed to have the same HASH value right?

PAGE 15

8 In theory a HASH algorithm is supposed to be collision proof. Two cryptographers, Bert von Boer and Antoon Bosselaers, published a paper in 1994 explaining an algorithm that could find collisions with in the MD5 algorithm In the paper they talk about how the search algorithm can search for collisions in the MD5 algorithm by going throu gh four different rounds [ 9 ]. After its publication a group of scientists took it a step further. Xiaoyun Wang and Hongbo Yu are both scientists at Shandong University China. In 2005 they published a paper about the different ways to create collisions and break HASH function s such as MD5. Their results showed how easily the algorithm could be broken by putting a file through different test rounds and generating the result [5 ]. Wang and Yu went on to publish another paper about collision attacks against SHA 1 with Yiqun Lisa Yin later in 2005 [6 ] NIST and NIJ work with HASH There has been significant wo rk done with HASH values aside from testing for collisions. The Nat ional Institute of Justice (NIJ ) began work with the National Software Reference Library (NSRL) to create a program that would help a computer forensic examiner [10 ] When working on a case with digital files a computer forensic specialist must determine what files are the most important for analysis. The new Reference Data Set (RDS) contains software profiles that can help an examiner find these files In this dataset a file is given a profile and a HASH value unique to the dataset allowing for faster results. This is an ongoing project and is continually updated [11 ] Because of the need for security, National Institute of Science and Technology (NIST) had a competition in 2007 for the

PAGE 16

9 generation of the SHA algorithm, SHA 3. Sixty four submissions were received for the first round and were narrowed down to five finalists with the help of public opinion The hope was that after the selection the algorithm would be able to be collision proof for at least twenty years The finalists were placed through a series of tests that would test for strength and compatibility. The algorithms not only had to be able t o handle large messages, but shorter ones as well [12 ] In 2012 the algorithm Keccak, designed by Guido Bertoni et al., was ch osen as the new SHA 3 algorithm. When using a HASH algorithm through a software program it is important for the user to know that they are getting accurate results. There is some good forensically sound software, like WinHex, that is used by law enforcement when analyzing cases. Unfortunately, this software is often limited to only law enforcement use. Freeware software allows for pe rsonal and professional use. This paper proposes a way to test some of these programs for validity.

PAGE 17

10 CHAPTER II PROPOSED FRAMEWORK A ND EXPERIMENTAL PROCESS In the forensic sciences, it necessary to employ techniques and tools that are known to generate repeatable and reproducible results This means that the testing procedure must be able to generate the same results when done by another person. During testing a series is run more than once in order to make sure that accuracy is maintained without bias to allow for reliability and consistency. The following is a proposal when testing different freeware HASH programs for validity and accuracy. In this series of tests ten different kinds of file types were used. The files were chosen based on commonality and what can be easily found in a personal computer. For each file type one hundred files were created. The files were numbered from one to one hu ndred. Example: Book001.xlsx, this is the first file for the Excel Documents. The file types made were: Excel Documents Word Documents JPEG Image Files Notepad Text File PDF Document MP3 Audio File WAVE Audio Fie WMA File

PAGE 18

11 AVI Video File QuickTime Video File Once each file was created ten different kinds of HASH checker utilities were used to check both the MD5 and SHA 1 HASH values. Eight of the programs were freeware programs that can be found and downloaded from the internet and two programs were foren sically sound programs commonly used by forensic examiners. The reason for this was to provide the ground truth when comparing HASH values to each other. These values will be called root values. The programs used were: FTK, forensically sound program WinHe x, forensically sound program Advanced Hash Calculator (AHC) Arpoon Febooti Hash CRC HashTab HashGenerator MD5 SHA1 Hash Utility IgorWare Hasher SFVNinja

PAGE 19

12 Each of the different file types were put into a program one at a time and then run three times to fo r later comparison. The value was copied and placed into an Excel spreadsheet. Three columns were labeled Test 1, Test 2, and Test 3 for each of the different runs. Figure 2 1: Excel Spreadsheet Example. After all the HASH values were gathered and placed i nto Excel MATLab was used to compare each of the value to ensure that they were the same. A script created by Catalin Grigoras was used for this. The full script can be found in the Appendix. In MATLab each freeware value was compared to the two root value s, first FTK then WinHex. The Excel documents were loaded in MATLab and the script was run. First the three columns were compared to see if the values matched. If they did a 1 would appear in Figure 2 2: All HASH Values Match.

PAGE 20

13 Figure 2 3: All HASH Values Match, columns. Figure 2 4: HASH Value Differences.

PAGE 21

14 Figure 2 5: HASH Value Differences, column. If a difference was detected the file would be run again through that HASH program and then run through MATLab again. If a value of 0 appeared again then the file was run through the program once more as well as MATLab. If the value was again 0 then it would be assumed that there was something wrong within the freeware HASH checker and it would fail the test Once all three columns had matching values the freeware file values were compared to the root value. If the values al a difference was detected the file would be run again through that HASH program and then run through MATLab again. If a value of 0 appeared again then the file was run through the program once more as well as MATLab. If the value was again 0 then it would be assumed that there was something

PAGE 22

15 wrong within the freeware HASH checker. Results were recorded in an Excel spreadsheet. A green X was used to indicate that all files matched and a red O was used to indicate where files did not match despite being tested three different times. Table 1 : Excel Document MD5 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X Table 2: Excel Document SHA 1 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X

PAGE 23

16 Table 3: JPEG Image File MD5 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X Table 4 : JPEG Image File SHA 1 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X

PAGE 24

17 Table 5 : PDF File MD5 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X Table 6 : PDF File SHA 1 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X

PAGE 25

18 Table 7 : Text File MD5 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X Table 8 : Text File SHA 1 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X

PAGE 26

19 Table 9 : Word Document MD5 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X Table 10 : Word Document SHA 1 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X

PAGE 27

20 Table 11 : WAVE File MD5 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X Table 12 : WAVE File SHA 1 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X

PAGE 28

21 Table 13 : MP3 File MD5 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X T able 14 : MP3 File SHA 1 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X

PAGE 29

22 Table 15 : WMA File MD5 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X Table 16 : WMA File SHA 1 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X

PAGE 30

23 Table 17 : AVI Video File MD5 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X Table 18 : AVI Video File SHA 1 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X

PAGE 31

24 Table 19 : QuickTime Video File MD5 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X Table 20 : QuickTime Video File SHA 1 Results Checker Name FTK WinHex FTK X X WinHex X X AHC X X Arpoon X X Febooti Hash CRC X X HashTab X X HashGenerator X X MD5 SHA1 X X IgorWare Hasher X X SFVNinja X X

PAGE 32

25 CHAPTER III CONCLUSION As seen in the previous tables there were no errors when gathering the HASH values from these freeware programs. However, this represents only a fraction of the different kinds of programs available online for download off of the internet. I t is recommended that new programs are tested in a similar way if not th e same way, before they are considered to be verified. Testing other HASH algorithms, such as WHIRLPOOL, can help in the verification process As a reminder MD5 and SHA 1 used together helps to safeguard against errors when safeguards are checked. Freeware software i s something that can be used to verify digital evidence from a working case. P roper protocol for handling digital files should still be followed depending on the Standard Operating Proced ure (SOP) setup by a department. It is still recommended to double ch eck the HASH values with a forensically based program like FTK, but when gathering evidence at a crime scene freeware programs can be used. This thesis has proposed a way of testing freeware software for errors, but not for collisions. Future tests can be done with the dataset to see if collisions occur wh en the HASH of each file is compared to the other.

PAGE 33

26 BIBLIOGRAPHY 1. What is a Hash Function? 2012, EMC Corporation. 7 Jul 2013 < http://www.rsa.com/rsalabs/node.asp?id=2176 > 2. Lu, Wenjun, Avinash L. Varna and Min Wu. "Forensic Hash for Multimedia Information." Media Forensics and Secutiry II 7541 (2010). 3. Peterson, William Wesley, and Daniel T. Brown. "Cyclic codes for error detection." Proceedings of the IRE 49.1, 1961 : 228 235. 4. Maxino, Theresa. "Revisiting Fletcher and Adler Checksums." (2006). 5. Wang, Xiayon, and HongBo Yu. How to Break MD5 and Other Hash Functions." Lecture Notes in Computer Science 3494 (2005): 19 35 6. Wang, Xiaoyun, Yiqun Lisa Yin, and Hongbo Yu. "Finding collisions in the full SHA 1." Advances in Cryptology CRYPTO 2005 Springer Berlin Heidelberg, 2005. 7. The WHIRLPOOL Hash Function. Ed. Paulo S. L. M. Barreto 2008. 7 Jul 2013 < http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html > 8. den Boer, Bert, and Antoon Bosselaers. "Collisions for the compression function of MD5." Lecture Notes in Computer Science 765 (1994): 293 304 9. Daum, Magnus and Stefan Lucks. Hash Collisions (The Poisoned Message Attack): "The Story of Alice and her Boss". 26 May 2013 < http://th.informatik.uni mannheim.de/people/lucks/HashCollisions/ > 10. National Institute of Standards and Technology (NIST), and United States of America. "Forensic Examination of Digital Evidence: A Gu ide for Law Enforcement." 2004 11. National Software Reference Library. 20 Aug. 2003. National Institute of Software and Technology. 7 Jul. 2013. < http://www.nsrl.nist.gov/ > 12. Turan, Meltem Sonmez et al. Status Report on the Second Round of the SHA 3 Cryptographic Hash Algorithm Competition NIST Interagency Report 7764, 2011.

PAGE 34

27 APPENDIX HASH Programs WinHex

PAGE 35

28 IgorWare Hasher MD5 SHA1 Utility

PAGE 36

29 Advanced Hash Calculator

PAGE 37

30 Febooti HASH CRC

PAGE 38

31 HashTab

PAGE 39

32 Hash Generator

PAGE 40

33 MATLab Script clear % cd directory cd J: \ Thesis \ Test_Documents \ %cd J: \ Thesis \ Test_Documents \ Excel_Files \ Excel_Program_HASHes \ SHA1 % cd %Get root file [ExcelRoot, Root] = uigetfile ( '*.xlsx' 'Select Root Excel File' ); addpath (Root); %Get comparison file [ExcelCompare, Compare] = uigetfile( '*.xlsx' 'Select Comparison Excel File' ); addpath (Compare) % read the Excel files %[sean01,txt01]=xlsread('Checksums.xlt'); %[sean02,txt02]=xlsread('HASH_CRC.xlt'); %[sean01,txt01]=xlsread('Excel File MD5 HASH FTK.xls'); %[sean02,txt02]=xlsread('Excel File MD5 HASH WinHex.xls'); %MD5 %[sean01,txt01]=xlsread('Excel File MD5 WinHex'); %[sean02,txt02]=xlsread(ExcelCompare); %SHA1 [sean01,txt01]=xlsread(ExcelRoot); [sean02,txt02]=xlsread(ExcelCompare); % compute the tabl es lines and columns numbers [a01,b01]=size(txt01); [a02,b02]=size(txt01); % compare the HASH values of the Test1...Test3 columns % txt01 for k1=2:a01 c11(k1 1)=strcmpi(txt01(k1,2),txt01(k1,3)); c12(k1 1)=strcmpi(txt01(k1,2),txt01(k1,4)); c13(k1 1)=strcmpi(txt01(k1,3),txt01(k1,4)); end if prod(c11(:))==1 disp( 'All the c11 HASH values match.' ) elseif prod(c11(:))==0 disp( 'Check the c11 HASH differences.' ) end if prod(c12(:))==1 disp( 'All the c12 HASH values match.' )

PAGE 41

34 elseif prod(c12(:))==0 disp( 'Check the c12 HASH differences.' ) end if prod(c13(:))==1 disp( 'All the c13 HASH values match.' ) elseif prod(c13(:))==0 disp( 'Check the c13 HASH differences.' ) end % txt02 for k1=2:a02 c21(k1 1)=strcmpi(txt02(k1,2),txt02(k1,3)); c22(k1 1)=strcmpi(txt02(k1,2),txt02(k1,4)); c23(k1 1)=strcmpi(txt02(k1,3),txt02(k1,4)); end if prod(c21(:))==1 disp( 'All the c21 HASH values match.' ) elseif prod(c21(:))==0 disp( 'Check the c 21 HASH differences.' ) end if prod(c22(:))==1 disp( 'All the c22 HASH values match.' ) elseif prod(c22(:))==0 disp( 'Check the c22 HASH differences.' ) end if prod(c23(:))==1 disp( 'All the c23 HASH values match.' ) elseif prod(c23(:))==0 disp( 'Check the c23 HASH differences.' ) end % compare the HASH values of two different Excel files for k1=2:min(a01,a02) for k2=2:min(b01,b02) c1(k1 1,k2 1)=strcmpi(txt01(k1,k2),txt02(k1,k2)); end end if prod(c1(:))==1 disp( 'All the HASH values match.' ) elseif prod(c1(:))==0 disp( 'Check the HASH differences.' ) end