Citation
Automated authentication of exemplar media in a database

Material Information

Title:
Automated authentication of exemplar media in a database
Creator:
Larsen, Brent M. ( author )
Place of Publication:
Denver, Colo.
Publisher:
University of Colorado Denver
Language:
English
Physical Description:
1 electronic file (23 pages) : ;

Thesis/Dissertation Information

Degree:
Master's ( Master of Science)
Degree Grantor:
University of Colorado Denver
Degree Divisions:
Department of Music and Entertainment Industry Studies, CU Denver
Degree Disciplines:
Recording Arts
Committee Chair:
Grigoras, Catalin
Committee Members:
Smith, Jeffrey
Lewis, Jason

Subjects

Subjects / Keywords:
Authentication ( lcsh )
Database management ( lcsh )
Authentication ( fast )
Database management ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Review:
The focus of this work is to document the creation of a database containing original audio recordings and images. Content for the database is managed through a web based application in which information can be passed to the database from the user. A fundamental feature of the media database application is the automated verification or rejection of a file based on its originality. By performing a string search of the file’s metadata, signs of non-originality due to prior processing from a software editor can be detected thus preventing the file from being added to the database. Additionally, the media database includes user restricted access for both submissions to and retrieval of database information.
Thesis:
Thesis (M.S.)-University of Colorado at Denver.
Bibliography:
Includes bibliographic references
System Details:
System requirements: Adobe Reader
Statement of Responsibility:
by Brent M. Larsen.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
958838962 ( OCLC )
ocn958838962

Downloads

This item has the following downloads:


Full Text
AUTOMATED AUTHENTICATION OF
EXEMPLAR MEDIA IN
A DATABASE
By
BRENT M LARSEN
B.S., University of Colorado, 2013
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
Of the requirements for the degree of
Masters of Science
Recording Arts
2016


2016
BRENT MICHAEL LARSEN
ALL RIGHTS RESERVED
11


This thesis for the Master of Science degree by
Brent Michael Larsen
has been approved for the
Recording Arts Program
by
Catalin Grigoras, Chair
Jeffery Smith
Jason Lewis


Larsen, M. Brent (M.S. Recording Arts)
Framework for Automating Authentication of Digital Media Files for an Exemplar Database
Thesis directed by Professor Catalin Grigoras
ABSTRACT
The focus of this work is to document the creation of a database containing original
audio recordings and images. Content for the database is managed through a web based
application in which information can be passed to the database from the user. A fundamental
feature of the media database application is the automated verification or rejection of a file
based on its originality. By performing a string search of the files metadata, signs of non-
originality due to prior processing from a software editor can be detected thus preventing the
file from being added to the database. Additionally, the media database includes user
restricted access for both submissions to and retrieval of database information.
The form and content of this abstract are approved. I recommend its publication
Approved: Catalin Grigoras
IV


TABLE OF CONTENTS
CHAPTER
I. INTRODUCTION........................................ 1
Purpose..........................................1
Authentication...................................2
Methods for Authentication Using Python..........5
II. GUIDELINES FOR MULTIPLE USER CONTRIBUTIONS.........9
III. D A I A B A SI! I RAM I AVOR K................... 10
Front and Backend Structures.................... 10
User Identity Management........................ 10
IV. TOOL VALIDATION METHODS AND MATERIALS............. 13
V. RESULTS............................................. 15
VI. CONCLUSION.......................................... 16
Discussion........................................ 16
Future Research................................... 16
REFERENCES.............................................. 17
v


LIST OF FIGURES
FIGURE
1. Example of EXIF information displayed in a hex viewer.............3
2. EXIF device information of an original file in hex viewer.........4
3. EXIF software editor traces in hex viewer.........................4
4. Python Exifread operation........................................ 5
5. Text output of EXIF...............................................6
6. Python script for converting first 5000 bytes to ASCII............7
7. Terminal output of the first 5000 bytes in ASCII of image........ 7
8. Black listed word dictionary......................................8
9. Stormpath user account management page.......................... 11
10. Stormpath user group management page............................ 12
11. MySQL table containing successful media file submission..........15


CHAPTER I
INTRODUCTION
Purpose
Due to the ever changing nature of digital media, the field Digital Forensics must be
responsive in staying current with new digital devices. One method that can be employed in
the aiding of digital examiners is the use of reference media otherwise known as exemplars.
Exemplar media can provide valuable metadata information about the device on which a
media file was created. By building a large collection of exemplars an examiner can use these
references against a media file of unknown origin to help identify any key similarities. In
order for this method to be effective a large collection of authentic original files must be
maintained. A database can be valuable resource once the number of exemplars grow and a
need for easy access is present. A properly implemented database can also prove useful in
adding of media created on newer digital devices, thus addressing this need to staying up-to-
date with digital devices.
In order to meet the needs of the intended user a media database should be efficient in
its operation. In our proposed exemplar database, the challenge of adding new media to the in
which new exemplar media files are added must be simplified. Authenticating a large
quantity of media files can be an arduous task for a single individual to take on. In this
instance much of the work can be divided up by automating a base-level of authentication
and incorporating a multiple user submission method. In addition to reducing the amount of
time it takes to submit a media file, automating authentication also serves the purpose of
1


acting as a gatekeeper for the media entering the database. If a submitted file is determined to
be consistent with that of an original, it is permitted to enter into the database. If the file
appears to be inconsistent with that of an original, it is rejected.
Authentication
One of the critical components this database relies upon is that the media contained
within must be authentic to have any inherent value. In determining whether a file is
authentic what we are looking for is that a file is consistent with the operation of the
recording device used to make the media. [1] One method of analysis for authenticating a
digital media file is by evaluating its file structure. A digital media file is comprised of two
parts. One part is the actual media that represents the audio, video, or image and the other
part contains information about the audio, video or image. In Scott Andersons Analytical
Framework for Authenticating Digital Images he likens the digital file to a can of soup. In his
analogy the soup within the represents the actual media, the can the container in which the
media is stored, and the label being information about the media. [2]
This metadata that is stored inside a file gives instruction about how the file is
assembled, the contents of the file, information about the media, and information about the
file. A key feature within this metadata is the Exchangeable Image File Format or EXIF. The
EXIF of a media file may include specific information on the creation date and time, the
device information on which the file was created, f-stop and ISO speeds for images, sample
rate and number of channels for audio, along with other relevant information about the
media.
A common method in assessing a files structure is by viewing it as hexadecimal
notation data. A hex viewer is a valuable tool that converts the raw binary data into a more
2


human-readable format. The converted text itself is for the most part unintelligible but certain
bits of EXIF information may prove easier to interpret [Figure 1], In addition to a media
device populating the metadata for a digital file, alterations to the file may be present as a
result of a software editor. For example, we could evaluate the metadata of an mp3 audio
recording created by a Tascam GT-R1 portable recorder and could expect find information
about the device on which it was recorded [Figure 2], However, if this mp3 file was then
opened in a software editor such as Adobe Audition and then saved with the exact name we
would see that new information was added to the metadata [Figure 3], Whether any editing of
the mp3 takes place not, Adobe Audition adds its own creation information into the file. At
this point it is upon this criterion that we must no longer consider the mp3 an original
recording as it no longer consistent with the operation of the device on which it was created.
< i & GOPR1923.JPG
1000 00000000 00000000 00000000 00000000 00000000 a
1020 00000000 4443494D 5C313031 474F5052 4F00476F DCIM\101GOPRO Go
1040 50726F00 00000000 00000000 00000000 00000000 Pro
1060 00004845 524F332B 2053696C 76657220 45646974 HER03+ Silver Edit
1080 696F6E00 00000000 00480000 00010000 00480000 ion H H
1100 00014844 332E3130 2E30322E 30300032 3031363A HD3.10.02.00 2016:
1120 30333A31 31203135 3A30393A 32350050 72696E74 03:11 15:09:25 Print
1140 494D0030 33303000 00000300 01002200 22000A00 IH 0300 "
1160 00000000 0B000000 91000000 00000000 00000A00 e
1180 00008200 00011800 00006432 3031363A 30333A31 c d2016:03:1
1200 31203135 3A30393A 32350032 3031363A 30333A31 1 15:09:25 2016:03:1
1220 31203135 3A30393A 32350000 24199800 135F60FF 1 15:09:25 $ 6 '
1240 FFF182FF FFFC1800 00012900 00006400 00000000 0C" ) d
1260 00002000 00012900 00006400 00000000 00000000 ) d
1280 00011500 0000640D 00000000 00000000 02000000 d
1300 00000000 00000000 00000000 00000000 0000004C L
1320 57313430 37313030 39303031 30383800 00000000 W14071009001088
1340 00000000 00000000 00000000 00000000 00000000
1360 00000000 00000000 00000000 00000000 00000000
1380 00000000 00000000 00000000 00000000 00000000
1400 00000000 00000000 00000000 00000000 00004041 @A
1420 4D424100 00000000 00000000 00000000 00000000 MBA
1440 00000000 00000000 00000000 00000000 00000000

( Signed Int C]( big 0] (select a contiguous range) 0
0x69 bytes selected at multiple offsets out of 0x28173A bytes
FIGURE 1: Example of EXIF information displayed in a hex viewer
3


g| TASCAM-GT-R1 -GT000053.mp3
0 49443303 00000000 00305449 54320000 000E0000 ID3 0TIT2
20 00475430 30303035 332E6D70 33005441 4C420000 GT000053.mp3 TALB
40 000E0000 00544153 43414D20 47542D52 3100FFFB TASCAH GT - R1 "
60 92400000 00000069 00000000 00000D20 00000000 L@ i
80 0001A414 00000000 00348280 0000FF7F BCE36CD0 § 4CA V 1- n L
100 0B4BFE4F 2738C250 C2E1AFF9 33FA60C1 E99E47B9 K 0'8--P-1 0 3 ' ' j EuGtt
120 2E4A01CC 562060E5 17E069E2 101B18EC 05823FE2 .J AV 'A Ji, i c?,
140 280580E0 DB8194FF C5580F01 80A07C38 6060F01F C AJeAtX Af 1 8''*
160 FF92822E 3982E30B 17139FFF E1950B14 1418 B3 8 6 t C 9C u 1 >u
180 E8E78E6F FFFE37C6 F8AB0C4E 166C3570 E78DF1CE EAeo"t 7A'' N 15pAg0(I
200 FFFFFC4E 01ABC558 DF1BE2AC 50020212 C2801CCF N 'Xfl " P -.A a
220 FFFFFFC9 4104C73C 951CF171 9283984A 0E70E78B "...A 240 3C5C7170 781BDEE0 7D40F81C 85C00843 9640D114 <\qpx 0i Cn@-
260 0D072D96 00E4B502 B03F03BC 900DA220 31403FC3 -n %\i ? e 4 1@?V
280 7C0C2C06 18C00724 FFC0C906 0B0C1000 220C0CB8 1 i $ i... " n
300 9FFF1920 310302DE 406058DD 141FFFE4 209AC37F ET 1 % 6^
320 100C4661 737FFFF8 9F850038 C2E007E1 F85C85B2 Fas "u0 8 \o<
340 DFFFFFF1 FB8F58F5 &6780C51 173&F58F F58FFFFF frj fpXAiir o RApAp
[ Signed Int 0) (big 0) (select less data) Q (+)
OxC bytes selected at offset 0x2D out of 0x2B61 B bytes
FIGURE 2: EXIF device information of an original audio file in hex viewer
; () TASCAM-GT-R1-GT000053.mp3
0 49443303 00000000 32725449 54320000 000E0000 ID3 2rTIT2
20 00475430 30303035 332E6D70 33005441 4C420000 GT000053.mp3 TALB
40 000E0000 00544153 43414D20 47542D52 31005052 TASCAM GT-R1 PR
60 49560000 11380000 584D5000 3C3F7870 61636B65 IV 8 XMP 80 74206265 67696E3D 22EFBBBF 22206964 3D225735 t begin="Oa0" id="W5
100 4D304D70 43656869 487A7265 537A4E54 637A6B63 M0MpCehiHzreSzNTczkc
120 3964223F 3E0A3C78 3A786D70 6D657461 20786D6C 9d"?> 140 6E733A78 3D226164 6F62653A 6E733A6D 6574612F ns:x="adobe:ns:meta/
160 2220783A 786D7074 6B3D2241 646F6265 20584D50 " x:xmptk="Adobe XMP
180 20436F72 6520352E 362D6330 36372037 392E3135 Core 5.6-C067 79.15
200 37373437 2C203230 31352F30 332F3330 2D32333A 7747, 2015/03/30-23:
220 34303A34 32202020 20202020 20223E0A 203C7264 40:42 "> 240 663A5244 4620786D 6C6E733A 7264663D 22687474 f:RDF xmtns:rdf="htt
260 703A2F2F 7777772E 77332E6F 72672F31 3939392F p://www.w3.org/1999/
280 30322F32 322D7264 662D7379 6E746178 2D6E7323 02/22-rdf-syntax-ns#
300 223E0A20 203C7264 663A4465 73637269 7074696F > 320 6E207264 663A6162 6F75743D 22220A20 20202078 n rdf:about="" x
340 6D6C6E73 3A786D70 444D3D22 68747470 3A2F2F6E ml ns:xmpDM="http://n
360 732E6164 6F62652E 636F6D2F 786D702F 312E302F s.adobe.com/xmp/1.0/
380 44796E61 6D69634D 65646961 2F220A20 20202078 DynamicMedia/" x
400 6D6C6E73 3A64633D 22687474 703A2F2F 7075726C mlns:dc="http://purl
420 2E6F7267 2F64632F 656C656D 656E7473 2F312E31 .org/dc/elements/1.1
440 2F220A20 20202078 6D6C6E73 3A786D70 3D226874 /" xml ns:xmp="ht
460 74703A2F 2F6E732E 61646F62 652E636F 6D2F7861 tp://ns.adobe.com/xa
480 702F312E 302F220A 20202020 786D6C6E 733A786D p/1.0/" xmlns:xm
500 704D4D3D 22687474 703A2F2F 6E732E61 646F6265 pMM="http://ns.adobe
520 2E636F6D 2F786170 2F312E30 2F6D6D2F 220A2020 .com/xap/1.0/mm/"
( Signed Int 0] (big 0) (select a contiguous range)
0x14 bytes selected at multiple offsets out of 0x2CFDD bytes
FIGURE 3: EXIF of audio file from FIGURE 2 after being saved in Adobe Audition
4


Methods for Authentication using Python
By scripting the analysis of a media file, the authentication process is sped up and any
potential human error can be mitigated. Python modules allow for a customizable set of tools
that can perform specific tasks with a very high rate of efficiency. For example, with a few
lines of code we can analyze the EXIF of a jpg image with the help of the exifread module
[Figure 4], This code performs the basic function of opening a file, utilizing the the exifread
module to extract EXIF tag information, and then outputting the findings into a text file
[Figure 5],
O #_________________________j exif.py________________________________
X ^ File Path : ~/Desktop/Testing/exif.py
___________ exif.py (nos...cted) + T B T # T
1 import exifread
Z 3 f = open("4586005925.jpg") #image file
4 exit = exifread.process_file(f)
5 C f.close()
D 7 exifdata = open('exif.txt1 , 'w')
8 exifdata.write(str(exif))
9 exifdata.close()
10 1
Line 10 Col 1 | Python *] Unicode (UTF-8) Unix (LF) *1_ ^
FIGURE 4: Python exifread operation
5


Q exif.txt
320)
ExposureTime (1,
FNumber (8, 1)
CustomRendered 0
MaxApertureValue
FocalLength (61, 1)
ShutterSpeedValue
Flash 16
WhiteBalance 0
ExifOffset 70
MeteringMode 5
DateTimeDigitized
DateTimeOriginal
ApertureValue (6, 1)
ISOSpeedRatings 100
Model Canon EOS 5DS
Make Canon
ExposureMode 1
ExposureBiasValue (0, 1)
(3, 1)
(8321928, 1000000)
2015:07:09
2015:07:09
08:58:59
08:58:59
FIGURE 5: Text output of EXIF organized
At this point, the information generated with Python still produces results that must
be manually examined. We can further automate this process by parsing the file and
instructing Python to interpret the results for us. The next process we can apply is a string
search of the media files metadata in order to look for traces left by software editors.
Because relevant EXIF data occurs at the beginning of a digital file, we can expedite the
search process by limiting the search to the first and last 5,000 bytes of the file. Figure 6
shows the output of an original image made with a Nikon D3300 camera that has been
opened by Python, and then the first 5000 bytes are converted to ASCII and outputted into
the terminal. [Figure 7]
6


O i
T
test.py
File Patfw : ~/Desktop/Testing/test.py
Q test.py (no sy.Jected)
1 import binascii
2 filename = "DSC_0023.JPG"
3 T with open(filename, 1 rb) as f:
4 content = f.read(5000)
5 hex = binascii.hexlify(content)
6 k. ascii = binascii.a2b_hex(hex)
7 print ascii
8 1
Line 8 Col 1
Python * Unicode (UTF-8) * Unix (LF) * £

FIGURE 6: Python script for converting first 5000 bytes to ASCII
O C3 Testing bash 80x24
Brents-MacBook-Pro:Testing Four_Digit$ python test.py
??????ExifII*
?
??(1
?2?i??%??;?;NIKON C0RP0RATI0NNIK0N D3300,,Ver.l.00 2016:03:10 10:54:02)??????"?'
? 0??0230???????
? ? ?
?(|?8l??,0??00??00??00?0100??p????;???\????d?? ?
?
?
.#
2016:03:10 10:54:022016:03:10 10:54:02$
?
??CII ? Nikonll:0211?
*
R"??#:Z$\?%?+?(>?-??6 ; ??+N??l???l?! ?6??
?6?i??????l?6?
7?7?*7?27??R7?7?8FINE AUTO AF-S ??p????
7344196301000100STANDARDSTANDARD?????????'
FIGURE 7: Terminal output of the first 5000 bytes in ASCII of image
7


These results can then be organized into a more human-readable format, however for
automation purposes this is not pertinent. For the next step of this process a dictionary of
commonly found words left in metadata by software editors is created. Figure 8 shows an
example dictionary of black listed words that are commonly left by image viewing and
editing software. Python then takes this dictionary and searches its contents against the
contents of the ASCII string that was converted from binary in the previous steps. If any of
the black listed words are found within the media file, Python is then instructed to make a
decision based on its findings.
O O Q words_black_list.txt
Adobe
adobe
AppleMark
Corel
Corel
GIMP"
Gimp
gimp
http://www.iec.ch
JFIF
Light room
picasa
Picasa
Photoshop
quicktime
Quicktime
QuickTime
Windows
windows
FIGURE 8: Sample of blacklisted word dictionary
8


CHAPTER II
GUIDELINES FOR MULTIPLE USER CONTRIBUTIONS
Crowdsourcing materials for use in any scientific research comes with the risk of
introducing a margin of error to the results. In allowing multiple users to contribute new
media to the database it can be said that a level of uncertainty may be introduced regarding a
files originality. By implementing specific control measures this level of uncertainty can be
reduced. Four proposed methods of ensuring the integrity of the database remain intact
include:
1. Users accounts are assigned to the user by administrators of the database. Those who
wish to gain access to the database must obtain their credentials by submitting a
request to administrators.
2. Authentication of media being submitted to the database is performed automatically.
This is to ensure that potential error or bias is mitigated.
3. When a user successfully submits a file to the database their user id and time of
submission is also added to the database entry. This will provide a record in the case
that any suspicious activity occurs.
4. If a user attempts to upload a media file that is determined not to be authentic they
will be given warning. A limited number of infractions will be tolerated before the
privileges of the users will be suspended.
9


CHAPTER III
DATABASE FRAMEWORK
Front and Backend Structures
Having defined the basis on which media files are to be authenticated, this section
provides documentation of the development of a working media exemplar database for use
by the National Center for Media Forensics. The database is built around a web accessible
application in which authorized users can access and/or upload new media depending on
their privileges.
The front end of the application is handled by Flask. Flask is Python micro-
framework which provides tools for creating a web based application around the Python
language. [3] Flask provides several benefits including database integration as well as the
ability to incorporate advanced Python modules into Flask. The backend of the database is
handled by MySQL. MySQL is an open-source relational database management system that
handles client-server responsibilities. It is also one of the most widely used database
management system and is effective in communicating with Flask. [4]
User Identity Management
Even though MySQL and Flask are fully capable of handling User Identity and
Access Management, an additional layer of security can be added by utilizing an outside user
authorization service. User authorization service for this database are handled by Stormpath.
The decision to choose this service was based upon the ease of implementation within
10


Flasks framework as well as high standard of security and best practices that they adhere to
[5] Many of these best practices followed by Stormpath can be observed outside of their
services, however it should be noted that development deadlines helped form this decision.
Stormpath allows for administrative control over user accounts to be managed
through a web-based format [Figure 9], User access controls can be managed at any time to
define whether a specific user is limited to access only privileges or if they have permission
to contribute original files to the database. [Figure 10] Stormpath also handles a number of
other account management features key to ensuring a databases success such as user
password rest and email communications.
QuickStart Documentation -i Add Admin Upgrade
primary-spectrum
$ Applications
SS Organizations fi Directories £* Groups £ Accounts ^ Agents HH ID Site
Applications >
Media Database
Details
Accounts
Account Stores
Groups
OAuth Policy
SAML Policy
List of accounts that have access to Media Database. This list is generated from the directories or groups that are mapped through the
Account Stores page.
2 Accounts
NAME
Brent Larsen
John Smith
Bulk Actions
Create Account
EMAIL DIRECTORY STATUS
brent. larsen@ucdenver.e... Media Database Directory Enabled
johnsmith@NCMF.edu Media Database Directory Enabled
(K
Search
ACTION
3
FIGURE 9: Stormpath user account management page
11


Stormpath
A Home $ Applications Organizations
Groups
Create and manage groups in Stormpath.
QuickStart Documentation + Add Admin Upgrade
. Accounts ^ Agents t*9 ID Site
primary-spectrum
Ik 2 Groups
NAME
Full Access
Limited Access
Bulk Actions
DESCRIPTION DIRECTORY
Users are able to access and submit t... Media Database Directory
Users are limited to access-only privile... Media Database Directory
Create Group
STATUS
Enabled
Enabled
(^Q. Search
ACTION

FIGURE 10: Stormpath user group management page
12


CHAPTER IV
TOOL VALIDATION METHODS & MATERIALS
The Scientific Working Group on Digital Evidence calls for the validation of, all
tools, techniques and procedures utilized in the performance of digital forensics. [6]
Validation of the database is key in ensuring proper functionality and that repeated use will
yield similar results. The primary function that this testing addresses is the ability of the
automated authentication to detect media that is not considered original.
Because the database accepts audio, video, and image files it was important to test
each file type to ensure proper validation. Five different devices from each category were
selected to provide the original media content.
1. Handheld audio recorders used:
Olympus DM-620
Tascam DR-07
Zoom HI
Sony ICD-SX750
Tascam DR lOOmkll
2. Digital Cameras used (images):
Panasonic DMC-FS7
Casio EX-Z150 [7]
Sony DSC H50 [7]
Nikon D200 [7]
Canon EOS 30D
13


3. Cameras used (video):
Nikon CoolPix LI8
Sony Cybershot DSC-S650
GoPro Hero 3+
Canon Power Shot Prol
Olympus EPL1
The media created on these digital devices were then manually authenticated by
performing a metadata analysis Copies of these files were then made and each was opened
within various software editors. Each file was then resaved and given the same name. Again
metadata analysis was performed on the newly created files to manually identify the traces
left by software editors. In order to test the function of the database an admin user was
logged into the database application and each file was attempted for submission.
14


CHAPTER V
RESULTS
Of the 15 original media files submitted to the database, each file passed the
requirement determined by the algorithm to successfully enter into the database. Figure 11 is
a view of the MySQL database table containing the of the 15 media files.
Result Grid U Filter Rows: [ Q Search | Edit: [^] gb> & Export/lmport:
mediajd media_user_id media_type media_make media_model
1 brent.larsen@ucdenver.edu audio TASCAM DR-100MK2
2 brent.larsen@ucdenver.edu audio TASCAM DR-07
3 brent.larsen@ucdenver.edu audio ZOOM H1
4 brent.larsen@ucdenver.edu audio SONY
5 brent.larsen@ucdenver.edu audio OLYMPUS DM620
6 brent.larsen@ucdenver.edu video OLYMPUS E-PL1
7 brent.larsen@ucdenver.edu video CANON
8 brent.larsen@ucdenver.edu video SONY DSC-S65
9 brent.larsen@ucdenver.edu video NIKON L18
10 brent.larsen@ucdenver.edu video GoPro
11 brent.larsen@ucdenver.edu image Panasonic DMC-FS7
12 brent.larsen@ucdenver.edu image CANON EOS 30 D
13 brent.larsen@ucdenver.edu image CASIO EX-Z150
14 brent.larsen@ucdenver.edu image NIKON CORPORATION D200
15 brent.larsen@ucdenver.edu image SONY DSC-H50
tbl_media 1
FIGURE 11: MySQL table containing successful media file submission
Each of the 15 media files that were opened and then resaved in various software
editors and queued for submission were rejected based upon traces of software editors
present. The algorithm determined in each case that the newly saved media files contained
traces left by software editors.
15


CHAPTER VI
CONCLUSION
Discussion
The documentation of the development of an exemplar media database framework
detailed in this paper outlines a practical method for the authentication of audio, video, and
images created on digital devices. In addition, validation testing of the proper functioning of
this database ensures that it can be considered a tool that is both reliable and reproducible in
its results. It should be noted that keeping the blacklist word dictionary is an important task
that will need to be regularly evaluated.
Future Research
This exemplar database was designed to so that it could be expanded upon in the
future to incorporate additional features. The current abundance of Python modules that are
openly available means that further customization and additions can be made in order to meet
the needs of the database. It could also be said that the overall scope of the database could be
redefined as new features are added. Currently the database serves the purpose of providing
reference material aiding with the comparisons against unknown media, but potential future
features worth consideration include:
Structure Analysis
Quantization Table extraction from cameras [8]
Clone Detection for images [9]
Photo Response Non Uniformity (PRNU) mapping [10]
Transition and zero level analysis of audio recordings
16


REFERENCES
[1] Scientific Working Groups on Digital Evidence and Imaging Technology (May 27,
2015). SWGDE and SWGIT digital & multimedia evidence glossary v2.8. Retrieved
March 14, 2016, from the Scientific Working Group on Digital Evidence website:
https://www.swgde.org/documents/Current%20Documents/
[2] Scott Dale Anderson (2011). Digital Image Analysis: Analytical Framework For
Authenticating Digital Images. University of Colorado Denver, 18
[3] Ronacher Armin (2013). Flasks Documentation. Retrieved March 2, 2016, from
Flask website: http://flask.pocoo.Org/docs/0.10/
[4] Oracle (2016) Oracle Products and Services Overview Retrieved March 17, 2016,
from Oracle website: http://www.oracle.com/us/products/mysql/overview/index.html
[5] Stormpath (2016) Flask-Stormpath Documentation. Retrieved March 14, 2016, from
Stormpath website: https://flask-stormpath.readthedocs.org/en/latest/index.html
[6] Scientific Working Group on Digital Evidence (September 5, 2014). SWGDE
Recommended Guidelines for Validation Testing v2.0. Retrieved March 15, 2016,
from the Scientific Working Group on Digital Evidence website:
https://www.swgde.org/documents/Current%20Documents/
[7] Gloe, T., & Bohme, R. (2010). The Dresden Image Database for benchmarking
digital image forensics. In Proceedings of the 25* Symposium on Applied Computing
(ACM SAC 2010) (Vol. 2, pp. 1585-1591)
[8] Christy, John (1992). Read/Write JPEG Image Format. Retrieved April 26, 2016 from
Github:
https://github.com/ImageMagick/ImageMagick/blob/05d2ff7ebf21f659f5blle45afb2
94el52f4330c/coders/jpeg.c L818
[9] Vasiliauskas Agnius (2009). Detecting Copy-move Forgery in Images. Retrieved
April 27, 2016 from Vasiliauskas blog: http://coding-
experiments.blogspot.com/2009/03/detecting-copv-move-forgery-in-images.html
[10] Jocelin Roasles Corripio, Ana Lucila Sandoval Orozco, Luis Javier Garcia Villalba,
Julio Hemanzez-Castro, Stuart James Gibson (2013). Source Smartphone
Identification Using Sensor Pattern Noise and Wavelet Transform. University of
Kent. Retrieved April 27, 2016 from: https://kar.kent.ac.Uk/37195/l/corripio2013.pdf
17


Full Text

PAGE 1

! AUTOMATED AUTHENTICATION OF EXEMPLAR MEDIA IN A DATABASE By BRENT M LARSEN B.S., University of Colorado, 2013 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment Of the requirements for the degree of Masters of Science Recording Arts 2016

PAGE 2

! ii 2016 BRENT MICHAEL LARSEN ALL RIGHTS RESERVED

PAGE 3

! iii This thesis for the Master of Science degree by Brent Michael Larsen has been approved for the Recording Arts Program b y Catalin Grigoras Chair Jeffery Smith Jason Lewis April 29 2016

PAGE 4

! iv Larsen, M. Brent (M.S. Recording Arts) Framework for Automating Authentication of Digital Media Files for an Exemplar Database Thesis directed by Professor Catalin Grigoras ABSTRACT The focus of this work is to document the creation of a database containing origin al audio recordings and images Content for the database is managed through a web based application in which information can be passed to the database from the user A fundamental feature of the media database application is the automated verificatio n or rejection of a file based on its originality. By performing a string search of the file 's metadata signs of non originality due to prior processing from a software editor can be detected thus preventing the file from being added to the database. Addi tionally, the media database includes user restricted access for both submissions to and retrieval of database information. The form and content of this abstract are approved. I recommend its publication Approved: Catalin Grigoras

PAGE 5

! v TABLE OF CONTENTS CHAPTER I. INTRODUCTIO N ... 1 Purpose .. 1 Authentication ... 2 Methods for Authentication Using Python ... 5 II. GUIDELINES FOR MULTIPLE USER CONTRIBUTIONS .... 9 III. DATABASE FRAMEWORK 10 Front and Backend Structures ... 10 User Identity Management 10 IV. TOOL VALIDATION METHODS AND MATERIALS .. 13 V. RESULTS .... 15 VI. CONCLUSION .. 16 Discussion .. 16 Future Research .. ... 16 REFERENCES .. 17

PAGE 6

! vi LIST OF FIGURES FIGURE 1 Example of EXIF information displayed in a hex viewer ... 3 2. EXIF device information of an original file in hex viewer .. 4 3. EXIF software editor traces in hex viewer ... 4 4. Python Exifread operation 5 5. Text output of EXIF 6 6 Python script for converting first 5000 bytes to ASCII .. 7 7 Terminal output of the first 5000 bytes in ASCII of image 7 8 Black listed word dictionary ... 8 9 Stormpath user account management page 11 10 Stormpath user group management page .. .. 12 11. MySQL table containing successful media file submission .. ..15

PAGE 7

! 1 CHAPTER I INTRODUCTION Purpose Due to the ever chang ing nature of digital media, the field Digital Forensics must be responsive in staying current with new digital devices. One method that can be employed in the aiding of digital examiner s is the use of reference media otherwise known as exemplars. Exemplar media can provide valuable metadata information about the device on which a media file was created. By building a large collection of exe mplars an examiner can use these references against a media file of unknown origin to help identify any key similarities In order for this method to be effective a large collection of authentic original files must be maintained A database can be valuable resource once the number of exemplars grow and a need for easy access is present A properly implemented database can also prove useful in adding of media created on newer digital devic es, thus addressing this need to staying up to date with digital devices In order to meet the needs of the intended user a media database should be efficient in its operation. In our proposed exemplar database, the challenge of adding new media to the in which new exemplar media files are added must be simplified Authenticating a la rge quantity of media files can be an arduous task for a single individual to take on. In this instance much of the work can be divided up by automating a base level of authentication and incorporating a multiple user submission method In addition to reducing the amount of time it takes to submit a media file, automating authentication also serves the purpose of

PAGE 8

! 2 acting as a gatekeeper for the media entering the database. If a submitted file is determined to be consistent with that of an original, it is permitted to enter into the database. If the file appears to be inconsistent with that of an original, it is reje cted Authentication One of the critical components this database relies upon is that the media contained within must be authentic to have any inherent value. In determining whether a file is authentic what we are looking for is that a file is consistent with the operation of the recordin g device used to make the media. [1 ] One method of analysis for authenticating a digital media file is by evaluating its f ile structure A d igital me dia file is comprised of two parts One part is the actual media that represe nts the audio, video, or image and the other part contains information about the audio, video or image In Scott Anderson's Analytical Framework for Authenti cating Digital Images he lik en s the digital file to a can of soup In his analogy t he soup w ithin the represent s the actual media the can the container in which the media is stored, and the label being information about the media [2 ] This metadata that is stored inside a file gives instruction about how the file is assembled, the contents of the file information about the media, and information about the file. A key feature within this metadata is the Exchangeable Image File Format or EXIF The EXIF of a media file may include specific information on the creation date and time, the device information on which the file was created, f stop and ISO speeds for images, sample rate and number of channels for audio, along with other relevant information about the media. A common metho d in assessing a file's structure is by viewing it as hexadecimal notation data. A hex viewer is a valuable tool that converts the raw binary data into a more

PAGE 9

! 3 human readable format The converted text itself is for the most part unintelligible but certain bits of EXIF information may prove easier to interpret [ Figure 1 ] In addition to a media device populating the metadata for a digital file alterations to the file m ay be present as a result of a software editor. For example, we could evaluate the metadata of an mp3 audio recording created by a Tascam GT R1 portable recorder and could expect find information about the device on which it was recorded [ Figure 2 ]. However, if this mp3 file was then opened in a software editor such as Adobe Audition and then saved with the exact name we would see that new information was added to the metadata [F igure 3 ] Whether any editing of the mp3 takes place not, Adobe Audition adds its own creation information into the file. At this point it is upon this criteri on that we must no longer consider the mp3 an original recording as it no longer consistent with the operation of the device on which it was created. FIGURE 1: Example of EXIF information displayed in a hex viewer

PAGE 10

! 4 FIGURE 2: EXIF device information of an original audio file in hex viewer FIGURE 3: EXIF of audio file from FIGURE 2 after being saved in Adobe Audition

PAGE 11

! 5 Methods for Authentication using Python By scripting the analysis of a media file the authentication process is sped up and any potential h uman error can be mitigated Python modules allow for a customizable set of tools that can perform specific tasks with a very high rate of efficiency. For example, with a few lines of code we can analyze the EXIF of a jpg image with the help of the exifread module [ Figure 4 ] This code performs the basic function of opening a file, utilizing the the exifread module to extract EXIF tag information, and then outputting the findings into a text file [ Figure 5 ] FIGURE 4 : Python e xifread operation

PAGE 12

! 6 FIGURE 5 : Text output of EXIF organized At this point, t he information generated with Python still produce s result s that must be manually examined. We can further automate this process by parsing the file and instructing Python to interpret the results for us. The next process we can apply is a string search of the media file's metadata in order to look for traces left by software editors Because relevant EXIF data occurs at the beginning of a digital file, we can expedite the search process by lim iting the search to the first and last 5 ,000 bytes of the file. Figure 6 shows the output of an original image made with a Nikon D3300 camera that has be en opened by Python, and then the first 5000 bytes are converted to ASCII and outputted into the terminal [Figure 7 ]

PAGE 13

! 7 FIGURE 6 : Python script for converting first 5000 bytes to ASCII FIGURE 7 : Terminal output of the first 5000 bytes in ASCII of image

PAGE 14

! 8 These results can then be organized into a more human readable format, however for automation purposes this is not pertinent For the next step of this process a dictionary of commonly found words left in metadata by software editors is creat ed Figure 8 shows an exampl e dictionary of black listed word s that are commonly left by image viewing and e diting software Python then take s this dictionary and search es its contents against the contents of the ASCII string that was converted from binary in the previous step s If any of the black listed words are found within the media file, Python is then instructed to make a decision based on its findings FIGURE 8 : Sample of blacklisted word dictionary

PAGE 15

! 9 CHAPTER II GUIDELINES FOR MULTIPLE USER CONTRIBUTIONS Crowdsourcing material s for use in any scientific research comes with the risk of introducing a margin of error to the results In allowing multiple users to contribute new media to the database it can be said that a level of uncertainty may be introduce d regarding a file's originality By implemen t ing specific control measures this level of uncertainty can be reduced. Four proposed methods of ensuring the integrity of the database r emain intact include: 1. Users accounts are assigned to the user by administrators of the database. Those who wish to gain acce ss to the database must obtain their credentials by submitting a request to administrators. 2. Authentication of media being submitted to the database is performed automatically This is to ensure that potential error or bias is mitigated. 3. When a user successfully submits a file to the database their user id and time of submission is also added to the database entry. This will provide a record in the case that any suspicious activity occurs. 4. If a user attempts to upload a media file that is determined not to be authentic they will be given warning A limited number of infractions will be tolerated before the privileges of the users will be suspended.

PAGE 16

! 10 CHAPTER III DATABASE FRAMEWORK Front and Backend Structure s Having defined the basis on which media files are to be authenticated, this section provides documentation of the development of a working media exemplar database for use by the National Center for Media Forensics. The database is built around a web accessible application in which authorized users can access and /or upload new media depending on their privileges. The front end of the application is handled by Flask. Flask is Python micro frame work which provides tools for creating a web based application around the Python language [3] Flask provides several benefits including database integration as well as the ability to incorporate advanced Python module s into Flask. The backend of the database is handled by MySQL. MySQL is an o pen source relational database management system that handles client server responsibilities. It is also one of the most widely used database management system and is effective in comm unicating with Flas k. [4] User Identity Management Even though MySQL and Flask are fully capable of handling User Identity and Access Management, a n additional layer of security can be added by utilizing an outside user authorization service. U ser authorization service for this database are handled by Stormpath The decision to choose this service was based upon the ease of i mplementation within

PAGE 17

! 11 Flask's framework as well as high standard of security and best practices that they adhere to [5] Many of these best pra ctices followed by Stormpath can be observed out side of their services, however it should be noted that development deadlines helped form this decision Stormpath allows for administrative control over user accounts to be managed through a web based format [Figure 9 ] User access controls can be managed at any time to define whether a specific user is limited to access only privileges or if they have permission to contribute original files to the database [Figure 10 ] Stormpath also handles a number of other account management features key to ensuring a database's success such as user password rest and email communication s FIGURE 9 : Stormpath user account management page

PAGE 18

! 12 FIGURE 10 : Stormpath user group management page

PAGE 19

! 13 CHAPTER IV TOOL VALIDATION METHODS & M ATERIALS The Scientific Working Group on Digital Evidence calls for the validation of, "all tools, techniques and procedures utilized in the performance of digital forensics ." [6] Validation of the database is key in ensuring proper functionality and that repeat ed use will yield similar results. The primary function that this testing address es is the ability of the automated a uthentication to dete ct media that is not considered original. Because the database accepts audio, video, and image files it was important to test each file type to ensure proper validation Five different devices from each category were selected to provide the original media content. 1. Handheld audio recorders used : Olympus D M 620 Tascam DR 07 Zoom H1 Sony ICD SX750 Tascam DR 100mkII 2. Digital Cameras used ( images ) : Panasonic DMC FS7 Casio EX Z150 [7] Sony DSC H50 [7] Nikon D200 [7] Canon EOS 30D

PAGE 20

! 14 3. Cameras used (video): Nikon CoolPix L18 Sony Cybershot DSC S650 GoPro Hero 3+ Canon PowerShot Pro1 Olympus EPL1 The media created on these digital devices were then manually authenticated by performing a metadata ana lysis C opies of these files were then made and each was opened within various software editors. Each file was then re saved and given the same name. Again metadata analysis was performed on the newly created files to manually identify the traces left by software editors. In order to test the function of the database an admin user was logged into the database application a nd each file was attempted for submission.

PAGE 21

! 15 CHAPTER V RESULTS Of the 15 original media files submitted to the database, each file passed the requirement determined by the algorithm to successfully enter into the database. Figure 11 is a view of the My SQL database table containing the of the 15 media files. FIGURE 11 : MySQL table containing successful media file submissio n Each of the 15 media files that were opened and then resaved in various software editors and queued for submission were rejected based upon traces of software editors present. The algorithm determined in each case that the newly saved media files contained traces left by software editors.

PAGE 22

! 16 CHAPTER VI CONCLUSION Discussion The documentation of the development of an exemplar media database framework detailed in this paper outlines a practical method for the authentication of audio, video, and images created on digital devices. In addition, validation testing of the proper functioning of this database ensu res that it can be considered a tool that is both reliable an d reproducible in its results. It should be noted that keeping the blacklist word dictionary is an important task that will need to be regularly evaluated. Future Research This exem plar database was designed to so that it could be expande d upon in the future to incorporate additional features. The current abundance of Python modules t hat are openly available means that further customization and additions can be made in order to meet the needs of the database. It could also be said that th e overall scope of the database could be redefine d as new features are added. Currently the database serves the purpose of providing reference material aiding with the comparisons against unknown media, but potential future features worth consideration includ e: Structure Analysis Quantization Table extraction from cameras [8] Clone Detection for images [9] Photo Response Non Uniformity (PRNU) mapping [10] Transition and zero level analysis of audio recordings

PAGE 23

! 17 REFERENCES [1] Scientific Working Group s on Digital Evidence and Imaging Technology (May 27, 2015 ) SWGDE and SWGIT digital & multimedia evidence glossary v2.8 Retrieved March 14, 2016, from the Scientific Working Group on Digital Evidence website : https://www.swgde.org/documents/Current%20Documents/ [2] Scott Dale Anderson (2011) Digital Image Analysis: Analytical Framework For Authenticating Digital Images. University of Colorado Denver 18 [3] Ronacher Armin (2013). Flask's Documentation Retrieved Ma rch 2, 2016, from Flask website: http://flask.pocoo.org/docs/0.10/ [4] Ora cle (2016) Oracle Products and Services Overview Retrieved March 17, 2016, from Oracle website: http://www.oracle.com/us/products/mysql/overview/index.html [5] Stormpath (2016) Flask Stormpath Documentation Retrieved March 14, 2016, from Stormpath website: https://flask stormpath.readthedocs.org/en/latest/index.html [6] Scientific Working Group on Digital Evidence (September 5, 2014). SWGDE Recommended Guidelines for Validation Testing v2.0 Retrieved March 15, 2016, from the Scientific Working Group on Digital Evidence website: https://www.swgde.org/documents/Current%20Documents/ [7] Glo e, T., & B šhme, R. (2010). The Dresde n Image Database for benchmarking digital image forensics. In Proceedings of the 25 th Symposium on Applied Computing (ACM SAC 2010) (Vol. 2, pp. 1585 1591) [8] Christy, John (1992) Read/Write JPEG Image Format. Retrieved April 26, 2016 from Github: https://github.com/ImageMagick/ImageMagick/blob/05d2ff7ebf21f659f5b11e45afb2 94e152f4330c/coders/jpeg.c L818 [9] Vasiliauskas Agni us (2009) Detecting Copy move Forgery in Images. Retrieved April 27, 2016 from Vasiliauskas' blog: http://coding experiments.blogspot.com/2009/03/detecting copy move forgery in images.html [10] Jocelin Roasles Corripio, Ana Lucila Sandoval Orozco, Luis Javier Garcia Villalba, Julio Hernanzez Castro, Stuart James Gibson (2013) Source Smartphone Identif ication Using Sensor Pattern Noise and Wavelet Transform. University of Kent. Retrieved April 27, 2016 from: https://kar.kent.ac.uk/37195/1/corripio2013.pdf