Citation
A Framework for analyzing the impact of distance on neural network object detection accuracy

Material Information

Title:
A Framework for analyzing the impact of distance on neural network object detection accuracy
Creator:
Hawronsky, David
Place of Publication:
Denver, CO
Publisher:
University of Colorado Denver
Publication Date:
Language:
English

Thesis/Dissertation Information

Degree:
Master's
Degree Grantor:
University of Colorado Denver
Degree Divisions:
Department of Electrical Engineering, CU Denver
Degree Disciplines:
Electrical engineering
Committee Chair:
Connors, Dan
Committee Members:
Liu, Chao
Lei, Tim

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.

Downloads

This item is only available as the following downloads:


Full Text

PAGE 1

AFRAMEWORKFORANALYZINGTHEIMPACTOFDISTANCEONNEURAL NETWORKOBJECTDETECTIONACCURACY by DAVIDHAWRONSKY B.S.,ElectricalEngineering,TexasTechUniversity,2012 Athesissubmittedtothe FacultyoftheGraduateSchoolofthe UniversityofColoradoinpartialfulllment oftherequirementsforthedegreeof MasterofScience ElectricalEngineering 2018

PAGE 2

ThisthesisfortheMasterofSciencedegreeby DavidHawronsky hasbeenapprovedforthe ElectricalEngineeringProgram by DanConnors,Advisor ChaoLiu TimLei December14,2018 ii

PAGE 3

Hawronsky,DavidM.S.,ElectricalEngineeringProgram AFrameworkforAnalyzingtheImpactofDistanceonNeuralNetworkObjectDetection Accuracy ThesisdirectedbyProfessorDanConnors ABSTRACT Ashardwarecontinuestopushtheenvelopeinbothperformanceandmobilityadvancements,anever-increasingnumberofneuralnetworkscontinuetobedevelopedandmodied totakeadvantageofthesenewplatforms.Asaresult,therenowexistsamyriadofneural networkswhichhavedieringperformancecharacteristics,hardwarerequirements,andlatencies.Thereexistsaneedtotestandbenchmarkthesenetworks,notonlyinperformance, butalsowithvaryingenvironmentalconditionsplacedonthedatabeingfedintothem, suchasanalyzingdierentdistancesforinputimages.Theresultsofthesetestswillallow fortheimplementationsdeployingthesenetworkstohavebetterknowledgeforselecting thearchitectureutilizedaswellasinformresearcherswherenetworkscouldbeimproved. TheobjectiveofthisthesisistocompileasetofbenchmarkstocomparehowwellvariousconvolutionalneuralnetworksCNNperformatdierentdistancesfromvarioustarget objects. Theformandcontentofthisabstractareapproved.Irecommenditspublication. Approved:DanConnors iii

PAGE 4

ACKNOWLEDGMENT ThisthesiswouldnothavebeenpossiblewithoutthegeneroussupportofProfessorDan Connors.Hisguidancethroughoutthewritingprocesswasinstrumentalinbeingableto nishthethesisinthetimeavailableforthework. ThankyoutomyfriendJayWhiteforgettingmeuptospeedonconvolutionneural networksandgrantingyouradviceduringthethesisprocess. IwouldalsoliketoexpressmygratitudetomyfriendsDerickandAmandaGunderson foralltheirhelpwiththeimagecapturingprocessandprovidingsomelevelofescapefrom thestressofnon-stopthesiswork. iv

PAGE 5

TABLEOFCONTENTS CHAPTER Figures...........................................vii Tables............................................ix Chapter I.INTRODUCTION................................1 II.BACKGROUND.................................3 NeuralNetwork..................................3 RegionalCNN..................................5 GoogleNet.....................................5 YOLO.......................................6 SSD........................................7 ExpectedPerformanceoftheNetworks.....................8 III.APPROACH...................................10 ExperimentComponents.............................10 CapturingObjectImages............................12 ObjectSelection...............................12 ImageCaptureProcess...........................12 ObjectImageexamples...........................13 RunningTheNeuralNetworkBenchmark...................18 Benchmarkprocessow.............................19 NeuralNetworkWrappers.........................20 NeuralNetworkCodeChanges.......................21 BenchmarkRunScript...........................21 ResultFormattingScript..........................22 CompilingResults................................24 IV.EXPERIMENTRESULTS............................25 v

PAGE 6

MachineLearningDatasets...........................25 PerformanceofObjectDetectionforeachObjecttype............25 SuccessfulDetectionHitRateAnalsis....................33 NetworkPerformanceComparisons.......................37 ObjectsAnalizedfromMultiplePerspectives..................38 NetworkRun-TimeAnalysis...........................39 V.CONCLUSION..................................41 REFERENCES.......................................43 vi

PAGE 7

FIGURES Figure II.1NeuralNetworkStructureandPerceptronDiagram..............3 II.2InceptionModule[14]..............................6 II.3YOLOArchitecture[9].............................7 II.4SSDArchitecture[8]...............................8 III.1ExperimentProcessFlow............................10 III.2ImageCaptureProcedureDiagram.......................13 III.3Vehicles1,2,and3frontproleat10Feet.................14 III.4Vehicle2frontproleat10and50Feet...................14 III.5Vehicles1,2,and3frontproleat10Feet.................14 III.6Vehicle2frontproleat10and50Feet...................15 III.7Bicycles1,2,and3at10Feet.........................15 III.8Bicycle2at10and50Feet...........................15 III.9Dogs1,2,and3at10Feet...........................16 III.10Dog2at10and50Feet.............................16 III.11People1,2,and3at10Feet..........................16 III.12Person2at10and50Feet...........................17 III.13StreetSigns1,2,and3at10Feet.......................17 III.14StreetSign2at10and50Feet.........................18 III.15BenchmarkRunProcedure...........................19 III.16DynamicObjectExpandability.........................20 III.17DynamicNetworkExpandability........................20 III.18BenchmarkRunScriptDiagram........................21 III.19ExampleofProcessedData...........................22 III.20ResultFormattingScriptDiagram.......................23 IV.1"Bicycle"BoundingBoxExamples.......................26 vii

PAGE 8

IV.2"Bicycle"Detectioncondenceoverdistance.................26 IV.3"Vehiclesfront"BoundingBoxExamples..................27 IV.4"Vehiclesfront"Detectioncondenceoverdistance.............27 IV.5"Vehiclesside"BoundingBoxExamples...................28 IV.6"Vehiclesside"Detectioncondenceoverdistance.............28 IV.7"Dog"BoundingBoxExamples.........................29 IV.8"Dogs"Detectioncondenceoverdistance...................29 IV.9"Person"BoundingBoxExamples.......................30 IV.10"Person"Detectioncondenceoverdistance..................30 IV.11"StreetSign"BoundingBoxExamples.....................31 IV.12"StreetSign"Detectioncondenceoverdistance...............32 IV.13"Bicycles"classicationsuccessovercondencethreshold..........33 IV.14"Vehicles"classicationsuccessovercondencethreshold..........34 IV.15"Dogs"classicationsuccessovercondencethreshold............35 IV.16"People"classicationsuccessovercondencethreshold...........35 IV.17"StreetSign"Classicationsuccessovercondencethreshold........36 IV.18Overallnetworkclassicationsuccessovercondencethreshold.......37 IV.19"Vehicles"Detectioncondenceoverdistance.................38 viii

PAGE 9

TABLES Table IV.1AverageNetworkRunTimes............................39 ix

PAGE 10

ListofAbbreviations CPU{ComputerProcessingUnit CNN-ConvolutionalNeuralNetwork COCO-CommonObjectsinContext CSV-CommaSeparatedVariable GIMP-GNUImageManipulationProgram GPU{GraphicProcessingUnit IoU-IntersectionovertheUnion JSON-JavaScriptObjectNotation MP-Megapixels OpenCV-OpenSourceComputerVisionLibrary PC-PersonalComputer SSD-SingleShotMultiBoxDetector VGG16-VisualGeometryGroup16 YOLO-YouOnlyLookOnce

PAGE 11

CHAPTERI INTRODUCTION Computervisionisaeldthatfocusesonextractingfeaturesfromdigitalimagesand beingabletoactontheextracteddetails.Thoseactionsincludeidentifyingwhereand whatobjectiswithinanimage.Theactionofbeingabletoautomaticallyextractfeatures fromanimageunlocksapplicationsinautonomousnavigationsystems,securitysystems, manufacturingqualitycontrol,andendlessotherpossibilitiesformakingsystemsmoreselfsucient.Deeplearningsystemsaretrainedondatasetsbysupplingaseriesoftarget results,suchasimagesofaspecicobject,andndscommonfeaturesthatcanidentify saidobject.Thetrainingdoesnotneedanyspecicrulesoranypriorguidelinessetin ordertomatchobservedparameterstoadesiredtarget.Thesedeeplearningsystemsare composedofhighlyparallelizablematrixoperationswhichcanbeecientlyprocessedby modernGraphicalProcessingUnitsGPUs. ConvolutionNeuralNetworksCCNareasubsetwithintheeldofdeeplearningandare atthebackboneofmanyvisionprocessingschemes.CNNsaremodelledafterthefunctionsof biologicalbrainswiththeneuronsmodeledbysmallprocessingunitsknownasperceptrons. Theseperceptronsarearrangedinlayerswitheachperceptronoutputfromtheprevious layerfeedingintoeachoftheinputsofthenextlayer.Therearevariouswaystoimplement theseneuralnetworksincludingadjustingtheinputweights,changingthedimensionsand sizeofthelayers,andhavingadierentnumberoflayers.Manyimplementationshavebeen createdovertheyears,eachofwhichhavingitsownvaryingecienciesandperformance accuracies. WiththedevelopmentofmoderndeeplearningframeworkssuchasTensorFlowand Cae,andwithadvancementsandincreasingavailabilityofGPUsystems,thenumberof deeplearningarchitecturesbeingdevelopedcontinuestoincrease.Ashardwaredeveloperscontinuetopushboththeperformancespecicationsandcreatenewadvancementsin smaller,standalonesystemsthatoncerequiredentirehigh-endcomputerstorun,possibil1

PAGE 12

itiesareopenedtoimplementbothmorepowerfulframeworksaswellasincreasetheneed formoreecientandmoredeployableversionsofthosesystems.Thisever-increasingdevelopmentofnewandmoreecientneuralnetworkscreatestheneedtoperformdetailed comparisonsofthevariousnetworks.Thisthesisdenesanexperimentalframeworktocompareside-by-sidedeeplearningobjectdetectionnetworksandstudytheirresultsofobjects presentinvariousdistance-basedscenarios.Theprinciplegoaloftheworkistocompare theecacyofmultipleneuralnetworksandsummarizethendingsacrossobjecttypesand scenarios. Thisthesisisorganizedasfollows:ChapterIIcoversthebackgroundofneuralGoogleNet, YOLO,SSDnetworkslearningmodels,andtheoptimizationofthosemodels.ChapterIII examinestheapproachtocapturingtheobjectimages,runningthoseimagesthroughthe variousnetworks,andoutputtingtheresults.Theexperimentresultssection,ChapterIV, showstheoutcomesoftheprocessedobjectimagesorganizedintodetectioncondenceand runtimeanalysis.Finally,ChapterVwillsummarizetheseresultsanddiscusspotential furtheradditionstothebenchmarking. 2

PAGE 13

CHAPTERII BACKGROUND NeuralNetwork Neuralnetworksarehighlyparallelizablecomputationalstructuresinuencedbytheinterconnectionofneuronsinbiologicalbrains.Inthistypeofdeeplearningsystem,data isfedthroughaseriesofltersconsistingofmanysmallercalculationunitsknownasperceptrons.Eachoutputofaperceptronfromthepreviouslayerismultipliedbyatrained weightandfedintoeachsubsequentperceptroninthenextlayer,oftenaddingabiasterm. Theseperceptronsreceiveindependentweightedinputs,sumthoseweightedoutputs,and applyanactivationfunction[1].Duringtraining,asinputtrainingimagescontinuetobe fedthroughthesystem,theseweights,whichrepresentdierentfeaturesoftheobject,are adjusteduntiltheweightedoutputsmorecloselyrepresentthetargetobject.Lossfunctions areusedtocalculatetheextenttowhichpredictedresultsmatchexpectedtargets,and weightsareadjustedtotheextentthatthelossiscausedbytheweightvalues.Theloss isbackpropagatedthroughthenetworktoadjusttheweightsviagradientcalculationsthat determinetheextenttowhicheachweightisresponsibleforthetotalloss.Onceanetwork hasbeensuccessfullytrained,theseweightscanbeinstalledintotheexistingnetworkand usedtoidentifyotherobjectswithsimilarfeatures. FigureII.1:NeuralNetworkStructureandPerceptronDiagram 3

PAGE 14

Convolutionneuralnetworkslayersareorganizedin3dimensions,depth,width,and height,andonlyconnecttoasmallersubsectionoftheperceptronsinthenextlayer.The outputwillbeaattened2-dimensionallayerwhichrepresentstheprobabilityscores.The twocomponentsofCNNsarethehiddenlayers,whichiswherethefeatureextractionoccurs, andtheclassierwhichassignsaprobabilityforeachofthetargetobject.Afrequently usedactivationfunctionforgeneratingrelativeprobabilitiesofoutputclassicationsisthe softmax.[2] ThefeatureextractionportionofaCNNiswheretheconvolutionoccurs.Aninputimage isrepresentedbyatwo-dimensionalmatrixofpixels.AlterofshapeNxNisconvolved acrosstheinputmatrixandproducesasmaller,lteredmatrixasanoutput.Eachstride, thedistanceofpixelseachlterjumpsoneachstepoftheconvolution,producesasingle elementintheoutputmatrix.Onceeachofthelterstraversestheentireinputmatrix, theoutputmatricesarethenfedintothenextlayerwithitsownsetofltersproducing potentiallyevensmallermatricesasaresult.Paddingcanalsobeaddedtotheedgesofthe outputtokeeptheoutputmatrixthesamesizeastheinput.[2]. Theoutputofaconvolutionalnetworkistypicallyadjustedtobefedintofullyconnected layersinordertoclassifytheoutputs.Thetwo-dimensionaloutputsofconvolutionallayers aretypicallyattened.Flatteningthetwo-dimensionaloutputsentailsconcatenatingallof therowsofalloftheoutputmatricesintheorderinwhichthenetworkoutputstherowsand lters,respectively.Afterattening,theconvolutionaloutputsareone-dimensionalandable tobefedintofullyconnectedperceptronsofatraditionalfullyconnectedsetoflayers.The outputlayertypicallyhasperceptronsforeachoftheanticipatedoutputs.Foradiscrete, multiclassclassication,asoftmaxactivationistypicallyapplied.Thisfunctionnormalizes theoutputtoberelativeprobabilitiesbasedonaratioofexponentialfunctions.Theoutput fromatypicalCNNmaybeaseriesofrelativeprobabilitiesthatrepresentthelikelihoodof thepresenceofcertaintargetobjects. 4

PAGE 15

RegionalCNN Therearetwoparametersneededtoachieveobjectdetectionwithinanimage,theclassofthe objectandthelocationandsizeofboundingboxfortheobject.Abrute-forceandinecient techniqueforachievingthisactionistheslidingboxmethodwheremanypre-sizedbounding boxestraversetheimage,eachoneclassifyingtheobjectswithinit.RegionalCNNR-CNN algorithmsweredevelopedtoalleviatetheinecienciesofthepreviousapproachbyusing selectivesearchtoreducethenumberofregionstocheck,performingimageclassication withinthoseselectedregions,anddiscardingtheregionswhichfailedtondaclass. Selectivesearchisperformedinaseriesofsteps.Firstregionsaregeneratedbasedona myriadofsimilarities,suchascolorandtexture,resultinginalargenumberofregionsbeing generated.Thenthesmallerregionsaremergedtogetheruntilthedesirednumberofregions totestisreached.ThoseregionsarethenrunthroughthedesiredCNNtodetermineobject classication.Althoughfasterthantheslidingboxmethod,selectivesearchisstillrather slowandisnotviableforreal-timeapplications.[15] FasterR-CNNs,andsimilarlystructuredR-CNNs,replacetheselectivesearchprocess byrunningtheimagethroughanetworkwhichisusedtopredictregionproposals.Those proposalsarefedthrougharegionofinterestpoolinglayer,thenasoftmaxlayer,tond theboundingregionswhicharetoberunthroughtheCNNforclassication.Thismethod increasesthespeedatwhichregionsaregeneratedbyalargedegree,enablingnearreal-time prediction[4]. GoogleNet Googlenetisoneoftheearlierarchitecturestobedeveloped.Itcontains22layersand utilizestheInceptionmodulewhichallowsformultiplefeatureextractorstobepresentina singlelayer[13]. 5

PAGE 16

FigureII.2:InceptionModule[14] Theinputfromthepreviouslayerisfedinto4separatelters,a1x1convolution,3x3 convolution,5x5convolution,and3x3maxpoolinglterswhicharethenconcatenatedtogetherbeforefeedingthenextlayer.Thelterslargerthan1x1havetheirdatapassed througha1x1lterfordimensionality,andthereforecomputation,reduction.Thesedierentconvolutionalscalesallowforfeaturesofvariousclustersizestobegroupedusingasingle InceptionModule[14]. YOLO TheYouOnlyLookOnce"YOLOarchitectureusesasingleshotdetectionmodel, whichutilizesactivationmapsforpredictingboundingboxes.Thisarchitecturesubdivides theimageintopre-denedboxesandoptimallymergesanyboxeswhereitidentiesthe targetobject.YOLOuses24convolutionlayersfollowedby2fullyconnectedlayers.The naloutputisafunctionofthegridsizeSxS,numberofboxesB,andnumberoflabelled classesCequalingSxSxB*5+Ctensorofpredictionswiththeoriginalversionhaving2 boundingboxes,7x7gridsize,and20labelsfora7x7x30tensor[9]. 6

PAGE 17

FigureII.3:YOLOArchitecture[9] YOLOv3YOLOversion3hasiteratedacouplegenerationspasttheoriginalYOLO implementation.Itnowusesaseriesof1x1and3x3lters,basedontheinceptionmodules oftheGoogleNetarchitecture,foratotalof53layers.ThisschemeisknownasDarknet-53. Theboundingboxesarepredictedviaanchorboxescenteredusingasigmoidfunctionlter. Logisticregressionisusedtodeterminehowlikelytheitemwithinthatboxisanobject. Asoftmaxlterwasnotusedinthisstructureasitscomputationincreasewasdeemedto outweighthenegligibleperformanceincreaseseenwithitsimplementation[11]. SSD TheSingleShotMultiBoxDetectorSSDisdesignedforreal-timedetectionand,like YOLO,isasingleshotdetectionarchitecture.TheSSDstructureutilizestheVGG16Visual GeometryGroup16-layerlterforfeatureextractionfollowedby6decreasinglysizedobject detectionlters.Thedierentsizesofltersallowfordetectionsatmultipleobjectscales[8]. 7

PAGE 18

FigureII.4:SSDArchitecture[8] Eachobjecthasadefaultboundingboxcreatedusingk-meansclusteringonthetraining dataset.Thefeaturelayersstartatthelowestscale,andthereforehighestresolution,to detectthesmallestobjectsandprogressthroughlargerscales,pickingupthelargerobjects. AlloftheboundaryboxeswithanIoUIntersectionovertheUniongreaterthan0.5when comparedtothegroundtruthboxesarethenusedforobjectdetection[5]. ExpectedPerformanceoftheNetworks BasedonthecharacteristicsofthearchitecturesofthevariousCNNsdescribedearlier, predictionsonperformancecanbeextrapolatedforhowwelleachnetworktypeshould perform. GoogleNet TheGoogleNetarchitecturetestedinthisexperimentdoesnotutilizeanyregional boundingbutanalysestheimageasawhole[3].Asaresult,thisnetworkshoulddoa betterjobclassifyingimagesthatlltheentireframebutshouldclassifywithdecreasing eectivenessasthedistanceawayfromthetargetobjectincreases. YOLO YOLOv3claimsadetectionaccuracyonparwithSSD,butat3timesthespeed.The YOLOv3publicationshowslargerobjectshavereducedaccuracieswhilesmallerobjectshave increasedaccuraciesfrompreviousYOLOiterations[11].Therefore,thelargerobjectssuch asvehiclesshouldnotperformaswellasthesmallerobjectssuchasdogs. 8

PAGE 19

SSD Duetotherstlayerhavingofaspatialdimensionof38x38,manysmallerobjectscould bemissedbySSD[5].Asaresult,SSDislikelytoperformpoorlyonverysmallobjectsand thosethatarefartheraway. 9

PAGE 20

CHAPTERIII APPROACH FigureIII.1:ExperimentProcessFlow FigureIII.1showstheoverallprocessfromstarttonishoftheexecutionoftheexperiment.Therststepistocapturetheimagesoftheobjectstobenchmark.Thoseimagesare thentransferredtoaPCwheretheresolutionisdownsampled.Theyarethentransferred againtotheTX2platformandrunthroughthevariousCNNs.Thatprocessproducesthe outputproductsasbothtextlesandimages.Theimagesshowoverlaysreportingthe objectsdetected,andthetextlearefedintothedataprocessingscripts.Thosescripts convertthedataintomoreorganizedanduseabledatawhichisthenusedtogenerateplots andguresthataidintheanalysisoftheCNNperformance. ExperimentComponents ThePython,BashScripts,andCNNinstallsincludingthemodiedsourcecodeislocated atthefollowinglocation: http://bit.ly/hawronsky_thesis ThisbenchmarkrequirestheinstallationoftheNVIDIATensorRTGoogleNet,SSD,and YOLOv3CNNsontotheNNVIDIAJetsonTX2platform.TheinstallsfortheYOLOv3[7] andSSD[6]networkswereinstalledusingthedirectionfoundatJKJung'sblog[7,6].The GoogleNetinstallationwasperformedusingthedirectionfoundDustinFranlin'sGitHub page[3].Thefulllistofitemsutilizedtoperformthisexperimentareasfollows. 10

PAGE 21

Hardware { NVIDIAJetsonTX2 CPU:HeterogeneousMulti-ProcessorDualDenver&QuadARMA57 NVIDIAPascalGPUwith256CUDAcores RAM8GB128-bitLPDDR4 RunningUbuntu16.04LTS { ImageCaptureItems MotorolaMotox4 Android8.1.0 Camera:12MP,f/2.0,1/2.55 ,1.4um,dualpixelPDAF DefaultCameraApp:4032x3024resolutioninHDRmode Tripod:SunpakTravelSmart50 VastarMobilePhoneTripodMount MeasuringTape:Pittsburgh165ft.x1/2in. CamKixWirelessBluetoothCameraShutterRemoteControl WindowsPC Software { Python2.7 { NeuralNetworksforTX2 NVIDIATensorRTGoogleNet[3] SSDonJetsonTX2[6] YOLOv3Darknet[7] { GIMP2.10ImageProcessor { BatchImageManipulationPlugin { MicrosoftExcel2016 11

PAGE 22

CapturingObjectImages Thisexperimentrequirestestdataconsistingofaseriesimagesofanobjectcaptured between5and50feetawayin5-footincrementsforatotalof10imagesperobject.In addition,inordertomaintainconstancyintheresults,3versionsofeachimagewerealso captured.Sincetheimagesneededwereveryspecictothisexperiment,theimageshadto becapturedrsthandratherthanbeingfoundinapre-compileddatabase. ObjectSelection Theobjectstestedinthisexperimentwherechosenfortwoprimaryreasons.One,they aretrainedonatleasttwooftheCCNswithoutneedingtoretrainthenetworks.Two,the captureoftheseobjectimagesmustbeachievable.Anuninterruptedpathupto50feetaway needstobeabletobedrawntotheobjectinordertoperformthecaptureprocessesinasafe andunobtrusivemanner,andthisprocessmustbeabletoberepeatedon3dierentversion oftheobject.Theobjectcategoriesofvehiclesbothfrontandsideproles,bicycles,and dogsweretrainedonall3CNNs,peopleweretrainedonYOLOv3andSSD,andstopsigns andstreetsignsweretrainedonYOLOv3andGoogleNet.Speedlimitsignsweretrained onlyonGoogleNetbutwereincludedintheresultsduetofalsepositivesofcapturingother kindsofsignsinsomeoftheYOLOv3results. Inordertomaintainconsistencybetweentheimages,alltheimageswerecapturedin sunnyorpartiallysunnyconditionsintheLittletonandCentennialsuburbsofDenver, Colorado,UnitedStates.Anyobjectsthatwerenotinaxedposition,suchasstreetsigns, hadalltheirimageversionscapturedinthesamelocation. ImageCaptureProcess Priortocapturinganobjectimage,thecapturerigmustbeassembled.Thisprocess involvesconnectingthephonetothetripodviathecellphonemountadapter.Themeasuring tapeisdrawnstraightoutfor50feetstartingfromtheedgeoftheobject.Thewirelesscamera 12

PAGE 23

shutterispoweredonandsyncedtothephone,andthecameraappinthephoneisopened. Thelegsofthetripodaredeployed,andatthispointthesetupisreadytobegincapturing images FigureIII.2:ImageCaptureProcedureDiagram FigureIII.2showstheprocessinvolvedtocapturetheimages.Thecameraonthephone isplacedatthe5-footmarkeronthemeasuringtape,andthepicturebuttonispressedon theremoteresultinginthephonetakingapicture.Thetripodisthenmovedback5feet, andtheprocessisrepeateduntilapictureistakenonthe50-footmarkerforatotalof10 picturescapturedoftheobject.Thisentireprocessisrepeateduntilall3versionsofeach objectatall10distancesiscaptured. TheimagesarethentransferredtoWindowsPCwheretheyareresized,organized,and renamedtotintothelestructurerequiredoftheexperiment.UsingtheGIMPsoftware withtheBatchImageplugin,theimagesaredownsampledfrom4032x3024to1920x1440 resolution.Asecondtrialwasalsoperformedwiththeimagesdownsampledto480x360, butthoseresultswillbediscussedinitsownsectionandallotherreferencesinthispaper willbetothe1920x1440run.Theimagesarethenrenamedandplacedinthemasterimage folderinthefollowingfolderstructure[objectname]/[objectiteration]/[distance].jpgwith areal-worldexamplebeingbicycle/3/25.jpgrepresentinganimageofthe3rdbicycletaken at25feetaway.Oncealltheimageshavebeenformatted,theyaretransferredtotheTX2 platformforprocessingthroughtheneuralnetworks. ObjectImageexamples Thefollowingaresamplesoftheimagescapturedforthisexperiment. 13

PAGE 24

Vehiclesfrontprole FigureIII.3:Vehicles1,2,and3frontproleat10Feet FigureIII.4:Vehicle2frontproleat10and50Feet Vehiclessideprole FigureIII.5:Vehicles1,2,and3frontproleat10Feet 14

PAGE 25

FigureIII.6:Vehicle2frontproleat10and50Feet Bicycles FigureIII.7:Bicycles1,2,and3at10Feet FigureIII.8:Bicycle2at10and50Feet 15

PAGE 26

Dogs FigureIII.9:Dogs1,2,and3at10Feet FigureIII.10:Dog2at10and50Feet People FigureIII.11:People1,2,and3at10Feet 16

PAGE 27

FigureIII.12:Person2at10and50Feet SpeedLimit,Stop,andStreetSigns FigureIII.13:StreetSigns1,2,and3at10Feet 17

PAGE 28

FigureIII.14:StreetSign2at10and50Feet RunningTheNeuralNetworkBenchmark TheneuralnetworkbenchmarkingisrunontheNvidiaTX2platform.Aspecialtyscript, thebenchmarkrunscript,waswrittentobulkprocessalltheimageswithoutadditional userinteractionafterbenchmarkhasstarted.Thisbenchmarkprocesseseachimage,runsit througheachoftheCNNs,andrecordsthedetectioncondencesandrun-timesofeachof thoseCNNruns.Thedataisthenrunthroughanotherscript,therunresultsformatting 18

PAGE 29

script,whichhandlesthelteringandsortingofthedatatoallowforcleanerandeasier analysisandplotting. FigureIII.15:BenchmarkRunProcedure FigureIII.15showsthebenchmarkrunprocedureontheTX2platform.FirsttheTX2 willberestartedinordertoclearoutanyadditionalprocessesthatmayaecttherun-time performanceofthebenchmark.Thesystemisthenputintohighperformancemode,and thebenchmarkrunscriptisstarted.Thisportionofthebenchmarktakesacouplehours, sothesystemisleftaloneuntilitnishes.Oncethepreviousscripthascompletedandthe outputdatahavebeenproduced,thedataprocessingscriptisrunwhichconditionsupdata foranalysis. Benchmarkprocessow Sincenewneuralnetworksarefrequentlybeingdeveloped,additionalinterestingobjects tobetestedareplentiful,andnewandinterestingtestcombinationscancometolight,the experimentcodewasdesignedwiththeeaseofexpandabilityinmind.Inordertoachieve thisgoal,theruncodewasdesignedtobeabletobeconguredsimplybyaddingorremoving eitherobjectimagesornetworkrunscriptsintotheirdesignatedfolders. 19

PAGE 30

NeuralNetworkWrappers FigureIII.16:DynamicObjectExpandability FigureIII.17:DynamicNetworkExpandability Eachoftheneuralnetworkcodesetswereindependentlydesignedbythirdpartieswhich utilizedvariouscodinglanguagesandruncommands.Asaresult,aspecializedrunscriptwas necessarytobedevelopedforeachneuralnetworktoallowforuniformcommanding.This uniformitywaskeyforunlockingtheabilitytoallowdynamicselectionofwhichnetworks torunsimplybyaddingorremovingbashscriptsfromthedesignatedfolder.Inadditionto wrappinganindividualnetwork'sruncommands,thescriptsalsomovetheoutputimages andresultreportsfromtheirrespectiveinstallfolderstoauniformresults"folderatthe topleveloftheexperimentlehierarchywhentheoriginalruncommandsdonothaveinput parametersforthisaction. 20

PAGE 31

NeuralNetworkCodeChanges Inordertoextractthedetectioncondencesofeachimagerunthroughthenetworksand controltheoutputformatting,itwasnecessarytomakechangestothe3neuralnetworks' sourcecodes.Thesechangesinvolvedaddingleprintstatementswherethecondences arereportedandcreatinguniformlocationswithineachoftheinstalldirectorieswherethe resultsaresaved.Inaddition,withintheGoogleNettrainedimagedatabase,someslight formattingandadjustmentoftheobjectnamesoccurredinordertohaveslightvariousof subcategoriesofthesamebroadertypeofobjectreportsuccessfulmatcheswiththeResult FormattingScript. BenchmarkRunScript FigureIII.18:BenchmarkRunScriptDiagram ThemainexperimentdriverutilizesaPythonscripttocontrolthetestpreparation, imageloading,andnetworkcommandingrequiredtorunacompletetrialthroughevery combinationoftestobjectandneuralnetworkthatisconguredforthesystem.When thescriptisstarted,thepreviousresultsareremoved,andnewlesarecreatedsincethe resultswillbeappendedtotheirrespectivelesastheyaregenerated.Deletingtheold resultsautomaticallyensureslatedataisnotmistakenlyincorporatedintothenewresults. Therearetwonestediterativelyloopingstructureswithinthisscript,oneforaccessingevery 21

PAGE 32

image,oneatatime,withinallthesubdirectoriesoftheinputimagefolder,andtheotherfor iterativelyloopingthrougheachofthebashscriptsthatrunonthecurrentlyloadedimage. Therstlevelofthenestedlooploadsthenextimagetotheprocess,andthensecond levelofthenestedloopbeginsitsfunction.Onceallloopiterationsofthesecondlevel havecompleted,thescriptwillcheckifanymoreimagesremain.Ifthereareanyimages remaining,thescriptwillloopbackandrepeattheprocesswithanewimage.Otherwise, theloopendsandthenetworkrunningportionoftheexperimentcompletes.Thesecond levelofthenestedloopiswheretheactualneuralnetworkprocessingoccurs.First,theinput andoutputpathsarecalculated.Arun-timecounterisstarted,andthenextneuralnetwork bashscriptisthenqueuedwiththosepathspassedasinputparameters.Theimageloaded intherstloopiterationisthenrunthroughthecurrentnetwork,whichaddstotheobject classicationreportforthecurrentnetwork,andcreatesanoutputimagewithcondences and/orboundingboxessuperimposedaroundtheobjects.Oncethebashscriptends,the runtimeisgathered,andthebenchmarklehasappendedthefollowinginformation:the imageused,networkran,thedistancetheobjectisawayfromthecamera,andtheruntime ofthebashscript.Theloopswillrepeatuntiltherenomorenetworkbashscriptsafterwhich itwillkickbacktotherstlevelloop. ResultFormattingScript FigureIII.19:ExampleofProcessedData 22

PAGE 33

Oncethemainexperimentdriverhascompleted,theprocessresultsscriptcanberun inordertogathertheresultsintoorganizedanduseableformatting.Eachnetworkrun previouslyproducedalecontainingalistofresultsabouteachimagethatwasprocessed. Eachlineofthislecontainstheinformationaboutasingleobjectimagethatwasrun includingthelenameandapairofeldswiththenameoftheobjectdetectedandits detectioncondence.Thatpairexistsforeverysingledetectionthatoccurredwithinthe image.Theprocesseddataisconvertedtothefollowingformat:networkrun,targetobject, objectiteration,distance,targetobjectvariantreportedifapplicable,andcondenceof detectionifapplicable. FigureIII.20:ResultFormattingScriptDiagram Inordertoachievethisformatting,theprocessingdatascriptdynamicallyloopsthrough eachoftheresultles,oneforeachnetworkrun,andreformatsthatinformation.The CommaSeparatedVariableCSVresultsleisparsed,andtheresultsareltered.When eachnewlineisread,thecontentsaresplitintocategoriesdelimitedbycommas.The staticitems,objecttype,objectiteration,anddistance,areextractedfromthelepath. Thescriptthenloopsthroughthedetectionpairsandwillcheckifanyword,delimitedby underscores,intheobjectnameappearsinanywordintheobjectvariantreported.This actionisperformedassomeofthenetworksaretrainedonmanysubcategoriesofthesame 23

PAGE 34

broaderitems,suchasbreedsofdogs,andthescopeofthisexperimentisinterestedinthe broadercategory.Ifamatchisfound,thatdetectionandcondencewilleitherbesavedif itdoesnotyetexistoriftherealreadyexistsamatchthenthehighestcondenceofthe currentandpreviousmatchwillbechosenandtheotherdiscarded.Onceallthedetection pairshavebeennarroweddowntothesinglebestmatch,thenetworktype,staticitemsand remainingdetectionpairorleftblankifnomatchesarefoundareappendedtothemaster outputlist. AftertheCSVlehasbeenparsedandthedataltered,theresultsaresortedsincethey mayberead,andsubsequentlyoutput,innoparticularorderusingthefolderloopingPython libraries.Thedataisthenprocessedintodictionariessothatitcanbeeasilyorganizedand processedinthefuture.Theresultsareappendedtothemasterdictionaryofalltheparsed networkresults.Thescriptthenchecksforanyremainingresultslesandrepeatstheprocess untilalltheresultsleshavebeenprocessed.Oncecompleted,thescriptwillhaveproduced twooutputles.OneisthemasterdictionaryoutputsavedasaJavaScriptObjectNotation JSONle,andtheotherisltereddatapriortoanysortingsavedasaCSVle.Both containthesamedatawithdierentformatting. CompilingResults Ascriptforplottingthedataintobargraphswasdeveloped,howeveritwasdicultto readandwasabandonedinfavorofusingtheExcelgeneratedplots. OncethebenchmarkprocedurehasbeencompletedontheTX2andthedatahasbeen formatted,theresultingtextreportsaretransferredtotheWindowsPC.Thedataisloaded intoMicrosoftExcelwhereitissorted,andvariousplotsaregeneratedforanalysis. 24

PAGE 35

CHAPTERIV EXPERIMENTRESULTS ThisExperimentwasperformedinordertocompareobjectdetectionperformanceof thedierentconvolutionneuralnetworksatvaryingdistancesfromtheobject.Inorderto achievethisgoal,theoutputdatawascompiledintoresultsfocusingontheperformanceof eachobjectandCNNcombination,thehit-rateoftheobjectsbeingdetected,performance comparisonsbetweenthenetworks,objectsfrommultipleperspectives,andtherun-time performance.Inordertokeeplikeresultsgroupedinthisexperiment,boththedierent streetsignstypesandvehicleperspectiveswillbeanalyzedasagroupintheirrespective sections. MachineLearningDatasets Thetraineddatasetsusedforobjectdetectionforthisexperimentwherethosethat weresuppliedwiththeCNNinstalls.TheGoogleNetdatasetwastheILSVRC12ImageNet databasewhichcontains1000objectclassications[12].TheSSDnetworkwasrunusing thepre-trainedSSD300caemodel[6].TheYOLOv3networkwasrunusingtheyolov3.cfg weightswhichwastrainedontheCommonObjectsinContextCOCODataset[10].There wasnoadditionaltrainingperformedasthedefaultdatasetweresucienttoproveoutthe experiment. PerformanceofObjectDetectionforeachObjecttype Thissectionlooksatthedetectioncondenceofeachobjecttypeasthedistanceaway fromtheobjectincreases.Eachobjectisanalyzedusingallversionsoftheobjecttypeas wellaseachiterationofthenetworkonwhichitwasrun. FiguresIV.2,IV.4,IV.6,IV.8,IV.10,andIV.12showthecondenceofdetectionfor eachobjectandneuralnetworkcombinationacrossthedistanceatwhichtheobjectimage wascapturedontherightandshowsthecondenceofdetectionaveragedoverallobjects 25

PAGE 36

versionsforeachneuralnetworkacrossthedistanceatwhichtheobjectimagewascaptured ontheleft. BicycleAnalysis FigureIV.1showstheboundingboxforleft:YOLOv3,middle:SSD,right:GoogleNet Note:GoogleNetdoesnotdrawboundingboxes. FigureIV.1:"Bicycle"BoundingBoxExamples FigureIV.2:"Bicycle"Detectioncondenceoverdistance Forthebicycleimages,theYOLOv3detectioncondenceremainedveryhighthroughout allobjectversionsanddistances.Thecondenceneverdroppedbelow90%andonlystarted tomoveawayfrom100%after35feetforallobjects.SSDperformedwellattheshorter distancesandstartedtodropfromaround100%after15feet.Thecondencesthenbegan tofallalmostlinearlyuntil45feetwhereitfailedtomakeanydetectionsatallforanyofthe objectversions.GoogleNetaveraged60%detectioncondenceat5feetbutdroppedtoator near0%fortheremainderofthedistances.ForBicycles,YOLOv3easilyoutperformedthe 26

PAGE 37

othertwo,withSSD'sperformancebeingmediocreandGoogleNetcominginlastmaking almostnodetectionsthroughouttheexperiment. AutomobilefrontperspectiveAnalysis FigureIV.3showstheboundingboxforleft:YOLOv3,middle:SSD,right:GoogleNet Note:GoogleNetdoesnotdrawboundingboxes. FigureIV.3:"Vehiclesfront"BoundingBoxExamples FigureIV.4:"Vehiclesfront"Detectioncondenceoverdistance Fortheautomobileimagestakenfromthefrontperspective,theSSDnetworkperformed verywell.Exceptforasinglemisseddetectionononeoftheimagesat50feet,theresultswere near100%detectioncondenceatalldistances.ThisobjectwastheonlyexampleofSSD outperformingYOLOv3onanimageset.YOLOv3performedquitewellthough,maintaining anaveragedetectioncondenceabove80%atalldistances.ForObject3pastdistancesof 20feet,GoogleNetandYOLOv3performsimilarly,howeveronaverageGoogleNetwasthe worstperformingofthe3.ItisinterestingtonotethatGoogleNetperformedbetteratthe 27

PAGE 38

middledistanceranges,producingabellcurvethatpeakedat80%at30and35feetwhile performingworseattheshorterandlongerdistances. AutomobilesideperspectiveAnalysis FigureIV.5showstheboundingboxforleft:YOLOv3,middle:SSD,right:GoogleNet Note:GoogleNetdoesnotdrawboundingboxes. FigureIV.5:"Vehiclesside"BoundingBoxExamples FigureIV.6:"Vehiclesside"Detectioncondenceoverdistance Fortheautomobileimagestakenfromthesideperspective,theSSDandYOLOv3networksperformedadmirably,maintainingnear100%detectioncondencesforallobjects acrossalldistance.GoogleNetdidnotfareaswell,maintaininganaveragecondenceof around70%upto15feet,thenaquickdropto30%at20feetwithalinearfalltoward20% at50feet. 28

PAGE 39

DogAnalysis FigureIV.7showstheboundingboxforleft:YOLOv3,middle:SSD,right:GoogleNet Note:GoogleNetdoesnotdrawboundingboxes. FigureIV.7:"Dog"BoundingBoxExamples FigureIV.8:"Dogs"Detectioncondenceoverdistance Noneofthenetworksperformedwithstellarresultsfortheimagesofdogs.Thisresultis likelyduetotheselectionofverysmallbreedsfordogs2and3.YOLOv3outperformedboth othernetworks.Itistheonlyonetodetectthesmallerdogspast5feet,butitdidhavea steepperformancedropafter15feet.ItdetectedDog1upto20feetwith100%condence, and84.77%condenceat25feet,butdroppedunder10%fortheremainingdistances.SSD detecteddog1withgreaterthan80%condenceupto15feetbutfailedtodetectitatlarger distances.GoogleNetaveragedadetectioncondenceof40%at5feetbutfailedtodetect atanyotherdistances. 29

PAGE 40

PeopleAnalysis FigureIV.9showstheboundingboxforleft:YOLOv3,right:SSD. FigureIV.9:"Person"BoundingBoxExamples FigureIV.10:"Person"Detectioncondenceoverdistance Forthepeopleimages,YOLOv3maintainednear100%condenceforalldistances.SSD hadstrongcondencesupto15feetbutuctuatedbetween0andabout60%fortwoofthe peopleimagesevery5feetbeforelevelingouttoeither0or60%at40feet.Itfailedtodetect the3rdatallpast15feet.GoogleNetwasnottrainedagainstpeople,andtheresultswere omitted. 30

PAGE 41

StreetSignsAnalysis FigureIV.11showstheboundingboxforleft:YOLOv3,right:GoogleNetNote:GoogleNet doesnotdrawboundingboxes. FigureIV.11:"StreetSign"BoundingBoxExamples 31

PAGE 42

FigureIV.12:"StreetSign"Detectioncondenceoverdistance Note:Yolowasn'tspecicallytrainingonspeedlimitorstreetsigns.Anydetections thatoccurredwerefalsepositivesofstopsignsorstopsignswhichwerealsointheimage. SDDwasnottrainedforsignsingeneralsoitwasleftoutoftheanalysis. Forbothspeedlimitandstreetsigns,GoogleNetdetectionshadaquickperformance 32

PAGE 43

dropafter10feet.Forstopsigns,YOLOv3maintainednear100%condencefortheimage sets2and3andonlydroppedto80%at50feetforimageset1.Thedetectioncondencefor stopsign1begantodropoabout20%per5feetafter30feet.GoogleNetaverageabout 50%condenceat5feetbutdroppedtonear0%withtheexceptionofoneanomalousspike toabout35%at25feetforstopsign1. SuccessfulDetectionHitRateAnalsis Thissectioncoverstheoverallhitrateofallobjectversionsanddistancesatdetection condencethresholdsfrom50%to100%.Hits"aredenedasthedetectioncondence beingequaltoorgreaterthanthethresholdvalue.Theanalysisinthissectiondoesnot distinguishbetweenthedierentdistancesatwhichtheimageswerecaptured.Eachobject had30totalsamplesobjectversionsat10distances.NotethatSSDdoesnotreport 100%condences,onlypercentagesexceedinglyclosetothem,soSSDalwaysreportsa0% hitratewhenthethresholdisat100% FiguresIV.13,IV.14,IV.15,IV.16,andIV.17showthesuccessfuldetectionsofobjects overcondencethresholds. BicycleAnalysis FigureIV.13:"Bicycles"classicationsuccessovercondencethreshold Forthehitrateanalysisontheimagesofbicycles,YOLOv3maintaineda100%hitrate untilthecondencethresholdhit93.3%and70%atathresholdof95%and100%respectively. 33

PAGE 44

SSDbeganwithahitrateof73.3%andslowlydroppedasthethresholdincreasedending ata36.67%hitrateata95%threshold.SSDperformedpoorlywithonlya6.67%hitrate atlowestthresholdanddroppingtonohitsatallatan80%threshold. AutomobileAnalysis FigureIV.14:"Vehicles"classicationsuccessovercondencethreshold FigureIV.14 Forthehitrateanalysisontheimagesofvehiclefromthefrontperspective,bothSSDand YOLOmaintained100%hitratesuntil95%thresholdwheretheybothhadasteepperformancedecrease;however,YOLOdidreporta43.3%hitrateat100%threshold.GoogleNet beganwitha30%hitratebutwasatnear0%oncethethresholdhit75%. Forthehitrateanalysisontheimagesofthevehiclesfromthesideperspective,Both YOLOv3andSSDmaintained96.7%orhigherhitrateuntilan80%thresholdwhereSSD helditshighhitrate,butYOLOv3hadnearlinearfallasitreacheda0%hitrateat100% threshold.GoogleNetperformedquiteabitbetteronimagesfromthesideperspective thanimagesfromthefrontperspective,butstilldidn'tmanagetokeepupwiththeother networks,startingata60%hitrateandgraduallyfallingto0%asthethresholdhit100%. 34

PAGE 45

DogAnalysis FigureIV.15:"Dogs"classicationsuccessovercondencethreshold Forthehitrateanalysisontheimagesofdogs,allnetworks'hitratesheldfairlyat withonlysmalldropsasthethresholdwasincreased.Thisresultislikelyduetothelow hitratesoverallinthisobjectcategorywithYOLOv3,SSD,andGoogleNethaving36.7%, 16.7%,and3.33%hitratesrespectivelyata50%detectionthreshold. PeopleAnalysis FigureIV.16:"People"classicationsuccessovercondencethreshold Forthehitrateanalysisontheimagesofpeople,YOLOv3helda100%hitrateforall thresholdsotherthan100%whichhada76.7%hitrate.SSDhelda56.7%hitrateuntil fallingto40%atthe60%threshold.Fromthere,thehitratefellgraduallyuntilsettlingat 20%startingata85%threshold.GoogleNetwasnottrainedagainstpeopleandtherefore excludedfromthisanalysis. 35

PAGE 46

StreetSignsAnalysis FigureIV.17:"StreetSign"Classicationsuccessovercondencethreshold FigureIV.17TopLeft:Resultsofspeedlimitsigns.TopRight:Resultsofstopsigns. Bottom:Resultsofstreetnamesigns Forthehitrateanalysisontheimagesofspeedlimitsignsandstreetnamesigns,only GoogleNetwastrainedtodetectthese,butitneverhadahitrategreaterthan23.3%. Forthehitrateanalysisontheimagesofstopsigns,GoogleNetwasnotspecically trainedonstopsigns,butitdidpickthemupasgenericstreetsigns.Asaresult,the detectiononthesewasfairlyweakinthatitonlyhadahitrateof6.67%at50%condence thresholdanddroppedto0%at75%threshold.YOLOv3wastrainedtodetectstopsigns, however,andperformedmuchbetter,maintaininggreaterthan80%hitrateuptoa90% threshold,andonlyfalling46.7at100%condencethreshold. 36

PAGE 47

NetworkPerformanceComparisons Thissectioncoverstheoverallcomparisonforthehitrateperformanceofallthedierent objecttypesacrossthethreeCNNs.FigureIV.18showsthesuccessfuldetectionsofeach objectforeachofthenetworksovercondencethresholds. FigureIV.18:Overallnetworkclassicationsuccessovercondencethreshold YOLYv3performedthebestoutofthethreenetworksmaintaininganear100%hitrate for3of8objectsbicycle,vehiclefrontperspective,andpeoplewhencomparedagainsta detectionthresholdof90%and4of8atathresholdof75%vehiclesideperspectivebeing thefourth.SSDalsohadsimilarresultsforboththevehicleperspectives.YOLOv3hadthe highesthitratefordogswithamax36.7%,comparedto16.7%forSSDandonly3.3%for GoogleNetat50%condencethreshold.SSDperformedmiddleoftherodeforpeopleand bicyclehitrateandperformedabouthalfaswellasYOLOv3ondoghitrates.GoogleNet consistentlyunderperformedonallobject,withonlymediocreresultsonthevehiclefront 37

PAGE 48

perspective. ObjectsAnalizedfromMultiplePerspectives Aninterestingcomparisonwastoseehowtakingimagesofthesameobjectfrommultiple perspectivesmightaecttheoutcomeoftheobjectdetections.Thisactionwasachievedby usingtheimagesofthevehicles.Onesetofimagesweretakenfromthefrontandtheother setweretakenfromthepassengerside. FigureIV.19showsthecondenceofdetectionforeachautomobileandneuralnetwork combinationacrossthedistancefromtheobjectatwhichtheimagewascapturedonthe right,andshowsthecondenceofdetectionaveragedoverallautomobilesforeachneural networkacrossthedistancefromtheobjectatwhichtheimagewascapturedontheleft. FigureIV.19:"Vehicles"Detectioncondenceoverdistance Forboththefrontandthesideperspectives,SDDperformednear100%foralldistanceswiththeexceptionofasinglemisseddetectionat50feetonvehicle1.YOLOv3 38

PAGE 49

performednear100%onthesideperspective,howeveritsaveragedetectioncondenceuctuatedbetween80%and90%onthefront.GoogleNetperformedquiteabitbetteronthe frontperspectivethanontheside.Aninterestingnoteisthatfromthefrontperspective, GoogleNet'saveragecondenceactuallyincreasedasthedistanceincreased.However,the condencefellwithdistancefromthesideperspective,whichistheexpectedresult. NetworkRun-TimeAnalysis ThissectionlooksattheruntimesofeachofthenetworksrunningontheNvidiaJetson TX2platform.Itincludesthetimeforsomecleanupoperationssuchasmovingthebounding boximagestothecommonresultsfolder.Thereisalsoalittlebitofoverheadswitching betweenthenetworksratherthanperformingbulkimageoperationthatsomeoftheCNN installshaveavailable. TableIV.1:AverageNetworkRunTimes AverageRunTimeseconds Object n Network GooglenetSSDYOLO Bicycle 4.106+/-0.0938.100+/-0.02415.448+/-0.044 Automobilefront 3.990+/-0.0668.197+/-0.10515.590+/-0.114 Automobileside 3.768+/-0.1138.180+/-0.02815.599+/-0.064 Dog 3.764+/-0.0838.188+/-0.01815.353+/-0.029 Person 4.101+/-0.1298.159+/-0.02315.536+/-0.045 SpeedLimitSign 4.089+/-0.1278.127+/-0.02015.449+/-0.031 StopSign 4.217+/-0.1308.160+/-0.02015.429+/-0.034 StreetNameSign 3.685+/-0.0658.120+/-0.01415.647+/-0.046 TableIV.1showstheaverageruntimewithstandarddeviationofeachobjecttype runningoneachneuralnetwork. Theruntimeforallthenetworksremainedveryconsistentbetweentheruns.Whether ndingadetectionornot,variationintargetobject,ornumberofdetectionsfoundwithin 39

PAGE 50

agivenimagedidnotseemtohaveanybearingonthetimeittooktocompletearunfor anyofthenetworks.YOLOv3averaged15.506seconds,SSDaveraged8.154seconds,and GoogleNetaveraged3.965secondsbetweenalltheirrespectiveruntimes.Thedetection performanceinthisexperimentappearstobeinverselyproportionaltotheruntimesofthe CNNs. Theseruntimesforallthenetworksappeartobequitehighandimpracticalforuseina real-timesystem.Atrstitwaspostulatedthattheslowruntimeswerearesultofrunning highresolutionimagesthroughthenetworks.Afterdownscalingtheimagesfrom1920x1440 to480x360andrunningtheexperimentagain,therewassomeimprovementtotheruntimes byabouta1seconddecreasepernetworkrun.However,theoverallperformancewasstill veryslowandnotpracticalforrealtimeapplications. Diggingintotheunexpectedlyslowperformance,theconsoleoutputshowedbenchmark runtimesthatdidnotmatchwhatwasbeingreportedbytheexperimentbenchmarks.The built-inbenchmarkfortheYOLOv3systemreportsanaveragepredictiontimeof0.5seconds totheconsole,however,whenviewingtheconsoleoutputinrealtimeasitreportedtothe screen,itisclearmostoftheactualruntimeis105resolutionresizingandconvolutionsteps priortorunningtheYOLOv3networkitself.TheruntimeoftheCNNpredictionisquitea bitfasterthantheoveralltime.Therefore,theseresultsshouldbeviewedonlyinthelightof overallnetworktimesincludingalloverheadinvolved,andnotasacomparisonofinference times. 40

PAGE 51

CHAPTERV CONCLUSION Theobjectiveofthisthesiswastoevaluatetheperformanceofvariousconvolutional neuralnetworksastheobjectstheyareattemptingtodetectchangedistancesfromthe camerathatcapturedthem.Theobjectiveistodeliveranalysisofthenetworkssobetter andmoreaccurateselectionofwhichneuralnetworkstodeployforagivenapplicationcanbe selected.Theworkexpandstheknowledgeofestimatinghowwelldierentneuralnetworks performatvariousdistancesandwithvarioustargetdetectionobjects.Thecontributionof thatworkincludescreatingplatformagnosticanddynamicallycongurablebenchmarksfor testingdierentimagesetsandneuralnetworks,modicationstotheneuralnetworksource codetoextractrelevantdetectiondata,buildingarun-timebenchmark,andbuildingdata lteringandanalyzingscriptstobettersortthroughthedata. Astheexperimentsprogressed,itbecameclearthatthereweresomeelementsthatcould beaddedtoimprovethendingsthatwerebeyondthescopeofthisexperiment.Itwouldbe benecialtoaddtrainingtothenetworks,sothatalltheobjecttypescouldbecomparedto eachofthenetworktypes.Additionalnetworksordieringweightscouldbeconguredfor thepreviousnetworkstocompareperformancesmetrics.Astherunscriptsweredesigned todynamicallyaddnetworks,thisactionwouldrequireminimalreworktotherunprocess oftheexperimentotherthantheinitialnetworkinstallationsifnecessary.YOLOv3could simplypointtonewtrainingweightsintherunscriptwrappercode.Anelementthatwould alsohelpwouldbetoaddcodethatwouldautomaticallygraphtheresults.Afoundationof thiscodewasdeveloped,butasthecomplexityandvarietyoftheneededgraphsgrew,the abilitytoupdatethatgraphingcodeoutgrewtheprogressoftheexperimentandthedata hadtobemanuallyenteredintothegraphingtoolinstead. OveralltheYOLOv3networkoutperformedtheothersbylargemargins.Itwastheonly networktoconsistentlydetectobjectsatrangesgreaterthan25withonlyminimaldrop-o indetectioncondence.However,itwasmuchslowerthantheothers,sothatfactshouldbe 41

PAGE 52

consideredwhenselectinganetworktouse.TheSSDnetworkperformedwellatdistances lessthan20feetbutquicklybegantofalloafterwardsinmost,butnotallcases.Itwasin themiddleintermsofruntimeforthenetworks.GoogleNettendedtohavelessthanstellar performanceacrossallobjects.Itrarelybrokethe90%condencethresholdandhadsteep dropsinperformanceasthedistancesincreased.Itwashoweverthefastest,nishing2times fasterthanSSDand4timesfasterthanYOLOv3inthisexperiment. Thevehicleshadthemostconsistentdetectionsacrossallthenetworks.Thisresult islikelyduetothembeingmuchlargerthantheotherobjectstestedmakingthemeasier todetectasthedistancesincreased.Asthenetworksweretrainedinthisexperiment, YOLOv3toppeddetectionscondencesforbicycles,dogs,people,andstopsigns.SSD slightlyoutperformedYOLOv3onthedetectioncondencesonthefrontofvehicles,butit performedonparwiththesidesofvehicles.GoogleNetonlyoutperformedinstreetsigns andspeedlimitsignswhichtheothernetworkswerenotspecicallytrainedtodetect. 42

PAGE 53

REFERENCES [1]C.Berger.Perceptrons-themostbasicformofaneuralnetwork. https://appliedgo. net/perceptron/ ,Jun2016.Accessed:December11,2018. [2]D.Cornelisse.Anintuitiveguidetoconvolutionalneuralnetworks. https: //medium.freecodecamp.org/an-intuitive-guide-to-convolutional-neuralnetworks-260c2de0a050 ,Apr2018.Accessed:December11,2018. [3]D.Franklin.Deepvisiontutorial,jetsonagxxavier. https://github.com/dusty-nv/ jetson-inference ,Dec2018.Accessed:December09,2018. [4]R.Gandhi.R-cnn,fastr-cnn,fasterr-cnn,yolo|objectdetectionalgorithms. https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnnyolo-object-detection-algorithms-36d53571365e ,Jul2018.Accessed:December 16,2018. [5]J.Hui.Ssdobjectdetection:Singleshotmultiboxdetectorforreal-timeprocessing. https://medium.com/@jonathan_hui/ssd-object-detection-single-shotmultibox-detector-for-real-time-processing-9bd8deac0e06 ,Mar2018.Accessed:December11,2018. [6]J.K.Jung.Singleshotmultiboxdetectorssdonjetsontx2. OnlineCourse ,Nov2017. Accessed:December09,2018. [7]J.K.Jung.Yolov3onjetsontx2. https://jkjung-avt.github.io/yolov3/ ,Mar 2018.Accessed:December09,2018. [8]W.Liu,D.Anguelov,D.Erhan,C.Szegedy,S.E.Reed,C.Y.Fu,andA.C.Berg.SSD: singleshotmultiboxdetector. CoRR ,abs/1512.02325,2015. [9]J.Redmon,S.K.Divvala,R.B.Girshick,andA.Farhadi.Youonlylookonce:Unied, real-timeobjectdetection. CoRR ,abs/1506.02640,2015. [10]J.RedmonandA.Farhadi.Yolo:Real-timeobjectdetection. https://pjreddie. com/darknet/yolo/ ,2018.Accessed:December11,2018. [11]J.RedmonandA.Farhadi.Yolov3:Anincrementalimprovement. CoRR , abs/1804.02767,2018. [12]O.Russakovsky,J.Deng,H.Su,J.Krause,S.Satheesh,S.Ma,Z.Huang,A.Karpathy, A.Khosla,M.Bernstein,A.C.Berg,andL.Fei-Fei.ImageNetLargeScaleVisual RecognitionChallenge. InternationalJournalofComputerVisionIJCV ,115:211{ 252,2015. 43

PAGE 54

[13]F.Shaikh.10advanceddeeplearningarchitecturesdatascientistsshould know! https://www.analyticsvidhya.com/blog/2017/08/10-advanced-deeplearning-architectures-data-scientists ,Aug2017.Accessed:December11, 2018. [14]C.Szegedy,W.Liu,Y.Jia,P.Sermanet,S.E.Reed,D.Anguelov,D.Erhan,V.Vanhoucke,andA.Rabinovich.Goingdeeperwithconvolutions. CoRR ,abs/1409.4842, 2014. [15]S.H.Tsang.Review:R-cnnobjectdetection. https://medium.com/coinmonks/ review-r-cnn-object-detection-b476aba290d1 ,Aug2018.Accessed:December16, 2018. 44