GPU - Agenda

Transcription

GPU - Agenda
GPUs forStatisticalDataAnalysisinHEP:
aperformancestudyofGooFit onGPUsvs RooFit onCPUs
AdrianoDiFlorio
UNIVERSITA’DEGLISTUDIDIBARI“ALDOMORO”&I.N.F.N.SEZIONEDIBARI
CCRMeeting&GiuliaFinzi Symposium
- 5th July-
Rome
1/20
Outline
Introductionto GPUcomputing & GooFit
Pseudo-experimentsforp-valueestimation: GooFit vs RooFit performancestudy
ExploringtheapplicabilitylimitsofWilks theorem
Summary& Outlook
CCRMeeting/ July5th
Adriano DiFlorio (BariUniversity&INFN)
Introduction:GPUcomputing&GooFit
CCRMeeting/ July5th
Adriano DiFlorio (BariUniversity&INFN)
WhyGPUcomputing?Moore’sLaw
Physical limit:heat dissipation
𝑃 = C×𝑉×𝑓 (
V– working tension
C– capacity
f – clockfrequency
Futuredevelopments
cannot rely anymore onan
exponential growth of
frequency
Anewapproach is needed: apossible solution is GPU-computing.
CCRMeeting/ July5th
Adriano DiFlorio (BariUniversity&INFN)
1/24
GPUs’architecture
“If you were plowing a field, which would you rather use:
Two strong oxen or 1024 chickens?”
Seymour Cray
CCRMeeting/ July5th
Adriano DiFlorio (BariUniversity&INFN)
2/24
GPUs’architecture
“If you were plowing a field, which would you rather use:
Two strong oxen or 1024 chickens?”
Seymour Cray
We definetely choose the chickens
CCRMeeting/ July5th
Adriano DiFlorio (BariUniversity&INFN)
2/24
GPUs’architecture
GPU
CPU
What is a GPU? Graphic Processing Unit
CCRMeeting/ July5th
Adriano DiFlorio (BariUniversity&INFN)
3/24
GPUs’architecture
What is a GPU? Graphic Processing Unit
1970s: first graphical user interface
produced requiring dedicated microchips
GPU
CPU
Video games and 3D graphics: strong
economic stimulus for GPU development
CCRMeeting/ July5th
Adriano DiFlorio (BariUniversity&INFN)
3/24
GPUs’architecture
What is a GPU? Graphic Processing Unit
1970s: first graphical user interface
produced requiring dedicated microchips
CPU
Video games and 3D graphics: strong
economic stimulus for GPU development
Consequences on GPU architecture:
GPU
Thousands of cores
CCRMeeting/ July5th
Adriano DiFlorio (BariUniversity&INFN)
3/24
GPUs’architecture
What is a GPU? Graphic Processing Unit
1970s: first graphical user interface
produced requiring dedicated microchips
CPU
Video games and 3D graphics: strong
economic stimulus for GPU development
Consequences on GPU architecture:
Thousands of cores
GPU
Big loads of data
CCRMeeting/ July5th
Adriano DiFlorio (BariUniversity&INFN)
3/24
GPUs’architecture
What is a GPU? Graphic Processing Unit
1970s: first graphical user interface
produced requiring dedicated microchips
CPU
Video games and 3D graphics: strong
economic stimulus for GPU development
Consequences on GPU architecture:
Thousands of cores
GPU
Big loads of data
CCRMeeting/ July5th
Low frequency clock (~1GHz)
Adriano DiFlorio (BariUniversity&INFN)
3/24
GPUs’architecture
What is a GPU? Graphic Processing Unit
1970s: first graphical user interface
produced requiring dedicated microchips
CPU
Video games and 3D graphics: strong
economic stimulus for GPU development
Consequences on GPU architecture:
Thousands of cores
GPU
Big loads of data
CCRMeeting/ July5th
Low frequency clock (~1GHz)
Arithmetical operations in a single
clock cycle (sin,cos,sqrt,1/x, …)
Adriano DiFlorio (BariUniversity&INFN)
3/24
GPUs’architecture
What is a GPU? Graphic Processing Unit
1970s: first graphical user interface
produced requiring dedicated microchips
CPU
Video games and 3D graphics: strong
economic stimulus for GPU development
Consequences on GPU architecture:
Thousands of cores
GPU
Big loads of data
CCRMeeting/ July5th
Low frequency clock (~1GHz)
Arithmetical operations in a single
clock cycle (sin,cos,sqrt,1/x, …)
Adriano DiFlorio (BariUniversity&INFN)
3/24
IntroductiontoGPU-acceleratedcomputing
Hetherogeneous GPU-acccelerated
computingistheuseofaGraphics
ProcessingUnittoacceleratescientific
applications(amongotherapps).
Enhancementofapplication
performanceobtainedby
offloadingcompute-intensive
portionstotheGPU(the
device)whiletheremainder
ofthecodestillrunsonthe
CPUs(thehost).
CCRMeeting/ July5th
GPU
ApplicationCode
Sequential
portion
Compute
intensive
portion
CPU
Adriano DiFlorio (BariUniversity&INFN)
4/24
IntroductiontoGPU-acceleratedcomputing
Hetherogeneous GPU-acccelerated
computingistheuseofaGraphics
ProcessingUnittoacceleratescientific
applications(amongotherapps).
Enhancementofapplication
performanceobtainedby
offloadingcompute-intensive
portionstotheGPU(the
device)whiletheremainder
ofthecodestillrunsonthe
CPUs(thehost).
GPU
ApplicationCode
Sequential
portion
Compute
intensive
portion
CPU
Fromtheuser’sperspective?Applicationssimplyrunsignificantlyfaster!
Howmuchfaster?Itdepends- ofcourse- ontheapplication…
Wewanttoexploreitinthecontextofthe‘end-userHEPanalyses’
byusingGooFit.
CCRMeeting/ July5th
Adriano DiFlorio (BariUniversity&INFN)
4/24
GooFit framework
GooFit isadataanalysistool forHEP,thatinterfacesROOT/RooFit toCUDA parallel
computingplatformonnVidia GPU.ItalsosupportsOpenMP.
Control&DataFlowofaGooFit program
[Device
side]
GooFit: a library for massively parallelising maximum-likelihood fits
R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
5/24
BACKUP-1
GooFit framework
GooFit isadataanalysistool forHEP,thatinterfacesROOT/RooFit toCUDA parallel
computingplatformonnVidia GPU.ItalsosupportsOpenMP.
AGooFit programhas4maincomponents:
Control&DataFlowofaGooFit program
[Device
side]
GooFit: a library for massively parallelising maximum-likelihood fits
R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003
Itisanopensourceproject,underdevelopmentandfundedbyUSNSF.
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
5/24
BACKUP-1
GooFit framework
GooFit isadataanalysistool forHEP,thatinterfacesROOT/RooFit toCUDA parallel
computingplatformonnVidia GPU.ItalsosupportsOpenMP.
AGooFit programhas4maincomponents:
Control&DataFlowofaGooFit program
aGooPdf objectrepresentingthe
PDFmodellingthephysicalprocess
[Device
side]
GooFit: a library for massively parallelising maximum-likelihood fits
R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003
Itisanopensourceproject,underdevelopmentandfundedbyUSNSF.
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
5/24
GooFit framework
GooFit isadataanalysistool forHEP,thatinterfacesROOT/RooFit toCUDA parallel
computingplatformonnVidia GPU.ItalsosupportsOpenMP.
AGooFit programhas4maincomponents:
Control&DataFlowofaGooFit program
aGooPdf objectrepresentingthe
PDFmodellingthephysicalprocess
[Device
side]
thefitparameters(Variables
objectscontainedintheGooPdf )
thedata (DataSet object)
GooFit: a library for massively parallelising maximum-likelihood fits
R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003
Itisanopensourceproject,underdevelopmentandfundedbyUSNSF.
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
5/24
BACKUP-1
GooFit framework
GooFit isadataanalysistool forHEP,thatinterfacesROOT/RooFit toCUDA parallel
computingplatformonnVidia GPU.ItalsosupportsOpenMP.
AGooFit programhas4maincomponents:
Control&DataFlowofaGooFit program
aGooPdf objectrepresentingthe
PDFmodellingthephysicalprocess
[Device
side]
thefitparameters(Variables
objectscontainedintheGooPdf )
thedata (DataSet object)
a FitManager objectformingthe
interfacebetweenMINUIT andthe
GooPdf
ACAT-2016 /January18
CCRMeeting/
July5th th
GooFit: a library for massively parallelising maximum-likelihood fits
R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
5/24
GooFit framework
GooFit isadataanalysistool forHEP,thatinterfacesROOT/RooFit toCUDA parallel
computingplatformonnVidia GPU.ItalsosupportsOpenMP.
AGooFit programhas4maincomponents:
Control&DataFlowofaGooFit program
aGooPdf objectrepresentingthe
PDFmodellingthephysicalprocess
[Device
side]
thefitparameters(Variables
objectscontainedintheGooPdf )
thedata (DataSet object)
a FitManager objectformingthe
interfacebetweenMINUIT andthe
GooPdf
GooFit: a library for massively parallelising maximum-likelihood fits
R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003
Itisanopensourceproject,underdevelopmentandfundedbyUSNSF.
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
5/24
BACKUP-1
GooFit framework
TheFitManager objectformstheinterfacebetweenMINUIT(runningonCPU)
andaGPUwhichallowsaPDFrepresentingthephysicalmodel(GooPdf object)
tobeevaluatedinparallel.
Control&DataFlowofaGooFit program
GooFit: a library for massively parallelising maximum-likelihood fits
R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
6/24
GooFit framework
TheFitManager objectformstheinterfacebetweenMINUIT(runningonCPU)
andaGPUwhichallowsaPDFrepresentingthephysicalmodel(GooPdf object)
tobeevaluatedinparallel.
Control&DataFlowofaGooFit program
Fitparametersareestimatedateach
NegLogLikelihood minimizationstep
onthehostside(CPU)whilethePDF/NLL
isevaluatedonthedeviceside(GPU)
[allthatuntilconvergence]:
CPU
fitparams
tuning
GPU
[memory
transfers]
PDF/NNL
evaluation
GooFit: a library for massively parallelising maximum-likelihood fits
R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
6/24
GooFit framework
TheFitManager objectformstheinterfacebetweenMINUIT(runningonCPU)
andaGPUwhichallowsaPDFrepresentingthephysicalmodel(GooPdf object)
tobeevaluatedinparallel.
Control&DataFlowofaGooFit program
Fitparametersareestimatedateach
NegLogLikelihood minimizationstep
onthehostside(CPU)whilethePDF/NLL
isevaluatedonthedeviceside(GPU)
[allthatuntilconvergence]:
CPU
fitparams
tuning
GPU
[memory
transfers]
PDF/NNL
evaluation
GooFit: a library for massively parallelising maximum-likelihood fits
R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003
Thiscanbeseenbyanalysing acyclewiththemonitoringtoolnVIDIA VisualProfiler[nvvp]
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
6/24
GooFit framework
TheFitManager objectformstheinterfacebetweenMINUIT(runningonCPU)
andaGPUwhichallowsaPDFrepresentingthephysicalmodel(GooPdf object)
tobeevaluatedinparallel.
Control&DataFlowofaGooFit program
Fitparametersareestimatedateach
NegLogLikelihood minimizationstep
onthehostside(CPU)whilethePDF/NLL
isevaluatedonthedeviceside(GPU)
[allthatuntilconvergence]:
CPU
fitparams
tuning
GPU
[memory
transfers]
PDF/NNL
evaluation
GooFit: a library for massively parallelising maximum-likelihood fits
R.Andreassen et al., J.Phys.:Conf.Ser. 513 (2014) 052003
Thiscanbeseenbyanalysing acyclewiththemonitoringtoolnVIDIA VisualProfiler[nvvp]
TheFitControl objectallowstoswitchbetween χ2 fits& MLfits(eitherunbinned &binned).
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
6/24
GooFit profiling
Example of a snapshot of the profile of a GooFit process provided by Nvidia
Visual Profiler :
Memory
transfers
GPU
calculation
processes
GPU :p.d.f.
evaluation
CCRMeeting/ July5th
Fit parameters exchange
between CPU and GPU
Adriano DiFlorio (BariUniversity&INFN)
CPU : Parameterstuningto
minimise Neg-Log-Likelihood
7/24
GooFit profiling
Example of a snapshot of the profile of a GooFit process provided by Nvidia
Visual Profiler :
Memory
transfers
GPU
calculation
processes
GPU :p.d.f.
evaluation
CCRMeeting/ July5th
Fit parameters exchange
between CPU and GPU
Adriano DiFlorio (BariUniversity&INFN)
CPU : Parameterstuningto
minimise Neg-Log-Likelihood
7/24
GooFit profiling
Example of a snapshot of the profile of a GooFit process provided by Nvidia
Visual Profiler :
Memory
transfers
GPU
calculation
processes
GPU :p.d.f.
evaluation
CCRMeeting/ July5th
Fit parameters exchange
between CPU and GPU
Adriano DiFlorio (BariUniversity&INFN)
CPU : Parameterstuningto
minimise Neg-Log-Likelihood
7/24
GooFit profiling
Example of a snapshot of the profile of a GooFit process provided by Nvidia
Visual Profiler :
Memory
transfers
GPU
calculation
processes
GPU :p.d.f.
evaluation
CCRMeeting/ July5th
Fit parameters exchange
between CPU and GPU
Adriano DiFlorio (BariUniversity&INFN)
CPU : Parameterstuningto
minimise Neg-Log-Likelihood
7/24
GooFit profiling
Example of a snapshot of the profile of a GooFit process provided by Nvidia
Visual Profiler :
Memory
transfers
GPU
calculation
processes
GPU :p.d.f.
evaluation
CCRMeeting/ July5th
Fit parameters exchange
between CPU and GPU
Adriano DiFlorio (BariUniversity&INFN)
CPU : Parameterstuningto
minimise Neg-Log-Likelihood
7/24
GooFit profiling
Example of a snapshot of the profile of a GooFit process provided by Nvidia
Visual Profiler :
Memory
transfers
GPU
calculation
processes
GPU :p.d.f.
evaluation
CCRMeeting/ July5th
Fit parameters exchange
between CPU and GPU
Adriano DiFlorio (BariUniversity&INFN)
CPU : Parameterstuningto
minimise Neg-Log-Likelihood
7/24
GooFit profiling
Example of a snapshot of the profile of a GooFit process provided by Nvidia
Visual Profiler :
Memory
transfers
GPU
calculation
processes
GPU :p.d.f.
evaluation
CCRMeeting/ July5th
Fit parameters exchange
between CPU and GPU
Adriano DiFlorio (BariUniversity&INFN)
CPU : Parameterstuningto
minimise Neg-Log-Likelihood
7/24
ApreliminaryexampleofGooFit/GPUscapabilities
Parameterestimationisacrucialpartofmanyphysicsanalyses.
PDFevaluationonlargedatasetsisusuallythebottleneckintheMINUITalgorithm.
GooFit actsasaninterfacebetweentheMINUITminimizationalgorithmandaparallel
processorwhichallowsaProbabilityDensityFunctiontobeevaluatedinparallel.
CCRMeeting/ July5th
Adriano DiFlorio (BariUniversity&INFN)
8/24
ApreliminaryexampleofGooFit/GPUscapabilities
Parameterestimationisacrucialpartofmanyphysicsanalyses.
PDFevaluationonlargedatasetsisusuallythebottleneckintheMINUITalgorithm.
Apreliminarytestwasdonewithan
Unbinned MLfit eitherbyusingasingle
CPUandbyusinganadditionalGPU
(an nVIDIA TeslaC2070hosted@BariT2).
EventsaccordingtoaVoigtian model
(convolutionisCPU-intensive)aregenerated&fitted.Thetimeneeded(thenegligible generationtimeisnotincluded)
isstudiedasafunctionofthe#events:
ACAT-2016 /January18
CCRMeeting/
July5th th
Time[s]
GooFit actsasaninterfacebetweentheMINUITminimizationalgorithmandaparallel
processorwhichallowsaProbabilityDensityFunctiontobeevaluatedinparallel.
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
#events
3/20
8/24
ApreliminaryexampleofGooFit/GPUscapabilities
Parameterestimationisacrucialpartofmanyphysicsanalyses.
PDFevaluationonlargedatasetsisusuallythebottleneckintheMINUITalgorithm.
Apreliminarytestwasdonewithan
Unbinned MLfit eitherbyusingasingle
CPUandbyusinganadditionalGPU
(an nVIDIA TeslaC2070hosted@BariT2).
EventsaccordingtoaVoigtian model
(convolutionisCPU-intensive)aregenerated&fitted.Thetimeneeded(thenegligible generationtimeisnotincluded)
isstudiedasafunctionofthe#events:
Time[s]
GooFit actsasaninterfacebetweentheMINUITminimizationalgorithmandaparallel
processorwhichallowsaProbabilityDensityFunctiontobeevaluatedinparallel.
#events
For10M events:RooFit needs61h+23m&GooFit takes4m+39s:speed-up~ 750
For1MfittedeventswithRooFit …youneedtowaitovernight,
For10MfittedeventswithGooFit …youneedtotakeanespresso!
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
8/24
MCtoysforp-valueestimation:GooFit vs RooFit
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
Testapplication:thePhysicscase
TotestthecomputingcapabilitiesofGPUswithrespecttoCPUcores:ahigh-statisticstoyMonte
Carlotechnique hasbeenimplementedbothinROOT/RooFit andGooFit frameworkswiththeaimto
estimatethe(local)statisticalsignificanceofthestructureobservedbyCMSclosetothekinematical boundaryoftheJ ψ φ invariantmassinthe3-bodydecayB + → J ψ φ K + [PLB734(2014)261]
Δm = m(µ +µ − K + K − ) − m(µ +µ − ) [GeV ]
2480 ±160 B±
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
5/20
9/24
Testapplication:thePhysicscase
TotestthecomputingcapabilitiesofGPUswithrespecttoCPUcores:ahigh-statisticstoyMonte
Carlotechnique hasbeenimplementedbothinROOT/RooFit andGooFit frameworkswiththeaimto
estimatethe(local)statisticalsignificanceofthestructureobservedbyCMSclosetothekinematical boundaryoftheJ ψ φ invariantmassinthe3-bodydecayB + → J ψ φ K + [PLB734(2014)261]
Δm = m(µ +µ − K + K − ) − m(µ +µ − ) [GeV ]
2480 ±160 B±
ACAT-2016 /January18
CCRMeeting/
July5th th
Structureparameters[compatiblewithY(4140)byCDF]:
m = 4148.0 ± 2.4(stat.) ± 6.3(syst.) MeV
Γ = 28+15
−11 (stat.) ±19(syst.) MeV
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
5/20
9/24
Testapplication:thetoyMCmethod
MCpseudo-experimentsareusedtoestimatetheprobability(p-value)thatbackground
fluctuationswould- alone- giverisetoasignalasmuchsignificantasthatseeninthedata.
ToyMCfitcycle(foreachgeneratedfluctuation)
Generationoffluctuatedbackgroundbinneddistribution(3-bodyphase-spacemodel)
[total#entriesfixedbydata fitswithnot-extendedML]
NullHypothesisbinnedMLfitperformedwiththephase-spacemodelonly
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
10/24
6/20
Testapplication:thetoyMCmethod
MCpseudo-experimentsareusedtoestimatetheprobability(p-value)thatbackground
fluctuationswould- alone- giverisetoasignalasmuchsignificantasthatseeninthedata.
ToyMCfitcycle(foreachgeneratedfluctuation)
Generationoffluctuatedbackgroundbinneddistribution(3-bodyphase-spacemodel)
[total#entriesfixedbydata fitswithnot-extendedML]
NullHypothesisbinnedMLfitperformedwiththephase-spacemodelonly
AlternativeHypothesis binnedMLfitperformedwiththephase-spacemodel+Voigtian PDF
[thelatteristruncated tocorrectlyaccountforthekinematicalthreshold;the
Gaussianresolutionfunctionhaswidthfixed@2MeV].Signalyieldconstrained>0.
Note:foreachbin,thePDFvalueisestimatedbyROOTintegrationoverthebin
[time-consumingbutneeded:steepsignalw.r.t.binsize]
Fitperformed8timeswithintheregionofinterest(fromCDF:
noLEE)tryingdifferentstartingvalues (2masses &4widths).
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
10/24
6/20
Testapplication:thetoyMCmethod
MCpseudo-experimentsareusedtoestimatetheprobability(p-value)thatbackground
fluctuationswould- alone- giverisetoasignalasmuchsignificantasthatseeninthedata.
ToyMCfitcycle(foreachgeneratedfluctuation)
Generationoffluctuatedbackgroundbinneddistribution(3-bodyphase-spacemodel)
[total#entriesfixedbydata fitswithnot-extendedML]
NullHypothesisbinnedMLfitperformedwiththephase-spacemodelonly
AlternativeHypothesis binnedMLfitperformedwiththephase-spacemodel+Voigtian PDF
[thelatteristruncated tocorrectlyaccountforthekinematicalthreshold;the
Gaussianresolutionfunctionhaswidthfixed@2MeV].Signalyieldconstrained>0.
Note:foreachbin,thePDFvalueisestimatedbyROOTintegrationoverthebin
[time-consumingbutneeded:steepsignalw.r.t.binsize]
Fitperformed8timeswithintheregionofinterest(fromCDF:
noLEE)tryingdifferentstartingvalues (2masses &4widths).
ForeachfitcalculateaΔχ2 w.r.t.theNullHypothesis fit;
thebestΔχ2 fitamongthe8alternativefitsischosen!
AΔχ2(ourteststatistic)distribution isobtainedoverthe
sampleofMCtoys.
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
Δχ 2 ≅ 31.44
Δm [GeV ]
10/24
6/20
Hardwareset-up
Used:1serverhosting2 nVIDIA TeslaK20
Tesla K20 @ BC2S
&
1serverhosting1 nVIDIA TeslaK40
(*)
GPU
Tesla K40 @ ReCaS
(*)
GPU
Numero of GPU
2 x GK110
Numero of GPU
Number of CUDA cores
2 x 2,496
Number of CUDA cores
2,880
Memory per GPU (GDDR5)
12 GB
Memory per GPU (GDDR5)
Memory bandwidth per board
2 x 5 GB
208 Gbytes/sec
CPU
•
•
Memory bandwidth per board
1 x GK110B
288 Gbytes/sec
CPU
16 cores : E5-2640 v2 @ 2.00GHz (32 with HT)
64 GB RAM
•
•
20 cores : E5-2640 v2 @ 1.70GHz (40 with HT)
256 GB RAM
(*) http://www.recas-bari.it
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
11/24
PerformanceofGooFit vsROOT/RooFit :apreliminaryresult
AfirstresultobtainedissimplecomparisonbetweentheMCToysprocedurerunningonasingleGPU
viaGooFit andonasingleCPU .Thespeedups:
S=62(TeslaK40)
S=48(TeslaK20)
For15kMCToysproduced(HighlytimeconsumingforROOT:~6days)
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
12/24
PerformanceofGooFit vsROOT/RooFit :apreliminaryresult
AfirstresultobtainedissimplecomparisonbetweentheMCToysprocedurerunningonasingleGPU
viaGooFit andonasingleCPU .Thespeedups:
S=62(TeslaK40)
S=48(TeslaK20)
For15kMCToysproduced(HighlytimeconsumingforROOT:~6days)
Thiskindofapplication(binnedfit&fewparameters)doesn’texploitthewholeGPUcomputational
capability.
Examplesnapshotof
nvidia-smi (nvidia
monitoringtool– top)
forasingleprocess.
66%
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
12/24
PerformanceofGooFit vsROOT/RooFit :apreliminaryresult
AfirstresultobtainedissimplecomparisonbetweentheMCToysprocedurerunningonasingleGPU
viaGooFit andonasingleCPU .Thespeedups:
S=62(TeslaK40)
S=48(TeslaK20)
For15kMCToysproduced(HighlytimeconsumingforROOT:~6days)
Thiskindofapplication(binnedfit&fewparameters)doesn’texploitthewholeGPUcomputational
capability.
Examplesnapshotof
nvidia-smi (nvidia
monitoringtool– top)
forasingleprocess.
66%
HowtoexploitthefullcomputationalpowerofaGPU?
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
12/24
8/20
nVidia MultiProcessServer
ThenVidia MultiProcessServer(MPS)isatooldeveloped bynVidia thatallowstoexecute
multiple processes(upto16)onthesameGPUchip.Itactsasascheduler :managesthe
accesstomemory and CUDAcores.
HereisanexampleofhowitaffectstheoccupancyofaTeslaK40GPU:
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
13/24
8/20
nVidia MultiProcessServer
ThenVidia MultiProcessServer(MPS)isatooldeveloped bynVidia thatallowstoexecute
multiple processes(upto16)onthesameGPUchip.Itactsasascheduler :managesthe
accesstomemory and CUDAcores.
HereisanexampleofhowitaffectstheoccupancyofaTeslaK40GPU:
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
13/24
8/20
PerformanceofGooFit onnVIDIA MultiProcessServer
ThenVidia MultiProcessServer(MPS)isatooldeveloped bynVidia thatallowstoexecute
multiple processes(upto16)onthesameGPUchip.Itactsasascheduler :managesthe
accesstomemory and CUDAcores.
Each processuses:
- 1(shared)GPUand1(exclusivelyassigned)CPU
Thereisasaturationeffect (Amdhal’s law)
16
15
5000 Toys Tesla K20
15000 Toys Tesla K20
5000 Toys Tesla K40
15000 Toys Tesla K40
14
13
12
Speed Up
11
10
9
8
7
6
MPS
0<N≤16
5
S
4
3
2
1
=
T1
N
1
Tn
∑
N n=1
0
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17
# indipendent concurrent processes per single GPU
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
14/24
8/20
PerformanceofGooFit onnVIDIA MultiProcessServer
ThenVidia MultiProcessServer(MPS)isatooldeveloped bynVidia thatallowstoexecute
multiple processes(upto16)onthesameGPUchip.Itactsasascheduler :managesthe
accesstomemory and CUDAcores.
Each processuses:
- 1(shared)GPUand1(exclusivelyassigned)CPU
Thereisasaturationeffect (Amdhal’s law)
1st(2nd)groupofprocessesassignedto…
0 < N ≤16
…1st(2nd)GPU(the2GPUsTK20onthesameserver
hosting32CPUs viaHyperThreading )
16
5000 Toys Tesla K20
15000 Toys Tesla K20
5000 Toys Tesla K40
15000 Toys Tesla K40
14
13
12
11
Speed Up
100
10
9
8
7
6
MPS
0<N≤16
5
S
4
3
2
1
=
T1
N
1
Tn
∑
N n=1
# MC Toys in 1 h (Thousands)
15
2 GPUs vs 1 GPU
90
80
70
60
50
40
30
1 GPU
20
2 GPUs
10
0
0
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17
# indipendent concurrent processes per single GPU
ACAT-2016 /January18
CCRMeeting/
July5th th
0 1 2 3 4 5 6 7 8 9 1011121314151617181920
# indipendent concurrent processes per single GPU
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
14/24
8/20
PerformanceofGooFit onnVIDIA MultiProcessServer
ThenVidia MultiProcessServer(MPS)isatooldeveloped bynVidia thatallowstoexecute
multiple processes(upto16)onthesameGPUchip.Itactsasascheduler :managesthe
accesstomemory and CUDAcores.
Each processuses:
- 1(shared)GPUand1(exclusivelyassigned)CPU
Thereisasaturationeffect (Amdhal’s law)
1st(2nd)groupofprocessesassignedto…
0 < N ≤16
…1st(2nd)GPU(the2GPUsTK20onthesameserver
hosting32CPUs viaHyperThreading )
2 GPUs vs 1 GPU
16
13
12
11
Speed Up
3,0
5000 Toys Tesla K20
15000 Toys Tesla K20
5000 Toys Tesla K40
15000 Toys Tesla K40
14
2,9
2,7
2,6
10
9
8
7
6
MPS
0<N≤16
5
S
4
3
2
1
=
T1
N
1
Tn
∑
N n=1
2 GPUs / GPU ratio
15
2 GPU / 1 GPU
2,4
2,3
2,1
2,0
1,9
1,7
1,6
1,4
1,3
1,1
1,0
0
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17
# indipendent concurrent processes per single GPU
ACAT-2016 /January18
CCRMeeting/
July5th th
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17
# indipendent concurrent processes per single GPU
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
14/24
8/20
PerformanceofGooFit onnVIDIA MultiProcessServer
Toefficiently runRooFit MCtoysinparallelonthe72CPUsavailableonthe2servers
hostingtheGPUs,weusePROOF-LitethatisadedicatedversionofPROOFoptimized
forsinglemulti-coremachines[*].
ThisROOT/RooFit extensionimplementsa2-Tierarchitecture withthemastermerged
intotheclient,controllingdirectlytheworkers(workersareprocessesnotthreads).
PROOFhasaPullarchitecture:allworkersendat
thesametimeavoidinglongtales,unavoidable
byrunningRooFit onaclusterinPushapproach
(thelastjobdeterminesthetotalexec.time).
[*] G.Ganis et al., PoS ACAT08 (2008) 007;
ACAT-2016 /January18
CCRMeeting/
July5th th
A.Pompili et al., J. Phys.: Conf. Ser. 396 032043, CHEP12, 2012
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
15/24
9/20
PerformanceofRooFit onCPUswith PROOF-Lite
Toefficiently runRooFit MCtoysinparallelonthe72CPUsavailableonthe2servers
hostingtheGPUs,weusePROOF-LitethatisadedicatedversionofPROOFoptimized
forsinglemulti-coremachines[*].
ThisROOT/RooFit extensionimplementsa2-Tierarchitecture withthemastermerged
intotheclient,controllingdirectlytheworkers(workersareprocessesnotthreads).
PROOFhasaPullarchitecture:allworkersendat
thesametimeavoidinglongqueues,unavoidable
byrunningRooFit onaclusterinPushapproach
(thelastjobdeterminesthetotalexec.time).
Checkspeedupperformanceon2servers:
PROOF−Lite
S0<n≤32(40)
=
- serverhostingTK20has32CPUs
- serverhostingTK40has40CPUs
T1
Tn
Good scalingwith#ofMCtoys
No differencebetween2servers(asexpected)
Speedupperfectlyscalingtill ~8workers;
thenthereisasaturationeffect(Amdhal’s law)
[*] G.Ganis et al., PoS ACAT08 (2008) 007;
ACAT-2016 /January18
CCRMeeting/
July5th th
A.Pompili et al., J. Phys.: Conf. Ser. 396 032043, CHEP12, 2012
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
15/24
9/20
Performancecomparison: RooFit/PROOF-Lite vs GooFit/MPS - I
Afirst performances’comparisoncanbecarriedoutontheserverhosting32CPUsand2GPUsTK20
asafunctionofthe#ofpseudo-experimentsproduced.
Wecancompare:- 1PROOF-Litejobusing30workers(on30CPUcores)
with:- 2GooFit/MPSjobs(eachonerunning15simultaneousprocesses)
Sn=30=N 2−TK 20 =
GooFit
TN=30
2−TK 20
Speedups
~45
RooFit
Tn=30
#ofprocessedMCtoys(perapplication)
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
16/24
10/20
Performancecomparison: RooFit/PROOF-Lite vs GooFit/MPS - I
Afirst performances’comparisoncanbecarriedoutontheserverhosting32CPUsand2GPUsTK20
asafunctionofthe#ofpseudo-experimentsproduced.
Wecancompare:- 1PROOF-Litejobusing30workers(on30CPUcores)
with:- 2GooFit/MPSjobs(eachonerunning15simultaneousprocesses)
~45
Sn=30=N 2−TK 20 =
RooFit
Tn=30
GooFit
TN=30
2−TK 20
Speedups
Good scalingwithextended# ofMCtoys:
PROOF−Lite
Sn=30
1PROOF-Litejobusing30workers
~20
~9
MPS−TK 20
SN=15
#ofprocessedMCtoys(perapplication)
ACAT-2016 /January18
CCRMeeting/
July5th th
VS
1RooFit jobusing1CPU
1GooFit/MPSjob
(running15simultaneousprocesses)
VS
1GooFit job
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
16/24
10/20
Performancecomparison: RooFit/PROOF-Lite vsGooFit/MPS - II
Asecond performances’comparisoncanbecarriedoutonboththeservershostingbothtypeof
GPUs(TK20&TK40) asafunctionofthe#ofpseudo-experimentsproduced.
Herewelimitthecomparisonto16independentprocesses(duetoMPSlimitforthesingleTK40)
Speedups(log-scale)
Wecancompare:- 1PROOF-Litejobusing16workers(on16CPUcores)
with:- 1GooFit/MPSjobrunning16simultaneousprocessesonsingleTK40/TK20
Sn=16=N TK 40 =
~60
RooFit
Tn=16
GooFit
TN=16
~40
Sn=16=N TK 20 =
TK 40
RooFit
Tn=16
GooFit
N=16 TK 20
T
< Sn=30=N 2−TK 20 ~ 45
Effectof
PROOF-Lite
saturation
#ofprocessedMCtoys(perapplication)
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
11/20
17/24
Performancecomparison: RooFit/PROOF-Lite vsGooFit/MPS - II
Asecond performances’comparisoncanbecarriedoutonboththeservershostingbothtypeof
GPUs(TK20&TK40) asafunctionofthe#ofpseudo-experimentsproduced.
Herewelimitthecomparisonto16independentprocesses(duetoMPSlimitforthesingleTK40)
Speedups(log-scale)
Wecancompare:- 1PROOF-Litejobusing16workers(on16CPUcores)
with:- 1GooFit/MPSjobrunning16simultaneousprocessesonsingleTK40/TK20
Sn=16=N TK 40 =
~60
RooFit
Tn=16
GooFit
TN=16
~40
Sn=16=N TK 20 =
RooFit
Tn=16
S
#ofprocessedMCtoys(perapplication)
ACAT-2016 /January18
CCRMeeting/
July5th th
=
GooFit
N=16 TK 20
T
~1.5
GPU
N=16
TK 40
TN=16 TK 20
TN=16 TK 40
< Sn=30=N 2−TK 20 ~ 45
Effectof
PROOF-Lite
saturation
Gainwithinmicro-architecture:TK40vs TK20
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
11/20
17/24
Performancecomparison: RooFit/PROOF-Lite vsGooFit/MPS - III
Athird performances’comparisoncanbedonefromthepointofviewoftheend-user/analystand
thetimeneededtodeliverthepseudo-experiments’task.
Letusassumehehasathisowndisposalthefullcomputationalpowerusedinthesestudies:
2serversequippedwith3GPUs(2TK20&1TK40)and72CPUcores(36physicalcores+HyperThr).
Elapsedtime[s](log-scale)
1month
1week
2 days
x 1M Toys
1day
10hours
2 hours
1hour
10min
#oftotalprocessedMCtoys
ACAT-2016 /January18
CCRMeeting/
July5th th
~ 11days
~ 6hours
TOTAL
SPEED UP
𝑆 ≈ 41.0
To get a signal significance
>5σ, a p-value < 3x10-7 is
needed, namely at least
3.3M toys are needed.
To estimate a signal signif.
much more toys are needed
(see next slide)
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
12/20
18/24
P-Value&statisticalsignificanceestimation
Thefinalobtaineddistribution
Δχ 2
(MCtoysproduction wasstopped once
2
afluctuationwithwasfound)
Δχ 2 > Δχ DATA
2
Δχ DATA
≅ 53.0
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
13/20
19/24
P-Value&statisticalsignificanceestimation
Thefinalobtaineddistribution
Δχ 2
(MCtoysproduction wasstopped once
2
afluctuationwithwasfound)
Δχ 2 > Δχ DATA
Δχ 2 ≅ 56.9
2
Δχ DATA
≅ 53.0
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
13/20
19/24
P-Value&statisticalsignificanceestimation
Thefinalobtaineddistribution
Δχ 2
(MCtoysproduction wasstopped once
2
afluctuationwithwasfound)
Δχ 2 > Δχ DATA
Δχ 2 ≅ 56.9
2
Δχ DATA
≅ 53.0
Thep-valueestimationisstraightforward:
+∞
p − value : P =
∫
2
Δχ DATA
Δχ 2 ≈
1
−8
≅1.73⋅10
57.7⋅10 6
Equivalent(gaussian)statisticalsignificance:
Zσ = Φ−1 (1− P)σ ≅ 5.52σ
Compatible withthelowerlimitof5σ forthestatisticalsignificancequotedinthe
CMSpaperPLB734(2014) 261 onthebasis of50.5millions ofMCtoys(byRooFit).
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
Inversefunction ofthe
cumulativedistribution
ofthestandardgaussian
19/24
ExploringtheapplicabilitylimitsofWilks theorem
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
1/20
Wilks theorem&theneedofMCtoys- I
[*] S.S.Wilks, Ann.Math.Stat. 9 (1938) 60-62
TheWilks[*] theoremisoftenusedtoestimatethep-valueassociatedtoanew/unexpectedsignal:
Giventwohypotheses:
Nullhypotheseswithd.o.f.
H0
ν0
Alternativehypotheseswithd.o.f.
H1
ν1
"L %
…anyteststatistict ,definedasalikelihoodratio −2ln λ = −2ln $ H 0 '
$L '
# H1 &
[orsimilarly(intheasymptoticlimit)asa],
Δχ 2 = χ H2 − χ H2
0
1
approaches adistributionwithd.o.f.,providedthattheseregularityconditionshold:
χ2
ν = ν1 − ν 0
H0
H1
H 0 andarenested(“includes”)
H1
whiletheparametersarewellbehaving(definedandnotapproachingsomelimit)
H1
H1 → H 0
asymptoticlimit(ofalargedatasample)
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
14/20
20/24
Wilks theorem&theneedofMCtoys- I
[*] S.S.Wilks, Ann.Math.Stat. 9 (1938) 60-62
TheWilks[*] theoremisoftenusedtoestimatethep-valueassociatedtoanew/unexpectedsignal:
Giventwohypotheses:
Nullhypotheseswithd.o.f.
H0
ν0
Alternativehypotheseswithd.o.f.
H1
ν1
"L %
…anyteststatistict ,definedasalikelihoodratio −2ln λ = −2ln $ H 0 '
$L '
# H1 &
[orsimilarly(intheasymptoticlimit)asa],
Δχ 2 = χ H2 − χ H2
0
1
approaches adistributionwithd.o.f.,providedthattheseregularityconditionshold:
χ2
ν = ν1 − ν 0
H0
H1
H 0 andarenested(“includes”)
H1
whiletheparametersarewellbehaving(definedandnotapproachingsomelimit)
H1
H1 → H 0
asymptoticlimit(ofalargedatasample)
Oncethistheoremholds,thep-valueassociatedtothesignalisgivenby: P =
Theuseofpseudo-experimentstoestimatethep-valueisnotneeded
(butstillsuggested)
∞
∫χ
tobs
2
ν1−ν 0
(t)dt
Whennull hypothesisisbackground-onlyandthealternativeisbackground+signal,
oftentheaboveregularityconditionsarenotallsatisfied,andMCtoysaremandatory!
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
14/20
20/24
Wilks theorem&theneedofMCtoys- II
Indeedthisisthecasewearedealingwith,here!
Thesignalparametersinthemodelofhypothesisaremass(),width()andyield().
Γ
m
H1
µ≥0
Whentheproblemisthat:1)andarenotwelldefined,2)tendtothenulllimit.
m
Γ
H1 → H 0
µ
Thisexplainswhywehaveusedpseudo-experiments.
Thedistributionsofteststatisticareingeneralnonpredictable and canbeextractedfromMCtoys!
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
21/24
Wilks theorem&theneedofMCtoys- II
Indeedthisisthecasewearedealingwith,here!
Thesignalparametersinthemodelofhypothesisaremass(),width()andyield().
Γ
m
H1
µ≥0
Whentheproblemisthat:1)andarenotwelldefined,2)tendtothenulllimit.
m
Γ
H1 → H 0
µ
Thisexplainswhywehaveusedpseudo-experiments.
Thedistributionsofteststatisticareingeneralnonpredictable and canbeextractedfromMCtoys!
Thepossibledistributionsinthedifferentcases
areshown&twospecialcaseswillbediscussed
m,Γ fixed;µ free
Δχ 2
m,Γ fixed;µ >0
m,Γ free;µ free
m,Γ free;µ >0
Δχ 2
Δχ 2
m free,Γ fixed;µ free
m free,Γ fixed;µ >0
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
15/20
21/24
SpecialcaseinwhichWilks theoremholds
Considertheteststatistic[:
tµ = −2 ln λ (µ ) µ strengthparameter ]asthebasisofthestatisticaltest.
Thiscouldbeatestofforpurposesofestablishingtheexistenceofasignalprocess,or
µ=0
…offorpurposesofobtainingaconfidenceinterval.
µ≠0
Inthelattercase,followingCowanetal.[*]thePDFofthetest
statisticapproachesachi-squaredistributionfor1d.o.f.:
[ inagreementwithWilks theorem!]
f (tµ µ ) =
1 1 −tµ 2
e
2 π tµ
[*] Cowan et al., EPJ C71 (2011) 1554
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
16/20
22/24
SpecialcaseinwhichWilks theoremholds
Considertheteststatistic[:
tµ = −2 ln λ (µ ) µ strengthparameter ]asthebasisofthestatisticaltest.
Thiscouldbeatestofforpurposesofestablishingtheexistenceofasignalprocess,or
µ=0
…offorpurposesofobtainingaconfidenceinterval.
µ≠0
Inthelattercase,followingCowanetal.[*]thePDFofthetest
statisticapproachesachi-squaredistributionfor1d.o.f.:
[ inagreementwithWilks theorem!]
Letusfixthe&parameters,
m Γ
(tothe CMSestimatesfromthe fitto data)
whileleavingfreeinourMLfits
µ
(isnotproperlyasignalyield).
µ
f (tµ µ ) =
1 1 −tµ 2
e
2 π tµ
Likelihoodratiodistribution
Byfitting our likelihoodratio
distrib.weindeedget:
Fitpull
d.o.f. ≈ 1.014 ± 0.001
(
𝜒0123
= 1.009 𝑃 𝑓𝑖𝑡 = 0.118
[*] Cowan et al., EPJ C71 (2011) 1554
ACAT-2016 /January18
CCRMeeting/
July5th th
−2 ln λ
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
16/20
22/24
Specialcase: asymptoticformulabyCowanetal.[*] holds
Considerthespecialcaseoftheteststatisticwiththepurposetotestinaclassofmodel
tµ
µ=0
whereweassume.Rejecting(thenullhypothesis)leadstothediscoveryofanewsignal.
µ≥0
µ=0
"$ µ̂ ≥ 0
"$ −2 ln λ (0)
InthiscasefollowingCowanetal.theteststatisticis: q0 = #
with #
$%
0
$% µ̂ < 0
q0
Cowanetal.deriveanalitically thatthePDFof
isanequalmixtureofadeltafunctionat0&
achi-squaredistributionfor1d.o.f.:
1
1 " 1 1 −q0
g(q0 µ = 0) = δ (q0 )+ $
e
2
2 $# 2π q0
2%
'
'&
[*] Cowan et al., EPJ C71 (2011) 1554
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
17/20
23/24
Specialcase: asymptoticformulabyCowanetal.[*] holds
Considerthespecialcaseoftheteststatisticwiththepurposetotestinaclassofmodel
tµ
µ=0
whereweassume.Rejecting(thenullhypothesis)leadstothediscoveryofanewsignal.
µ≥0
µ=0
"$ µ̂ ≥ 0
"$ −2 ln λ (0)
InthiscasefollowingCowanetal.theteststatisticis: q0 = #
with #
$%
0
$% µ̂ < 0
q0
Cowanetal.deriveanalitically thatthePDFof
isanequalmixtureofadeltafunctionat0&
achi-squaredistributionfor1d.o.f.:
Letusfixthe&parameters
m Γ
(tothe CMSestimatesfromfittodata)while
constraininginourMLfits
µ≥0
(representsasignalyieldhere).
µ
Byfitting our likelihoodratio
distrib.weindeedget:
1
1 " 1 1 −q0
g(q0 µ = 0) = δ (q0 )+ $
e
2
2 $# 2π q0
2%
'
'&
Likelihoodratiodistribution
Fitpull
d.o.f. ≈ 0.992 ± 0.001
weight Cχ 2 ≈ 0.507± 0.01
[*] Cowan et al., EPJ C71 (2011) 1554
ACAT-2016 /January18
CCRMeeting/
July5th th
−2 ln λ
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
17/20
23/24
Specialcase: asymptoticformulabyCowanetal.[*] holds
Considerthespecialcaseoftheteststatisticwiththepurposetotestinaclassofmodel
tµ
µ=0
whereweassume.Rejecting(thenullhypothesis)leadstothediscoveryofanewsignal.
µ≥0
µ=0
"$ µ̂ ≥ 0
"$ −2 ln λ (0)
InthiscasefollowingCowanetal.theteststatisticis: q0 = #
with #
$%
0
$% µ̂ < 0
q0
Cowanetal.deriveanalitically thatthePDFof
isanequalmixtureofadeltafunctionat0&
achi-squaredistributionfor1d.o.f.:
Letusfixthe&parameters
m Γ
(tothe CMSestimatesfromfittodata)while
constraininginourMLfits
µ≥0
(representsasignalyieldhere).
µ
Byfitting our likelihoodratio
distrib.weindeedget:
1
1 " 1 1 −q0
g(q0 µ = 0) = δ (q0 )+ $
e
2
2 $# 2π q0
2%
'
'&
Likelihoodratiodistribution
Fitpull
d.o.f. ≈ 0.992 ± 0.001
weight Cχ 2 ≈ 0.507± 0.01
[*] Cowan et al., EPJ C71 (2011) 1554
ACAT-2016 /January18
CCRMeeting/
July5th th
−2 ln λ
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
17/20
23/24
Specialcase: asymptoticformulabyCowanetal.[*] holds
Byfitting our likelihoodratiodistributionwithadelta+chisquarep.d.f. weindeedget:
(
𝜒0123
= 1.013 𝑃 𝑓𝑖𝑡 = 0.035
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
17/20
24/24
Summary&Outlook
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
1/20
Summary
Inorder totestthecomputingcapabilitiesofGPUswithrespecttotraditionalCPUcores,
ahigh-statisticstoyMonteCarlotechniquehasbeenimplemented both inROOT/RooFit and
GooFit frameworkswiththepurpose toestimatethelocalstatisticalsignificanceof
a- possibly exoticcharmonium-like - signalrecentlyconfirmed byCMS(itwasfirstly
observed byCDF).
TheoptimizedGooFit applicationsrunning,bymeansoftheMPS,onGPUs,hostedbythe
serversusedinthepresentedtest,providesastrikingspeed-upperformancewith
respecttotheRooFit applicationparallelizedonmultipleCPUsbymeansofPROOF-Lite.
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
18/20
Summary
Inorder totestthecomputingcapabilitiesofGPUswithrespecttotraditionalCPUcores,
ahigh-statisticstoyMonteCarlotechniquehasbeenimplemented both inROOT/RooFit and
GooFit frameworkswiththepurpose toestimatethelocalstatisticalsignificanceof
a- possibly exoticcharmonium-like - signalrecentlyconfirmed byCMS(itwasfirstly
observed byCDF).
TheoptimizedGooFit applicationsrunning,bymeansoftheMPS,onGPUs,hostedbythe
serversusedinthepresentedtest,providesastrikingspeed-upperformancewith
respecttotheRooFit applicationparallelizedonmultipleCPUsbymeansofPROOF-Lite.
Bymeansof GooFit ithasalsobeeneasiertoexplorethe(asymptotic)behaviour
ofalikelihood ratioteststatistic indifferentsituationsinwhichtheWilks Theorem
mayapplyordoesnotapply becauseitsregularityconditionsarenotsatisfied.
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
18/20
Outlook
Thepresentedmethodcanbeextended tosituationswithanew unexpectedsignal,
whereaglobal statisticalsignificancemustbeestimated.
ToincludeproperlytheLook-Elsewhere-Effectasortofscanningtechniqueofthe
relevantmassspectraneedstobeimplemented.
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
19/20
Outlook
Thepresentedmethodcanbeextended tosituationswithanew unexpectedsignal,
whereaglobal statisticalsignificancemustbeestimated.
ToincludeproperlytheLook-Elsewhere-Effectasortofscanningtechniqueofthe
relevantmassspectraneedstobeimplemented.
Thiscancertainlyeither…
- increasetheexecutiontimeofthefitstobeperformed onthesingle fluctuation, and…
- requiretotrydifferentscanmodels (andrepeatthewholeprocedure) inorderto
evaluatetheassociatedsystematicuncertainty.
Inthissituation:
- theRooFit approachwouldbeunbearable(highlytime-consuming!),
- turningtoGPUswouldbemandatory ,
- GooFit wouldbethereliable&crucialtool.
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
19/20
Ifyouareinterestedtostartlearning&workingwithGooFit …
1)youcantakethetutorialbyR.Andreassen : http://indico.cern.ch/conferenceDisplay.py?confId=235992
2)GooFit sourcecodelivesinaGitHub repository:https://github.com/GooFit
3)youmaywanttoexchangeusefulfeedbacksontheGooFit GoogleGroup.
Thank you for your attention
Letmethankinparticular:
mysupervisorofCMS-Bari:: AlexisPompili (UniversityofBari&INFN)
LeonardoCristella (UniversityofBari&INFN)&Giacinto Donvito (INFN-Bari,Tier2
manager) &thesupportbyItalianProject20108T4XTM- MIURPRIN2010-2011 - STOALHC
- MikeSokoloff (UniversityofCincinnati)coordinatoroftheGooFit projectfundedbyNSF
(NSF-1414736 EnablingHEPattheInformationFrontierUsingGPUsandOtherMany/Multi-Core Architectures)
- BradHittle (OhioSupercomputerCenter)&Tommaso Dorigo (INFN-Padova)
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
20/20
BACKUP
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
20/20
Amdhal’s Law
Incomputer architecture,Amdahl's lawgives thetheoretical speedup when using multiple
processorsas afunction ofthefraction (P)of thecodethat canbeparellilised andofthe
number ofmultiprocessors (n)used.
ACAT-2016 /January18
CCRMeeting/
July5th th
AlexisPompili
Adriano
DiFlorio
(BariUniversity&
(BariUniversity&INFN)
INFN)
20/20