ShoreMT

Transcription

ShoreMT
In-MemoryPerformancefor
BigData
GoetzGraefe,Haris Volos,HideakiKimura,HarumiKuno,JosephTucek,
MarkLillibridge,AlistairVeitch
VLDB2014,
presented byNickR.Katsipoulakis
APreliminaryExperiment
• B-Treenodes
• 10GBofMemory
• Bufferpool
• Diskpages
• In-Memory
• Directpointersbetweennodes
RelatedWork– In-memorydatabases
• Workloadfits
• e.g.OracleTimesTen,SQLServerHekaton,MonetDB,SAPHana,VoltDB etc.
• Workloaddoesnotfit
• OSVMlayer
• Poorevictiondecisions
• Dataintegrityissues
• Compression(frozendata)
• Identifyhotandcolddata
• Stoica andAilamaki workonVoltDB
• Decreasestatisticcost
• Anti-Caching
Motivation
• Combinebestofbothworlds
• Nearin-memoryperformance(workloadfits)
• Buffer-poolperformance(workloaddoesnotfit)
• BufferPool
• Benefits
• largeworkingsets
• supportforwrite-aheadlogging
• Insulationfromcache-coherenceissues
• Drawbacks:
• Levelofindirection
Butfirst,theSystemModel
• TransactionalStorageManager
• ACIDguarantees
• Modernhardware(multi-corearchitecture)
• DataStorage
• B-Tree(onenodetoonediskpage)
• Leafnodesmaintaindata
• Bufferpool
• copiesofpages
• LatchesandLocks
• Write-aheadLogging
AflashbackatHarizopoulos’etal.observation
• Datasetin-memory
• Observations
• Buffermanagertakesup~30%of
bothinstructionsandcyclestotal
• Idea
• Fasterbufferpool
• Correctnessguarantees
Acloserlook– thesourceofallevil
Hash table
3 - Key1 maps to p_id_1
1 - lookup(key1, root)
5 - pin(p_id_1, &buffer)
p_id_1
2 - H(root)
6 – H(p_id_1)
“the disk”
4 – fetch(d_mem_1)
mapping structure
long pid_to_mem(long pid) {
…
return mem_addr;
}
long mem_to_pid(long mem_addr) {
…
return page_id;
}
Theirproposalforimprovingthebufferpool
• Decreasebufferpooloverhead
• Removetheaccesses tothecommonmappingstructure
• Pointerswizzling
• lazy
• notallpage-IDsareswizzled
• Contribution
• Bufferpoolre-design.Supportpointer(un-)swizzling
• Evictionpolicy
But,wait.WhataboutVirtualMemory?
• Correctnessrequirementsmightbeviolated
• writetooearly
• e.g.writeapagebeforetheloghasconcluded
• writetoolate
• e.g.missacheckpointbecauseadirtypagehavenotbeenwrittentothebackingstore
• recyclenon-persistentlogs
• e.g.logpageisrecycledbytheOSVMmanager,but,changeshavenotyetbeen
persistedtoactualstorage
• msync() &mlock() donotsupport:
• asynchronousread-ahead
• concurrentmultiplewrites
AlookattraditionalB-TreeNodesandthe
bufferpool
Flow-chartsforlocatingpages
Traditionalbuffer pool:
In-memory:
Proposedbuffer-pooldesignwithpointer
swizzling
BufferPool
Flow-chart
Proposeddesignwithswizzling
• Pointersareswizzled oneatatime
• Notallpointersareswizzled
• Pooleviction
• Generalizedclockscheme
• SweepB-Treeusingdepth-firstsearch
• Pageswithnorecentusageareun-swizzled unlesstheycontainswizzled parent-to-child
pointers
• Child-to-parentpointers
• Expediteun-swizzling
• Includeparent-frameinmetadata
ExperimentalEvaluation
• Shore-MT
• pointer-swizzling bufferpool
• traditionalbufferpool
• in-memory
• Testbed:IntelXeon(4socket,24cores),256GBRam,RAID-10with
10Krpmdrives
• 10GBBufferpoolwithO_DIRECT enabled
• 100GBdatabasesize
• Keysize20bytes
• Valuesize20bytes
Bufferpoolperformance– Query
performance
Bufferpoolperformance– Insert
performance
• 24threads
• 50millionrecords
• initially10millionrecords
• Randomlychosenkeys
Bufferpoolperformance- Driftingworking
set
TPC-CBenchmark
Conclusion- Thoughts
• Awaytocombinethebest-of-bothworlds
• In-memoryperformance(workloadfits)
• Bufferpoolperformance(workloaddoesnotfit)
• Questions
• Wasitreallythe“mapping-data-structure”thebottleneck?
• IfaNVMdatabasewasused,ispointer-swizzling theanswer?Dowestillneed
abuffermanager,ordoweneedageneral“memorymanager”?
• Thankyou!