ShoreMT
Transcription
ShoreMT
In-MemoryPerformancefor BigData GoetzGraefe,Haris Volos,HideakiKimura,HarumiKuno,JosephTucek, MarkLillibridge,AlistairVeitch VLDB2014, presented byNickR.Katsipoulakis APreliminaryExperiment • B-Treenodes • 10GBofMemory • Bufferpool • Diskpages • In-Memory • Directpointersbetweennodes RelatedWork– In-memorydatabases • Workloadfits • e.g.OracleTimesTen,SQLServerHekaton,MonetDB,SAPHana,VoltDB etc. • Workloaddoesnotfit • OSVMlayer • Poorevictiondecisions • Dataintegrityissues • Compression(frozendata) • Identifyhotandcolddata • Stoica andAilamaki workonVoltDB • Decreasestatisticcost • Anti-Caching Motivation • Combinebestofbothworlds • Nearin-memoryperformance(workloadfits) • Buffer-poolperformance(workloaddoesnotfit) • BufferPool • Benefits • largeworkingsets • supportforwrite-aheadlogging • Insulationfromcache-coherenceissues • Drawbacks: • Levelofindirection Butfirst,theSystemModel • TransactionalStorageManager • ACIDguarantees • Modernhardware(multi-corearchitecture) • DataStorage • B-Tree(onenodetoonediskpage) • Leafnodesmaintaindata • Bufferpool • copiesofpages • LatchesandLocks • Write-aheadLogging AflashbackatHarizopoulos’etal.observation • Datasetin-memory • Observations • Buffermanagertakesup~30%of bothinstructionsandcyclestotal • Idea • Fasterbufferpool • Correctnessguarantees Acloserlook– thesourceofallevil Hash table 3 - Key1 maps to p_id_1 1 - lookup(key1, root) 5 - pin(p_id_1, &buffer) p_id_1 2 - H(root) 6 – H(p_id_1) “the disk” 4 – fetch(d_mem_1) mapping structure long pid_to_mem(long pid) { … return mem_addr; } long mem_to_pid(long mem_addr) { … return page_id; } Theirproposalforimprovingthebufferpool • Decreasebufferpooloverhead • Removetheaccesses tothecommonmappingstructure • Pointerswizzling • lazy • notallpage-IDsareswizzled • Contribution • Bufferpoolre-design.Supportpointer(un-)swizzling • Evictionpolicy But,wait.WhataboutVirtualMemory? • Correctnessrequirementsmightbeviolated • writetooearly • e.g.writeapagebeforetheloghasconcluded • writetoolate • e.g.missacheckpointbecauseadirtypagehavenotbeenwrittentothebackingstore • recyclenon-persistentlogs • e.g.logpageisrecycledbytheOSVMmanager,but,changeshavenotyetbeen persistedtoactualstorage • msync() &mlock() donotsupport: • asynchronousread-ahead • concurrentmultiplewrites AlookattraditionalB-TreeNodesandthe bufferpool Flow-chartsforlocatingpages Traditionalbuffer pool: In-memory: Proposedbuffer-pooldesignwithpointer swizzling BufferPool Flow-chart Proposeddesignwithswizzling • Pointersareswizzled oneatatime • Notallpointersareswizzled • Pooleviction • Generalizedclockscheme • SweepB-Treeusingdepth-firstsearch • Pageswithnorecentusageareun-swizzled unlesstheycontainswizzled parent-to-child pointers • Child-to-parentpointers • Expediteun-swizzling • Includeparent-frameinmetadata ExperimentalEvaluation • Shore-MT • pointer-swizzling bufferpool • traditionalbufferpool • in-memory • Testbed:IntelXeon(4socket,24cores),256GBRam,RAID-10with 10Krpmdrives • 10GBBufferpoolwithO_DIRECT enabled • 100GBdatabasesize • Keysize20bytes • Valuesize20bytes Bufferpoolperformance– Query performance Bufferpoolperformance– Insert performance • 24threads • 50millionrecords • initially10millionrecords • Randomlychosenkeys Bufferpoolperformance- Driftingworking set TPC-CBenchmark Conclusion- Thoughts • Awaytocombinethebest-of-bothworlds • In-memoryperformance(workloadfits) • Bufferpoolperformance(workloaddoesnotfit) • Questions • Wasitreallythe“mapping-data-structure”thebottleneck? • IfaNVMdatabasewasused,ispointer-swizzling theanswer?Dowestillneed abuffermanager,ordoweneedageneral“memorymanager”? • Thankyou!