Large-scale log compressing system based on differential
Transcription
Large-scale log compressing system based on differential
36 Z1 2015 11 Vol.36 No.Z1 November 2015 Journal on Communications doi:10.11959/j.issn.1000-436x.2015300 1,2,3 1,2 1,2 (1. , 100093; 2. , 100093; 3. 100049) !"#$% &' ()*+,- ./012345 678 9:;6<=> 6?@ ABCDE+FG2HIJ KLMNOP:Q,RSTUVWCXY KLMN OZ[+\]9^_` gzip aRCbcF6M>Fd2 2~10 efb gzip g hTij 10%+ kNOkKLMkJ TP302 A Large-scale log compressing system based on differential compression TANG Qiu1,2,3, JIANG Lei1,2, DAI Qiong1,2 (1. National Engineering Laboratory for Information Security Technologies, Beijing 100093, China; 2. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China; 3. University of Chinese Academy of Sciences, Beijing 100049) Abstract: The scale of log data produced by the large scale information system is growing rapidly. It leads to the big challenge of line-speed compressing and saving the large scale log data. By analysis on massive network log data, it is found that the log data has redundant pattern in terms of log structure and time local similarity in terms of log content. A differential log compression architecture based on template is proposed. Fine-grained differential compressive strategies in the architecture can be configured for a special log data. Experimental results show that, compared with gizp, the proposed log compressing architecture improves 2~10 times’ compressive speed and gain a better compressing ratio approaching to 10%. Key words: log; differential compression; fine grain; template 1 !"#$% !"&'()*+,-./01234 5678 Web 9:45;<=>45;?@AB !"45C#4567D9:EF;!"G;H IJK;67LMCNO67P[1~4]QR SOTUVW !"4567#X67; YZ[\]^_` !"ab4cde fg[\hi9: !"jklmnoT pq`!"4567rstuvwxoT#y z{|}%~!"SO12z} *z67F.VWJK l4567{|45!"D}X #QR;VWX45 67Dt^ !"S# 67U ¡4567 ¢£¤¥¦"§¨UF.45 67VW8 Linux 45©ª§¨ logrotate 2015-11-11 lmn*[EompqrHsturvwXDA06031000x Foundation Item: Special Pilot Research of the Chinese Academy of Sciences (XDA06031000) 2015300-1 198 «¦ gzip U{4567F.¬VW# ®D¦"¯°±²4567VW ³'V´²kµ#}¶f"° ¯·¸¹z´²Q3¯°º »J¼4567,½¾·¿À}¶f¯ °lXÁ¥ÂêX456 7#ÄÅ`Æ{X4567VWÇ ÈÉXlÊ˧#ÌÍ[5]¼ Web 456 7Î϶fÐVXlÊÑ·¾·h u}Ò4567aÓÔ*ÕÖ45Ö} ×45ÊØ67ÙÓ3ÊØ67TÚÛ¿ &ÊØ67ÜÝÞ¿ßà{aÓà4567 ᦯°âF}ã¿ä¶° &åD{|&Øæç4567èØé °#ÌÍ[6~8]êàëìíîï;&Øíîð ñòóôC{ Apache Web 45F.ð JõÌhu¶°ö÷}ÊÑp îDäÌͶ°øùúûüý·q DþÆ{| Apache Web 45øù{ æç4567¯·#Kimmo Chu |4567Nd67x<= >45°[9]#ÌÍ[10]¯ JK DNS 456 7¾·Æ{ DNS 4567]';IP );æç 4 æ67JF.&ض °#ÌÍ[11]hu|¯°456 7ê°¯ ê°J6* 4567JØ45homogeneous buckets ßà{&ØØ454567¦ "° bzip2 gzip F.#|Ø 45Ï45ÎÚÊÑQR¦ bzip2 gzip #ÌÍ[12]¦ ! FPGA "t LZ4 °{È#$%!"&:45F. Ý' LZ4 ´¢£¤(bþ ! )`Ú*+#,45§±²45 VW³'®DV¥¤-.&å/1¯·ð 01|¨23}æ45¿2&¨4·&5 6óôúû¿378ÚóôÆ{& Ø67æçèÊØóô¿4&569xð JÁ¥:åÂ"]45S# õÌÆ{¥;&åhu}Ò56X4 5679xðJ<#ä<=ê»J¼ 4567ê>?@*}æ4567A&a 67hB3C45]C 36 D67EVW}CFÆ#G]H 7CÏIdJ4567¿ßàÆ{456 7&Øæç67ÔzðJóô#õÌ huX45!"&01|¾45æ ç{|&Øæç4567¥úû&ØK8 ÚðJóôä!"è9xðJ< Lq¦õÌhu45!"¨MN ´Ú*+# 2 4567Ox¥D!"P,D¦ Q ¦ syslog 4 5 O x [13] à P RFC3164 RD#«}×45 S*ê TU ; SVUS 2U3 JWz*ê T D} 6 í « ^ X W 45 Y Z facility[N·severity¿V\]]' ^¿ 2D¨245Ï#| syslog 4 5RED}_`Raz 2\] Xl45b·®$%ÏcdOxºe D!"P,.D#¥f4 5!"D*z3}Ò56OxD{« F.4#8<=>HIgD¦ NetIQ hihu WELF 45Ox#Lj"!" èk 21 GB j"4567$67kj" !"<=>lm;ön45;$Óo 345?@AB!"IDS45;VPN 45 pqAB!"45W#¯ {ä67kJK trÖ45!"V¥¤ 3 Ns¾t# 1) 45OxuÊÑ#rÖWvwx456 7OxæÑ| syslog RFC3164 ROxP yR45Ox8 WELF#ÖP8ön<=> 45Ox8X 1 z ¿àP}×45 ¯{DS<íîíîÛ>U|Û{W &Ø45!"}|Û~|Û{p'J ;Kð8X 1 à 3 ×456 7lm<=>45|Û{J3³O |Û~3Cq IDS 45J3 J|Û~3J# 2) 4567VÎx#LX 1 ?Ø}æ4567¥ÒOxXz Î;VXlÊØx8ön<=> 106017 æç45X 1 1; 2 .Ð] SASA*Deny IP due to Land Attack from*to*U 2015300-2 Z1 x¿lm<=>jkÔ45X 1 3; 4 .Ð] Sid=*time=*fw=*pri=*Ux# zxXzaÏí3 &aÏIÎx # 3) 4567b·ÛæçDA¨f ]'·ÊÑ·#I{|Ø}æ4567 I¦67cdOx;ðñ®~ DÊØ#8&Øi<=>Wj kÔ4567Ð\]3}<=>~ c;]';MAC );W45; <=>óô ID C ¿}¨245!" $%b·ÛæçDe]'$ %b·ÛDÊÑ8]'b·ÛED6 ðñ;ù6 IP )cNdutC# 1 !"#$%&'()*+,-./ ! Apr 29 2015 23:49:55 ASA5585: %ASA-2-106017: Deny IP due to Land Attack from 10.0.1.5 to10.0.1.6 Apr 29 2015 23:00:09 ASA5585 : %ASA-2-106017: Deny IP due to Land Attack from 10.0.1.7 to10.0.1.8 id=tos time="2015-04-30 08:01:49" fw=TopsecOS pri=6 type=conn recorder=session src=10.0.1.5 dst=10.0.1.6 proto=tcp sport=49726 dport=10050 inpkt=9 "# ! outpkt=10 id=tos time="2015-04-30 15:01:49" fw=TopsecOS pri=6 type=conn recorder=session src=10.0.1.5 dst=10.0.1.6 proto=tcp sport=2450 dport=88 inpkt=5 outpkt=4 $% IDS time:2015-04-30 12:11:02;danger_degree:1;breaking_sighn:0; event:[30061]DNS &'()&'*+,-./01; src_addr: 10.0.1.5;src_port:58729;dst_addr: 10.0.1.6 3 3.1 | 2 Ë| 21 GB 4567JKÎ õhu}Ò|C45Ô45 67VÎxxòD8¤# 1íîfield #íîDíîfld íîÛvalW|Û{fldλval zSλU 3íîíîÛp'~#}íîXz} ¨245b·8X 1 lm<=>jk Ô45P IP )íî3Ssrc= x.x.x.xU z src 3íîP IP Sx.x.x.xU3íîÛIP )íîíîÛ~3C# 245 , log_msg #}×45 X z}&!«D íîWíîp' 199 êà¡Z;íîJθÊ~¿I log_msg=field1θfield2θfieldNzfieldi=fldiλvali# 3 45log#45D×45 Wkïlog={log_msgi | ię1,2,3,,M; log_ msgi=field1iθfield2iθ fieldNi; fieldji=fldjiλvalji, j¢1, 2,,N}# Ö}JK?,45OxõÌ xòD45OxfÊÑQRES{£ 4567F.¤¥Ãª*4567ÙÓ3 u45Ox(log)#4567íD¦ §í&]ÔæíQR¥¨©Ôæí 3 λ θ#{|Ø}æ4567¥¯ ª ÜX«x¬úyÌõêõ®W45ê# Ø}!"45VÎx/ 45 íîcWÎDA+íî DÊØI log_msg1 log_msg2 \]ÊØ Ï / fld1λ,θfld2λ θfldN ¿ & Ø J D / val1, val2,,valN#|R¾tõÌhu45! "*45Îx¥íî3¥¯hB3 Cßà*CDxL45 45 VWCFÆ#¨2C xò°±8¤# 4 Ctemplate#}Cíî íîÛ~;íîJ;+íî¥ äC ID tid W I template={tid, λ, θ, fld1, fld2,, fldN}# QR | C 45 ²N Ô^ Oã ³ 8¤# Step1 45ê# Step2 {´}×45 log_msgi F.8¤ # 1) Cµ¶· log_msgi ʬúC templatek¿ 2) * templatek DíîL log_msgi * templatek C ID ¸?45 àb_|CaÓà log_msgi 3Stidkθval1iθval2 i θθvalN iU # {|à4567H7CFƯ ¹ .|Cº²N I»dJ¼4567# 3.2 |C4567Nà±²45VW ³'®½VXl yzD]' Øæ45 íîp'VÊÑ·#3F 2015300-3 200 }ã4567³'õÌ45ðJ< {|C²Nà4567âF}ãK8Ú ðJ#=êD-æ¯ðJóô diff_strgy¿ßàH7&ØíîÛ¾·¨© +íîÛ¾ïðJóô{|C²Nà 4567F.íîTðJ#3564 5 Â45ðJE¨©Ö}× Øæç ¡45 F.ðJ¿|45 ¨]'ÊÑ·¥äðJóô Ý'45]'³'´# 3.2.1 ´ í î K 8 Ú ð J FFDE, finegrained field differential encoding¥°±3} c/ffde=( fld, fld_type, diff_strgy, initVal, size) c+b·D8¤# fld/íî# fld_type/íîÛæçJ3í¿;u6 2 X æzu6J3 8 ¯16 ¯32 ¯64 ¯ u6¿À.6Ùò3F6Á6 2 u6Xz# diff_strgy/íîÛðJóôõ45!" D 4 æðJóô¨2DÂX 2{|&Ø ¾·íîè&ØðJóôr4ÃðJó ô¥56Ãæçíî# 2 23 4 5 678const9 :7;<=6>?@ABCD@A7EF6G HI@AJBKLMEFNO@A7PQLM RISEFTU UV8copy9 WX@A78val9YXZ[\]^_`ab @A78val'9c\MPEFNOWX@A7de fghD@A7 78delta9 ijWX@A78val9YXZ[\]^_`a b@A78val'978∆9 BakP∆=val i –val'd a@lm7n:c\op@lmPq ∆ r@ lst>\@lumBvaMw@lm7P val=20150501001324, val'=20150501002326Pf ∆=val i –val' =1224 qx8other9 yz{:|6G23@A7}~BS dqx P@A initVal/íîļÛ#Õ}× ðJ ]* initVal 3 ¡Ûval'ÕÖíîÛ âðJ-#ÕíîðJóô3SÛU]Ü initVal b·ÛD3äíîÛ# size/Å@¦ðJóôb_íîðÛ¦ TÆÇÆÇ3 size íÈDaTÆÇ 36 #TÆÇIÝ6J¼Û¿aTÆÇè size=0 Q¦ LEB128[14]aTÆÇI}í 7 bit Xz"É67¾¯Å@ÕÖí DÊ3aT67à}í8 0 XzÎË# {|ðJàíîÛBÛRÌA8 IP )}îÛRÌ3 0~255 C| 1 í X| 4 íTÛèTÆÇÍpèa TÆÇ#$饦VW³'«_*#8} 32 3u6ÛèaTÆÇ]ÕzÛ '3[0, 127]ÎÏ}Û]ESO}íV WXzq&D 4 í# 3.2.2 ðJóô¸¹íîÛ¨ 3 ÒSÛVU/ 0/V;&V;×!V#SVUXzíî S& ðJpàÐDÛ8SðÛUíîðJóô¿ VUXzíîðJàÐDºÛ8SÛUð Jóô¿S×!VUXzíîðJà07×! ÑÛrÑÛ#8íîðJóô3SdÔU ÕÖ}×45 ÛÊØ]ÕÖ45 äÛðJàíîÛ¿ÍpÜÝÞäÛð JàíîÛ#QR}×ðJà45 \ ]íîÛ6r&A#3ªeGÇ45 }×ðJ45 SOVW}íîÛV ¯ÒglFVPB, field value presence bitmap FVPB ´}¾{45}¨S× !VU/0íîû¯XzíîÛV¿d¯ XzíîÛ&V#{|SVUS&VU íîÛÜ&SO FVPB Åz# 3.3 Îï 3.2 3.3 õÓu|C45 67K8ÚðJÐ2<#45ê¯ C² Nßàá{45íîÛF.K8ÚðJ# $ÔPÐD45ðJCD8¤# 5K8ÚðJCtemplate′ #}K 8ÚðJCíîÛV¯Ògl;+íî K8ÚðJc;C ID ;íîíîÛ ~íîJWI template'={ffdetid,λ,θ, ffde1,ffed2,,ffedN }#zC ID r345 Õy}Ö¯íîffdetid¦SdÔUð Jóô×ù~ØØæç45C ID VW#|K8ÚðJC45ðJ° °±3° 1# íîðJDÕÖÛvalØæçÖ} 2015300-4 Z1 ×4567{íîÛ(val')¹.ðJQR u45ðJ YSOÙÚ}ðJíÛ |12Ö}×Øæç4567íîÛ#° 1 1 .3´CD}ðJíÛdict[k][0 N]zļÛ3{CíîļÛinitVal ¿ ´¹.®}×45 ðJ-ÕÖ45 íîÛÃäðJíÛ° 1 6 . #{|´ }×45 =êSOe¾ïzCßà H7C{ä45 F.K8ÚðJ° 1 3~7 . # 1 |K8ÚðJC45° ?/45 log={log_msgi |i=1,2,,M}; ðJ Ckï temp_set={template'k|k=1,2,,K};z template'k={ffdetidk,λ,θ, ffde1k,,ffedNk}, ffedjk= (fldjk, fld_typejk, diff_strgyjk, initValjk, sizejk), j=1,2,, N }; u/ðJ4567; 1) ļòCíîðJíÛ dict[k][0N]={k, initVal1k,,initValN k }; 2) {|´45 log_msgi¹.8¤ ¿ 3) H745 ζ·zÊ{ C template'k¿ 4) {C ID k log_msgi ´íîÛ i valj ¹.8¤ ¿ 5) 07 diff_strgyjk { valji dict[k][j]F.ð J-¿8ÜÕíîðJÆÇ/03S×!VU ðJàÛÜ FVPB[j']=1;ÊÜ FVPB[j']=0¿/*j' 3ÕÖíîíîÛVW¯Ò¾¯*/¿ 6) ÃíîÖÛíÛ dict[k][j]=valj; 7) u FVPB +íîðJàÛ# 4 3 201 [ /KB fw_log(1w9 1 607 fw_log(10w) 10 6 071 fw_log_(20w) 20 12 143 fw_log(150w) 150 94 831 {$ 4 67kJ¦õÌhu|C K8ÚðJ<¯§¨ gzip F. B쨩 gzip F.äåDQ3¦É P45©ª§¨ logrotate ¦æD gzip 3z4 5°¿qæÑ45ÌÍ[6~9]"> {rD gzip Ê#2 §¨{|+6 7k]'´J8Ò 1 Ò 2 z | gzip ¥ûzä6Ûç¥û3 ;´x xÚ;²´ Úç;´Ö¯xÚ ¥"ÉBì ´è|Ö 2 Òxp' {|Ø}67kJ¦ 3 Ò&Ø gzip xF .Bì#LÒ 1 ?õÌhu|Cð J45!"Ú;Ö¯x gzip 2~5 é´x gzip }6 lTÒ 1 ]'êë3{6ìÚ # õ¯ Bìj" !"4567JK õÌhu56X45!"´#^O ÝÞz´Ú# ">ßà/êá3 Intel Core I3-3240V 2 GBFedora-14 !"# |huðJ<D9x§¶xIE Ö}×45 âðJQRÐ]â4 5WªË!67kBìIãM ä<Ú#3b_Bì67k ´ÚõLj"lm<=>jk Ô4567khuJíî cW 4 67k 67k¾·°±8X 3 z# 1 Mwa LÒ 2 ?õÌhu|C45ðJ <´Ð²| 3 Òx¤ gzip ´_«3 10.5%# Ò 3 ° ± õ Ì h ¶ ° gzip { | fw_log(150w)67kÚ´ 2 ¶f u2{#zí/ÒXz´îïðñÒ Xz]'#Ò 3 XMõÌhu45ðJ < gzip ʨMNÚ*+ 2~10 éØ]´r*| gzip#$DïªQ3 2015300-5 202 |CðJ»J¼4567,½ ¾·Öæ(b4567ê>?@ Cq¯ gzip §¨º$%ê>?@# 36 International Conference on Advanced Research in Computer Science Engineering Technology (ICARCSET 2015)[C]. 2015. 1-6. [5] SKIBIŃSKI P, SWACHA J. Fast and efficient log file compression[A]. Proceedings of CEUR Workshop of 11th East-European Conference on Advances in Databases and Information Systems(ADBIS 2007)[C]. 2007. [6] GRABOWSKI S, DEOROWICZ S. Web log compression[J]. Automatyka/Akademia Górniczo-Hutniczaim Stanisława Staszicaw Krakowie, 2007, (11): 417-424. [7] DEOROWICZ S, GRABOWSKI S. Efficient preprocessing for Web log compression[J]. International Journal of Computing, 2008, 7(1): 35-42. [8] DEOROWICZ S, GRABOWSKI S. Sub-atomic field processing for improved Web log compression[A]. Proceedings of IEEE International Conference on Modern Problems of Radio Engineering, Telecommunications and Computer Science[C]. 2008.551-556. [9] HÄTÖNEN K. et al. Comprehensive log compression with frequent patterns[A]. Data Warehousing and Knowledge Discovery[C]. 2003. 360-370. 2 [10] , , g¡. Z¢£¤ DNS ¥~[J]. ¦¥ a §¨, 2010, 36(15): 32-35. WANG Y F, WANG Z, YAN B P. High efficient DNS log compression algorithm[J]. Copular Engineering, 2010, 36(15): 32-35. [11] CHRISTENSEN R. Improving compression of massive log data[EB/OL]. http://www.erg.utal.edu, 2013. [12] JANG J H, et al. Accelerating forex trading system through transaction log compression[A]. SoC Design Conference (ISOCC), 2014 International[C]. IEEE, 2014. 24-75. [13] LONVICK C. RFC 3164: The BSD Syslog Protocol[S]. Network Working Group. [14] LEB128 [EB/OL]. http://en.wikipedia.org/wiki/LEB128, 2015. 3 IY gzip H89Y p89a 5 |C9xðJ45!"=ê4 567Îx#z¼4567] ';ÊÑ·¯ úû¾|45Ïb· íîðJóô¹.ðJF}ã±²4 567VW³'#|è9xðJ¦õÌ h45<¨NsÚ*+#ðJ óôúû·¦ä<¨¯·4·ä ¶°¥|}ò45# 1985-, !"#$%&'()*+ 1984-,-./ 01 2 !"#$%& '()*+ [1] [2] [3] [4] YEN T F, et al. Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks[A]. Proceedings of the 29th Annual Computer Security Applications Conference[C]. 2013.199-208. BREIER J, BRANIŠOVÁ J. Anomaly detection from log files using data mining techniques[A]. Information Science and Applications[C]. 2015.449-457. DUMAIS S, et al. Understanding user behavior through log data and analysis[A]. Ways of Knowing in HCI[C]. 2014. 349-372. SRIVASTAVA M, GARG , MISHRA P K. Analysis of data extraction and data cleaning in Web usage mining[A]. Proceedings of the 2015 2015300-6 1975-34567 8 9 2&':;<=> 1%&'?@=ABC*+
Similar documents
2004_ Januar
.. Periodical .. of the International Flying Dutchman Class Editorial Office: Bergsepad 4, 1244 PS Ankeveen, Holland Phone: +31-(0)35-6563195 Fax: +31-(0)35-6564004
More informationfQ Q`pk AwGQ uhcd ~zuhuydgNv dO`vdQ}|c Vgk gw}k
È|~mz§m¥Ó mm°~mÆÊm¿Óm§mº ʧrmȲ~ mξ¹m²¦m²|mƦ °~
More informationDie BUCHSTAVIER - Das Dosierte Leben
Das Dosierte Leben Das Avant-Avantgarde-Magazin 16. Jahrgang
More information14 chassis
/ iÊ `iLÀVÊ ÌÀÕÃÊ ÌÀiÀÊ vviÀÃÊV«iÌiÊVÌÀÊÛiÀÊÞÕÀÊ ÌÀÕÃÊÃÞÃÌiÊÊ>Ê«ÜiÀvÕÊÞiÌÊ V«>VÌÊ «>V>}i°Ê / iÊ ÃÞÃÌiÊ ÕÌâiÃÊ «ÕÃiÊ Ü`Ì Ê `Õ>ÌÊ ÌÊ Ài}Õ>ÌiÊÌ iÊ>ÕÌÊv...
More information