ElasticSearch Log3

Transcription

ElasticSearch Log3
Experiences*in*Traffic*Logging*and*
Visualization*with*ELK*and*D3.js
Surasak Sanguanpong
Department of Computer Engineering
Faculty of Engineering, Kasetsart University
U-Bahn Station Candidplazt, Munich, Germany
Tech% Talk% Session,%WUNCA% 33 rd Chulalongkorn University,% July% 14%2016
In This Talk
About%Traffic%Log
Search%Platform%
with%ELK
Real%Time%
Visualization
with%D3.js
Lessons%Learnt
2
Log Monitoring
Collecting
Processing
Analysing
Visualising
Image:% https://www.flickr.com/photos/sbeebe/4772418919
3
At What Scale?
Hmm..Large..
4
http://www.24hourcampfire.com/ubbthreads/ubbthreads.php/topics/5976731/all/That_s_a_load_of_l
Traffic Logging Solution
Splunk?
Great, but..commercial, proprietary
Graylog?
Excellence, but too automatic
Elasticsearch, Logtash, Kibana, D3
That is!, a lot of fun to play
5
Chapter I
Log Architecture and
Raw Log Management:
A Case Study
6
Evolution of KU Traffic Logging Design
2008-2015
2015-
Simple GUI
Kibana/D3
MySQL
Elasticsearch
Raw Log
Raw Log
7
Logging Architecture
Login Log
Network
Mirror
packets
Logging
Engine
Login
Web Log
Search GUI
Login/
Logout
Packet
Log
8
Login Log Format
Date
Time
Action
IP
UserName
LogServer
Jul 1 10:04:57 login
158.108.X.X [email protected] 192.168.1.1
Jul 1 10:04:58 logout 158.108.X.X [email protected] 192.168.1.2
Jul 1 10:04:59 timeout 158.108.X.X [email protected] 192.168.1.2
9
Web Log Format
UnixTime SrcIPv4 SrcIPv6 DstIPv4 DstIPv6 SrcPort DstPort URL Referer/HTTPS
20151103010000 192.55.X.X - 158.108.X.X - 17490 80 mirror1.ku.ac.th/fedoraepel/6/i386/jday-devel-2.4-5.el6.i686.rpm http://mirror1.ku.ac.th/fedoraepel/6/i386/
20151103010000 10.X.X.X - 203.104.175.X - 62635 80 sg-nvapis.line.me/
ping?&msgpad=1446487199964&md=9LMRXqv1Nb8P07aj0Vo%3D –
20151103010000 - 2406:3100:1018:1::XX - 2600:1417:a::174c:XX 61154 443
fbcdn-photos-g-a.akamaihd.net HTTPS
20151103010000 - 2406:3100:1018:1::XX - 2a03:2880:f002:105:fa:b0:0:YYXX
59960 443 edge-mqtt.facebook.com HTTPS
10
Packet Log Format (Header Log)
TimeStamp SrcIP DstIP SrcPort Proto Size DstPort SrcPort [Flag]
2009-07-16 17:53:59.999206 208.117.8.X 158.108.234.X 1514 TCP 80 1371 0x10
2009-07-16 17:53:59.999209 158.108.2.X 202.143.136.X 90 UDP 123 123
TimeStamp SrcIP DstIP Proto Code
2009-07-16 17:53:59.999210 158.108.184.X 218.164.54.X ICMP 168
11
Example of Log Folder
Time based
Hierarchical
Folder
Year
Month
Day
Hour
00
01
01
2015
02
:
01
:
02
:
:
23
30
Minutes%File
201501010000.txt
201501010001.txt
:
201501010059.txt
201501012300.txt
201501012301.txt
:
201501012359.txt
12
12
Minutely HTTP Log
11"days"(11x"24x60="15,640"data"points)
13
Request Rate and Log Sizing
14
Accumulated Log Request and Size
20M
14.1B
2.04"GB
2.57"TB
#Files":"120
#Files":"172,800
3.27T
28.03"TB
#Files":"172,800
15
Log Processing and Search Services
• On the fly Text based
Log to MySQL
converter
• Slow processing/
searching time
• Simple Search
16
Chapter II
ELK Stack Testbed
17
What is the Elasticsearch?
Real\time
Search/Analytic
Engine%SW
Document\
Oriented
REST%API
&
JSON
JAVA/
Lucene
based
Distributed
Scalable
Plugin
Architecture
Open"Source
Apache" 2"License
REST:%Representational%State%Transfer
JSON:%JavaScript%Object%Notation
18
What does Elasticsearch offer?
Full%Text%Search
Very%Fast
Fault%Tolerance
High%Availability
19
How the world is using Elasticsearch?
Full-text search with highlighted
search snippets
Providing search across
GitHub's code
Analytics solution on 40 million
documents per day to deliver
real-time visibility
Full-text search to find related
questions and answers
20
Elasticsearch and Big Data
ES-Hadoop: Connectivity of Hadoop's big data analytics and the real-time search of Elasticsearch.
https://www.elastic.co/products/hadoop
21
ELK stack from Elastic
Logtash: Log transport
and processing daemon
Elasticsearch: Highperformance
scalable search engine
Kibana: Visualisation
dashboard
ELK Stack
22
Logtash
Log aggregator and parser
Transferring parsed data
to Elasticsearch
Configuration file for
specifying input, filtering
(parsing) and output
input%{%stdin {%}%}
filter% {%%
grok {%%%
match%=>% {%"message"% =>%"%{COMBINEDAPACHELOG}"% }%%
}%%
date%{%%%%match%=>% [%"timestamp"% ,"dd/MM/yyyy:HH:mm:ss"% ]%%
}
}
output%{%%Elasticsearch {%hosts%=>% ["localhost:9200"]%}%%
stdout {%codec% =>%rubydebug }}
23
Kibana
General purpose query UI
Includes many widgets
Query Elasticsearch without
coding
24
Alternative Stack
ELK
EFK
25
Elasticsearch Indexing Performance
• Xeon E3-1271v3 3.6
Ghz 4C/8T
• 32 GB RAM
• 2x6 TB NLSAS
• Elasticsearch
2.3.2
• 10 Shards/0 Replica
• Hyper-threading off
• Web Log Indexing
250
#Records
Records/s
200
45
44
43
THOUSANDS
• Single Dell R220
MILLIONS
Daily**Performance*Indexing
42
150
41
40
100
39
38
50
37
36
0
35
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26
Search Performance
Search keyword:
“ face” against each
daily log
3.50
3.33
3.00
3.00
2.00
1.50
2.33
2.01 22,816 1.99
18,218
16,346
16,240
17,551
23,559
2.00
2.67
30,000
2.43
25,405
2.00
25,000
2.14
22,092
18,054
17,683
1.33
15,000
9,127
7,958
10,000
8,221
5,622
5,000
1,886
1
2
3
4
5
6
7
20,000
12,951
12,343
1.02
0.50
40,000
35,000
2.67
28,259
2.33
2.13
3.33
33,528
2.67
1.00
0.00
3.00
2.67
2.50
SEARCH"TIME"(MS)
Not yet Optimization
Search "Performance"an d"Hits
8
9
10
11
Search%Time%(ms)
12
13
14
15
16
17
18
19
0
Hits
27
Kibana: Main Dashboard
28
Kibana : per IP Log
29
Kibana: Login Profile
30
Kibana: Concurrent Login View
31
Chapter III
Playing with D3.js
32
Real Time Visualization with D3.js
• Data-Driven
Documents (D3)
• JavaScript library for
manipulating
documents based on
data
• Developed by Mike
Bostock
https://d3js.org/
33
D3 Architecture
! Input data to build
visualizations (JSON,
CSV,…)
! Data manipulation of HTML
elements dynamically with
JavaScript
node.js
socket.io
34
Sample Gallery
35
Real-time makes impression
Norse%Live% Attack%Map%
http://map.norsecorp.com/#/
36
Concurrent Login
37
IP Matrix Occupied
38
Tree Map Web Access
39
Traffic Connectivity
40
Chapter IV
New Log Design
41
New Logging Architecture
Login Log
Network
Mirror
packets
Logging
Engine
Elasticsearch
Real time
Indexing
Web Log
Login
Login/out
event
DHCP,
RADIUS
Session"
Tracking"&
Accounting
Flow
Log
Elasticsearch
GUI/
Analytics
42
Logging Redesign
User"
identification
Legal"Logging
Real^time
Accounting
User
Session
Control
Traffic
Analytics
SIEM
Supports
Performance
Management
43
New*Login*Log*Format
• Real-time logging, one file per day
• Fields
login_session_id user login_timestamp logout_timestamp mac_address ipv4 ipv6
agent_ip agent_type via_ip ipv4_byte_in ipv4_byte_out
ipv4_pkt_in ipv4_pkt_out ipv6_byte_in ipv6_byte_out ipv6_pkt_in ipv6_pkt_out
• Sample Log
67686345 [email protected]
login – 0 0 0 0 0 0 0 0
67686346 [email protected]
67686345 [email protected]
2001:db8::1 203.0.113.5 login
1467551484.163681 0 001122334455 192.0.2.1 2001:db8::1 203.0.113.5
1467551490.524125 0 - 192.0.5.5 - 203.0.113.1 login – 0 0 0 0 0 0 0 0
1467551484.163681 1467551833.754636 001122334455 192.0.2.1
– 234342 423442 5522 6622 233456 22334 445 665
New*Web*Log*Format
• Real-time logging, one file per minute
• Fields
request_timestamp {flow link fields} {login link fields} {ip info fields}
{tcp info fields} method host path referrer agent
• Sample Log
554455 1467551484.180000 67686345 [email protected] 1467551484.163681
4 192.0.2.1 198.51.100.1 tcp 5566 80 GET www.domain.com /index.html - “Linux”
Traffic*Flow*log
• Log commit periodically (Configurable 1 minute to 1 hour interval)
• Fields
• flow_id flow_start_timestamp {segment info fields} {login link fields}
• {ip info fields} {tcp info fields} {tcp additional info fields} {tcp stat fields}
• Sample Log
554455 1467551484.180000 1467551484.180000 1467551492.954258 18 20 1628 25456 223344
f 67686345 [email protected] 1467551484.163681 4 192.0.2.1 198.51.100.1 tcp 5566 80 1 - 1428 1428 864 24522 3 17 2 2 0 30000 0 30000
Chapter V
Lessons Learned
47
Lessons Learned
Elasticsearch offers a very fast full-text search services
Indexing size may 3x to 5x bigger than source data
Use Elasticsearch for search services, not for data archiving
48
Lessons Learned
Logtash : A powerful tool to manipulate log
Kibana : Simple and useful for visualize data
49
Lessons Learned
D3 pros
Flexible, Facsinating Visualization
D3 cons
Low Level, Steep Learning Curve, CPU intensive
50
Lessons Learned
Combination of
Lawful Log,
Security information and event management (SIEM) and
Accounting
51
Kasom Koth-Arsa
Thank you for your attention
Core Log Design and Development
Jautuporn Chuchuay
Peerapol Boonthaganon
Web GUI Development
Q&A…
Sataporn Techaaramwong
Web/Elasticsearch Development
Peerapong Thongpubeth
Jiradech Sirijantadilok
Kibana Development
Poomipat Thongudom
Nichapat Nattee
Q & A Time
D3 Development
Surachai Chitpinijyol
Project Coordinator
Surasak Sanguanpong
Project Director
Sunset at Narita Airport
Special Thanks to Kasetsart Office of Computer
Services for supporting traffic data
52

Similar documents

Office Profile - Integra Realty Resources

Office Profile - Integra Realty Resources includes over 30 years of consultation and valuation analysis for the general public on commercial and residential properties. Recent experience is concentrated in major urban and suburban developm...

More information