CyberEagle: Automated Discovery, Attribution, Analysis and Risk

Transcription

CyberEagle: Automated Discovery, Attribution, Analysis and Risk
Summer 2011 Company Meeting
CyberEagle: Automated Discovery, Attribution, Analysis and Risk
Assessment of Information Security Threats
Saby Saha, Narus
Lei Liu, Michigan State University
Prakash Mandayam, Michigan State University
Narus Company Confidential
1
CyberEagle
•
•
•
•
•
•
Motivation and Challenges
Project Layout
Architecture
Statistical Machine Learning/ Data Mining
Results
Conclusion & Future Work
Narus Company Confidential
2
Increasing Security Threats
Zeus: 3.6 million machines [HTML Injection]
Koobface: 2.9 million machines [Social
Networking Sites]
TidServ: 1.5 million machines [Email spam
attachment]
• Continuous and increased attacks on infrastructure
• Threats to business, national security
• Huge financial stake (Conficker: 10 million machines, Loss
$9.1 Billion)
• Attacks are becoming more advanced and sophisticated
• Honeypots, IDS/IPS, Email/IP Reputation Systems are
inadequate
Narus Company Confidential
3
More Sophisticated Attacks
Narus Company Confidential
4
Host Based Security
• Complete monitoring end hosts behavior and the
state of the system
• Often analyzes a malware program in a controlled
environment to build a model its behavior
• Pros
– Information rich view: high detection rate with low false
positive
– Reverse engineer the properties of the Threat
• Cons
– After-the-fact approach
• Require malicious code for analysis
– Fail to identify evolved threats
– Not effective to identify zero-day threats
Narus Company Confidential
5
Network Security
•
•
•
•
Firewall systems
IDS/IPS
Network behavior anomaly detection (NBAD)
Pros:
– Complete macro view of the network
– With the knowledge of good traffic it can identify
anomalies
– Able to identify new threats as anomalies
• Cons
– Generate large number of false positives
– Unsupervised approach, lacks ground truth
Narus Company Confidential
6
Bringing Them Together
• Leverage advantages of both the approaches
• Host-security tag flows with threat signatures
– Generates ground truth for associated with flows
• Network security can learn rich statistical model
for all threats using the flow data tagged with
ground truth
• Develop a comprehensive end-to-end data
security system for real-time discovery, analysis,
and risk assessment of security threats
Narus Company Confidential
7
Enhanced Comprehensive Security System
• Discover common and persistent behavioral
patterns for all security threats
– Even when sessions are encrypted (IDS/IPS fails)
• Generate precise threat alerts in real-time
– Reduce the false positive rate
• Identify new threats which has some similarities
with previous ones
– Newly evolved version of a threat
– New threat with similar behavioral pattern
• Inform about the newly identified threat to the
host-security
Narus Company Confidential
8
System Overview
Model Generation
Classification
Validation
Assessment
Narus Company Confidential
9
• Extract Set of Transport Layer
Features
• Generation of Statistical Models
• Flush Out Model to Streaming
Classification Path
• Redirect Packets Matching Model to
Binary Analysis Module
• Extract Executable and Execute
Executable
• Analysis of Information Touched
• Assess the Risk
• Increase Confidence and Alert
Information Flow
Narus Company Confidential
10
Supervised Threat Classification
• Data
– Network flow features
• Kernel
– Define similarity between different flows
• Classifier
– Binary to separate good from bad
– Multiclass to further separate bad flows
• Scalability issues
– Hierarchy
Narus Company Confidential
11
Challenges
• Irregular data
–
–
–
–
Missing values.
Imbalanced data
Heterogeneous.
Non applicable features.
• Large number of classes (Number of threats
reaches hundreds of thousands)
• New classes
• Noise in the data
• All threat classes may not be captured
• Minimize false positives
Narus Company Confidential
12
Preprocessing
• Normalization
• Deal with missing values
– Case deleting method:
– Mean imputation
• Overall classes
• Each individual class
– Median imputation
• Overall classes
• Each individual class
Narus Company Confidential
13
Classifier Framework
Flows
SNORT
Bad Flows
76 different classes
13935 Flows
Class 1
Shellcode
Class 2
Spambot_Proxy_Control_Channel
…
44427 Flows
Class 76
Exploit_Suspected_PHP_Injection_Attack
Supervised Classifier
Unknown Flows
Learning/Training
Macro-Level
Classifier
Learning/Training
Unknown
Bad
CL_A
Narus Company Confidential
Micro-Level
Classifier
14
CL_B
CL_N
…
Binary Classifier Results
• Kernel Learning
• Biased SVM performance comparison with different kernels
Precision good
Recall good
F1 good
Precision bad
Recall bad
F1 bad
Accuracy
G-mean
Narus Company Confidential
Linear Kernel
79.75
87.07
83.25
79.75
37.17
42.74
74.08
56.89
15
RBF Kernel
87.46
90.42
88.9347
69.33
62.55
65.7657
83.26
75.21
Poly Kernel
78.70
97.79
87.2126
79.78
24.81
34.8495
78.79
49.25
Binary Classifier Results
• Parameter selection for Biased SVM with RBF Kernel
When gamma=10,
C+/C_=0.5, win best
F1_bad = 0.6494
When gamma=10,
C+/C_=0.55, win best
F1_bad = 0.657657
Narus Company Confidential
16
Binary Classifier Results
• F1 bad comparison of the methods for Binary classifier
F1 bad comparison without noise
100
90
80
70
60
50
40
30
20
10
0
86.8
F1 bad comparison with noise
88.7
70
67.55
51.7
76.01
80
79.43
51.7
90
63.74
65.7657
KNN
Biased
SVM
79.07
60
53.4
50
45.57
45.57
46.41
40
30
20
10
0
Bagging Adaboost
SMO
SMO
KNN
Biased
SVM
Decision Bagging
Tree
Decision
Tree
Bagging Adaboost
SMO
F1 best performance with/without noise: 79.07/88.7 %
Narus Company Confidential
17
SMO
Decision
Tree
Bagging
Decision
Tree
Preprocessing (Multiclass)
• Tree based generated features
– For each class k, do
• Repeat c times
– Collect samples from class k, label them +1
– Collect samples from class kc, label them -1.
– Build a regression tree on above binary data.
– Store the tree as Tik
• End
– End
• Example:
Tree based features
Original features
Home
owner
Marital status
Annual
income
Number of
children
age
-
married
125K
-
41
No
Not married
70K
N/A
22
No
-
59K
1
55
yes
Not married
-
N/A
23
yes
married
100K
1
-
Narus Company Confidential
transformation
18
Tree 1
Tree 2
Tree 3
Tree 4
Tree5
-0.25
-0.25
-1
0.5
-0.25
-1
-1
0.2
0.714286
-0.33333
-0.5
-0.5
1
0.5
-0.5
-1
-0.33333
1
0.25
0.777778
-0.14286
-0.14286
0.142857
-1
-0.14286
Preprocessing
• Multiclass results comparison with
– Original features
– Tree based generated features
Average performance of 6 majority classes
83
82
81
80
79
78
77
76
75
74
73
81.75
77.43
76.84
Precision Precision
Recall Recall Tree F1 original F1 tree
original Tree based original
based
features
based
features features features features
features
Narus Company Confidential
Tree based features
Original Features
80.93
80.21
76.40
Performance of 6 majority classes
19
Class
ID
Precision
Recall
F1
Precision
Recall
F1
24
77.65
78.30
77.97
86.12
88
87.05
25
63.62
70.02
66.67
79.3
82
80.63
28
99.36
99.70
99.53
100
100
100
48
82.16
73.95
77.84
79.68
77.9
78.78
68
69.05
71.38
70.20
67.7
76
71.61
76
66.58
71.23
68.83
68.45
66.6
67.51
Multi-class Classification
• Identify individual threats
• Identify new classes and provide properties
• Classifiers
– K-Nearest Neighbor
• No training involved
• Computationally intensive for testing
– Ensemble methods
• Failing to scale up for huge number of classes
– Sphere-based SVM
• Encapsulate each class in a hyper sphere.
• Transform data into appropriate space such that
they cluster into single cohesive unit
Narus Company Confidential
20
Building Kernel
• Let (Xi,Yi) be the data points where Yi={+1,-1}
• Construct ground truth kernel K
– Kij = YiYj
• Now learn a parametric kernel as follows
Kij ~fθ(Xi,Xj)
– Kij = fθ(Xi,Xj)
class
1
2
3
4
5
6
1
+1
+1
+1
-1
-1
-1
2
+1
+1
+1
-1
-1
-1
+1
3
+1
+1
+1
-1
-1
-1
55
+1
4
-1
-1
-1
+1
+1
+1
N/A
23
-1
100K
1
-
-1
5
-1
-1
-1
+1
+1
+1
-
2
32
-1
6
-1
-1
-1
+1
+1
+1
Home
owner
Marital
status
Annual
income
Number
of
children
age
Y
-
married
125K
-
41
+1
No
Not married
70K
N/A
22
No
-
59K
1
yes
Not married
-
yes
married
-
Married
 T
yy 
Once θ is learned, it can be applied onto the test set.
Narus Company Confidential
21
Kernel for Multi Class
• For each class we do following
– Collect samples belonging to class and label as +1
– Collection samples from rest of data and label as -1
– Build separate kernel for each class.
Kij ~fθ(Xi,Xj)
Narus Company Confidential
22
class
1
2
3
4
5
6
1
+1
+1
+1
-1
-1
-1
2
+1
+1
+1
-1
-1
-1
3
+1
+1
+1
-1
-1
-1
4
-1
-1
-1
+1
+1
+1
5
-1
-1
-1
+1
+1
+1
6
-1
-1
-1
+1
+1
+1
Boosted Trees for Kernel Learning
 1
  1
 
  1
y1   
 1
  1
 
 1
1
2
3
4
5
6
1
+1
-1
-1
+1
-1
+1
2
-1
+1
+1
-1
+1
-1
3
-1
+1
+1
-1
+1
-1
4
+1
-1
-1
+1
-1
+1
5
-1
+1
+1
-1
+1
-1
6
+1
-1
-1
+1
-1
+1
Output of tree 1
 1
  1
 
 1
y2   
  1
  1
 
 1
Output of tree 2
Kernel matrix for tree 1
.
Narus Company Confidential
1
2
3
4
5
6
1
+1
+1
+1
-1
-1
-1
2
+1
+1
+1
-1
-1
-1
3
+1
+1
+1
-1
-1
-1
4
-1
-1
-1
+1
+1
+1
5
-1
-1
-1
+1
+1
+1
6
-1
-1
-1
+1
+1
+1
23
1
2
3
4
5
6
1
+1
-1
+1
+1
-1
+1
2
-1
+1
-1
+1
+1
-1
3
-1
+1
+1
-1
+1
-1
4
+1
-1
-1
+1
-1
+1
5
-1
+1
+1
-1
+1
-1
6
+1
-1
-1
+1
-1
+1
Kernel matrix for tree 2
Multi class Results
Spheres require only K =6
(number of classes)
comparison whereas KNN
require N comparisons.
Narus Company Confidential
24
Classification +New Class Detection
Find transformation
Find transformation
to separate class x
from rest of data
to separate class +
from rest of data
Build a
separate
Kernel for
each class
Find transformation
to separate class -from rest of data
Narus Company Confidential
25
Find transformation
to separate class ^
from rest of data
New Class Generation
Narus Company Confidential
26
Conclusion
• CyberEagle: An enhanced comprehensive security
system
– Bringing Host and Network security together to fight
security threats
• Identify threats that IDS/IPS fails to detect
(Encrypted, evolved)
• Identify new threats in the earliest stage
• Generate signatures for the new threats and alert
the host security system in an automated way
Narus Company Confidential
27
Future Work
• Improve classification accuracy
• Scaling up for huge number of classes
• Reduce computation during classification
– Learn class hierarchy
– Increase speed without sacrificing accuracy
•
•
•
•
Validate with diverse data
Reputation analysis of the ip addresses
Online update of the classifier
Mapreduce implementations
Narus Company Confidential
28
Summer 2011 Company Meeting
Thank You
Prakash, Lei, Saby
Narus Company Confidential
29