Data Security - Big Data Everywhere

Transcription

Data Security - Big Data Everywhere
Data Security
as a Business Enabler – Not a Ball & Chain
Big Data Everywhere
May 21, 2015
Les McMonagle
Protegrity - Director Data Security Solutions
Les has over twenty years experience in information
security. He has held the position of Chief Information
Security Officer (CISO) for a credit card company and
ILC bank, founded a computer training and IT
outsourcing company in Europe and helped several
security technology firms develop their initial product
strategy.
Les founded and managed Teradata’s Information
Security, Data Privacy and Regulatory Compliance
Center of Excellence and is currently Director of Data
Security Solutions for Protegrity.
Les holds a BS in MIS, CISSP, CISA, ITIL and other
relevant industry certifications.
Les McMonagle (CISSP, CISA, ITIL)
Mobile: (617) 501-7144
Email:[email protected]
2
The Problem . . .
The cost of cybercrime is staggering:
3
• 
The annual cost to the global economy is in excess of $400 billion/year.
• 
Businesses that are victims of cybercrime need an average of 18 days to
resolve the problem and suffer average costs of over $400K.
• 
The tangible and intangible costs associated with some of the recent
high-profile cases exceeds $400M.
• 
Traditional network security, firewalls, IDS, SIEM, AV and monitoring
solutions do not offer the comprehensive security needed to protect the
target data against current, new and evolving threats.
Typical Phases of an Attack
4
http://eval.symantec.com/mktginfo/enterprise/white_papers/b-anatomy_of_a_data_breach_WP_20049424-1.en-us.pdf
Factors to Consider
"   Bad guys search for the easy targets
•  Large repositories of valuable, un-protected data
•  Systems with weaker controls and/or more access paths
•  Financial Data or Personally Identifiable Information (PII)
"   Blurring or Network Boundaries
•  Where does your company network end and another begin?
•  BYOD
•  Cloud
•  IoT (Internet of Things)
"   Insider threats remain the biggest threat
"   Advanced Persistent Threats (APTs)
•  Coordinated, comprehensive attack strategies
5
Types of Sensitive Data Potentially Stored in Hadoop
Credit
Card
PAN
SSN
DOB
Bank
Account
Numbers
Customer Lists
PIN
Best Practices
Pending
Patents
Trade
Secrets
Health
History
Order History
Health
Records
Accounts
Receivable
Accounts
Payable
Production Planning
Prescriptions
Employee
Personnel
Records
6
Home
Addresses
Location Data
Passwords
Sales Forecasts
Payroll Data
R&D
Customer
Contact
Information
Income Data
Salary Data
Project
Plans
What to do about it
"   Engage Information Security
"   Work with Legal and Compliance
"   Establish Good Data Governance Program
"   Adhere to generally accepted privacy principles *
"   Apply consistent protection throughout the data flow
"   Limit access on a Need-to-Know basis
"   Protect the actual data itself (regardless of where it is)
"   De-Identify data ─ without losing analytics value
7
* See reference slide(s) at end of presentation
Engage InfoSec, Legal, Compliance, Privacy
"   Engage Information Security – rather than avoid them
"   CISO’s and InfoSec ultimately have the same goals
"   Will help fund and implement effective data protection
"   Legal, Privacy and Compliance
•  Identify/interpret regulatory and compliance requirements
•  Helping protect the business by identifying risks to consider
•  Incorporate generally accepted Privacy Principles*
8
* See reference slide(s) at end of presentation
Data Governance Program
"   Establish good data governance program
•  Identified Data Owners
•  Identified Data Stewards
•  Identified Data Custodians
•  RACI – Roles and Responsibilities
"   Data Governance subject areas
•  Data Ownership
•  Data Quality
•  Data Integration
•  Metadata Management
•  Master Data Management
•  Data Architecture
•  Data Security & Privacy
9
Protect sensitive data consistently wherever it goes
At Rest
In Transit
In Use
Ideally with a single, centralized enterprise solution
10
What Data to Tokenize or Encrypt ?
"   Important questions to ask . . .
•  What policy and regulatory compliance requirements apply?
•  What risks must be mitigated?
•  How/Why are protected columns accessed/used?
•  What other mitigating controls are available?
•  Appropriate balance between business and data privacy/security?
•  When is Tokenization or Encryption most appropriate?
"   Utilization and access control limitations of Hadoop / Hive
"   Alternative protection options to consider
•  Full Disk Encryption (FTE)
Important Data Security Architecture Questions
To Encrypt or Tokenize . . . This is the Question
Tokenization
Encryption
SSN
PIN, CID, CV2
Password
Large - Field Size relative to width of lookup table - Small
CC-PAN
More -
X-Ray
Structured
-
Less
Cat Scan
Healthcare Records
More -
Logic in portions of the data element
Patient ID #
Less -
HIV-Pos*
-
Less
Diagnosis
report
Bank Acct No.
Percent of Access Requiring Clear Text - More
Customer ID #
DOB
Increasing
Data
Sensitivity
* With Initialization Vector (IV)
Potential Additional Controls to Consider
"   Tokenization or Encryption farther upstream in Data Flow
"   Do not load unnecessary regulated data to Hadoop
"   Access Hadoop Hive Tables through Teradata (QueryGrid)
"   HDFS file-level access control
" Accumulo cell level access control (Row/Column intersection)
"   Knox Gateway (authentication for multiple Hadoop clusters)
"   Coarse grained HDFS File Encryption
" XASecure (now HDP Advanced Security)
" Ambari (Hadoop Cluster Management)
"   Kerberos (Authentication) – all or nothing
Piecemeal independent security tools for Hadoop
Reduce your Exposure and Risk
Population of users who have
access to SSN today
SSN
Token
Population of users
who can perform
their job function
with only the last 4
digits of the SSN
Vaultless Tokenization is a form of
data protection that converts sensitive
data into fake data. The real data can
be retrieved only by authorized users.
SSN
Last 4 Digits
Often a more usable form of protection
than encryption.
SSN
Full
Population of users who need
access to the full SSN to
perform their job function
Improve Security Posture Without Impacting Analytics Value
14
What to look for in a good Enterprise Solution
Critical core requirements:
v  A single solution that works across all core platforms
v  Scalable, centralized enterprise class solution
v  Segregation of duties between DBA and Security Admin
v  Good Encryption Key or Token Lookup Table management
v  Data layer solution
v  Tamper-proof audit trail
v  Transparent (as possible) to authorized end-users
v  High Availability (HA)
v  Optional in-database versus ex-database encryption/tokenization
15
Other "nice to have" features
"   Flexible protection options (Encrypt, Tokenize, DTP/FPE, Masking)
"   Broadest possible support for a range of data types
"   Built in DR, Dual Active, Key and system recovery capability
"   Minimal performance impact to applications/end users
"   Optimized operations to minimize CPU utilization
"   Proven Implementation methodology
"   PCI-DSS compliant solution (meeting all relevant requirements)
"   Deep partnership with Teradata and other database providers
"   Minimal impact on system upgrades
"   Maintain consistent referential integrity and indexing capability
"   Low Total Cost of Ownership (TCO)
16
What to look for in a good solution for Hadoop
"   Course Grained and Fine Grained Protection Capability
•  HDFS File Encryption, Multi-Tennant File Encryption, HDFS FP (HDFS Codec)
• 
Column/Field Level “Fine Grained” Protection
"   Multi-Tennant Row Level Protection
• 
Allow authorized users access to specific rows only
• 
Unprotect columns for authorized users only
"   Heterogeneous Protection Capabilities
• 
Protect Upstream sources of data and Downstream targets of data
• 
Vaultless Tokenization, often less intrusive than encryption, reversible protection
•  Reversible – where masking is not
• 
Deployed on the (Data) Nodes
•  Leverage MPP architecture of Hadoop
•  Avoid Appliance based solutions that can slow down Hadoop
"   Tokenization capability for Hive access to HDFS Files/Tables
• 
17
Hive does not support VarByte data type (Encryption = Binary Ciphertext)
Hadoop security controls are playing catch-up
Traditional RDBMS
Firewalls, IDS/IPS
Hadoop (Fewer Layers)
Firewalls, IDS/IPS
Authentication (Kerberos)
Authorization
Authentication (Kerberos)
Future ?
RBAC
(Accumulo, Knox)
RLS
CLS
Audit
Hive
HDFS
RDBMS
Encrypt
Tokenize
Heavier reliance on Tokenization with Hadoop
18
Tokenize
Only
Granularity of Protecting Sensitive Data
Coarse Grained
Protection
(File/Volume)
Fine Grained
Protection
(Data/Field)
•  Methods: File or Volume encryption
•  Operates at the individual field level
•  “All or nothing” approach
•  Fine Grained Protection Methods:
•  Vaultless Tokenization
•  Masking
•  Encryption (Strong, Format Preserving)
•  Does NOT secure file contents in use
•  OS File System Encryption
•  Data is protected in use and wherever it goes
•  HDFS Encryption
•  Business logic can be retained
•  Secures data at rest and in transit
Data Security Platform
Applications
RDBMS
Audit
Log
Audit
Log
Enterprise
Security
Administrator
EDW
Audit
Log
Policy
Big Data
Audit
Log
File and Cloud
Gateway
Servers
IBM MainframeAudit
Protector Log
20
Audit
NetezzaLog
Audit
Log
File Servers
Protegrity Confidential
Protection
Servers
Protegrity’s Big Data Protector for Hadoop
Hadoop Node
Hadoop Cluster
Hive
Policy
Pig
Other
MapReduce
YARN
HBase
Audit
HDFS
OS File System
"   Protegrity Big Data Protector for Hadoop delivers protection at every node
and is delivered with our own cluster management capability.
"   All nodes are managed by the Enterprise Security Administrator that delivers
policy and accepts audit logs
"   Protegrity Data Security Policy contains information about how data is deidentified and who is authorized to have access to that data.
"   Policy is enforced at different levels of protection in Hadoop.
21
Rich Security Layer over the Hadoop Ecosystem
•  UDF Support for Pig
•  UDF Support for Hive
•  Hive - Tokenization
•  Java API Support for MapReduce
•  Hbase - Coprocessor support via UDFs
•  Cassandra – UDT
•  HDFS Encryption through the HDFS Codec
•  HDFS Commands Extended for Security Functions
•  HDFS Interface for Java Programs
•  De-identify before Ingestion into HDFS
•  OS File System Encryption; Folder/File or Volume
22
Pig / Hive
MapReduce
YARN
HBase
HDFS
File
System
Coarse Grained Protection: File / Volume Encryption
All fields are in the
clear
All fields are in
the clear
Pig / Hive
MapReduce
YARN
HBase
HDFS
File
with identifiable
Entire
File is
data
elements
Encrypted
File
System
Volume encryption option will encrypt the
entire volume versus the files themselves.
23
Coarse Grained with HDFS Staging Area
Pig / Hive
MapReduce
Jobs
MapReduce
YARN
HBase
HDFS
Ingest into HDFS
Staging Area
File
System
24
Coarse Grained Multi-Tenant Protection
Pig / Hive
MapReduce
YARN
HBase
T1
T1 folder
T2
25
T3 folder
clear folder
HDFS
Ingest into HDFS
Key 1
T3
T2 folder
Key 2
Key 3
File
System
Fine Grained Protection
Production Systems
Encryption
•  Reversible
•  Policy Control (authorized / Unauthorized Access)
•  Lacks Integration Transparency
•  Not searchable or sortable
•  Complex Key Management
•  Example: !@#$%a^.,mhu7///&*B()_+!@
Vaultless Tokenization / Pseudonymization
•  Reversible
•  Policy Control (Authorized / Unauthorized Access)
or
•  Not Reversible
•  No Complex Key Management
In either case
•  Integrates Transparently
•  Searchable and sortable
•  Business Intelligence: 0389 3778 3652 0038
Non-Production Systems
Masking
•  Not reversible
•  No Policy, Everyone Can Access the Data
•  Integrates Transparently
•  No Complex Key Management
•  Example: Date of Birth 2/15/1967 masked as xx/xx/1967
Protegrity Confidential
Enterprise-wide Protection
Source Systems
(Internal / External)
Consumption BI Systems
Target Systems
(Internal / External)
Input File
Source
FPG
Node
Node
Node
Input File
Source
Ecosystem Components
Pig
ETL
Hive
MapReduce
YARN
HBase
Database Server
HDFS
Database
Protector
Database
Sqoop
OS FS
Edge Node
Java
Program
File
Protector
Application
Protector
If Edge Node is a Hadoop Node,
Hadoop resources can be used
ESA
Policy Deployment
Audit Collection
Downstream Systems
Traditional IT Environment: Protegrity Protection
Typical Enterprise Today
Internet
Inside the Firewall
Apps
EDW
Files
DBs
Hadoop
Apps
Arch
028
Protegrity Confidential
Today’s IT Environment: Protegrity Protection
Typical Enterprise Today
Internet
Inside the Firewall
Apps
Cloud
Protector
Gateway
Files
Files
DBs
File
Protector
Gateway
EDW
Apps
Arch
029
ESA
Protegrity Confidential
Hadoop
HG Apps
Summarize what to do
"   Establish Good Data Governance
"   Protect the actual data Itself
"   Maintain referential integrity
"   De-Identify data ─ while maintaining analytics capability
"   Apply consistent protection throughout the data flow
"   Engage Information Security, Legal and Compliance
Build security in rather than bolt it on later
30
Sign Up for a Free, Half-Day Risk Assessment Workshop
Protegrity is proud to offer free, half-day risk
assessment workshops designed to help companies
evaluate their security posture.
This is a no-obligation offer.
These workshops are a unique, low-cost opportunity
to gain valuable insight into where you stand from a
risk management perspective relative to your peers.
For more information or to schedule a free half-day workshop, please email: [email protected]
31
The End . . .
Q&A
Convergence of Data Privacy Regulations
•  Government and industry groups are regularly
releasing new data privacy laws, requirements,
recommendations
•  Each leverages the best of previous privacy laws and
discards what has proven not to work
•  New regulations and standards are converging on a
standard set of data privacy principles
•  The International Security, Trust and Privacy Alliance
(ISTPA) has published a comparison of leading privacy
Privacy Principles – One
1/2
"   Accountability – requires that the entity define, document, communicate,
and assign accountability for its privacy polices and procedures and be
accountable for PII under its control.
"   Notice – requires that the entity provide notice about its privacy policies and
procedures and identify the purpose for which personal information is
collected, used, retained, and disclosed.
"   Choice and Consent – requires that the entity describe the choices
available to the individual and obtain implicit or explicit consent with respect
to the collection, use, and disclosure of personal information.
"   Collection Limitation – requires that the entity collect personal information
only for the purposes identified in the notice.
"   Use Limitation – requires that the entity limit the use of personal information
to the purpose identified in the notice and for which the individual has
provided implicit or explicit consent.
Comparable lists from: International Security, Trust and Privacy Alliance (ISTPA)
Association of Insurance Compliance Professionals (AICP)
Privacy Principles – Two
2/2
"   Access – requires that the entity provide individuals with access to their
personal information for review and update.
"   Disclosure – requires that the entity disclose personal information to third
parties only for the purposes identified in the notice and only with the implicit
or explicit consent of the individual.
"   Security – requires that the entity protect personal information against
unauthorized access or alteration (both physical & logical).
"   Data Quality – requires an entity maintain accurate, complete, and relevant
personal information for the purposes identified in the notice.
"   Enforcement – requires that the entity monitor compliance with its privacy
policies and procedures and have procedures to address privacy-related
inquiries and disputes.
These must be captured in business/technical requirements
Plethora of Global Privacy Regulations
Legislation and Regulations
European Union – 95/46/EC Directive on Data Privacy
Germany – Federal Data Protection Act
Sweden – Personal Data Act
United Kingdom – Data Protection Act
Australia – Privacy Act
Japan – Personal Information Protection Act
United States – SOX, GLBA, HIPAA, COPPA, SB
1386
36