Using Data Classification to Manage File Servers
Transcription
Using Data Classification to Manage File Servers
Using Data Classification to Manage File Servers Adi Oltean – Senior SDE, Microsoft Corporation Ran Kalach – Principal Dev Manager, Microsoft Corporation Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Agenda Customer challenges Solution: File Classification Manage data based on business value Grow the ecosystem in classification solutions File Classification Infrastructure The classification pipeline Aggregation, conflict resolution Incremental classification Challenges, Mitigations & Best Practices Conclusions Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Customer challenges – file servers Storage growth Storage cost Data sharing and search Compliance Increasing data management needs / many data management tools Security HSM Backup Replication Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Archive Encryption Expiration Security and Information leakage File shares and business requirements Business IT Need per project share Make sure high business impact files do not leak out Backup files with personal information to encrypted store Expire low business impact files created three years ago and not touched for a year Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. 4 Some time later … Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. 5 Classify and apply policy Classification methods Step 1: Classify data IT Scripts Manual Line Of Business application Step 2: Apply policy based on classification Automatic classification •Location •Content •Owner Actions based on classification Backup Expiration Search Archive Replication HSM Security Reports Encryption Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Leakage prevention File shares and business requirements Business IT Personal Information Business Impact Need per project share Make sure high business impact files do not leak out Backup files with personal information to encrypted store Expire low business impact files created three years ago and not touched for a year Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. 7 Customer benefits - Summary Apply Policies Based on Classification = Manage data based on business value! Reduce Cost • • • • Expire files to reduce storage purchasing needs Move files to less expensive storage Optimize backup SLAs Replicate only business related files Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Manage risk • • • • • Find sensitive files on public servers Watermark documents Keep files containing personal information encrypted in backup Apply rights management to high secrecy files Comply with retention policies Agenda Customer challenges Solution: File Classification Manage data based on business value Grow the ecosystem in classification solutions File Classification Infrastructure The classification pipeline Aggregation, conflict resolution Incremental classification Challenges, Mitigations & Best Practices Conclusions Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. File Classification Infrastructure Set classification properties API for external applications Get classification properties API for external applications Discover Data Extract classification properties Classify Data File Classification Extensibility points Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Store classification properties Apply Policy based on classification Classification pipeline – an example This is an example of a pipeline setup with one storage module and two classifiers Each component passes property bags to the next one Property bag object Classification Runtime Process Scanner Gets basic file properties Office Storage [Load] Folder Classifier Hosting Process discovery load properties Content Classifier Hosting Process classification Property bags can cross processes • Security checks are performed on cross-process data transfers Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Office Storage [Save] Reporting Engine Hosting Process save properties run policies Most modules are hosted within a separate process Aggregation and Conflict Resolution Problem: • A classification rule may provide conflicting value with the value already stored in the file • Two classification rules may provide conflicting values for the same property • Example: Admin creates a “Business Impact” property with possible values (LBI, MBI, HBI) A file previously classified as MBI is copied to a folder x:\foo The Folder rule for x:\foo classifies all files as LBI The Content classifier scans the file and classifies it as HBI What is the correct value? Solution: • Provide several types of classification rules: • Default: rule runs only if the property not present in the file. Otherwise: rules can either explicitly aggregate or overwrite previously-stored properties. Value aggregation depends on the property type Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Incremental Classification Goal: Minimize re-classification of already classified files Crucial for scalability (large amount of files) Automatic classification (scheduled) Cache classification results in ADS (alternate data stream) Re-classify the file only if: ADS contains a hash of certain file properties (last-modify-time, file-path, file-id, etc) ADS contains the last classification time Allows determining whether the cached classification is up-to-date The file changed or was added since previous classification (hash is different), or A rule has changed since previous classification, or The configuration of a classifier has been updated since previous classification. Get Property API (on-demand) If cache is present and up to date, return cached properties Otherwise (out-of-date classification), application can choose: Accuracy: classify the file “on the fly” Performance: return stored properties Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Challenges, Mitigations & Best Practices 1 - Performance Content classification is expensive (I/O , CPU) Must optimize to scan & classify only when needed Must be able to cache results Minimize performance impact on host of data being classified Classify on another machine When classifying locally, throttle machine resource usage and back out when the machines becomes non-idle Be smart with how you schedule classification, support pause/resume Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Challenges, Mitigations & Best Practices 2 - Accuracy Automatic Classification can almost never be 100% accurate Tune your rules for false-positive / false-negative according to the scenario Policy execution: revert in case of classification error Example: backup files one last time just before you expire them Examine classification results periodically Example: secure files – false positive, expire files – false negative Modify your rules or classifiers till they’re optimized for your data-set Enable manual classification Clear and consistent policy for aggregating and resolving conflicts Support flexible rules that allow tuning by administrator or application One answer doesn’t fit all! Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Challenges, Mitigations & Best Practices 3 - Real-time Classification and Policies Some policies require real-time or near real-time execution Example: removing confidential file from unsecured share Solution: event-based classification File-system activity can be a trigger Need a hook to file-system operations, (many implementation options exist) Consider Classifying only when the file content is “stable” Avoid overloading the server performance with too aggressive classification Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Examples of FCI-enabled solutions Solution Example Classification solutions An LOB app that maintains special classification rules for PII data it generates. Custom “classifiers” that extract metadata from files A medical imaging classifier extracts embedded metadata from scanned images Custom “storage modules” that load/store custom metadata in files Load/store metadata in your custom file formats (example: videos) Add “classification awareness” to existing data management solutions. A backup app can have special backup policies for HBI data Build “intelligent” policy-based data management solutions Define a policy to automatically apply encrypt HBI data Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Opportunities for you Why participate in the File Classification Infrastructure ecosystem? Use FCI for existing software Enhance existing data-producing apps to also attach classification to generated files (ex: LOB applications) Enhance existing data management apps to consume classification Use FCI for new software solutions Develop solutions on top of FCI Develop components for the FCI ecosystem How I can develop against it? Classifiers Storage modules File Classification Infrastructure can be consumed through a rich, scriptable COM API FCI can be extended using C++/C# code, or Powershell scripts When can I start? Now: FCI is part of the latest Server releases (starting with Windows Server 2008 R2) Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. More information about FCI General information Home page: http://www.microsoft.com/windowsserver2008/en/us/fci.aspx Team blog: http://blogs.technet.com/filecab API documentation on MSDN: http://msdn.microsoft.com/en us/library/bb972746(VS.85).aspx Sample code Windows SDK http://msdn.microsoft.com/enus/windows/bb980924.aspx Sample FCI clients (C++, C#) Sample classifiers (C++, C#) Code Gallery: http://code.msdn.microsoft.com/fci Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.