Visualisation of Large Datasets with Houdini
Transcription
Visualisation of Large Datasets with Houdini
Visualisation of Large Datasets with Houdini Ben Simons Data Arena Lead Developer University of Technology, Sydney [email protected] [email protected] New UTS Broadway Building UTS Data Arena ~ April 2014 Today's Outline - Big Data 1. Some strategies used in Film Visual FX 2. Visualisation Techniques in Houdini 3. VFX Data Formats & Disk Systems Happy Feet 2 ● 2 Petabytes (2,000,000 GB) ● 3D Stereo HD images ● Render: 18,000 cpu cores ● ● ● ● Parallel access to data HDF5 data on Bluearc & Isolon NAS Disk Systems Linux software: Maya, Houdini, Naiad, Nuke, 3Delight Entirely made at Carriageworks in Sydney at Dr D Studios Resident Evil 3 Extinction ● The Desert Undead: 18-layer images (Rman AOV's) ● Each single image frame was split into 96 tiles ● Rendered on 96 machines, then each frame tile-joined Houdini www.sidefx.com Houdini across 2 screens Houdini Object Nodes Houdini Procedural Network Houdini Parameters Houdini Chops ● ● ● ● ● ● Channel is a column of data Plain textfiles ok – separate columns with tabs Interactive Channel graph (zoom in) Visual programming Filtering, Sampling, shading, instancing, and rendering Hands-on tomorrow will be Chops & Vops Spitzer Glimpse Dataset http://data.spitzer.caltech.edu/popular/glimpse/20070416_enhanced_v2/source_lists/south/ Spitzer Space Telescope GLIMPSE Dataset ● South: ~300 files, 78 different Channels, 145K rows ● gzipped .tbl data loaded into Houdini ● Houdini Chops used to filter & calc 'colours' ● Show difference of infra-red magnitude bands Point colours and scales calculated by VOPs SIMD Shaders ● Houdini Movie Rendered (Mantra PBR) – – 36M points, filtered <12M Shading & VOP's ● A shader is a mini-program which makes data ● It can be better to generate data than load it. ● Shaders allow additional level of management ● ● Geom shaders on HF2 generated 1 billion snow particles per image frame (impossible to load). Houdini VOP's are SIMD Houdini VOP Network Instancing ● Saves Memory & I/O by re-using geometry ● Copies generated at render time ● ● Each Instance can be varied based on point attributes Referencing one “instance object” provides a massive data reduction Adaptive Meshes, LOD, Caching & Filtering ● Data reduction techniques ● Level of Detail (distance from camera) ● Adaptive Meshes ● Cache common files locally ● Filter texture (images) - Mipmapping Other tricks Baked Lighting & Shadows ● ● ● ● Pre-calculate lighting & shadows “bake” new textures & reapply onto geom Sydney Harbour Multi-Beam Sonar Survey, 30cm data. Interactive 3D Flythrough Know ur Limits: Memory & I/O ● I/O will Bottleneck - Partition the problem & then scale it up – – ● Split job across many independent machines (eg. render) Segment data access for each machine (eg. HDF5) Alternate memory hardware ● Vector (array) processor - SIMD – – ● as Cray, now intel SSE/MMX and Nvidia GPU IBM Cell Processor has Vector Processor Content-Addressable Memory – “associative arrays” are used by Network Routers Types of System Memory ● Virtual Memory ● ● Swapping is good, thrashing is bad SMP vs MPI ● SMP Symmetric Multiprocessing: Multiple CPU's with common/shared memory. Multi-threaded apps. eg. Intel Xeon, Core 2 Duo are SMP. – Cache coherency, snooping bus (on distributed SM) ccNUMA MPI (Message Passing) PVM Clusters, Beowulf, etc (Memory not shared) – ● Data Formats ● ● HDF5 “Heirachical Data Format” ● www.hdfgroup.org ● Browsable container of data (HDFView) ● Has “groups & datasets” like “dirs & files” ● Data stored in B-Trees ● Can also store Binary Data HDF5 for Python www.h5py.org ● Operate on HDF5 data via python dictionaries & NumPy arrays - www.numpy.org Disk Systems ● Network Attached Storage (NAS) ● Bluearc (now Hitachi) implemented via FPGA ● Isilon (now EMC) clustered filesystem, 100GB/s – ● Lustre Filesystem ● ● Multiple SSD nodes & maintains global file coherency Experimental Parallel distributed filesystem – can have multiple copies of a file, one master. Venti (Bell Labs Plan-9 & Inferno) – WORM Archive. Shares Blocks by secure SHA-1 Hash. Data Formats 2 ● Open VDB www.openvdb.org ● Hierachical structure for volumetric data (“clouds”) ● Good for sparse volumetric time-varying data ● Fast access (constant-time) to voxels ● Large set of operators (Level Set tools, filters, transforms & morphological operators) Data Formats 3 ● Disney Ptex eliminates uv texture assignment ● http://ptex.us/ ● no (u,v)'s required! no seams visible ● works on sub-d/poly faces ● Stores face adjacency data & filters ● Efficiently stores 106 mipmapped texture files ● Multi-channels, compressed separately ● Used in Disney's “Bolt” “D3” Data-Driven Documents ● ● D3 – An amazing Data visualisation web framework (javascript) ● http://d3js.org ● See: https://github.com/mbostock/d3/wiki/Gallery Offers Parallel Coordinates ● Demo ? Nutrient Contents - An interactive visualization of the USDA Nutrient Database. http://exposedata.com/parallel/ Parallel Co-ordinates protein, calcium, sodium, fibre, vitamin c, potassium, carbohydrate, sugar, fat, water, calories, saturated, ...