VSIPL Design Principles - Object Management Group Portals

Transcription

VSIPL Design Principles - Object Management Group Portals
VSIPL Design Principles
James Lebak
Massachusetts Institute of Technology
Lincoln Laboratory
VSIPL Forum
1
Outline
VSIPL is an API that is…
• Portable
• Object-based
• For signal and image processing applications
• On embedded platforms
VSIPL Forum
2
VSIPL Portability Goals
•
Enable source-code portability between platforms
– Require only a simple re-compile of application
•
Achieve portability by use of a standard API
–
–
–
–
•
ANSI C API
Agreed to and maintained by the VSIPL forum
Forum membership consists of both users and implementors
Test suite to verify conformance to API
Maintain performance
– API must not restrict ability of implementations to optimize
VSIPL Forum
3
Achieving Portability
•
•
VSIPL gives good portability for the computation
portion of applications
Multiple implementations exist
– Vendor-optimized for specific platforms
– Third-party optimized for single-board G4 systems
– TASP reference implementation for general workstation
use
•
Implementations exist with very little performance
penalty
VSIPL Forum
4
Outline
VSIPL is an API that is…
• Portable
• Object-based
• For signal and image processing applications
• On embedded platforms
VSIPL Forum
5
Object-Based Libraries
•
Traditional libraries are functional
– e.g. BLAS (Basic Linear Algebra Subroutines, FORTRAN
scientific library)
saxpy( n, alpha, x, incx, y, incy)
Length
•
y=αx+y
Data Stride
Array
VSIPL is object-based
– Application uses VSIPL abstract data types
– Implementation of these data types is hidden to allow
vendor-private optimizations
– Primary abstract data types are blocks and views
vsip_vadd_f(x, y, y)
Views of Data
(Include data arrays, lengths, strides)
VSIPL Forum
6
Blocks and Views
• Blocks and views are the primary VSIPL abstract data types
– A block is a contiguous storage area
– Data in a block may be viewed as a vector, matrix, or 3-tensor
– Calculations are performed using views
• View characteristics:
– Offset from start of block
– Length (number of elements in view)
– Stride (spacing between elements in view)
Example data in a block
1 2 3 4 5 6 7 8 9
View of entire block as a vector
(offset 0, length 9, stride 1)
1 2 3 4 5 6 7 8 9
View of even-numbered elements
(offset 1, length 4, stride 2)
1 2 3 4 5 6 7 8 9
View of block as 3 by 3 matrix
(offset 0, row length 3, row stride 1,
column length 3, column stride 3)
1 2 3 4 5 6 7 8 9
1 2 3
4 5 6
7 8 9
VSIPL Forum
7
Outline
VSIPL is an API that is…
• Portable
• Object-based
• For signal and image processing applications
• On embedded platforms
VSIPL Forum
8
VSIPL Functionality
VSIPL contains a wide range of functions including:
Vector/Matrix Elementwise Ops
• Arithmetic (+,-,*,/)
• Comparison operations (<,>,=)
Signal Processing
• FFT
• Selection operations
•
•
•
•
– 1D, 2D, 3D
– Multiple 1D
– In-place and
out-of-place
– Real-to-complex and
complex-to-real
Convolution (1D, 2D)
Correlation (1D, 2D)
FIR/IIR Filters
Window functions
– Min, max
• Boolean operations
– AND, OR, NOT
• Data conversion
Linear Algebra
• Inner, Outer, Kronecker product
• Matrix-Vector, Matrix-Matrix Multiply
• QR, LU, Cholesky, Singular Value
Decompositions
• Solvers based on above
VSIPL Forum
9
Supported Data Types
Boolean, integer, floating-point, and index types
• Implementations must support at least one integer and
one floating-point type
Complex integer and floating-point data
• Input data may be stored split or interleaved
• Implementation allowed to choose preferred order for VSIPL
space (must support import of data in either order)
Input data layout examples
Split
Interleaved
Real Part
Imaginary Part
Standard defines portable precision specifiers for user-defined types
Float
Integer
At least n decimal digits of accuracy
Exactly n bits
At least n bits
Fastest type of at least n bits
VSIPL Forum
10
Object-Based Linear Algebra
qrdObject = vsip_qrd_create_f(M, N,
VSIP_QRD_SAVEQ);
...
vsip_qrd_f(qrdObject, A);
vsip_qrd_prodq_f(qrdObject,
VSIP_MAT_HERM,
VSIP_MAT_LSIDE,
w);
•
•
Allocate memory;
setup to use Q later
A=QR
w = QH w
Algorithm setup is done before inner loop calls
Implementation is free to choose algorithm
– Gram-Schmidt or Householder method could be used
– Choice can be made by create call
VSIPL Forum
11
VSIPL Profiles
• VSIPL Profiles provide functionality for specific areas
– Core lite targeted at vector signal processing
– Core targeted at adaptive signal processing
Functionality
Core Lite
Profile
float, complex, signed int types
FFT, FIR Filters
Vector arithmetic
Matrix arithmetic
Random numbers
Convolution
Correlation
Matrix decomposition and solvers
Core
Profile
VSIPL Forum
12
Outline
VSIPL is an API that is…
• Portable
• Object-based
• For signal and image processing applications
• On embedded platforms
– Early binding
– Separate memory spaces
– Separate development and performance modes
VSIPL Forum
13
Early Binding
Principle of Early Binding: Allocate resources for an
operation as early as possible for better performance
Examples:
• FFT
Setup phase
Calculate coefficients and
store in FFT object
•
Calculation phase
Calculate the FFT using
stored coefficients
Object and data memory allocation
Setup phase
Allocate block and views
and bind to data
Calculation phase
Operate on data
VSIPL Forum
14
VSIPL Data Spaces
VSIPL has two logical memory spaces
User Data Space
• User manipulates data using
–
–
–
–
Direct access
I/O functions
Other math libraries
Communication libraries
(e.g. MPI, MPI/RT)
• VSIPL will not operate on
data in user space
VSIPL Data Space
• User manipulates data using
•
•
VSIPL functions (only)
Memory hierarchy details
hidden
Implementation may
optimize memory use
– Chaining
– Deferred execution
– Strip-mining
These logical spaces may be the
same physical address space,
depending on the implementation
VSIPL Forum
15
Example VSIPL Implementations
User Space
=VSIPL Space
User Space
(DRAM)
User Space
(interleaved complex)
Real Part
Imaginary Part
VSIPL Space
(SRAM)
VSIPL Space
(split complex)
• Workstation
– No special memory
management needed
• Digital Signal Processor
– e.g. SHARC
– User space is in DRAM
– Library may manage
movement of data through
SRAM
• Vector Processor
– e.g. Altivec
– User can store data in either
complex format
– Library
can store data
••VSIPL
VSIPLcode
codeisisportable
portableto
todifferent
differentplatforms
platforms
internally
in
best
• Vendor can optimize for each platform format
• Vendor can optimize for each platform
VSIPL Forum
16
VSIPL Error Checking
•
VSIPL provides separate modes for debugging
and for deployment
– Vendors may provide either or both modes
– May be one library or two
– Operate the same except for error reporting and timing
•
Development mode
– Extensive error checking
– All errors are fatal
•
Production mode
– Expected to be faster
– Implies no error checking
– Programming errors may have unpredictable results
VSIPL Forum
17
Summary
•
VSIPL was designed to be
–
–
–
–
•
Portable, without sacrificing performance
Object-based
Useful for signal and image processing
Targeted at embedded systems
Other talks today will explore VSIPL in more detail
VSIPL Forum
18