RDMs 8.4 SQL User`s Guide - Online Documentation
Transcription
RDMs 8.4 SQL User`s Guide - Online Documentation
RDM Server 8.4 SQL User's Guide Trademarks Raima®, Raima Database Manager®, RDM®, RDM Embedded® and RDM Server® are trademarks of Raima Inc. and may be registered in the United States of America and/or other countries. All other names referenced herein may be trademarks of their respective owners. This guide may contain links to third-party Web sites that are not under the control of Raima Inc. and Raima Inc. is not responsible for the content on any linked site. If you access a third-party Web site mentioned in this guide, you do so at your own risk. Inclusion of any links does not imply Raima Inc. endorsement or acceptance of the content of those third-party sites. Contents Contents Contents i 1. Introduction 1 1.1 Overview of Supported SQL Features 1 1.2 About This Manual 2 2. A Language for Describing a Language 3 3. A Simple Interactive SQL Scripting Utility 5 4. RDM Server SQL Language Elements 6 4.1 Identifiers 6 4. 2 Reserved Words 6 4. 3 Constants 8 Numeric Constants 8 String Constants 9 Date, Time, and Timestamp Constants 9 System Constants 5. Administrating an SQL Database 10 11 5.1 Device Administration 11 5.2 User Administration 12 5.3 Database and File Maintenance 13 5.3.1 Database Initialization 13 5.3.2 Extension Files 14 5.3.3 Flushing Inmemory Database Files 15 5.3.4 SQL Optimization Statistics 15 5.4 Security Logging 16 5.5 Miscellaneous Administrative Functions 16 5.5.1 Login/Logout Procedures 16 5.5.2 RDM Server Console Notifications 17 6. Defining a Database 19 6.1 Create Database 21 6.2 Create File 22 6.3 Create Table 24 6.3.1 Table Declarations 25 6.3.2 Table Column Declarations 26 Data Types 26 Default and Auto-Incremented Values 27 Column Constraints 28 6.3.3 Table Constraint Declarations 30 6.3.4 Primary and Foreign Key Relationships 31 6.3.5 System-Assigned Primary Key Values 34 SQL User Guide i Contents 6.4 Create Index 35 6.5 Create Join 38 6.6 Compiling an SQL DDL Specification 41 6.7 Modifying an SQL DDL Specification 42 6.7.1 Adding Tables, Indexes, Joins to a Database 42 6.7.2 Dropping Tables and Indexes from a Database 42 6.7.3 Altering Databases and Tables 43 6.7.4 Schema Versions 44 6.8 Example SQL DDL Specifications 45 6.8.1 Sales and Inventory Databases 45 6.8.2 Antiquarian Bookshop Database 48 6.8.3 National Science Foundation Awards Database 52 6.9 Database Instances 54 6.9.1 Creating a Database Instance 55 6.9.2 Using Database Instances 55 6.9.3 Stored Procedures and Views 56 6.9.4 Drop Database Instance 57 6.9.5 Restrictions 57 7. Retrieving Data from a Database 59 7.1 Simple Queries 59 7.2 Conditional Row Retrieval 60 7.2.1 Retrieving Data from a Range 62 7.2.2 Retrieving Data from a List 62 7.2.3 Retrieving Data by Wildcard Checking 63 7.2.4 Retrieving Rows by Rowid 63 7.3 Retrieving Data from Multiple Tables 7.3.1 Old Style Join Specifications 65 65 Inner Joins 65 Outer Joins 66 Correlation Names 69 Column Aliases 70 7.3.2 Extended Join Specifications 70 7.4 Sorting the Rows of the Result Set 75 7.5 Retrieving Computational Results 77 7.5.1 Simple Expressions 78 7.5.2 Built-in (Scalar) Functions 79 7.5.3 Conditional Column Selection 82 7.5.4 Formatting Column Expression Result Values 83 7.6 Performing Aggregate (Grouped) Calculations 85 7.7 String Expressions 92 7.8 Nested Queries (Subqueries) 93 SQL User Guide ii Contents 7.8.1 Single-Value Subqueries 94 7.8.2 Multi-Valued Subqueries 95 7.8.3 Correlated Subqueries 97 7.8.4 Existence Check Subqueries 98 7.9 Using Temporary Tables to Hold Intermediate Results 99 7.10 Other Select Statement Features 100 7.11 Unions of Two or More Select Statements 101 7.11.1 Specifying Unions 102 7.11.2 Union Examples 102 8. Inserting, Updating, and Deleting Data in a Database 8.1 Transactions 105 105 8.1.1 Transaction Start 105 8.1.2 Transaction Commit 106 8.1.3 Transaction Savepoint 106 8.1.4 Transaction Rollback 106 8.2 Inserting Data 107 8.2.1 Insert Values 107 8.2.2 Insert from Select 108 8.2.3 Importing Data into a Table 110 8.2.4 Exporting Data from a Table 112 8.3 Updating Data 116 8.4 Deleting Data 117 9. Database Triggers 119 9.1 Trigger Specification 119 9.2 Trigger Execution 120 9.3 Trigger Security 121 9.4 Trigger Examples 122 9.5 Accessing Trigger Definitions 126 10. Shared (Multi-User) Database Access 10.1 Locking in SQL 128 129 10.1.1 Row-Level Locking 129 10.1.2 Table-Level Locking 130 10.1.3 Lock Timeouts and Deadlock 130 10.2 Transaction Modes 11. Stored Procedures and Views 11.1 Stored Procedures 131 133 133 11.1.1 Create a Stored Procedure 133 11.1.2 Call (Execute) a Stored Procedure 134 11.2 Views 135 11.2.1 Create View 135 11.2.2 Retrieving Data from a View 136 SQL User Guide iii Contents 11.2.3 Updateable Views 137 11.2.4 Drop View 137 11.2.5 Views and Database Security 138 12. SQL Database Access Security 12.1 Command Access Privileges 139 139 12.1.1 Grant Command Access Privileges 139 12.1.2 Revoke Command Access Privileges 140 12.2 Database Access Privileges 141 12.2.1 Grant Table Access Privileges 141 12.2.2 Revoke Table Access Privileges 142 13. Using SQL in a C Application Program 144 13.1 Overview of the RDM Server SQL API 145 13.2 Programming Guidelines 148 13.3 ODBC API Usage Elements 152 13.3.1 Header Files 152 13.3.2 Data Types 153 13.3.3 Use of Handles 154 13.3.4 Buffer Arguments 154 13.4 SQL C Application Development 154 13.4.1 RDM Server SQL and ODBC 154 13.4.2 Connecting to RDM Server 155 13.4.3 Basic SQL Statement Processing 156 13.4.4 Using Parameter Markers 156 13.4.4 Premature Statement Termination 158 13.4.5 Retrieving Date/Time Values 159 13.4.6 Retrieving Decimal Values 159 13.4.7 Retrieving Decimal Data 160 13.4.8 Status and Error Handling 160 13.4.9 Select Statement Processing 162 13.4.10 Positioned Update and Delete 166 13.5 Using Cursors and Bookmarks 13.5.1 Using Cursors 168 168 Rowset 168 Types of Cursors 168 13.5.2 Static Cursors 168 Using Static Cursors 169 Limitations on Static Cursors 169 13.5.3 Using Bookmarks 170 Activate a Bookmark 170 Turn Off a Bookmark 170 Retrieve a Bookmark 170 SQL User Guide iv Contents Return to a Bookmark 171 13.5.4 Retrieving Blob Data 171 14. Developing SQL Server Extensions 175 14.1 User-Defined Functions (UDF) 14.1.1 UDF Implementation 175 177 UDF Module Header Files 177 Function udfDescribeFcns 178 SQL Data VALUE Container Description 180 Function udfInit 181 Function udfCheck 182 Function udfFunc 184 Function udfReset 186 Function udfCleanup 186 14.1.2 Using a UDF as a Trigger 187 14.1.3 Invoking a UDF 195 Calling an Aggregate UDF 196 Calling a Scalar UDF 196 14.1.4 UDF Support Library 197 14.2 User-Defined Procedures 197 14.2.1 UDP Implementation 198 Function udpDescribeFcns 198 Function ModInit 200 Function udpInit 201 Function udpCheck 203 Function udpExecute 203 Function udpMoreResults 205 Function udpColData 205 Function udpCleanup 207 Function ModCleanup 208 14.2.2 Calling a UDP 208 14.3 Login or Logout UDP Example 209 14.4 Transaction Triggers 212 14.4.1 Transaction Trigger Registration 212 14.4.2 Transaction Trigger Implementation 214 15. Query Optimization 218 Overview of the Query Optimization Process 218 Cost-Based Optimization 221 Update Statistics 222 Restriction Factors 222 Table Access Methods 224 Sequential File Scan 224 SQL User Guide v Contents Direct Access Retrieval 225 Indexed Access Retrieval 225 Index Scan 226 Primary To Foreign Key Join 227 Foreign To Primary Key Join 227 Foreign Thru Indexed/Rowid Primary Key Predefined Join 228 Optimizable Expressions 229 How the Optimizer Determines the Access Plan 232 Selecting Among Alternative Access Methods 232 Selecting the Access Order 232 Sorting and Grouping 234 Returning the Number of Rows in a Table 236 Select * From Table 236 Query Construction Guidelines 236 User Control Over Optimizer Behavior 237 User-Specified Expression Restriction Factor 237 User-Specified Index 237 Optimizer Iteration Threshold (OptLimit) 238 Enabling Automatic Insertion of Redundant Conditionals 238 Checking Optimizer Results 238 Retrieving the Execution Plan (SQLShowPlan) 238 Using the SqlDebug Configuration Parameter 240 Limitations 245 Optimization of View References 245 Merge-Scan Join Operation is Not Supported 246 Subquery Transformation (Flattening) Unsupported 246 SQL User Guide vi 1. Introduction 1. Introduction The RDM Server SQL User's Guide is provided in order to instruct application developers in how to build C applications that use the RDM Server SQL database language. Those developers that have SQL experience will find much information here with which they are familiar. Moreover, while this guide is not intended to provide complete training on the use of SQL, it does give sufficient information for the novice SQL programmer to get a good start on RDM Server SQL programming. Other SQL-related RDM Server documentation includes: SQL Language Reference SQL C API Reference ODBC User's Guide JDBC User's Guide ADO.NET User's Guide A complete description of the SQL language and statements provided in RDM Server. Descriptions of all SQL-related C application programming interface (API) functions. Describes the use of ODBC with RDM Server SQL. Describes the use of the RDM Server JDBC API. Describes the use of the RDM Server ADO.NET API. 1.1 Overview of Supported SQL Features RDM Server supports a subset of the ISO/IEC 9075 2003 SQL standard including referential integrity and column constraint checks as wells as extensions that provide transparent network model database support and full relational access to combined model databases. Specific RDM Server SQL features include the following. l l l l Full automatic referential integrity checking. Automatic checking of column and table constraints that conform to the SQL standard column and table constraint features. Support for b-tree and hash indexes. Support for optional indexes that can be activated on-demand is also provided. Ability to specify high-performance, pre-defined joins using the proprietary create join DDL statement. Used with foreign and primary key specifications to indicate that direct access methods are to be used in maintaining inter-table relationships. l Support for the definition of standard SQL triggers. l Searched and positioned update and delete used in conjunction with the RDM Server SQL ODBC API. l Support for date, time, and timestamp data types. l A full complement of built-in scalar functions that include math, string, and date manipulation capabilities. l Support for null column values. l l Data insertion statements. RDM Server SQL provides the insert values statement to insert a single row into a specified table. Your application can use the insert from select statement to insert one or more rows from one table into another. The insert from file statement can be used to perform a bulk load from data contained in an ASCII text file. Support for select statements including group by, order by, subqueries, unions, and extended join syntax specification. l Support for the database security through standard grant and revoke statements. l Full transaction processing capabilities, including the capability for partial rollbacks. l Ability to create multiple instances of the same database schema. l l Capability to define and access C structure and array columns manipulated using the RDM Server Core API (d_ prefix functions). A cost-based query optimizer that uses data distribution statistics to generate query execution plans based on use of indexes, predefined joins, and direct access. SQL User Guide 1 1. Introduction l l Support for user-defined functions (UDF) that can be used in SQL statements. UDFs are extension modules that implement scalar and/or aggregate functions. You can extend the SQL functionality of the server, for example, by writing a function that does bitwise operations, or a function that performs an aggregate calculation (e.g., standard deviation) not provided in the built-in functions. Support for stored procedures written in SQL and user-defined procedures (UDP) written in C that execute on the database server. 1.2 About This Manual The RDM Server User's Guide is organized into the following sections. l l l l l l l l l l l l l Chapter 2, "A Language for Describing a Language" describes the "meta-language" that is used to represent SQL statement syntax. Chapter 3, "A Simple Interactive SQL Scripting Utility" introduces a simple, command-line utility called "rsql" that can be used to interactively execute RDM Server SQL statements. We encourage you to use it to execute for yourself many of the SQL examples provided in this document. Chapter 4, "Administrating an SQL Database" provides descriptions of the SQL statements that can be used to perform a variety of administration functions such as creating and dropping users and devices. Chapter 5, "Defining a Database" explains how to create an SQL database definition (called a schema) using SQL database definition language statements. Also described is how one goes about making changes to an existing database definition that contains data. The SQL DDL specifications for the example databases used throughout this manual are provided as well. Chapter 6, "Retrieving Data from a Database" provides descriptions all of the query capabilities available in the RDM Server SQL select statement as well as how to specify a union of two or more select statements. It also describes how you can predefine a specific query using the create view statement. Chapter 7, "Inserting, Updating, and Deleting Data in a Database" explains the use of the SQL insert, update, and delete statements. Chapter 8, "Database Triggers" provides a detailed description of how to implement database triggers in which predetermined database actions can be automatically "triggered" whenever certain database modifications occur. Chapter 9, "Transactions and Concurrent Database Access" describes the important features of RDM Server SQL that can be controlled/used to manage concurrent access to a database from multiple users in order to balance high access performance with the need to guarantee the integrity of the database data (through transactions). Chapter 10, "Writing and Using Stored Procedures" shows you how to develop SQL stored procedures which encapsulate one or more SQL statements in a single, parameterized procedure. Stored procedures are pre-compiled thus avoiding having to recompile the statements each time they need to be executed. Chapter 11, "Establishing SQL User Access Rights" explains the use of the SQL grant and revoke statements in order to restrict access to portions of the database or restrict the use of certain SQL commands for specific users. Chapter 12, "Using SQL in a C Application Program" provides detailed guidelines on how to write an RDM Server SQL application in the C programming language using the SQL C API functions (based on ODBC with several nonODBC extensions also provided). Chapter 13, "Developing SQL Server Extensions" contains how-to guidelines for writing C-language server extensions for use by SQL. These include user-defined functions (UDF), user-defined procedures (UDP), user-defined import/export filters (IEF), login/logout procedures, and transaction triggers. Chapter 14, "Query Optimization" provides a detailed description of how the RDM Server SQL query optimizer determines the "best" way to execute a particular query. Don't skip this chapter! Writing efficient and correct select statements is not always easy to do. Moreover, the "optimizer" is also not as smart as that particular designation may lead you to think. The more you understand how queries are optimized, the better able you will be to not only create quality queries but also to figure out why certain queries do not work quite the way you thought they should. SQL User Guide 2 2. A Language for Describing a Language 2. A Language for Describing a Language SQL stands for "Structured Query Language." You have probably seen many different methods used in programming manuals to show how to use a specific programming language. The two most common methods use syntax flow diagrams and what is known as Backus-Naur Form (BNF) which is a formal language for describing a programming language. In this document we use a simplified BNF method that seeks to represent the language in a way that closely matches the way you will code your own SQL statements for your application. For example, the following select statement: select sale_name, company, city, state from salesperson natural join customer; can be described by this syntax rule: select_stmt: select identifier [, identifier]… from identifier [natural join identifier] ; where "select_stmt" is the name of the rule (sometimes called a non-terminal); the bold-faced identifiers select, from, natural, and join are key words (sometimes called terminal symbols); identifier is like a function argument that stands in place of a user-specified value (technically, it too is the name of a rule that is matched by any user-specified value that begins with a letter followed by any sequence consisting of letters, digits, and the underscore ("_") character). Rule names are identifiers and their definitions are specified by giving the rule name beginning in column 1 and terminating the rule with a colon (":") as shown above. There are also special meta-symbols that are part of the syntax descriptor language. Two are shown in the above select_stmt syntax rule. The brackets ("[" and "]") enclose optional elements. The ellipsis ("…") specifies that the preceding item can be repeated zero or more times. Other meta-symbols include a vertical bar (i.e., an "or" symbol) that is used to separate alternative elements and braces ("{" and "}") which enclose a set of alternatives from which one must always be matched. All other special characters (e.g., the "," and ";" in the select_stmt rule) are considered to be part of the language definition. Meta-symbols that are themselves part of the language will be enclosed in single quotes (e.g., '[') in the syntax rule. Rule names can be used in other rules. For example, the syntax for a stored procedure that can contain multiple select statements could be described by the following rule: create_proc: create procedure identifier as select_stmt[; select_stmt]… end proc; In order to make the syntax more readable, any non-bold, italicized name is considered to be matched as an identifier. Thus, the select_stmt rule can also be written as follows… select_stmt: select colname [, colname]… from tabname [natural join tabname] ; where colname represents identifiers that correspond to table column names and tabname represents identifiers that correspond to table names. Some italicized terms are used to match specific text patterns. E.g., number matches any text pattern that can be used to represent a number (either integer or decimal) and integer matches any pattern that represents an integer number. These rules are summarized in the table below. SQL User Guide 3 2. A Language for Describing a Language Table 2-1. Syntax Description Language Elements Syntax Element Description keyword Bold-faced words that identify the special words used in the language that specify actions and usage. Sometimes called reserved words. Examples, select, insert, create, using. identifier Italicized word corresponding to an identifier: sequences of letters, digits, and "_" that begin with a letter. number Any text that corresponds to an integer or decimal number. integer Any text that corresponds to an integer. [option1 | option2] A selection in which either nothing or option1 or option2 is specified. {option1 | option2} Either option1 or option2 must be specified. element… Repeat element zero or more times. identifier Normal-faced identifiers correspond to the names of syntax rules. Syntax rules are defined by the name starting in column 1 and ending with a ":". Text for programming and SQL examples is shown in courier font in a shaded box as in the following example. RSQL Utility - RDM Server 8.4.1 [22-Mar-2012] A Raima Database Manager Utility Copyright (c) 1992-2012 Raima Inc.. All Rights Reserved. Enter ? for list of interface commands. 001 rsql: Connected *** using 001 rsql: .c 1 p admin secret to RDM Server Version 8.4.1 [22-Mar-2012] statement handle 1 of connection 1 select * from salesperson; sale_id sale_name BCK Kennedy, Bob BNF Flores, Bob BPS Stouffer, Bill CMB Blades, Chris DLL Lister, Dave ERW Wyman, Eliska GAP Porter, Greg GSN Nash, Gail JTK Kirk, James SKM McGuire, Sidney SSW Williams, Steve SWR Robinson, Stephanie WAJ Jones, Walter WWW Warren, Wayne 002 rsql: SQL User Guide dob commission region 1956-10-29 0.075 0 1943-07-17 0.100 0 1952-11-21 0.080 2 1958-09-08 0.080 3 1999-08-30 0.075 3 1959-05-18 0.075 1 1949-03-03 0.080 1 1954-10-20 0.070 3 2100-08-30 0.075 3 1947-12-02 0.070 1 1944-08-30 0.075 3 1968-10-11 0.070 0 1960-06-15 0.070 2 1953-04-29 0.075 2 4 3. A Simple Interactive SQL Scripting Utility 3. A Simple Interactive SQL Scripting Utility Okay, we know that this is the world of point-and-click, easy-to-use applications. In fact, many abound for doing just that with SQL. So what value can there possibly be in providing a text-based, command-line-oriented, interactive SQL utility? Well, for one thing, you can keep both hands on the keyboard and never have to touch the mouse! Novel concept isn’t it? It also has provided us here at Raima with something that was easy to write and is easily ported to any platform. Hence, the interface works identically on all platforms. It also provides us (and, presumably, you as well) with the ability to generate test cases that can be easily and automatically executed. Since we also share the source code to the program, it allows you to more easily see how to call the RDM Server SQL API functions without getting bogged down by object-oriented layers and user-interface calls. There is an educational benefit as well. You will more effectively learn how to properly formulate SQL statements by actually typing them in than by simply pointing to icons that do the job for you. The name of this program is rsql (the standalone version is named rsqls). To start rsql, open an OS command window and enter a command that conforms to the following syntax. Note that an RDM Server that manages the SQL databases to be accessed must be running and available. Table 3-1. RSQL Command Options rsql [-? | -h] [-B] [-V] [-e] [-u] [-c num] [-H num] [-s num] [-w num] [-l num] [-b [@hostname:port]] startupfile [arg]…] -h Display command usage information. -B Do not display program banner on startup. -V Display operating system version information. -e Do not echo commands contained in a script file. -u Display result set column headings in upper case. -c num Set maximum number of possible connections to num. -H num Set size of statement history list to num. -s num Set maximum number of statement handles per connection to num. -w num Set page width to num characters. -l num Set number of lines per display page to num. -o filename Output errors to filename. Name of text file containing startup rsql/SQL commands and any needed script file arguments (see .r startupfile [arg]… command below). SQL User Guide 5 4. RDM Server SQL Language Elements 4. RDM Server SQL Language Elements This section defines all of the basic elements of RDM Server SQL that have been used throughout this User's Guide including identifiers, reserved words and constants. 4.1 Identifiers Identifiers are used to name a wide variety of SQL language objects including databases, tables, columns, indexes, joins, devices, views, and stored procedures. An identifier is formed as a combination of letters, digits, and the underscore character ('_'), always beginning with a letter or an underscore. An identifier in RDM server can be from 1 to 32 characters in length. Unless otherwise noted in the User's Guide, identifiers are case-insensitive (upper and lower case characters are indistinguishable). Thus, CUSTOMER, customer, and Customer all refer to the same item. Identifiers cannot be a reserved word (see below). 4. 2 Reserved Words Reserved words are predefined identifiers that have special meaning in RDM Server SQL. As with identifiers, RDM Server SQL does not distinguish between uppercase and lowercase letters in reserved words. Table 4-1 lists the RDM Server SQL reserved words. Some of the listed words are not described in this document but have been retained for compatibility with other SQL systems. Note none of the words listed in this table can be used in any context other than that indicated by the use of the word in the SQL grammar. Table 4-1. RDM Server SQL Reserved Words ABS COUNTS HAVING NAME SET ACOS CREATE HEADINGS NATURAL SHARED ACTONFAIL CROSS HOUR NEW SHORT ADD CURDATE IF NEXT SHOW ADMIN CURRENCY IFNULL NOINIT SIGN ADMINISTRATOR CURRENT IGNORE NOINITIALIZE SIN AFTER CURRENT_DATE IMPORT NON_VIRTUAL SMALLINT AGE CURRENT_TIME IN NOSORT SOME AGGREGATE CURRENT_ TIMESTAMP INDEX NOT SQRT ALL CURTIME INIT NOTIFY START ALTER C_DATA INITIALIZE NOW STATEMENT AND DATA INMEMORY NULL SUBSTRING ANY DATABASE INNER NULLIF SUM AS DATABASES INSERT NUMERIC SWITCH ASC DATE INSTANCE NUMRETRIES TABLE ASCENDING DAYOFMONTH INT OBJECT TABLES ASCII DAYOFWEEK INT16 OCTET_LENGTH TAN ASIN DAYOFYEAR INT32 OF THEN SQL User Guide 6 4. RDM Server SQL Language Elements ATAN DB_ADDR INT64 OFF THROUGH ATAN2 DEACTIVATE INT8 OLD TIME ATOMIC DEBUG INTEGER ON TIMESTAMP AUTHORIZATION DEC INTO ONE TINYINT AUTO DECIMAL IP ONLY TO AUTOCOMMIT DEFAULT IPADDR OPEN TODAY AUTOLOG DELETE IS OPTION TRAILING AUTOSTART DESC ISOLATION OPTIONAL TRIGGER AVG DESCENDING JOIN OPT_LIMIT TRIM BEFORE DEVICE KEY OR TRUE BEGIN DIAGNOSTICS LARGE ORDER TRUNCATE BETWEEN DISABLE LAST OUTER TYPEOF BIGINT DISPLAY LCASE OWNER UCASE BINARY DISTINCT LEADING PAGESIZE UINT16 BIT DOUBLE LEFT PARAM UINT32 BLOB DROP LENGTH PARAMETER UINT64 BOOLEAN EACH LEVEL PI UINT8 BOTH ELSE LIKE POSITION UNICODE BTREE ENABLE LN PRECISION UNION BUT ENCRYPTION LOCALTIME PRIMARY UNIQUE BY END LOCALTIMESTAMP PROC UNLOCK BYTE ERRORS LOCATE PROCEDURE UNSIGNED CALL ESCAPE LOCK PUBLIC UPDATE CASCADE EXCLUSIVE LOG QUARTER UPPER CASE EXEC LOGFILE RAND USE CAST EXECUTE LOGGING REAL USER CEIL EXISTS LOGIN REFERENCES USING CEILING EXP LOGOUT REFERENCING VALUES CHAR EXTENSION LONG REMOVE VARBINARY CHARACTER FALSE LOWER REP VARBYTE CHARACTER_ LENGTH FILE LTRIM REPEAT VARCHAR CHAR_LENGTH FILTER MARK REPLACE VARYING CHECK FIRST MASTER REVOKE VIRTUAL CLOSE FLOAT MASTERALIAS RIGHT WAITSECONDS COALESCE FLOOR MAX ROLLBACK WCHAR COLUMN FLUSH MAXCACHESIZE ROUND WCHARACTER COMMANDS FOR MAXPGS ROW WEEK SQL User Guide 7 4. RDM Server SQL Language Elements COMMIT FOREIGN MAXTRANS ROWID WHEN COMMITTED FROM MEMBER ROWS WHERE COMPARE FULL MIN RTRIM WITH CONCAT FUNCTION MINIMUM RUN WORK CONVERT FUNCTIONS MINUTE SAVE WVARCHAR COS GRANT MOD SAVEPOINT XML COT GROUP MODE SECOND YEAR COUNT HASH MONTH SELECT 4. 3 Constants An RDM Server SQL constant is a number or string value that is used in a statement. The following sections describe how to specify each type of constant value. Numeric Constants The RDM Server SQL numeric data types are smallint, integer, float, double, and decimal. Numeric constants are formed as specified in the following syntax. numeric_constant: [+|-]digits[.digits] digits: d[d]... d: 0|1|2|3|4|5|6|7|8|9 If you specify a constant with a decimal portion (that is, [.digits]), RDM Server stores the constant as a decimal. If you do not use the decimal part, the constant is stored as an integer. The following examples show several types of numeric constants. 1021 -50 3.14159 453.75 -81.75 Floating-point constants (data type real, float, or double) can be specified using as a numeric_constant or as an exponential formed as specified below. exponential_constant: [+|-]digits[.digits]{E|e}[+|-]ddd Shown below are several examples of floating-point constants. SQL User Guide 8 4. RDM Server SQL Language Elements 6.02E23 1.8E5 -3.776143e-12 String Constants ASCII string constants are formed by enclosing the characters in the string inside single quotation marks ('string') or double quotation marks ("string"). To form a wide character (Unicode) string constant, the initial quotation mark must be immediately preceded with "L". If the string itself contains quotation mark used to specify the string it must be immediately preceded by a backslash (\). To include a backslash character in the string, enter a double backslash (\\). The following are examples of string constants. "This is an ASCII string constant" L"This is a Unicode string constant" "this string contains \"quotation\" marks" 'this string contains "quotation" marks too' 'this string contains a backslash (\\)' The default maximum length of an RDM Server SQL string constant is 256 characters. You can change this value by modifying the MaxString configuration parameter in the [SQL] section of rdmserver.ini. Refer to RDM Server Installation / Administration Guide for more information. Date, Time, and Timestamp Constants The following syntax shows the formats for date, time, and timestamp constants. date_constant: date "YYYY-MM-DD" | @"[YY]YY-MM-DD" time_constant: time "HH:MM[:SS[.dddd]]" | @"HH:MM[:SS[.dddd]]" timestamp_constant: timestamp "YYYY-MM-DD HH:MM[:SS[.dddd]]" | @"YYYY-MM-DD [HH:MM[:SS[.dddd]]]" The formats following the date, time, and timestamp keywords conform to the SQL standard. In the format for date constants, YYYY is the year (you must specify all four digits), MM is the month number (1 to 12), and DD is the day of the month (1 to 31). The @ symbol represents a nonstandard alternative. When only two digits are specified for the year using the nonstandard format, the century is assumed to be 1900 where YY is greater than or equal to 50; where YY is less than 50 in this format, the century is assumed to be 2000. In the format for time constants, HH is hours (0 to 23), MM is minutes (0 to 59), SS is seconds (0 to 59), and .dddd is the fractional part of a second, with up to four decimal places of accuracy. If you specify more than four places, the value rounds to four places. The format for timestamp constants simply combines the formats for date and time constants. You can use three alternative characters as separators in declaring date, time, and timestamp constants. Besides hyphen ("-"), RDM Server accepts slash ("/") and period ("."). SQL User Guide 9 4. RDM Server SQL Language Elements The following are examples of the use of date, time, and timestamp constants. insert into sales_order(ord_num, ord_date, amount) values(20001, @"93/9/23", 1550.00); insert into note values("HI-PRI", timestamp "1993-9-23 15:22:00", "SKM", "SEA"); select * from sales_order where ord_date >= date "1993-9-1"; insert into event(event_id, event_time) values("Marathon", time "02:53:44.47"); The set date default statement, shown below, can be used to change the separator character and the order of month, day, and year. set_date_default: set {date default | default date} to {"MM-DD-YYYY" | "YYYY-MM-DD" | "DD-MM-YYYY"} One of the three date format option must be specified exactly as shown except that the "-" separator can be any special character you choose. This statement will set the date format for both input and output. Note that the specified separator character will be accepted for date constants as well as the built in characters hyphen, slash, and period. System Constants RDM Server SQL also recognizes three built-in literal constants as described in Table 4-2. Table 4-2. Literal System Constants Constant Value user The name of the user who is executing the statement. today The current date at the execution time of the statement. now The current timestamp at the execution time of the statement. The following examples illustrate the use of the literal system constants. .. a statement that could be executed from an extension module or .. stored procedure that is always executed when a connection is made. insert into login_log(user_name, login_time) values(user, now); .. check today's action items select cust_id, note_text from action_items where tickle_date = today; SQL User Guide 10 5. Administrating an SQL Database 5. Administrating an SQL Database This chapter contains information pertinent to the administration of SQL databases. For complete RDM Server administration details please refer to the RDM Server Installation and Administration Guide. Much of the capabilities described in this chapter have alternative methods. For example, users and devices can be defined outside of SQL through use of the rdsadmin utility. However, it is often convenient (e.g., for regression testing, etc.) to be able to perform basic administrative actions through SQL statements. Hence, RDM Server SQL includes a variety of administration related statements. Note that administrator user privileges are required in order to use the SQL statements described below. 5.1 Device Administration An RDM Server device specifies a logical name for a file system directory into which the server will manage database related files. A device can be created through SQL using the create device statement with the following syntax. create_device: create [readonly] device devname as "directory_path" This statement creates a device named devname with the specified directory_path which usually will be a fully-qualified path name to an existing directory. Relative path names that are interpreted as being relative to the catalog directory as specified by the CATPATH environment variable can also be used as in the following (Windows) example. create device importdev as ".\impdata"; It is important to repeat that the directory specified in the as clause must already exist. Otherwise the system will return error "invalid device: Illegal Physical Path for Device" error. A readonly device is one in which the RDM Server managed files are only allowed to be read. Any attempt to write to a file contained on a readonly device will result in an error. Note that before any SQL DDL specification can be processed in order to define and use an SQL database, devices will need to be created for the directories that contain the DDL specification files and that will contain the created database files. create device sqldev as "c:\rdms\sqlscripts"; create device salesdb as ".\saledb"; Devices can be dropped but only when there are no RDM Server managed files contained in the directory associated with the device. The syntax for the drop device statement is very simple. drop_device: drop device devname Successful execution of this statement will drop the logical device named devname from the RDM Server system. The directory to which it refers, however, will remain as well as any, non RDM Server managed, files. You can retrieve a list of all of the devices defined for the RDM server to which you are connected by executing the predefined stored procedure name ShowDevices as shown in the example below. SQL User Guide 11 5. Administrating an SQL Database execute ShowDevices; NAME catdev emsamp importdev mlbdev mlbimpdev rdsdll samples sqldev sqlsamp sysdev TYPE Read/Write Read/Write Read/Write Read/Write Read/Write Read/Write Read/Write Read/Write Read/Write Read/Write PATH .\ ..\examples\em\ .\impdata\ c:\aspen\mlbdb\ c:\aspen\mlbdb\impfiles\ ..\dll\nt_i\ ..\examples\tims.nt_i\ ..\sqldb.nt_i\ ..\examples\emsql\ ..\syslog.nt_i\ RDM Server manages disk space in such a way as to protect against a server shutdown in the event that needed external disk storage requirements are not satisfied (i.e., the system runs out of disk space). When this happens, RDM Server will automatically switch into read-only mode until sufficient disk space is freed and made available to RDM Server. A minimum available space attribute can be associated with all RDM Server devices that allows an administrator to have some control over this low disk space system behavior. SQL provides the set device minimum statement in order to set the minimum number of bytes of free space that must be available on every device in order for RDM Server to operate in its normal, read-write mode. The syntax for this statement is as follows. set_device: set device minimum to nobytes where nobytes is the number of free space that must be available on each RDM Server device. Note that this overrides the MinFreeDiskSpace parameter in the [Engine] section of rdmserver.ini. set device minimum to 100000000; This example sets the device minimum free space threshold to 100 megabytes. 5.2 User Administration The create user statement can be used to create a new RDM Server user that will allow a user with the specified name to login into the RDM Server associated with the connection that is issuing the create user statement. The syntax for create user is shown below. create_user: create [admin[istrator]] user username password "password" [with encryption] on devname The login name for the user is username and the login password is specified as "password" with devname as the user's default device (the device which will be used with SQL statements for which an optional on devname clause has been omitted). The with encryption option indicates that an encrypted form of the password is to be stored in the system catalog. A username is a case-sensitive identifier so that "Sam", "SAM", and "sam" are three different user names. SQL User Guide 12 5. Administrating an SQL Database Administrator users have full access rights to all databases and commands. The access rights for normal (non-administrator) users must be specified through use of the grant statement (see Chapter 11). The create user statement can only be executed by administrator users. The password for an existing user can be changed using the alter user statement. alter_user: alter user username {authorization | password} "password" Normal users can use this statement only to change their own password. Administrator users can use it to change the password for any user. Administrator users can remove a user from an RDM Server using the drop user statement: drop_user: drop user username IMPORTANT: RDM Server is delivered with some predefined users. Of particular importance is user "admin" with password "secret". We high recommend that this user be dropped (or at least the password changed) once you have defined your own administrator users. An administrator can get a list of the names of all users of an RDM Server by executing the pre-defined stored procedure named ShowUsers as shown in the following example. create user randy password "RJ29j32r34s36k38" on sqldev; create admin user paul password "SaulOrPaul" with encryption on catdev; exec ShowUsers; USER_NAME admin guest paul randy wayne RIGHTS Admin Normal Admin Normal Normal HOME_DEVICE catdev catdev catdev sqldev samples 5.3 Database and File Maintenance 5.3.1 Database Initialization Administrators can initialize a database by issuing the following statement. initialize_database: init[ialize] [database] dbname SQL User Guide 13 5. Administrating an SQL Database Before initializing a database, the database must be closed by all users (including you) who currently have the database opened. Execution of this statement is unrecoverable. The only way to restore the database is to restore from your last backup. If a database contains rows that are referenced from another database, initializing the referenced database will invalidate the referential integrity between those databases. 5.3.2 Extension Files Extension files can be created to allow database files to grow to sizes larger than can be accommodated in a single operating system file. The feature was first added to RDM Server to overcome what was then the 2 gigabyte maximum size limitation for files on some operating systems. As that may still be the case on some RDM Server OS installations, extension files are necessary for those database files where the possible size limitation can be exceeded. Extension files can also be used to partition the contents of a database file into multiple files contained on separate devices. The partitioning is defined based strictly on the specified maximum size for the data file (which can be set using the alter [extension] file statement) and the range of database addresses whose associated record occurrences are stored in a given file. The syntax for the create extension file statement is given below. create_extension_file: create extension file "extname" on extdev for "basename" on basedev The name of the extension file is specified by extname and must be a legal, non-existent file name for the operating system on which RDM Server is installed. The extension file will be stored on the RDM Server device named extdev. This file will contain all data associated with the standard database file "basename" located on the device named basedev. Device extdev can be the same as basedev as long as the extname is not the same as basename. If more than one extension file is needed, you can issue as many create extension file statements on the basename as necessary. For example, the following statements create two extension files for file "sales.000" in the sales database. create extension file "sales.0x0" on sqldev for "sales.000" on sqldev; create extension file "sales.0x1" on sqldev for "sales.000" on sqldev; The alter [extension] file statement can be used to specify a variety of file sizing options for base and extension files as shown in the following syntax. alter_file: alter [extension] file extno for "basename" on basedev set {[maxsize=maxsize] | [cresize=cresize] | [extsize=extsize]}... The specified file size settings apply to extension whose number is extno where an extno of 0 (zero) refers to the base file itself. The maxsize option specifies the maximum file size in bytes. The cresize option specifies the initial size of the file when the file is first created. The extsize option specifies how much additional file space is to be allocated when data is added to the end of the file. It is best that all of these values be integer multiples of the file's page size (see create file statement). If maxsize is less than the current amount of allocated space on the file, or if the file is fully allocated to its current maximum, the request is denied. The new extsize value takes effect the next time the file is extended. The new cresize value is used the next time the file is initialized. SQL User Guide 14 5. Administrating an SQL Database You can use the maxsize value to control partitioning of the data among a set of extension files. You can execute the ShowDBFiles predefined procedure to get a complete list of all of the files for a specified database as in the example below. exec ShowDBFiles("sales"); FILENO 0 0 0 1 2 3 4 5 6 EXTNO 0 1 2 0 0 0 0 0 0 DEVNAME sqldev sqldev sqldev sqldev sqldev sqldev sqldev sqldev sqldev FILENAME sales.000 sales.0x0 sales.0x1 sales.001 sales.002 sales.003 sales.004 sales.005 sales.006 5.3.3 Flushing Inmemory Database Files RDM Server provides the ability for specific database files to be kept entirely in memory while a database is opened. This is particularly important for files whose contents are accessed often and which need to have as fast a response time as possible. RDM Server inmemory files can specified to be volatile (meaning that the file always starts empty), read (meaning that the data is initially read from the file but no changes are ever written back), or persistent (meaning that the data file is completely loaded when the database is first opened and all changes are written back to the database when the database is last closed). For persistent (and even read) immemory files, it may be necessary for changes to those files to be written back to the database while the database remains open. The flush database statement provides the ability to do just that. flush: flush [database] dbname[, dbname]... This statement flushes the updated contents of the specified persistent or read inmemory files for the specified databases to the physical database files. Note that use of the flush database statement is the only way in which changes made to inmemory read files can be written to the database. WARNING: The contents written to the database files with a flush database command are permanent and cannot be rolled back with a transaction rollback.. 5.3.4 SQL Optimization Statistics The RDM Server SQL query optimizer (see Chapter 14) utilizes data distribution statistics to assist its process of determining the best methods to use to efficiently execute a given select (or update/delete) statement. It is important that these statistics be kept up to date so that they provide a reasonable estimation of the characteristics of the data stored in the database. These statistics are generated from the current state of the database contents by executing the update statistics statement on the specified database as follows. update_stats: update {stats | statistics} on SQL User Guide dbname 15 5. Administrating an SQL Database The statistics are collected and stored in the SQL system catalog by executing the update statistics statement. The histogram for each column is collected from a sampling of the data files. The other statistics are maintained by the RDM Server runtime system. The histogram for each column contains a sampling of values (25 by default, controlled by the rdmserver.ini file OptHistoSize configuration parameter), and a count of the number of times that value was found from the sampled number of rows (1000 by default, controlled by the rdmserver.ini file OptSampleSize configuration parameter). The sampled values are taken from rows evenly distributed throughout the table. When update statistics has not been performed on a database, RDM Server SQL uses default values that assume each table contains 1000 rows. It is highly recommended that you always execute update statistics on every production database. The execution time for an update statistics statement is not excessive in RDM Server and does not vary significantly with the size of the database. Therefore, we suggest regular executions (possibly once per week or month, or following significant changes to the database). 5.4 Security Logging RDM Server SQL provides the ability to log all grant and revoke statements that are issued. The SecurityLogging configuration parameter is used to activate (=1) or deactivate this feature (=0). SecurityLogging is by default disabled. When enabled, RDM Server SQL records the information associated with each grant and revoke statement that is successfully executed in a row that is stored in the system catalog (syscat) table sysseclog along with a copy of the text of the command being stored in table systext. The following example shows the query can be used to display the security log showing when the command was issued, the name of the issuing user, and a copy of the grant/revoke statement. select issued, user_name, txtln from sysseclog natural join systext; ISSUED USER_NAME 2012-04-26 13:59:27.3160 admin 2012-04-26 13:59:16.5520 admin 2012-04-26 13:58:59.1110 admin TXTLN grant all privileges on item to wayne; grant all privileges on product to wayne; grant all privileges on outlet to wayne; 5.5 Miscellaneous Administrative Functions 5.5.1 Login/Logout Procedures Login/logout procedures are stored procedures that the SQL system calls automatically whenever a user connects to or disconnects from a server. Two types of login/logout procedures are available: l l Public login/logout procedures are called whenever any user connects or disconnects with the server. Private login/logout procedures are associated with particular users and are only called when those users connect or disconnect. If both a public and a private procedure have been defined for a user, both procedures are called; the public procedure is called before the private procedure. SQL User Guide 16 5. Administrating an SQL Database Login/logout procedures cannot return a result set and cannot have arguments. They are typically used for setting user environment values (e.g., display formats) or for performing specialized security functions. A login or logout procedure can be written either as a standard SQL stored procedure or as a C-based, user-defined procedure (UDP). Login/logout procedures are registered using the set login procedure statement with the following syntax. set_login_procedure: set {login | logout} proc[edure] for {public | username [, username ]...} to {procname | null} The public option means that the login/logout procedure will be called whenever anyone logs in/out to/from the RDM Server associated with the connection on which this statement is executed. Otherwise, the username list identifies the specific users to which the procedure applies. Only one private login/logout procedure can be associated with a user. Hence, a subsequent set login/logout procedure call will replace the previous one. The following example creates a stored procedure called set_germany, which is to be used as a login procedure that defines the user environment for German users. create proc set_germany as set currency to "€"; set date display(12, "yyyy mmm dd"); set decimal to ","; set thousands to "."; set decimal display(20, "#.#,##' €'"); end proc; The following statement registers the set_germany procedure as the login procedure for users Kurt, Wolfgang, Helmut, and Werner. set login proc for "Kurt", "Wolfgang", "Helmut", "Werner" to set_germany; The use of login/logout procedures can be enabled or disabled using the set login statement as follows. set_login: set login [to | =] { on | off } The effect of this statement is system-wide and will persist until the next set login is issued by this or another administrator user. Use of login procedures is initially turned off. 5.5.2 RDM Server Console Notifications The notify statement can be used to display a message on the RDM Server console. The syntax is shown below. notify: notify {"message" | procvar | trigvar | ?} A stored procedure variable (procvar) can be specified when the notify statement is executed within a stored procedure. A trigger variable (trigvar) that references an old or new row column value can be specified when the notify statement is executed within a trigger. If a parameter marker is specified, then the bound parameter value must be of type char or varchar. The following example shows how the notify statement can be used in a trigger. SQL User Guide 17 5. Administrating an SQL Database create trigger grade_watch before update of grade on course referencing old row as bc new row as nc for each row begin atomic notify "Grade change made to: " notify bc.student_course end; SQL User Guide 18 6. Defining a Database 6. Defining a Database A poorly designed database can create all kinds of difficulties for the user of a database application. Unfortunately, the blame for those difficulties are often laid at the feet of the database management system which, try as it might, simply cannot use non-existent access paths to quickly get at the needed data. Good database design is as much of an art as it is engineering and a solid understanding of the application requirements is a necessary prerequisite. However, it is not the purpose of this document to teach you how to produce good database designs. But you do need to understand that designing a database is a complex task and that the quality of the application in which it is to be used is highly dependent on the quality of the database design. If you are not experienced in designing databases then it is highly recommended that you first consult any number of good books on that subject before setting out to develop your RDM Server SQL database. Information in a relational database is stored in tables. Each table is composed of columns that store a particular type of information and rows that correspond to a particular record in the table. A simple but effective analogy can be made with a file cabinet as illustrated in Figure 6-1. Figure 6-1. A File Cabinet is a Database A file cabinet contains drawers. Each drawer contains a set of files organized around a common theme. For example, one drawer might contain customer information while another drawer might contain vendor information. Each drawer holds individual file folders for each customer or vendor, sorted in customer or vendor name order. Each customer file contains specific information about the customer. The cabinet corresponds to a database, each drawer is like a table, and each folder is like a row in the table. Typically, tables are viewed as shown in Figure 6-2, where the basic components of a database table are identified in an example customer table. Each column of the table has a name that identifies the kind of information it contains. Each row gives all of the information relating to a particular customer. Figure 6-2. Definition of a "Table" Suppose that you want to expand this example further and define a simple sales order database that, initially, keeps track of salespersons and their customers. Figure 6-3 shows how this information could be stored in the table. SQL User Guide 19 6. Defining a Database Figure 6-3. Salesperson Accounts Table There are columns for each salesperson's name and commission rate. Each salesperson has one or more customer accounts. The customer's company name, city, and state are also stored with the data of the salesperson who services that customer's account. Note that the salesperson's name and commission are replicated in all of the rows that identify the salesperson's customers. Such duplicated data is called redundant data. One of the goals in designing a database is to minimize the amount of redundant data that must be stored in a database. A description on how this is done will be given below in section 6.3.4. A database schema is the definition of what kind of data is to be stored and how that data is to be organized in the database. The Database Definition Language (DDL) consists of the SQL statements that are used to describe a particular database schema (also called the database definition). Five DDL statements are provided in RDM Server SQL: create database (schema), create file, create table, create index, and create join. The example below shows the RDM Server SQL DDL specification that corresponds to the TIMS Core API database definition. create database tims on sqldev disable null values disable references count; create create create create file file file file tims_d1; tims_d2; tims_k1; tims_k2; create table author( name char(31) primary key ) in tims_d2; create unique index author_key on author(name) in tims_k2; create table info( id_code char(15) primary key, info_title char(79), publisher char(31), pub_date char(11), info_type smallint, name char(31) references author ) in tims_d2; create unique index info_key on info(id_code) in tims_k1; create join has_published order last on info(name); create table borrower( myfriend char(31), date_borrowed date, date_returned date, id_code char(15) references info ) in tims_d2; create index borrower_key on borrower(myfriend) in tims_k2; create join loaned_books order last on borrower(id_code); create table text( line id_code ) in tims_d2; SQL User Guide char(79), char(15) references info 20 6. Defining a Database create join abstract order last on text(id_code); create table keyword( word char(31) primary key ) in tims_d1; create unique index keyword_key on keyword(word) in tims_k2; create table intersect( info_type smallint, id_code char(15) references info, word char(31) references keyword ) in tims_d1; create join key_to_info order last on intersect(word); create join info_to_key order last on intersect(id_code); Detailed explanation for the use of each of the statements used in the above example are given in the flowing sections of this chapter. Section 6.1 explains the use of the create database (schema) statement which names the database that will be defined by the DDL statements that follow it. The create file statement that can be used to define the files into which database data is stored is described in section 6.2. The create table statement, described in section 6.3, is used to define the characteristics of a table that will be stored in the database. The create index and create join statements are used to define methods to quickly access database data and are described in section 6.4 and section 6.5, respectively. Instructions on how to compile an SQL DDL specification follows in section 6.6. The kinds of changes that can be made to the schema of an existing (and operational) database are described in section 6.7. Finally, the database definitions for the example databases provided with RDM Server are described in section 6.8. 6.1 Create Database A complete DDL specification begins with a create database statement that conforms to the following syntax. create_database: | create {database | schema [authorization]} dbname db_attributes create {database | schema} dbname authorization username db_attributes db_attributes: [pagesize bytes] | slotsize {4 | 6 | 8} | on devname | [{enable | disable} null values] | [{enable | disable} reference count] The name of the database to be created is specified by the dbname identifier which is case-insensitive meaning that "Sales", "sales", and "SALES" all refer to the same database. The create schema form follows the SQL standard. If the authorization username clause is specified then the owner of the database will be the user named username. Otherwise the owner is the user submitting this statement. The pagesize clause specifies that the default database file page size is to be set to the integer constant nobytes bytes. It is recommended that this value be set to a multiple of the standard block size for the file system on which RDM Server is running. The default page size is 1024. The slotsize clause specifies the number of bytes to be used for the record (row) slot number used in an RDM Server database address. The slotsize defines the maximum number of rows that can be stored in a database file as the maximum unsigned 4, 6, or 8 byte integer value. The default slotsize is 4. SQL User Guide 21 6. Defining a Database The on clause is used to specify the default device on which the database files will be stored. The create file statement can be used to locate database files on separate devices if desired. RDM Server SQL maintains in each table row a bitmap that keeps track of that row's null column values. The column value is null when the bit in the bitmap associated with that column is set to 1. One byte is allocated for this bitmap for every 8 columns that are declared in that row's table. These bitmaps are automatically allocated and invisibly maintained by RDM Server SQL. However, for some applications (e.g., those designed for Core API use) do not require the use of SQL null column values. Hence, the disable null values clause can be specified to disable the allocation and use of the null values bitmap for the database. SQL requires that referential integrity be enforced for foreign and primary key relationships. This means that all rows in the primary key table that are referenced by foreign key values in the rows of the referencing table exist. This is automatically handled by SQL for those foreign keys on which a create join has been defined. For the other foreign keys, SQL maintains in each referenced primary key table row a count of the number of current references to that row. RDM Server SQL enforces referential integrity by only allowing primary key rows to be deleted or primary key values to be updated when its references count is zero. The references count value is automatically allocated and invisibly maintained by the SQL system for each row in the referenced, primary key table. The allocation and use of the references count can be disabled by specifying the disable references count clause on the create database statement. NOTE: When disable references count is specified, it will not be possible to delete rows (or update the primary key value) from a primary key table that is referenced by a foreign key for which a create join has not been defined. . The following example shows a create database statement for the bookshop database with a default page size of 4096 bytes and located on device booksdev. create database bookshop pagesize 4096 on booksdev; The create database for the RDM Server system catalog database is as follows. create database syscat on catdev disable references count disable null values; Note that this database is actually a Core API database as RDM Server SQL is itself a Core API application. Hence, the use of both null values and the references count is disabled. 6.2 Create File The create file statement is used to define a logical file in which will be stored the contents of one or more table rows, indexes, or blob values. The table or index data which will be stored in the file is specified using the in clause of a subsequent create table or create index statement. The syntax for create file is as follows. create_file: create {file | tablespace} filename [pagesize bytes] [on devname] The filename is a case-insensitive identifier to be referenced in an in clause of a later DDL statement. The pagesize clause can be used to specify the page size to be used for this particular file. If not specified, the default page size for the database will SQL User Guide 22 6. Defining a Database be used. The on clause specifies the name of the RDM Server device on which the file will be located. If not specified, the file is located on the default device for the database. Use of the create file is not required. However, it must be used when a page size other than the database default is needed or when this file needs to be located on a device other than the database's default device. Files referenced in an in clause but be created before the statement that references them is compiled. Files can only contain the same kind of content. In other words, a file can either contain the rows of one or more tables (a data file), the occurrences of one or more index keys (a key file), the occurrences of a single hash index, or the occurrences of one or more blob (e.g., long varchar) columns (a blob file). A portion of the RDM Server system catalog SQL DDL specification is shown in the example below that illustrates the use of the create file statement. create database syscat on catdev disable references count disable null values; ... /* index files */ create file sysnames; // all name indexes create file syspfkeys; // primary and foreign key column indexes | ... /* table files */ create file systabs; // systable, ... create file syspkeys; // syskey create file sysdbs; // sysparms, sysdb, sysindex ... /* blob files */ create file syscblobs pagesize 128; // long varchar data ... create table sysdb "database definition" ( name char(32) not null unique compare(nocase) "name of database", ... ) in sysdbs; create unique index db_name on sysdb(name) in sysnames; ... create table syskey "primary or unique key definition" ( ) in syspkeys; create unique index pkey on syskey(cols) in syspfkeys; ... create table systable "table definition" ( table_addr db_addr primary key, name char(32) not null compare(nocase) "table name", dbid integer not null "database identifier", ... defn long varchar in syscblobs "definition string", ... SQL User Guide 23 6. Defining a Database ) in systabs; create unique index tab_name on systable(name, dbid) in sysnames; Note that RDM Server accepts both types of C-style comments to be embedded in an SQL script. 6.3 Create Table An SQL table is the basic container for all data in a database. It consists of a set of rows each comprised of a fixed number of columns. A simple example of a table declaration and the contents of a table is given below. The example shows the create table declaration for the author table in the bookshop example database. create table author( last_name char(13) primary key, full_name char(35), gender char(1), yr_born smallint, yr_died smallint, short_bio varchar(250) ); The bookshop database contains 67 rows in the author table. Each row has values for each of the 6 columns declared in the table. Some of the rows from this table are shown below. Note that the short_bio column values are truncated due to the size of the display window. LAST_NAME AlcottL ... AustenJ wor ... BaconF state ... BarrieJ dramat ... BaumL dre ... BronteC poet, ... BronteE poet, ... BurnsR ici ... BurroughsE know ... CarlyleT writer, ... CarrollL son) ... CatherW A ... . . . TolstoyL rega ... TrollopeA specte ... SQL User Guide FULL_NAME Alcott, Louisa May GENDER M YR_BORN YR_DIED SHORT_BIO 1832 1888 American novelist. She is Austen, Jane F 1775 1817 English novelist whose Bacon, Francis M 1561 1626 English philosopher, Barrie, J. M. (James Matthew) M 1860 1937 Scottish author and Baum, L. Frank (Lyman Frank) M 1856 1919 American author of chil- Bronte, Charlotte F 1816 1855 English novelist and Bronte, Emily F 1818 1848 English novelist and Burns, Robert M 1759 1796 Scottish poet and a lyr- Burroughs, Edgar Rice M 1875 1950 American author, best Carlyle, Thomas M 1795 1881 Scottish satirical Carroll, Lewis M 1832 1898 (Charles Lutwidge Dodg- Cather, Willa F 1873 1947 a Pulitzer Prize-winning Tolstoy, Leo M 1828 1910 Russian writer widely Trollope, Anthony M 1815 1882 One of the most...re- 24 6. Defining a Database TwainM ... VerneJ p ... WellsH k ... WhartonE Ame ... WhitmanW j ... WildeO pr ... WoolfV ... Twain, Mark M 1835 1910 (Samuel Clemens) American Verne, Jules M 1828 1905 French author who helped Wells, H. G. (Herbert George) M 1866 1946 English author, now best Wharton, Edith F 1862 1937 Pulitzer Prize-winning Whitman, Walt M 1819 1892 American poet, essayist, Wilde, Oscar M 1854 1900 Irish writer, poet, and Woolf, Virginia F 1882 1941 English author, essayist, Details on how to properly define table using the create table statement are provided in the following sections of this chapter. 6.3.1 Table Declarations The create table statement is used to define a table and must conform to the following syntax. create_table: create table [dbname.]tabname ["description"] (column_defn [, column_defn]... [, table_constraint]...) [in filename] [inmemory [persistent | volatile | read] [maxpgs = maxpages]] The table will be contained in the database defined by the most recently executed create database statement. The name of the table is given by tabname which is an identifier of up to 32 characters in length. It is case-insensitive so that "salesperson" and "SALESPERSON" both refer to the same table. The table name must be unique—there can be no other table defined in the database with the same name. An optional "description" can be specified to provide additional descriptive information about the table which will be stored in the system catalog entry for this table. The infilename clause specifies a file, previously declared using create file, into which the rows of the table will be stored. If no in clause is specified, the system will automatically create a file for the table's rows using the database's default page size and storing it on the database's device. The inmemory clause indicates that all of the rows in the table are to be maintained in the RDM Server computer's memory while the database containing the table is open. The read, persistent, and volatile options control whether the table's rows are read from disk when the database is opened (read, persistent), and whether they are written to the disk when the database is closed (persistent). The default inmemory option is volatile which means that the table is always empty when the database is first opened. The read option means that all of the table's rows are read from the file when the database is opened; changes to the data are allowed but are not written back to the file on closing. The persistent option means that the table's changes that were made while the database was open are written back to the file when the database is closed. The maxpgs parameter is used to specify the maximum number of database pages allowed for the table. (A database page is the basic unit of file input/output in RDM Server. A page contains one or more rows from a table. The number of rows per page is computed based on the physical size of the table's row and the page size defined for the database file in which the table's rows are stored.) SQL User Guide 25 6. Defining a Database 6.3.2 Table Column Declarations A table is comprised of one or more column definitions. Each column definition must follow the syntax shown below. column_defn: colname basic_type [default {constant | null | auto}] [not null] [primary key | unique] [references [dbname.]tabname [ (colname[, colname]...) ]] [check(cond_expr)] [compare({nocase | wnocase | cmdFcnId})] ["description"] | colname long {varchar | wvarchar | varbinary} [default {constant | null}] [not null] [in filename] The name of the column is given by colname which is a case-insensitive identifier. There can only be one column declared in the table with that name but it can be used in other tables. A good practice when naming columns is to use the same names for the primary and foreign key columns (except, of course, when the foreign key references the same table as in the salesperson table in the sales database example, see section 6.8 below). Keeping all other column names unique across all the tables in the database will allow you to use the natural join operation in your select statements. Data Types Table columns can be declared to contain values of one of the following data types as specified in the syntax below. basic_type: {char | varchar |wchar | wvarchar } [( length )] | {binary | varbinary} [( length )] | {double [precision] | float } | real | tinyint | {smallint | short} | {int | integer | long} | bigint | rowid [ '[' {4 | 6 | 8} ']' ] | decimal [(precision[, scale])] | date | time [(precision)] | timestamp [(precision)] Descriptions for each of these data types are given in the following table. Table 6-1. RDM Server SQL Data Types Data Type Description char, varchar ASCII characters. The length specifies the maximum number of characters that can be stored wchar, wvarchar binary, varbinary SQL User Guide in the column which will be represented and stored as a null-terminated string. If no length is specified (char only), a single character only is stored. Wide character data in which the storage format is operating system dependent. On Windows, wchar is stored as UTF-16 characters. On Linux, they are stored as UCS4 characters. The length specifies the maximum number of characters (not bytes) that can be stored in column which will be represented and stored as a null-terminated string. Binary data where the length specifies the number of bytes that are stored in the column. 26 6. Defining a Database Data Type double, float real tinyint smallint int, integer, long bigint rowid decimal date time timestamp long varchar long wvarchar long varbinary Description A 64-bit floating point number. A 32-bit floating point number. An 8 bit, signed integer. A 16 bit, signed integer. A 32 bit, signed integer. A 64 bit, signed integer. A 32-bit or 64-bit (depending on slotsize value) unsigned integer that holds the address of a particular table row in the database. A binary-coded decimal in which precision specifies the maximum number of significant digits (default 32) and scale specifies the number of decimal digits (default 16). Date values are stored as a 32 bit unsigned integer containing the number of elapsed days since Jan 1, 1 A.D. Time values are stored as a 32 bit unsigned integer contains the elapsed time since midnight (to 4 decimal places => # seconds * 10000). A struct containing a data and time as defined above. A blob data column containing up to 2.1 gigabytes of ASCII character data which will be represented and stored as a null-terminated string. A blob data column containing up to 2.1 gigabytes of wide character data which will be represented and stored as a null-terminated string. A blob data column containing up to 2.1 gigabytes of binary data. Default and Auto-Incremented Values column_defn: colname basic_type [default {constant | null | auto}] The default clause can be used to specify a default value for a column when one has not been provided on an insert statement for the table. The default is specified as a literal constant that is appropriate for that particular data type (see section 3.3) or it can be set to null (the default). A column of type integer is designated as an auto-increment column by specifying the default auto clause. This will cause SQL to automatically assign the next monotonically increasing non-negative integer value to the column when a value is not specified for the column in an insert statement. For example, the log_num column of the ship_log table is declared with default auto in the following create table statement. create table ship_log ( LOG_NUM integer default auto primary key, ORD_DATE timestamp default now "date/time when order was entered", ORD_NUM smallint not null "order number", PROD_ID smallint not null "product id number", LOC_ID char(3) not null "outlet location id", QUANTITY integer not null "quantity of item to be shipped from loc_id", BACKORDERED smallint default 0 SQL User Guide 27 6. Defining a Database "set to 1 when item is backordered", check(OKayToShip(ord_num, prod_id, loc_id, quantity, backordered) = 1) ); When executing an insert statement, SQL automatically generates a value for log_num if no value has been specified. For example, in the insert statement below, SQL supplies the value for the log_num column. insert into ship_log values(, date "1998-07-02", 3710, 17419, "SEA", 1, 0); However, if you supply a value, then that value will be stored. In the example below, the loc_num value stored will be 12. insert into ship_log values(12, date "1998-07-02", 3710, 17419, "SEA", 1, 0); You should have little reason to assign your own values. But if you do, be sure to assign a value lower than the most recently auto-generated value. The automatically generated integer values do not necessarily increase in strict monotonic order (that is, exactly by 1 each time). If a table's rows are stored in a file that also contains rows from other tables, the next number might exceed the current number by more than 1. Values from deleted rows are not reused. The use of auto-increment default values does not incur any additional performance cost. RDM Server has implemented them as part of the standard file header, which uses special high-performance logging and recovery mechanisms. Column Constraints Column constraints restrict the values that can be legally stored in a column. The clauses used to do this are shown following in the syntax portion. column_defn: colname basic_type [not null] [primary key | unique] [references [dbname.]tabname [ (colname[, colname]...) ]] [check(cond_expr)] Specifying not null indicates that the column cannot be assigned a null value. This means that either a default clause must be specified for the column (of course, default null is not allowed) or a value for the column must always be specified in an insert statement on the table. A column that is declared to be a primary key or unique means that only one row in the table can have any specific value. SQL enforces this through creation of a unique index in which a copy each row's column value is contained. Error "integrity constraint violation: unique" error is returned for any insert or update statement that attempts to assign a column value that is already being used in another row of the table. Note that primary key and unique columns are automatically treated as not null columns even when the not null clause is omitted in the column declaration. SQL User Guide 28 6. Defining a Database A column that is declared with the references clause identifies it as a foreign key column referencing the primary key column in the referenced table, tabname which can be in a separate database (dbname). This means that there must exist a row in the referenced table with a primary key value that matches the column value being assigned by the insert or update statement. The check clause is used to specify a conditional expression that must evaluate to true for every row that is stored in the table. The specified conditional can only reference this column name and will typically check the value that it belongs to a certain range or set of values. Built-in or user-defined functions can be called from the conditional expression. Conditional expressions are specified in the usual way as given in the syntax below. cond_expr: rel_expr [bool_oper rel_expr]... rel_expr: | | | | | | | | | expression [not] rel_oper {expression | [{any | some} | all] (subquery)} expression [not] between constant and constant expression [not] in {(constant[, constant]...) | (subquery)} [tabname.]colname is [not] null string_expr [not] like "pattern" not rel_expr ( cond_expr ) [not] exists (subquery) [tabname.]colname *= [tabname.]colname [tabname.]colname =* [tabname.]colname subquery: select {* | expression} from {table_list | path_spec} [where cond_expr] expression: arith_expr | string_expr arith_expr: arith_operand [arith_operator arith_operand]... arith_operand: constant | [tabname.]colname | arith_function | ( arith_expr) arith_operator: +|-|*|/ arith_function: {sum | avg | max | min} (arith_expr) | count ({* | [tabname.]colname}) | if ( cond_expr, arith_expr, arith_expr) | numeric_function | datetime_function | system_function | user_defined_function string_expr: string_operand [^ string_operand] string_operand: "string" | [tabname.]colname | if ( cond_expr, string_expr, string_expr) | string_function | user_defined_function rel_oper: | | = | == < > SQL User Guide 29 6. Defining a Database | | | <= >= <> | != | /= bool_oper: | & | && | and "|" | "||" | or Descriptions of all supported SQL built-in functions can be found in Chapter 5 of the SQL Language Reference. The following example gives the declaration of the salesperson table in the example sales database. create table salesperson( sale_id char(3) primary key, sale_name char(30) not null, dob date, commission decimal(4,3) check(commission between 0.0 and 0.15), region smallint check(region in (0,1,2,3)), sales_tot double, office char(3) references invntory.outlet(loc_id), mgr_id char(3) references salesperson ); This table contains a number of column constraint definitions. The sale_id column is defined as the table's primary key. The sale_name column has the not null constraint meaning that a salesperson's name must always be specified on an insert statement. The commission column can only contain values in the range specified in its check clause. The region column must contain a value equal to 0,1,2 or 3. The office column value, if not null (null is okay), must be the same as the loc_id column of a row from the outlet table in the invntory database. And the mgr_id column, if not null (null is also okay), must be the same as a sale_id in another row of the same, salesperson table (this is a self-referencing table and is valid—note that it is not possible for a row to reference itself). 6.3.3 Table Constraint Declarations Following all column definitions, table constraints can be defined. Table constraints are similar to column constraints and are used to specify multi-column primary/unique and foreign key definitions and/or a check clause that can be used to specify a conditional expression involving multiple columns in the table that must be true for all rows in the table. The syntax for specifying table constraints is as follows. table_constraint: {primary key | unique} ( colname[, colname]... ) | foreign key ( colname[, colname]... ) references [dbname.]tabname ( colname[, colname]... ) | check ( cond_expr ) The columns that comprise a unique or primary key cannot have null values. The example below shows the create table statement for the note table in the sales database in which is declared a primary key consisting of three of the table's columns. create table note( note_id char(12) not null, SQL User Guide 30 6. Defining a Database note_date date not null, sale_id char(3) not null references salesperson, cust_id char(3) references customer, primary key(sale_id, note_id, note_date) ); The note_line table declaration in the example below contains a table constraint that declares a foreign key to the note table shown above. create table note_line( note_id char(12) not null, note_date date not null, sale_id char(3) not null, txtln char(81) not null, foreign key(sale_id, note_id, note_date) references note ); Note that no column names are specified in the "references note" clause. The references clause usually references the primary key of the referenced table but it could reference a unique column(s) declaration too. When the column names are not provided, the references clause will always refer to the table's primary key. NOTE: The number of data types of columns specified in a foreign key must match exactly with their corresponding referenced primary key (unique) counterparts.. A portion of the sales_order table declaration is shown below which includes a check clause that ensures that the specified amount is greater than the tax. create table sales_order ( cust_id char(3) not null references customer, ... amount double, tax real default 0.0, ... check(amount >= tax) ); A side note needs to be mentioned here. The amount and tax columns are declared as floating point types which may be okay for this simple example database but is not recommended for columns that are intended to contain monetary values. Floating point arithmetic is too prone to computational errors to be used for monetary calculations. Instead, always use decimal types. 6.3.4 Primary and Foreign Key Relationships Consider the create table statement below and its contents as shown in Table 6-2. create table customer( sale_name char(30), comm decimal(4,3), office char(3), company varchar(30), city char(17), SQL User Guide 31 6. Defining a Database state char(2), zip char(5) ); Table 6-2. Example Un-normalized Customer Table sale_name comm Kennedy, Bob 0.075 Kennedy, Bob 0.075 Flores, Bob 0.100 Flores, Bob 0.100 Stouffer, Bill 0.080 Blades, Chris 0.080 Lister, Dave 0.075 Wyman, Eliska 0.075 Wyman, Eliska 0.075 Wyman, Eliska 0.075 Wyman, Eliska 0.075 Wyman, Eliska 0.075 Porter, Greg 0.080 Nash, Gail 0.070 Nash, Gail 0.070 Nash, Gail 0.070 Kirk, James 0.075 McGuire, Sidney McGuire, Sidney McGuire, Sidney Williams, Steve Williams, Steve Williams, Steve Robinson, Stephanie Robinson, Stephanie Robinson, Stephanie Jones, Walter 0.070 Jones, Walter 0.070 Jones, Walter 0.070 Warren, Wayne 0.075 Warren, Wayne 0.075 office DEN DEN SEA SEA SEA SEA ATL NYC NYC NYC NYC NYC SEA DAL DAL DAL ATL 0.070 0.070 0.070 0.075 0.075 0.075 0.070 0.070 0.070 CHI CHI CHI MIN MIN company city state zip Broncos Air Express Denver Cardinals Bookmakers Phoenix Seahawks Data Services Seattle Forty-niners Venture Group Colts Nuts & Bolts, Inc. CO 80239 AZ 85021 WA 98121 San Francisco Baltimore Browns Kennels Cleveland OH Jets Overnight Express New York Patriots Computer Corp. Foxboro MA 'Bills We Pay' Financial Corp. Buffalo Giants Garments, Inc. Jersey City Lions Motor Company Detroit MI Saints Software Support New Orleans Oilers Gas and Light Co. Houston Cowboys Data Services Dallas TX 44115 NY 2131 NY NJ 48243 LA TX 75230 CA IN 94127 46219 10021 14216 7749 70113 77268 WDC Steelers National Bank Pittsburgh PA WDC Redskins Outdoor Supply Co. Arlington WDC Eagles Electronics Corp. Philadelphia ATL Dolphins Diving School Miami FL 33133 ATL Falcons Microsystems, Inc. Atlanta GA ATL Bucs Data Services Tampa FL 33601 LAX Raiders Development Co. Los Angeles CA LAX Chargers Credit Corp. San Diego CA LAX Rams Data Processing, Inc. Los Angeles Chiefs Management Corporation Kansas City MO Bengels Imports Cincinnati OH 45241 Bears Market Trends, Inc. Chicago IL 60603 Vikings Athletic Equipment Minneapolis MN Packers Van Lines Green Bay WI 54304 15234 VA PA 22206 19106 30359 92717 92126 CA 64141 90075 55420 This table shows a customer list for a fictional company. Each customer entry contains information about the salesperson who services that company. Notice that there are duplicate salesperson (the sale_name, comm. and office columns) entries because most salespersons manage multiple customer accounts. Those duplicates comprise what is referred to as redundant data. Conceptually, an entire database can be viewed as a single table in which there is a great deal of redundant data among the rows of that database. Hence, an important aspect of database design is the need to significantly reduce amount of redundant data in order to reduce disk space consumption which will also result in improved data access performance. The database design technique that does this is called normalization. Normalization transfers the columns containing the same redundant data into a separate table and then defines two new columns that will be used to associated the old data in the new table with its original data in the old one. The new column in the new table is called the primary key. The new column in the old table is called the foreign key. For the example above, the create table declarations for the two tables would be as follows. create table salesperson( sale_id char(3) primary key, sale_name char(30), SQL User Guide 32 6. Defining a Database comm decimal(4,3), office char(3) }; create table customer( company varchar(30), city char(17), state char(2), zip char(5), sale_id char(3) references salesperson ); The sale_id column in the salesperson table is the primary key. Each row of the salesperson table must have a unique sale_id value. The sale_id column in the customer table is a foreign key that references the specific salesperson row that identifies the salesperson who services that customer. The amount of redundant data per customer row has been reduced from about 40 down to 3 bytes. Table 6-3 Example Normalized Customer and Salesperson Tables Table 6-3 shows the contents of the two tables after normalization. Each customer's salesperson is found from the row in the salesperson table that has a matching sale_id column value. In order to see the name of the salesperson who services any particular customer you must perform a join (specifically an equi-join) between the two tables. An example of a join between the salesperson and customer tables is shown in the following select statement which displays the customers and their salespersons for the companies located in California. select sale_name, company, city, state from salesperson, customer where salesperson.sale_id = customer.sale_id and state = "CA"; sale_name Robinson, Stephanie Robinson, Stephanie SQL User Guide company Raiders Development Co. Rams Data Processing, Inc. city Los Angeles Los Angeles 33 6. Defining a Database Robinson, Stephanie Flores, Bob Chargers Credit Corp. Forty-niners Venture Group San Diego San Francisco A one-to-many relationship is formed between two tables through primary and foreign key column declarations in which for a given row in the primary key table there can be many rows in the foreign key table with the same value. It is often very helpful to refer to a graphical representation of a database schema in order to see all of the foreign and primary key relationships that have been defined between the database tables. There are some very sophisticated standard ways to graphically depict a database design. We prefer, however, a simpler method using an arrow between the two related tables where the arrow starts at the primary key table (the "one" side of the one-to-many relationship and the arrow ends at the foreign key table (the "many" side of the one-to-many relationship). The arrow is labeled with the name of the foreign key column. The sales database example referred to in this documentation will be described in more detail later but a diagram of the schema showing all of the foreign and primary key relationships is shown in the figure below. Figure 6-4. Sales and Inventory Database Schema Diagram Note that the sales example is actually comprised of two databases named sales and invntory. As you can see in the above example, foreign and primary key relationships can even be declared between tables defined in separate databases. It is usually a good design practice for primary and foreign key columns to have the same name. Moreover, while it is possible to declare multicolumn primary and foreign keys, it is better to define single column, unique primary keys. If there is already data that uniquely identifies each row of a table (e.g., social security number, driver's or vehicle license number, etc.) then you should make that the primary key. If not, RDM Server provides two, easy-to-use methods that automatically assign primary key values for you when rows are inserted into the table. 6.3.5 System-Assigned Primary Key Values You can declare an integer column primary key to be auto-generated. The use of auto-generated integer column values was described earlier in the "Default and Auto-Incremented Values" paragraph in section 6.3.2. In the following create table statement the log_id column is declared to be an auto-generated, integer primary key. The insert statement which follows shows how. SQL User Guide 34 6. Defining a Database create table activity_log( log_id integer default auto primary key, userid char(32), act_time timestamp, act_code tinyint, act_desc varchar(256) ); The example below shows an insert statement into the above table and the select statement that shows the log_id value that was assigned. insert into activity_log values(,user,now,1,"created auto-gen primary key example"); select log_id, act_desc from activity_log; log_id 1 act_desc created auto-gen primary key example Alternatively, you can declare a column to be a rowid primary key. A rowid primary key column uses a row's physical location in its data file to uniquely identify the row. Related tables would contain a rowid column foreign key referencing the primary key row. This provides for the fastest possible method of locating rows based on the primary key value. Use of rowid primary keys is much the same as auto-generated primary keys as shown in the example below. create table activity_log( log_id rowid primary key, userid char(32), act_time timestamp, act_code tinyint, act_desc varchar(256) ); The example below shows an insert statement into the above table and the select statement that shows the log_id value that was assigned. insert into activity_log values(,user,now,1,"created auto-gen primary key example"); select log_id, act_desc from activity_log; log_id 1 act_desc created auto-gen primary key example A value can be assigned even to a rowid primary key but the value must be for a non-existent row in the database. This allows you to export a table (in rowid order) including the rowid values so that they can be imported into an empty table keeping the same rowid primary key column values. This is important because tables that have rowid foreign key references to the rowid primary key table must also maintain their values for export/import purposes. The primary difference between the two methods is that an auto-generated integer primary key has an index whereas no index is needed for the rowid. 6.4 Create Index An index is a separate file containing the values of one or more table columns that can be used to quickly locate a row or rows in the table. Two indexing methods are supported in RDM Server. The standard indexing method is a Btree which SQL User Guide 35 6. Defining a Database organizes the indexed column values so that they are stored in sorted order. This allows fast access to all the rows that match a specified range of values. It also provides the ability to retrieve the table rows in the column order defined by the create index statement avoiding the need to do a separate sort when a select statement includes an order by clause for those columns. The time required to locate a specific row using a Btree access depends on factors such as the size of the index key and the total number of rows in the database but typically will require from 3 to 5 disk reads. The second supported index method is called hashing in which the location of a row a determined from performing a "hash" of the indexed column value. This method can often locate a particular row in 1 disk access and so can be used to provide very fast access to a row based on the indexed column value. However, the values are not sorted and, hence, hash indexes are only used to find rows that match a specific value. You do not directly use an index from SQL, but indexes are used by the RDM Server SQL optimizer in the selection of an access plan for retrieving data from the database. More indexes provide the optimizer with more alternatives and can greatly improve select execution performance. Unfortunately, the cost associated with a large number of indexes is a large amount of required storage and a lower performance, incurred by insert, update, and delete statements. Therefore, your selection of table columns to include in an index requires careful consideration. In general, create an index on the columns through which the table's rows typically will be accessed or sorted. Do not create an index for every possible sort or query that may be of interest to a user. SQL can sort when the select statement is processed, so it is unnecessary to create all indexes in advance. Create indexes on the columns you expect will be used most often in order to speed access to the rows or to order the result rows. The syntax for create index is shown in the following syntax specification: create_index: create [unique | optional] index [dbname.]ndxname ["description"] [using {btree | hash {(hashsize) | for hashsize rows}}] on tabname ( colname [asc | desc] [, colname [asc | desc] ]... ) [in filename] [inmemory [maxpgs = maxpages] ] Each index declared in the database has a unique name specified by the identifer ndxname. As with all table and column names, index names are case-insensitive. The dbname qualifier is only specified when the index is being added to a database that already exists. You can optionally include a "description" of the index which will be stored in the system catalog with the other index information. The create unique index statement is used to create an index that cannot contain any duplicate values. In addition, null column values are not allowed in columns that participate in a unique index. The create optional index creates an index (always non-unique) that can be deactivated so that the overhead incurred during the execution of an insert statement can be avoided. Optional indexes that have been activated behave just like any other index. The values of the columns on which the index is based are stored in the index during the processing of an insert statement. Use of an activated optional index is also taken into consideration by the SQL query optimizer. When an optional index is deactivated, the index values are not created when new rows are inserted nor does the optimizer use the deactivated optional index. Note, however, that a delete or update of a row in which an index value has been stored in the optional index will properly maintain it (i.e., the index value will be deleted/updated) even when the index is deactivated. Hence, the active or deactive state of an optional index only affects the use of the index in the processing of an insert or select statement. Optional indexes are useful when the use of the index by the optimizer is important only when executing queries that do not regularly occur. For example, an accounting system may activate optional indexes to improve the performance of month-end reporting requirements but then deactivate them at all other times to improve performance of transactions in which new rows are being added. To enable or disable use of an optional index, use the activate index and deactivate index statements. Execution of an activate index statement will read each row of the table and store an index value only for those rows that had been inserted since the index was last deactivated. Initially, an optional index is deactivated. SQL User Guide 36 6. Defining a Database You can specify the indexing method the using clause. The btree method is the default when the using clause is not specified. For hash indexes the maximum number of rows that will (eventually) be stored in the table must be specified as the hashsize. Note that this does not need to be exact as it is unlikely that you can actually know this value in advance. The hash algorithm relies on this information so it needs to be sufficiently large to minimize the average number of rows that hash to the same value. Note that hash indexes must also be unique. The on clause specifies the table and table columns on which the index is to be created. For btree indexes you can also specify whether an indexed column is to be sorted in the index in either ascending (asc) or descending (desc) order. Use the in clause to identify the file that contains the index. If not specified, the index will be maintained in a separate file using the default page size (1024 bytes). For hash indexes, the file specified in the in clause can only be used to store one hash index. For btree indexes, the file specified in the in clause can contain any number of other btree indexes. However, it is recommended that each index be contained in its own file as this will generally produce better performance. Although, there are some embedded operating systems with older (or simpler) file management capabilities in which having too many files can also degrade performance. The inmemory clause indicates that the index is to be maintained in the RDM Server computer's memory while the database containing the table is open. The read, persistent, and volatile options control whether the index is read from disk when the database is opened (read, persistent), and whether it is written to the disk when the database is closed (persistent). The default inmemory option is volatile which means that the index is always empty when the database is first opened. The read option means that entire index is read from the file when the database is opened; changes to the index are allowed but are not written back to the file on closing. The persistent option means that the index's changes that were made while the database was open are written back to the file when the database is closed. The maxpgs option is be used to specify the maximum number of database pages allowed for the index. (A database page is the basic unit of file input/output in RDM Server. A page contains one or more keys in the index. The number of keys per page is computed based on the size of the indexed columns and the page size defined for the database file in which the index is stored.) All unique and primary key columns (except those of the rowid data type) are indexed. If you do not specify a create index for a unique or primary key, SQL will automatically create one for you. You only need to specify a create unique index for unique or primary key table column(s) when 1) you want to use a hash index, 2) some of the columns in the btree index need to be in desc order, 3) you need to use the in clause to specify the index file where the create file was used to specify a page size other than the default page size, or 4) the index needs to be specified as an inmemory index. In the following index example, the outlet table in our inventory database has two indexes. The loc_key index is an inmemory index for the primary key and loc_geo is an optional index. create table outlet ( loc_id char(3) primary key, city char(17) not null, state char(2) not null, region smallint not null "regional U.S. sales area" ); create unique index loc_key on outlet(loc_id) inmemory persistent; create optional index loc_geo on outlet(state, city); In the following index example, the outlet table in our inventory database has two indexes. The loc_key index is an inmemory index for the primary key and loc_geo is an optional index. The create table for the sales_order table in the sales database example is shown below along with a multi-column create index on the ord_date, amount, and ord_time columns. SQL User Guide 37 6. Defining a Database create table sales_order ( cust_id char(3) not null references customer, ord_num smallint primary key, ord_date date default today, ord_time time default now, amount double, tax real default 0.0, ship_date timestamp default null, check(amount >= tax) ) in salesd0; create index order_ndx on sales_order(ord_date, amount desc, ord_time) in salek1; 6.5 Create Join Using a create join statement, you can declare predefined joins that RDM Server SQL will automatically maintain on each insert, update, and delete statement issued by your application. Predefined joins are used to directly link together all of the rows from a referencing table together with the referenced primary key row. Thus, queries that include a join (that is, an equijoin) between tables related through foreign and primary key relationships can directly access the related rows. This results in optimal join performance. Like an index, a join is implicitly used by the RDM Server SQL optimizer in optimizing data access. This means that no RDM Server SQL data manipulation statement refers directly to the predefined join. A predefined join provides direct access from the primary key's row to all referencing foreign key rows, as well as from the foreign key rows to the referenced primary key row. Thus, bi-directional direct access is available without the necessity of an index on the foreign key. This bi-directional access also provides efficient outer-join processing. Suppose that the salesperson table illustrated in Figure 6-8 contains rows for newly hired salespersons who do not yet have any customers. An "inner join" results in a virtual table that includes only the salespersons who have at least one customer (new hires are excluded). In this case, Figure 6-8 corresponds to an inner natural join of the salesperson and customer tables. An "outer join" created for these tables results in a virtual table that includes all salespersons and their customers. New hires appear in the table with empty (or null) customer column values, as illustrated in Figure 6-2. Figure 6-5. Example of Outer Join Result Access from the table row containing a foreign key to another table row containing the corresponding primary key entry is always available through the primary key index. You can simply index the foreign key column to allow quick access to the foreign key row from the primary key table. However, doing so can use a large amount of disk storage because many foreign keys can be associated with a single primary key. If you create a join instead, without indexing the foreign key, RDM Server uses direct access methods to form the table relationship. This strategy results in better performance and saves considerable disk storage. The foreign key columns used in a create join are virtual columns (that is, columns for which RDM Server does not store the data values). The application can access a value in a virtual column, just as it does any column. For virtual columns, RDM Server automatically extracts the data value from the primary key column of the referenced row through a pointer to that row that is associated with the predefined join and maintained by RDM Server. SQL User Guide 38 6. Defining a Database Since values in a foreign key column come from the corresponding primary key column of the referenced table, no redundant data is required. However, if an index uses one of the foreign key columns or if you have specified the non_virtual attribute in your create join, the foreign key column values will be stored. In this case, redundant data is maintained in the referencing (foreign key) table. When all foreign keys that reference a particular primary key are virtual, RDM Server allows the primary key to be modified, even if there are still active foreign keys that reference it. This is the only case where RDM Server allows a primary key column to be modified with references still active. Thus, changing the primary key value will instantly change it in all the foreign key rows that reference it. Using a join in your schema guarantees that only a single logical disk access is necessary to retrieve a row in the referenced table. Thus, performance is optimal for referential integrity checking, for select statement processing, and for locating all rows of the tables with a particular foreign key value. In addition, the database can use either one-to-many or many-to-one data retrieval. As with indexes, you should take care in deciding what foreign keys to use in predefined joins. Since a join is implemented by using database address links stored in each row, RDM Server must use resources to maintain the links during execution of database modification calls. Therefore, you should only use a join for situations in which the application needs to access tables by going from primary key to foreign key, as well as from foreign key to primary key. When the access direction will only be from the foreign key to the table containing the primary key, simply using the primary key index usually achieves acceptable performance. The syntax for create join is shown in the following grammar specification: create_join: create join joinname order {first | last | next | sorted} on foreign_key [and foreign_key]... foreign_key: [virtual | non_virtual] tabname [ ( colname [, colname]... ) ] [by colname [asc | desc] [, colname [asc | desc]]... ] The name of the join is specified by joinname which is a case-insensitive identifier. Even though the join is named, no other SQL statement refers to a join by name as use of joins is handled automatically by RDM Server SQL. A join on a foreign key declared in table tabname. If only one foreign key is declared then no colname list needs to be specified. If specified, the colname list must exactly match the colname list in a foreign key declared in table tabname or, if only one column is specified, the colname column declaration in table tabname must itself have a references clause specified. Columns of foreign keys on which the create join is specified are by default virtual—meaning that the column value is not stored in the foreign key table but is always retrieved from its referenced, primary key table row. This reduces the amount of data redundancy in the database. However, you can declare the join to be non_virtual indicating that the foreign key values are to also be stored in the foreign key table. RDM Server implements a predefined join by maintaining all of the rows that have the same foreign key values (the referencing rows) in a linked list connected to the referenced primary key row (the referenced row). The order clause specifies the order in which the referencing rows are maintained in this linked list as follows: order order order order first last next sorted SQL User Guide Newly Newly Newly Newly inserted inserted inserted inserted foreign foreign foreign foreign key key key key rows are rows are rows are rows are placed placed placed placed at the front of the list. at the end of the list. following the current position in the list. in the order specified in the foreign_key clause. 39 6. Defining a Database When you define a join as order sorted, you need to specify the by clause with either asc or desc to describe the sort order for each column as ascending or descending, respectively. Sort orders of mixed ascending and descending columns can be specified but are not yet supported. Currently, ordering of all sort columns is based on the ordering of the first sort field. The performance of an insert or update operation involving a joined foreign key will degrade when a large number of matching foreign key values exist. This is because the linked list implementation of predefined joins must be scanned to locate the proper insertion place. The larger the list, the longer the time of the scan. The and operation allows multiple tables which contain foreign key declarations that reference the same primary key table to share the same predefined join. This means that rows from each table that reference the same primary key row will be maintained in the join's linked list. If the join is order sorted, then the type and length of the sort columns in each of the and'd tables must match exactly. Use of the and reduces the amount of space allocated to each row of the primary key table needed to maintain the predefined join lists. However, the cost of accessing related rows from one of the tables will be reduced as an access cost is incurred from any intervening rows from the other table(s) that are in the linked list. The following example shows a portion of the salesperson and customer tables containing their respective primary and foreign key declarations. create table salesperson ( sale_id char(3) primary key, sale_name char(30), ... ); create table customer ( cust_id char(3) primary key, company varchar(30), ... sale_id char(3) references salesperson ); create join salesperson_customers order last on customer(sale_id); The order last specification will place a given salesperson's newly inserted rows at the end of the list so that a subsequence select statement that includes a join of the two tables will return the rows in the same order in which they were inserted. insert into salesperson values "WHG", "Gates, Bill"; insert into insert into insert into insert into commit; customer customer customer customer values values values values "IBM", "DLL", "INT", "UW", "IBM Corporation", "WHG"; "Dell, Inc.", "WHG"; "Intel Corporation", "WHG"; "University of Washington", "WHG"; select sale_name, cust_id, company from salesperson, customer where salesperson.sale_id = customer.sale_id and sale_id = "WHG"; sale_name Gates, Bill Gates, Bill Gates, Bill Gates, Bill cust_id IBM DLL INT UW company IBM Corporation Dell, Inc. Intel Corporation University of Washington Now consider, on the other hand, that the create join was specified with order sorted as follows. SQL User Guide 40 6. Defining a Database create join salesperson_customers order sorted on customer(sale_id) by company; Then the rows from that same select statement would be returned in company name order as shown below. select sale_name, cust_id, company from salesperson, customer where salesperson.sale_id = customer.sale_id and sale_id = "WHG"; sale_name Gates, Bill Gates, Bill Gates, Bill Gates, Bill cust_id DLL IBM INT UW company Dell, Inc. IBM Corporation Intel Corporation University of Washington 6.6 Compiling an SQL DDL Specification There are several ways to compile an SQL DDL specification. Each DDL statement can be individually compiled and executed (of course in the correct sequence) through whatever method you would typically choose to use (e.g., the rdsadmin utility's SQL Browser). Usually, however, the SQL DDL specification will be contained in a text file as is the case with all of the RDM Server example database specifications. The SQL DDL specification text can include C-style comments where the text between an opening "/*" up through a closing "*/" (can span multiple lines) is ignored as well as the text from a "//" to the end of the text line. Two command-line utilities are provided that you can use to process an SQL DDL specification file. The sddlp utility is provided just for that purpose. You can also use the rsql utility's ".r" command to process the DDL file as a script file. If you do that, be sure to subsequently submit a commit statement (assuming, of course, there were no errors in the DDL specification). The sddlp utility is executed from a command-line according the usage shown below. sddlp [-?|-h] [-B] [-V] [-2] [-6] [-f] [-L server ; user ; password ] ddlfile Option -? Description Displays this usage information -B Do not display the start-up banner -V Display the version information -2 Align records like version RDM Server 2.1 -6 Align BLOB files like version RDM Server 6.X -f Return database header file to client (for core API program use of SQL database). -L server ; user ; pass- Login in to RDM Server named server with user name user and password password. Each are word separated by a semi-colon (:) with no intervening spaces. If not specified, sddlp will attempt to use the values specified in the RDSLOGIN environment. variable and, failing that, will issue command-line prompts for the information. ddlfile The name of the text file containing the SQL DDL specification. SQL User Guide 41 6. Defining a Database 6.7 Modifying an SQL DDL Specification RDM Server allows the schema for an existing (i.e., populated) database to be modified by adding new tables or indexes, dropping existing tables or indexes, or changing the definition of a table. Each of these types of DDL modifications are described in the following sections. 6.7.1 Adding Tables, Indexes, Joins to a Database You can add a new table or index to a database simply by issuing a create table/index statement with the table/index name qualified by the database name as indicated in the earlier syntax specifications reproduced below. create_table: create table [dbname.]tabname ["description"] (column_defn [, column_defn]... [, table_constraint]...) [in filename] [inmemory [persistent | volatile | read] [maxpgs = maxpages]] create_index: create [unique | optional] index [dbname.]ndxname ["description"] [using {btree | hash {(hashsize) | for hashsize rows}}] on tabname ( colname [asc | desc] [, colname [asc | desc] ]... ) [in filename] [inmemory [maxpgs = maxpages] ] The table/index being created will be added to the database named dbname. If dbname is not specified, then the table/index is added to the most recently opened/accessed database. Note that the new table can contain a foreign key declaration that references an existing table, however, it is not possible to add a create join on the foreign key. A create join can be added only when the join being defined is between two tables that are also being added in the same transaction. The syntax for the create join statement is reproduced below. create_join: create join joinname order {first | last | next | sorted} on foreign_key [and foreign_key]... foreign_key: [virtual | non_virtual] tabname [ ( colname [, colname]... ) ] [by colname [asc | desc] [, colname [asc | desc]]... ] 6.7.2 Dropping Tables and Indexes from a Database You can use the drop table statement to remove a table from a database as shown in the syntax below. drop_table: drop table [dbname.tabname The index will be dropped from database dbname. If dbname is not specified, then table tabname will be dropped from the most recently opened/accessed database that contains a table named tabname. SQL User Guide 42 6. Defining a Database Tables to which foreign key references exist in other tables cannot be dropped. Nor can tables be dropped that have foreign keys on which a create join has been declared. Indexes can be dropped from a table using the drop index statement as follows: drop_index: drop index [dbname.]ndxname The index will be dropped from database dbname. If dbname is not specified, then index ndxname will be dropped from the most recently opened/accessed database that contains an index named ndxname. Pre-defined joins (defined by the create join statement) cannot be dropped from a database. Any create table, create index, drop table or drop index statements that are issued do not take effect until the next commit statement is issued. 6.7.3 Altering Databases and Tables If you will be making more than one change to a database schema, it is best to encapsulate the changes in an alter database transaction. The syntax for the alter database statement is shown below. alter_database: alter {database | schema} dbname The alter database statement is followed by a series of create file, create table, create index, drop table, drop index, or alter table statements that describe the changes you wish to make to the database schema. All the changes will be processed when a subsequent commit statement is submitted. For example, the following alter database script will add an index on column contact in the customer table, drop the cust_ order_key index in sales_order, and add a new table called sales_office. alter database sales; create file salesd4; create file salek3; create index contact_key on customer(contact) in salek3; drop index cust_order_key; create table sales_office( office_id char(3) primary key, address char(30), city char(20), state char(2), zip char(10), phone char(12) ); create unique index office_key on sales_office(office_id) in salek3; commit; SQL User Guide 43 6. Defining a Database The alter table statement is used to change the definition of an existing table. It can be used to add, drop, or modify column definitions, rename the table or its columns, to modify the inmemory maxpgs value, drop a foreign key, or change the table description string. The syntax for the alter table statement is as follows. alter_table: alter table [dbname.]tabname alter_table_action alter_table_action: add [column] column_defn [, column_defn]... | alter [column] column_defn | drop column colname | inmemory maxpgs = maxpages | rename table tabname ["description"] | rename [column] oldname newname ["description"] | drop foreign key ( colname [, colname]... ) | "description" Execution of this statement will modify the definition of the table name tabname in database dbname. If dbname is not specified then the table must be defined in the database identified in a previously submitted and active alter database statement. The add column clause is used to add one or more columns to the table. A complete column_defn must be specified for each one. The added columns will be added at the end of the table in the order specified in the list. The alter column is used to change the definition of an existing column. The column_defn must be complete, that is, it must still include all of the original column definition entities that are to be retained. If not null is added to the column definition, a default must be given. If the type or length of the column changes, any index that uses this column must have been previously dropped. If the type or length of the column changes, any foreign key references including this column must have been previously dropped. The check and compare clauses of the column_defn cannot be changed, added, or removed when altering a column. Type conversion from double, float, real, numeric, decimal, date, time or timestamp into varchar, char, wvarchar or char will use the default display format for the type as defined by the user (i.e. "set date display(14, "mmm. d, yyyy")"). The drop column action removes the column from the table The rename action can be used to change the table name or the name of a column. The drop foreign key clause removes the foreign key table constraint with the specified column names. Foreign keys on which a create join has been declared cannot be dropped. 6.7.4 Schema Versions All of the DDL statements that have been submitted after the alter database statement which initiated the schema modification take effect upon execution of a commit statement. RDM Server assigns a new version number to the newly changed database schema. Versioning allows DDL changes to take immediate effect without having to apply to those changes to the existing database data. All SQL statements that access the database which are submitted after the schema has been changed must conform to the new DDL specification. Any C applications or stored procedures that reference changed or dropped DDL tables or columns must be changed and recompiled. Database files that contained only tables, indexes, or blob column data that have been dropped are deleted. SQL User Guide 44 6. Defining a Database Columns that have been added to tables will have null values returned for the table's rows that existed prior to the DDL changes being put into effect. If the newly added column was specified as not null then the column's default value will be returned. If any tables are dropped or columns are changed or dropped by an alter database transaction, an update stats should be submitted on the database after the DDL changes have been committed.. 6.8 Example SQL DDL Specifications Several example databases are provided with RDM Server. The two example databases that are primarily used in this documentation to illustrate use of RDM Server SQL are for a hypothetical computer components sales company. (Since this example has been in use in the RDM Server documentation since 1992 perhaps we should call it an antique computer component sales company.) Also provided is an example database for a hypothetical bookshop that only sells high-end, rare antiquarian books. The other example database contains actual data derived from over 130,000 National Science Foundation (USA) research grants that were awarded during the years 1990 through 2003. 6.8.1 Sales and Inventory Databases The example inventory database is defined in the following schema. This database consists of three tables. The product table contains information about each product, including an identification code, description, pricing, and wholesale cost. The outlet table identifies all company office and warehouse locations. The on_hand table is used to link the other two tables by defining the quantity of a specific product (through prod_id) located at a particular outlet (through loc_id). This specification is available in the text file named "invntory.sql". create database invntory on sqldev; create table product ( prod_id smallint primary key "product identification code", prod_desc char(39) not null "product description", price float "retail price", cost float "wholesale cost" ); create unique index prod_key on product(prod_id); create index prod_pricing on product(price desc, prod_id); create table outlet ( loc_id char(3) primary key, city char(17) not null, state char(2) not null, region smallint not null "regional U.S. sales area" ); create unique index loc_key on outlet(loc_id); create optional index loc_geo on outlet(state, city); create table on_hand ( loc_id char(3) not null references outlet(loc_id) "warehouse containing this product", prod_id smallint not null references product "id of product at this warehouse", quantity integer not null "units of product on hand at this warehouse", primary key(loc_id, prod_id) SQL User Guide 45 6. Defining a Database ); create unique index on_hand_key on on_hand(loc_id, prod_id); create join inventory order last on on_hand(loc_id); create join distribution order last on on_hand(prod_id); The example sales database definition given below, is more complex than the inventory database. The salesperson table contains specific information about a salesperson, including sales ID code, name, commission rate, etc. The customer table contains standard data identifying each customer. The sale_id column in this table is a foreign key for the salesperson who services the customer account. Note that sales orders made by a customer are identified through the cust_id foreign key in the sales_order table. This DDL specification is contained in text file "sales.sql". create create create create create create create create database sales on sqldev; file salesd0; file salesd1; file salesd2; file salesd3; file salek0; file salek1 pagesize 2048; file salek2 pagesize 4096; create table salesperson ( sale_id char(3) primary key, sale_name char(30) not null, dob date "date of birth", commission decimal(4,3) check(commission between 0.0 and 0.15) "salesperson's commission rate", region smallint check(region in (0,1,2,3)) "regional U.S. sales area", office char(3) references invntory.outlet(loc_id) "location where salesperson works", mgr_id char(3) references salesperson "salesperson id of sales mgr" ) in salesd0; create unique index sale_key on salesperson(sale_id) in salek0; create optional index sales_regions on salesperson(region, office) in salek1; create optional index sales_comms on salesperson(commission desc, sale_name) in salek2; create join manages order last on salesperson(mgr_id); create table customer ( cust_id char(3) primary key, company varchar(30) not null, contact varchar(30), street char(30), city char(17), state char(2), zip char(5), sale_id char(3) not null references salesperson "salesperson who services customer account" ) in salesd0; create unique index cust_key on customer(cust_id) in salek0; create optional index cust_geo on customer(state, city) in salek2; create join accounts order last on non_virtual customer; create table sales_order ( SQL User Guide 46 6. Defining a Database cust_id char(3) not null references customer "customer who placed order", ord_num smallint primary key "order number", ord_date date default today "date order placed", ord_time time default now "time order placed", amount double "total base amount of order", tax real default 0.0 "state/local taxes if appl.", ship_date timestamp default null, check(amount >= tax) ) in salesd0; create unique index order_key on sales_order(ord_num) in salek0; create index order_ndx on sales_order(ord_date, amount desc, ord_time) in salek1; create index cust_order_key on sales_order(cust_id) in salek0; create join purchases order last on sales_order; create table item ( ord_num smallint not null references sales_order, prod_id smallint not null references invntory.product, loc_id char(3) not null references invntory.outlet, quantity integer not null "number of units of product ordered", check( HaveProduct(ord_num, prod_id, loc_id, quantity) = 1 ) ) in salesd1; create index item_ids on item(prod_id, quantity desc) in salek1; create join line_items order last on item(ord_num); create table ship_log ( log_num integer default auto primary key, ord_date timestamp default now "date/time when order was entered", ord_num smallint not null "order number", prod_id smallint not null "product id number", loc_id char(3) not null "outlet location id", quantity integer not null "quantity of item to be shipped from loc_id", backordered smallint default 0 "set to 1 when item is backordered", check(OKayToShip(ord_num,prod_id,loc_id,quantity,backordered) = 1) ) in salesd0; create index ship_order_key on ship_log(ord_num, prod_id, loc_id) in salek1; create table note ( note_id char(12) not null, note_date date not null, sale_id char(3) not null references salesperson, cust_id char(3) references customer, primary key(sale_id, note_id, note_date) ) in salesd2; create unique index note_key on note(sale_id, note_id, note_date) in salek1; SQL User Guide 47 6. Defining a Database create join tickler order sorted on note(sale_id) by note_date desc; create join actions order sorted on note(cust_id) by note_date desc; create table note_line ( note_id char(12) not null, note_date date not null, sale_id char(3) not null, txtln char(81) not null, foreign key(sale_id, note_id, note_date) references note ) in salesd3; create join comments order last on note_line; For the sales database, the item table contains product and quantity data pertaining to the sales order identified through the ord_num foreign key. Notes that can serve as a tickler for the salesperson or that indicates actions to be performed for the customer are stored in the note table. Each line of the note text is stored as a row of the note_line table. An additional table, called ship_log, contains information about sales orders that have been booked but not yet shipped. Your application will create rows in this table through a trigger function, which is a special use of a user-defined function (UDF). The schema diagram for the sales and inventory databases was given earlier in Figure 6-3 but is also shown in below. Recall that the boxes represent tables. The arrow represents the foreign and primary key relationship between the two tables where the arrow starts at the primary key table (the "one" side of the one-to-many relationship and the arrow ends at the foreign key table (the "many" side of the one-to-many relationship). The arrow is labeled with the name of the foreign key column. Figure 6-6. Sales and Inventory Databases Schema Diagram 6.8.2 Antiquarian Bookshop Database Our fictional bookshop is located in Hertford, England (a very real and charming town north of London). It is located in a building constructed around 1735 and has two rather smallish rooms on two floors with floor-to-ceiling bookshelves throughout. Upon entering, one is immediately transported to a much earlier era being quite overwhelmed by the wonderful sight and SQL User Guide 48 6. Defining a Database odor of the ancient mahogany wood in which the entire interior is lined along with the rare and ancient books that reside on them. There is a little bell that announces one’s entrance into the shop but it is not really needed, as the delightfully squeaky floor boards quite clearly makes your presence known. In spite of the ancient setting and very old and rare books, this bookshop has a very modern Internet storefront through which it sells and auctions off its expensive inventory. A computer system contains a database describing the inventory and manages the sales and auction processes. The database schema for our bookshop is given below. It is contained in text file "bookshop.sql". create database bookshop on booksdev; create table author( last_name char(13) primary key, full_name char(35), gender char(1), yr_born smallint, yr_died smallint, short_bio varchar(250) ); create table genres( text char(31) primary key ); create table subjects( text char(51) primary key ); create table book( bookid char(14) primary key, last_name char(13) references author, title varchar(255), descr char(61), publisher char(136), publ_year smallint, lc_class char(33), date_acqd date, date_sold date, price double, cost double ); create join authors_books order last on book(last_name); create index year_ndx on book(publ_year); create table related_name( bookid char(14) references book, name char(61) ); create join book_names order last on related_name(bookid); create table genres_books( bookid char(14) references book, genre char(31) references genres ); create join genre_book_mm order last on genres_books(genre); create join book_genre_mm order last on genres_books(bookid); create table subjects_books( bookid char(14) references book, SQL User Guide 49 6. Defining a Database subject char(51) references subjects ); create join subj_book_mm order last on subjects_books(subject); create join book_subj_mm order last on subjects_books(bookid); create table acctmgr( mgrid char(7) primary key, name char(24), hire_date date, commission double ); create table patron( patid char(3) primary key, name char(30), street char(30), city char(17), state char(2), country char(2), pc char(10), email char(63), phone char(15), mgrid char(7) ); create index patmgr on patron(mgrid); create index phone_ndx on patron(phone); create table note( noteid integer primary key, bookid char(14) references book, patid char(3) references patron ); create join book_notes order last on note(bookid); create join patron_notes order last on note(patid); create table note_line( noteid integer references note, text char(81) ); create join note_text order last on note_line(noteid); create table sale( bookid char(14) references book, patid char(3) references patron ); create join book_sale order last on sale(bookid); create join book_buyer order last on sale(patid); create table auction( aucid integer primary key, bookid char(14) references book, mgrid char(7) references acctmgr, start_date date, end_date date, reserve double, curr_bid double ); create join book_auction order last on auction(bookid); create join mgd_auctions order last on auction(mgrid); SQL User Guide 50 6. Defining a Database create table bid( aucid integer references auction, patid char(3) references patron, offer double, bid_ts timestamp ); create join auction_bids order last on bid(aucid); create join patron_bids order last on bid(patid); Descriptions for each of the above tables are given below. Table 6-5. Bookshop Database Table Descriptions Table Name Description author Each row contains biographical information about a reknowned author. book Contains information about each book in the bookshop inventory. The last_name column genres subjects related_name genres_books subjects_books note note_line acctmgr patron sale auction bid associates the book with its author. Books with a non null date_sold are no longer available. Table of genre names (e.g., "Historical fiction") with which particular books are associated via the genres_books table. Table of subject names (e.g., "Cape Cod") with which particular books are associated via the subjects_books table. Related names are names of individuals associated with a particular book. The names are usually hand-written in the book’s front matter or on separate pages that were included with the book (e.g., letters) and identify the book’s provenance (owners). Only a few books have related names. However, their presence can significantly increase the value of the book. Used to create a many-to-many relationship between genres and books. Used to create a many-to-many relationship between subjects and books. Connects each note_line to its associated book. Notes include edition info and other comments (often coded) relating to its condition. One row for each line of text in a particular note. Account manager are the bookshop employees responsible for servicing the patrons and managing auctions. Bookshop customers and their contact info. Connected to their purchases/bids through their relationship with the sale and auction tables. Contains one row for each book that has been sold. Connects the book with the patron who acquired through the bookid and patid columns. Some books are auctioned. Those that have been (or currently being) auctioned have a row in this table that identifies the account manager who oversees the auction. The reserve column specifies the minimum acceptable bid, curr_bid contains the current amount bid. Each row provides the bid history for a particular auction. A schema diagram depicting the intertable relationships is shown below. SQL User Guide 51 6. Defining a Database Figure 6-7. Bookshop Database Schema Diagram 6.8.3 National Science Foundation Awards Database The data used in this example has been extracted from the University of California Irvine Knowledge Discovery in Databases Archive (http://kdd.ics.uci.edu/). The original source data can be found at http://kdd.ics.uci.edu/databases/nsfabs/nsfawards.html. The data was processed by a Raima-developed RDM SQL program that, in addition to pulling out the data from each award document, converted all personal names to a "last name, first name, ..." format and, where possible, identified each person’s gender from the first name. The complete DDL specification for the NSF awards database is shown below. It is contained in text file "nsfawards.sql". create database nsfawards on sqldev; create table person( name char(35) primary key, gender char(1), jobclass char(1) ); create table sponsor( name char(27) primary key, addr char(35), city char(24), state char(2), zip char(5) ); create index geo_loc on sponsor(state, city); create table nsforg( SQL User Guide 52 6. Defining a Database orgid name char(3) primary key, char(40) ); create table nsfprog( progid char(4) primary key, descr char(40) ); create table nsfapp( appid char(10) primary key, descr char(30) ); create table award( awardno integer primary key, title char(182), award_date date, instr char(3), start_date date, exp_date date, amount double, prgm_mgr char(35) references person, sponsor_nm char(27) references sponsor, orgid char(3) references nsforg ); create join manages order last on award(prgm_mgr); create join sponsor_awards order last on award(sponsor_nm); create join org_awards order last on award(orgid); create index award_date_ndx on award(award_date); create index exp_date_ndx on award(exp_date); create index amount_ndx on award(amount); create table investigator( awardno integer references award, name char(35) references person ); create join award_invtgrs order last on investigator(awardno); create join invtgr_awards order last on investigator(name); create table field_apps( awardno integer references award, appid char(10) references nsfapp ); create join award_apps order last on field_apps(awardno); create join app_awards order last on field_apps(appid); create table progrefs( awardno integer references award, progid char(4) references nsfprog ); create join award_progs order last on progrefs(awardno); create join prog_awards order last on progrefs(progid); Descriptions for each of the tables declared in the nsfawards database are given in the following table. Table 6-6. NSF Awards Database Table Descriptions Table Name Description person Contains one row for each investigator or NSF program manager. An investigator (jobcclass = "I") is a person who is doing the research. The NSF program manager (jobcclass = "P") over- SQL User Guide 53 6. Defining a Database Table Name sponsor nsforg nsfprog nsfapp award investigator field_apps progrefs Description sees the research project on behalf of the NSF. An award can have more than one investigator but only one program manager. The gender column is derived from the first name but has three values "M", "F", and "U" for "unknown" when the gender based on the first name could not be determined (about 13%). Institution that is sponsoring the research. Usually where the principal investigator is employed. Each award has a single sponsor. NSF organization. The highest level NSF division or office under which the grant is awarded. Specific NSF programs responsible for funding research grants. NSF application areas that the research impacts. Specific data about the research grant. The columns are fairly self-explanatory. For clarity the exp_data column contains the award expiration data (i.e., when the money runs out). The amount column contains the total funding amount. The instr column is a code indicating the award instrument (e.g., "CTG" = "continuing", "STD" = "standard", etc.). The specific investigators responsible for carrying out the research. This table is used to form a many-to-many relationship between the person and award tables. NSF application areas for which the research is intended. This table is used to form a manyto-many relationship between the nsfapp and award tables. Specific programs under which the research is funded. This table is used to form a many-tomany relationship between the nsfprog and award tables. Note that the interpretations given in the above descriptions are Raima's and may not be completely accurate (e.g., it could be that NSF programs are not actually responsible for funding research grants). However, our intent is to simply use this data for the purpose of illustration. A schema diagram for the nsfawards database is shown below. Figure 6-8. NSF Awards Database Schema Diagram 6.9 Database Instances A database instance is a database that shares the schema (DDL specification) of another database. There can be any number of database instances that share the same schema definition. One principal use for database instancing is a situation where mutually exclusive data exists that includes differing archiving requirements. For example, to retrieve and delete data from a database all database information related to a particular account or client record can be tedious to program and expensive to SQL User Guide 54 6. Defining a Database process. However, if each client or account data is placed in a separate database instance, it is easy to both archive (simply copy the database files), and delete (simply reinitialize the database or delete it altogether). Time oriented applications also can benefit from database instancing. Consider the example of a company that uses a separate instance for each day of the current year. In this setup, each day's transactions can simply be stored in the instance for that day. Instancing is also useful in some replication applications. For example, assume a large corporation has a mainframe computer that stores all accounts from all its branch offices. Each branch office performs a daily download of the new and modified accounts into separate database instances for each account. This allows each modified account to simply reinitialize the database before receiving the new account information or to create a new instance for the new accounts. Database instancing requires that the database definition be considered as distinct from the database itself, since there can be more than one instance of a schema and each instance has a different name. The original instance has the same name as the schema; subsequent instances have different names. Once a database instance has been created, it can be used in exactly the same manner as any database. 6.9.1 Creating a Database Instance When SQL processes an SQL DDL specification, the database name specified in the create database statement names both the schema and the first instance of the schema, and automatically creates the first instance. Other instances can then be created and dropped. A new instance of a database is created by a successful execution of the create database instance statement shown below. create_database_instance: create database instance newdb from sourcedb [with data | [no]init[ialize]] [on devname] New database instances are created from existing databases. The name of the new database is given by newdb, which must be unique for all databases on the server. The existing database instance from which the new instance is created is sourcedb. The database device name, devname, must be specified and must be a valid RDM Server database device. In addition, that device cannot have been the device in which database sourcedb is contained nor any other instances of the same schema. All database files will be stored on that device and, since the file names for all instances are identical, they must be stored in separate database devices. If specified, the with data option opens the source database for exclusive access and causes all database files and optimizer statistics from the source database to be copied into the new database. The init option (default) will ensure that the database files for the instance are initialized. The noinit option can be specified to defer initialization to some later time when an initialize database statement will be performed. The create database instance statement can only be executed by administrators or the owner of the schema (that is, the user who issued the original create database statement). The initial instance of a database is created when a database definition is processed. The name of the instance is specified in the create database statement. Other instances can then be created from the original database. All instances share the same database definition information from the system catalog. However, database statistics used by the SQL query optimizer collected during execution of the update stats statement are maintained separately for each database instance. 6.9.2 Using Database Instances Database instances are referenced just as you reference any database. You can explicitly open a database instance using the open statement or implicitly open one through a qualified table name. For example, assume that wa_sales, ca_sales, and mi_ SQL User Guide 55 6. Defining a Database sales are each instances of the sales database, containing the sales for Washington, California, and Michigan, respectively. The following example shows how these instances can be created and populated. create database instance wa_sales from sales on wadev; insert into wa_sales.customer from file "customrs.wa" on salesdev; create database instance ca_sales from sales on cadev; insert into ca_sales.customer from file "customrs.ca" on salesdev; create database instance mi_sales from sales on midev; insert into mi_sales.customer from file "customrs.mi" on salesdev; update stats on wa_sales, ca_sales, mi_sales; The next example returns the customers from the Michigan instance of sales. open mi_sales; select * from customer; This same query could have been executed using a single statement as follows. select * from mi_sales.customer; You can have any number of instances of the same schema opened at a time. An unqualified reference to a table in the schema will use the most recently opened instance by default. If you are not sure which instance is open, it is best to explicitly qualify the table name with the database name. An unqualified reference to a table from a schema on which there is more than one instance will use the oldest instance (usually the original) when none have been opened. 6.9.3 Stored Procedures and Views Views and stored procedure definitions are maintained based on the schema definition and are not dependent on a particular database instance except when the database instance is explicitly referenced in the view or stored procedure declaration. However, the execution plan generated for the view or stored procedure is based on the optimization statistics associated with whatever database instance was open at the time the view or stored procedure was compiled. Thus, if a view or stored procedure will be used with more than one database instance, it is important that the instance used during compilation contain a representative set of data on which an update stats has been run. The example below creates a view called in_state_by_zip that will list the customers in a database instance in zip code order. The mi_sales database was opened for the create view because it contained a large number of customers. Thus, the optimizer would be sure to use the index on zip (assuming that in this example zip is indexed). The subsequent open on wa_sales followed by the select of in_state_by_zip will return the results from the wa_sales database. -- Lot's of customers in Michigan, should provide good stats open mi_sales; create view in_state_by_zip as select * from customer order by zip; open wa_sales; select * from in_state_by_zip; SQL User Guide 56 6. Defining a Database Note that for views referenced in a select statement qualified with an instance name, the instance name is used to identify the schema to which the view is attached. It does not specify which instance to use with any unqualified table names in the view definition itself. Thus, in the following example, the result set will contain Washington, not Michigan, customers. open wa_sales; select * from mi_sales.in_state_by_zip; 6.9.4 Drop Database Instance The drop database statement can be used to delete database instances. The syntax is shown below. drop_database: drop database dbname This statement can only be executed by administrators or the database owner. Also, database dbname must not be opened by any other users. The system drops the database instance by removing its instance-specific information from the system catalog. The database definition information associated with the schema is not deleted. Dropping the original database after all other database instances based on it have been dropped will remove the database completely from the system, including the schema definition. 6.9.5 Restrictions A database instance cannot be created for any database which contain explicitly declared foreign key references to a different database. For example, the example sales database schema provided in RDM Server contains foreign references to the invntory database. Any attempt to create an instance of either sales or inventory will return an error. This restriction exists because it is impossible for RDM Server to reliably manage inter-database reference counts for multiple database instances. The reliability of such operations would be based on the correctness of the application's use of those databases, thus violating the very concept of DBMS-enforced referential integrity. Inter-database relationships can still be maintained by the application program by using undeclared foreign keys. Shown below is an excerpt from sales.sql with the declared foreign keys to the invntory database highlighted. By simply removing the indicated reference clauses, it is possible to create multiple instances of both sales and inventory. Referential integrity will not be enforced by SQL but the inter-database relationships can still exist with no effect on how joins between the databases are processed. create table salesperson ( sale_id char(3) primary key, ... office char(3) references invntory.outlet(loc_id), mgr_id char(3) references salesperson ); ... create table item ( ord_num smallint not null references sales_order, prod_id smallint not null references invntory.product, loc_id char(3) not null references invntory.outlet, SQL User Guide 57 6. Defining a Database ... ); SQL User Guide 58 7. Retrieving Data from a Database 7. Retrieving Data from a Database The reason data is stored in a database is so that it can be later retrieved and looked at. However, in order to do something intelligent with that data it must first intelligently be retrieved. This is often much easier to say than to do and that is particularly true with a language like SQL. Data is retrieved from RDM Server databases using the SQL select statement. A completely specified select statement is commonly referred to as a query. The complete set of rows that are returned by a select statement is called the result set. This chapter will explain how to properly formulate select statements to view data contained in one or more RDM Server databases. We will begin with the simplest and progress to more complex queries. The select statement syntax specification will be incrementally developed throughout this chapter in order to show only the syntax that is relevant to the select statement feature being explained. 7.1 Simple Queries The most basic of queries is to retrieve all of the rows and columns of a table. The easiest way to do this is to use the following statement: select: select * from tabname The "*" indicates that all of the columns declared in tabname are to be returned. Thus, you can enter the following statement to see all of the account managers in the acctmgr table in the bookshop database. For example, the following statement retrieves data from the salesperson table in the example sales database. To choose all columns in the table, enter an asterisk (*). select * from salesperson; SALE_ID BCK BNF BPS CMB DLL ERW GAP GSN JTK SKM SSW SWR WAJ WWW SALE_NAME Kennedy, Bob Flores, Bob Stouffer, Bill Blades, Chris Lister, Dave Wyman, Eliska Porter, Greg Nash, Gail Kirk, James McGuire, Sidney Williams, Steve Robinson, Stephanie Jones, Walter Warren, Wayne DOB COMMISSION REGION 1957-10-29 0.075 0 1943-07-17 0.100 0 1952-11-21 0.080 2 1958-09-08 0.080 3 1999-08-30 0.075 3 1959-05-18 0.075 1 1949-03-03 0.080 1 1954-10-20 0.070 3 2100-08-30 0.075 3 1947-12-02 0.070 1 1944-08-30 0.075 3 1968-10-11 0.070 0 1960-07-15 0.070 2 1953-04-29 0.075 2 SALES_TOT 736345.32 173102.02 29053.3 0 0 566817.01 439346.5 306807.26 0 208432.11 247179.99 374904.47 422560.55 212638.5 OFFICE DEN SEA SEA SEA ATL NYC SEA DAL ATL WDC ATL LAX CHI MIN MGR_ID BNF *NULL* *NULL* *NULL* *NULL* GAP *NULL* CMB *NULL* GAP CMB BNF BPS BPS Of course, if you only need to see some but not all of the columns in a table, those columns can be individually listed as indicated in the following syntax. select: select colname[, colname]… from tabname SQL User Guide 59 7. Retrieving Data from a Database Each specified colname must identify a column that is declared in tabname. The next example retrieves the salesperson name, sales total, commission, region code for each salesperson. select sale_name, sales_tot, commission, region from salesperson; SALE_NAME Kennedy, Bob Robinson, Stephanie Flores, Bob Wyman, Eliska Porter, Greg McGuire, Sidney Jones, Walter Warren, Wayne Stouffer, Bill Williams, Steve Kirk, James Lister, Dave Nash, Gail Blades, Chris SALES_TOT COMMISSION REGION 736345.32 0.075 0 374904.47 0.070 0 173102.02 0.100 0 566817.01 0.075 1 439346.5 0.080 1 208432.11 0.070 1 422560.55 0.070 2 212638.5 0.075 2 29053.3 0.080 2 247179.99 0.075 3 0 0.075 3 0 0.075 3 306807.26 0.070 3 0 0.080 3 7.2 Conditional Row Retrieval If you need to retrieve only table rows that meet particular selection criteria, you can issue a select statement using the where clause to specify a condition indicating just the rows you want. The where clause contains a conditional expression consisting of one or more relational expressions separated by operators as specified in the syntax given below. select: select {* | colname[, colname]…} from tabname where cond_expr cond_expr: rel_expr [bool_oper rel_expr]... rel_expr: | | | | | | | | | expression [not] rel_oper {expression | [{any | some} | all] (subquery)} expression [not] between constant and constant expression [not] in {(constant[, constant]...) | (subquery)} [tabname.]colname is [not] null string_expr [not] like "pattern" not rel_expr ( cond_expr ) [not] exists (subquery) [tabname.]colname *= [tabname.]colname [tabname.]colname =* [tabname.]colname expression: arith_expr | string_expr arith_expr: arith_operand [arith_operator arith_operand]... arith_operand: constant | [tabname.]colname | arith_function | ( arith_expr) arith_operator: +|-|*|/ SQL User Guide 60 7. Retrieving Data from a Database string_expr: string_operand [^ string_operand] string_operand: "string" | [tabname.]colname | if ( cond_expr, string_expr, string_expr) | string_function | user_defined_function rel_oper: | | | | | = | == < > <= >= <> | != | /= bool_oper: | & | && | and "|" | "||" | or For example, the following query chooses only customer accounts in the customer table (sales database) that are serviced by Sidney McGuire (that is, accounts with sale_id equal to "SKM"). select sale_id, cust_id, company, city, state from customer where sale_id = "SKM"; SALE_ID SKM SKM SKM CUST_ID PHI PIT WAS COMPANY Eagles Electronics Corp. Steelers National Bank Redskins Outdoor Supply Co. CITY Philadelphia Pittsburgh Arlington STATE PA PA VA The next query example lists the sales_order rows for those orders that have not yet shipped (indicated by a null in the ship_ date column) and where the amount is $50,000 or more. select cust_id, ord_num, ord_date, amount from sales_order where ship_date is null and amount > 50000.00; CUST_ID ORD_NUM ORD_DATE BUF 2205 1997-01-03 DEN 2207 1997-01-06 GBP 2211 1997-01-10 NOS 2218 1997-01-24 DET 2219 1997-01-27 HOU 2226 1997-01-30 ATL 2230 1997-02-04 LAA 2234 1997-02-10 DEN 2237 1997-02-12 KCC 2241 1997-02-21 DET 2250 1997-03-06 PHO 2253 1997-03-16 CIN 2257 1997-03-23 NYJ 2270 1997-04-02 NEP 2281 1997-04-13 SFF 2284 1997-04-20 SQL User Guide AMOUNT 150871.2 274375 53634.12 81375 74034.9 54875 62340 124660 103874.8 82315 82430.85 143375 62340 54875 66341.5 74315.16 61 7. Retrieving Data from a Database DET GBP NOS 2288 1997-04-24 2292 1997-04-30 2324 1997-07-30 252425 77247.5 104019.5 Note that the "ship_date is null" and not "ship_date != null" relational operator is required in order for the query to return the correct results. The SQL standard specifies that the result of a normal relational comparison with a null value is indeterminate and that only those rows in which the where clause evaluates to true are returned by a select statement. Since, "ship_date != null' is, according to standard SQL, indeterminate, no rows would be returned from that select statement. 7.2.1 Retrieving Data from a Range The between operator returns those rows where the left hand expression inclusively evaluates to a value between the two values on the right. In the following example, the between operator will restrict the select result set to only those sales orders made from January 1 to January 31, 1997, inclusive. select cust_id, ord_num, ord_date from sales_order where ord_date between date "1997-1-1" and date "1997-1-31"; CUST_ID CHI MIN KCC CIN BUF LAN DEN PHI PHO IND GBP ATL NYG LAA SEA KCC SDC NOS DET DEN NEP CLE MIN TBB SEA HOU IND ORD_NUM 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 ORD_DATE 1997-01-02 1997-01-02 1997-01-02 1997-01-02 1997-01-03 1997-01-02 1997-01-06 1997-01-07 1997-01-07 1997-01-09 1997-01-10 1997-01-15 1997-01-16 1997-01-16 1997-01-17 1997-01-21 1997-01-24 1997-01-24 1997-01-27 1997-01-27 1997-01-27 1997-01-28 1997-01-28 1997-01-28 1997-01-29 1997-01-30 1997-01-31 7.2.2 Retrieving Data from a List You can use the in operator to choose only those rows that match one of the column values specified in the list. The example shows a select statement that retrieves all customers located in Pacific Coast states from the customer table. SQL User Guide 62 7. Retrieving Data from a Database select cust_id, company, city, state from customer where state in ("CA", "OR", "WA"); CUST_ID SEA SFF LAA LAN SDC COMPANY Seahawks Data Services Forty-Niners Venture Group Raiders Development Co. Rams Data Processing, Inc. Chargers Credit Corp. CITY Seattle San Francisco Los Angeles Los Angeles San Diego STATE WA CA CA CA CA 7.2.3 Retrieving Data by Wildcard Checking The where clause can include a like operator to retrieve the rows where a character column's value match the wildcard pattern specified in the like string constant. Two wildcard characters are defined in standard SQL. Table 7-1. LIKE Operatior Wild Card Character Descriptions Table Name Description % (percent) Matches zero or more characters. _ (underscore) Matches any single character. The next example includes a select statement that retrieves from the customer table all customers who have "Data" as part of their company name. select cust_id, company, city, state from customer where company like "%Data%"; CUST_ID SEA DAL TBB LAN COMPANY Seahawks Data Services Cowboys Data Services Bucks Data Services Rams Data Processing, Inc. CITY Seattle Dallas Tampa Los Angeles STATE WA TX FL CA The application can change these match characters using the set wild statement. 7.2.4 Retrieving Rows by Rowid RDM Server SQL provides a feature where rowid primary key columns can be declared in a table. The primary key value is automatically assigned by the system to the row's location in the database file. This allows rows from that table to be accessed directly through the primary key column. Even when no rowid primary key column has been declared in the table, RDM Server SQL exposes the rowid of each row of the table through use of the rowid keyword. All a user needs to is reference a column called "rowid" in the select statement as shown in the example queries below. select rowid, sale_id, sale_name, region, office, mgr_id from salesperson; ROWID SALE_ID SALE_NAME 6 BCK Kennedy, Bob 1 BNF Flores, Bob SQL User Guide REGION OFFICE MGR_ID 0 DEN BNF 0 SEA *NULL* 63 7. Retrieving Data from a Database 3 4 14 7 2 11 13 8 12 5 9 10 BPS CMB DLL ERW GAP GSN JTK SKM SSW SWR WAJ WWW Stouffer, Bill Blades, Chris Lister, Dave Wyman, Eliska Porter, Greg Nash, Gail Kirk, James McGuire, Sidney Williams, Steve Robinson, Stephanie Jones, Walter Warren, Wayne 2 3 3 1 1 3 3 1 3 0 2 2 SEA SEA ATL NYC SEA DAL ATL WDC ATL LAX CHI MIN *NULL* *NULL* *NULL* GAP *NULL* CMB *NULL* GAP CMB BNF BPS BPS The rowid column should be qualified by a table name if there is more than one table listed in the from clause as shown below. select salesperson.rowid, sale_name, customer.rowid, company from salesperson, customer where salesperson.sale_id = customer.sale_id; salesperson.rowid 6 6 1 1 3 7 7 7 7 7 2 11 11 11 8 8 8 12 12 12 5 5 5 9 9 9 10 10 sale_name Kennedy, Bob Kennedy, Bob Flores, Bob Flores, Bob Stouffer, Bill Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Porter, Greg Nash, Gail Nash, Gail Nash, Gail McGuire, Sidney McGuire, Sidney McGuire, Sidney Williams, Steve Williams, Steve Williams, Steve Robinson, Stephanie Robinson, Stephanie Robinson, Stephanie Jones, Walter Jones, Walter Jones, Walter Warren, Wayne Warren, Wayne customer.rowid 17 34 15 31 29 23 26 27 30 32 39 19 25 33 24 35 42 28 36 41 16 18 20 21 22 38 37 40 company Broncos Air Express Cardinals Bookmakers Seahawks Data Services Forty-niners Venture Group Colts Nuts & Bolts, Inc. Browns Kennels Jets Overnight Express Patriots Computer Corp. 'Bills We Pay' Financial Corp. Giants Garments, Inc. Lions Motor Company Saints Software Support Oilers Gas and Light Co. Cowboys Data Services Steelers National Bank Redskins Outdoor Supply Co. Eagles Electronics Corp. Dolphins Diving School Falcons Microsystems, Inc. Bucs Data Services Raiders Development Co. Chargers Credit Corp. Rams Data Processing, Inc. Chiefs Management Corporation Bengels Imports Bears Market Trends, Inc. Vikings Athletic Equipment Packers Van Lines If more than one table is listed in the from clause and the rowid column is not qualified with a table name, the system will return the rowid from the first listed table. As with standard column references the qualifier name should be the correlation name when a correlation name as been specified, as shown in the example below. select s.rowid, s.sale_name, c.rowid, c.city, c.state from salesperson s, customer c where s.sale_id = c.sale_id and s.region = 0; SQL User Guide 64 7. Retrieving Data from a Database S.ROWID 6 6 5 5 5 1 1 S.SALE_NAME Kennedy, Bob Kennedy, Bob Robinson, Stephanie Robinson, Stephanie Robinson, Stephanie Flores, Bob Flores, Bob C.ROWID 17 34 16 18 20 15 31 C.CITY Denver Phoenix Los Angeles San Diego Los Angeles Seattle San Francisco C.STATE CO AZ CA CA CA WA CA Direct access retrieval will occur for queries of the following form: select … from … where [tabname.]rowid = constant select s.rowid, sale_name, company, city, state from salesperson s, customer c where s.sale_id = c.sale_id and s.rowid = 7; S.ROWID 7 7 7 7 7 SALE_NAME Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska COMPANY CITY STATE Browns Kennels Cleveland OH Jets Overnight Express New York NY Patriots Computer Corp. Foxboro MA 'Bills We Pay' Financial Corp. Buffalo NY Giants Garments, Inc. Jersey City NJ 7.3 Retrieving Data from Multiple Tables A join associates two tables together common columns. Typically, but not always, the common columns will have the same names. Join relationships can be explicitly defined between tables in the database definition through the specification of primary and foreign key clauses. But even where explicit joins have not been defined in the schema, joins between tables with common columns can still be specified in a select statement. RDM Server support two different methods for specifying joins. Old style join specifications are based on the 1989 ANSI SQL standard in which all of the inter-table join relationships are specified in the select statement’s where clause. Extended join specifications are based on the join enhancements originally introduced in the 1992 ANSI SQL standard in which the join relationships are specified in the from clause. 7.3.1 Old Style Join Specifications Inner Joins It is often necessary for an application to retrieve data from several related tables using a join. To form a join, issue a select statement that specifies each table name in the from clause. In the where clause, include an equality comparison of the associated columns (that is, the foreign and primary key columns) from the two tables. This comparison is called a join predicate. To differentiate between join columns of the same name in the two tables, the select statement must prefix the table names to the column names in the comparison. An inner join is one in which only those rows from the two tables with matching values are returned. Join predicates are specified in the where clause as a relational expression according to the following syntax. rel_expr: | ... [tabname.]colname = [tabname.]colname SQL User Guide 65 7. Retrieving Data from a Database The example below retrieves and lists the customer accounts (customer table) for each salesperson (salesperson table). select sale_name, company, city, state from salesperson, customer where salesperson.sale_id = customer.sale_id; SALE_NAME Kennedy, Bob Kennedy, Bob Flores, Bob Flores, Bob Stouffer, Bill Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Porter, Greg Nash, Gail Nash, Gail Nash, Gail McGuire, Sidney McGuire, Sidney McGuire, Sidney Williams, Steve Williams, Steve Williams, Steve Robinson, Stephanie Robinson, Stephanie Robinson, Stephanie Jones, Walter Jones, Walter Jones, Walter Warren, Wayne Warren, Wayne COMPANY Broncos Air Express Cardinals BoOKmakers Seahawks Data Services Forty-niners Venture Group Colts Nuts & Bolts, Inc. Browns Kennels Jets Overnight Express Patriots Computer Corp. 'Bills We Pay' Financial Corp. Giants Garments, Inc. Lions Motor Company Saints Software Support Oilers Gas and Light Co. Cowboys Data Services Steelers National Bank Redskins Outdoor Supply Co. Eagles Electronics Corp. Dolphins Diving School Falcons Microsystems, Inc. Bucs Data Services Raiders Development Co. Chargers Credit Corp. Rams Data Processing, Inc. Chiefs Management Corporation Bengels Imports Bears Market Trends, Inc. Vikings Athletic Equipment Packers Van Lines CITY Denver Phoenix Seattle San Francisco Baltimore Cleveland New York Foxboro Buffalo Jersey City Detroit New Orleans Houston Dallas Pittsburgh Arlington Philadelphia Miami Atlanta Tampa Los Angeles San Diego Los Angeles Kansas City Cincinnati Chicago Minneapolis Green Bay STATE CO AZ WA CA IN OH NY MA NY NJ MI LA TX TX PA VA PA FL GA FL CA CA CA MO OH IL MN WI Your application can join any number of tables using the select statement. The next example illustrates a three-table join from the sales database that shows the January sales orders booked by Stephanie Robinson ("SWR"). select sale_name, cust_id, ord_date, ord_num, amount from salesperson, customer, sales_order where salesperson.sale_id = "SWR" and salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id and ord_date between date "1997-1-1" and date "1997-1-31"; SALE_NAME Robinson,Stephanie Robinson,Stephanie Robinson,Stephanie CUST_ID LAN LAA SDC ORD_DATE 1997-01-02 1997-01-16 1997-01-24 2206 2214 2217 ORD_NUM AMOUNT 15753.190000 12614.340000 705.980000 Outer Joins An outer join between two tables includes those rows in one table that do not have any matching rows from the other table. A left outer join includes the rows for which the column on the left side of the join predicate do not have matching right-side column values. A right outer join does just the opposite. RDM Server SQL supports both left outer joins and right outer joins as specified below. SQL User Guide 66 7. Retrieving Data from a Database rel_expr: | | ... [tabname.]colname *= [tabname.]colname [tabname.]colname =* [tabname.]colname Table 7-2. Outer Join Relational Operators Type of Join Operator left outer join *= right outer join =* The "outer" side column of an outer join predicate must be indexed or be a foreign key column on which a create join has been declared in order for RDM Server SQL to be able to perform the outer join. Otherwise, a "No access path between outer joined tables" error will be returned by SQL. The select statement in the following example uses a left outer join operator to retrieve the customers for each salesperson, whether or not that salesperson has any customers. The result set in this case will contain rows for all salespersons and null for the customer table columns for those salespersons who do not manage any customer accounts (e.g., salesperson managers). select sale_name, company, city, state from salesperson, customer where salesperson.sale_id *= customer.sale_id; SALE_NAME Kennedy, Bob Kennedy, Bob Flores, Bob Flores, Bob Stouffer, Bill Blades, Chris Lister, Dave Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Porter, Greg Nash, Gail Nash, Gail Nash, Gail Kirk, James McGuire, Sidney McGuire, Sidney McGuire, Sidney Williams, Steve Williams, Steve Williams, Steve Robinson, Stephanie Robinson, Stephanie Robinson, Stephanie Jones, Walter Jones, Walter Jones, Walter Warren, Wayne Warren, Wayne SQL User Guide COMPANY Broncos Air Express Cardinals BoOKmakers Seahawks Data Services Forty-niners Venture Group Colts Nuts & Bolts, Inc. *NULL* *NULL* Browns Kennels Jets Overnight Express Patriots Computer Corp. 'Bills We Pay' Financial Corp. Giants Garments, Inc. Lions Motor Company Saints Software Support Oilers Gas and Light Co. Cowboys Data Services *NULL* Steelers National Bank Redskins Outdoor Supply Co. Eagles Electronics Corp. Dolphins Diving School Falcons Microsystems, Inc. Bucs Data Services Raiders Development Co. Chargers Credit Corp. Rams Data Processing, Inc. Chiefs Management Corporation Bengels Imports Bears Market Trends, Inc. Vikings Athletic Equipment Packers Van Lines CITY Denver Phoenix Seattle San Francisco Baltimore *NULL* *NULL* Cleveland New York Foxboro Buffalo Jersey City Detroit New Orleans Houston Dallas *NULL* Pittsburgh Arlington Philadelphia Miami Atlanta Tampa Los Angeles San Diego Los Angeles Kansas City Cincinnati Chicago Minneapolis Green Bay 67 7. Retrieving Data from a Database As you can see, the outer join result includes rows from the salesperson table that do not have any customers. This is exactly what the outer join does. Compare with the results from the earlier query. As long as the table names are unique, you need do nothing different to perform a join between tables in different databases. The following example retrieves the descriptions of the specific products (product table in invntory database) ordered in the Stephanie Robinson ("SWR") sales orders. select cust_id, ord_num, prod_id, prod_desc from customer, sales_order, item, product where sale_id = "SWR" and ord_date between @"97-1-1" and @"97-1-31" and customer.cust_id = sales_order.cust_id and sales_order.ord_num = item.ord_num and item.prod_id = product.prod_id; CUST_ID LAA LAA LAA LAA SDC SDC LAN LAN LAN LAN LAN LAN LAN ORD_NUM 2214 2214 2214 2214 2217 2217 2206 2206 2206 2206 2206 2206 2206 PROD_ID 13016 17419 18060 19100 22024 23401 10450 15750 17214 18120 18121 23200 23400 PROD_DESC RISC 16MB computer 19in SVGA monitor 60MB cartridge tape drive flat-bed plotter 1200/2400 baud modem track ball 486/50 computer 750 MB hard disk drive 14in VGA monitor 120MB cartridge tape drive 120MB tape cartridge enhanced keyboard mouse If both databases have a table with the same name the table names listed in the from clause will need to be qualified with the database name as indicated by the syntax shown below. from_clause: from [dbname.]tabname [, [dbname.]tabname ]... For example, assume that both the sales database and the invntory database contain a table named "product". In the from clause of the select statement, the name of the product table is prefixed with the database name "invntory". However, note that the prod_id column in the select column list is not qualified. RDM Server assumes that an unqualified duplicate column name is from the first table in the from list that contains a column of that name. Since the prod_id column values from both tables is the same, it doesn't really matter which column is returned by the select statement. select cust_id, ord_num, prod_id, prod_desc from customer, sales_order, item, invntory.product where sale_id = "SWR" and ord_date between @"97-1-1" and @"97-1-31" and customer.cust_id = sales_order.cust_id and sales_order.ord_num = item.ord_num and item.prod_id = product.prod_id; CUST_ID ORD_NUM PROD_ID PROD_DESC LAA 2214 13016 RISC 16MB computer LAA 2214 17419 19in SVGA monitor LAA 2214 18060 60MB cartridge tape drive LAA 2214 19100 flat-bed plotter SDC 2217 22024 1200/2400 baud modem SDC 2217 23401 track ball LAN 2206 10450 486/50 computer SQL User Guide 68 7. Retrieving Data from a Database LAN LAN LAN LAN LAN LAN 2206 2206 2206 2206 2206 2206 15750 17214 18120 18121 23200 23400 750 MB hard disk drive 14in VGA monitor 120MB cartridge tape drive 120MB tape cartridge enhanced keyboard mouse Correlation Names Sometimes an application must use the same select statement to reference two tables with the same name from separate databases. In that case, the from clause must include correlation names to distinguish between the two table references. Correlation names are aliased identifiers specified following the table name as shown in the following from clause syntax . from_clause: from [dbname.]tabname [[as] corrname][, [dbname.]tabname [[as] corrname]]... The correlation name, corrname, is an identifier defined as an alias for the table name that can be used to qualify column names in that table that are referenced in the select statement. Suppose that the product table in the invntory database is named item instead of product. Then the information in the example above would be specified as follows. select cust_id, ord_num, prod_id, prod_desc from customer, sales_order, sales.item s_item, invntory.item i_item where sale_id = "SWR" and ord_date between @"97-1-1" and @"97-1-31" and customer.cust_id = sales_order.cust_id and sales_order.ord_num = item.ord_num and s_item.prod_id = i_item.prod_id; In this example, the correlation name for the sales database item table is s_item, and the correlation name for the invntory database item table is i_item. Correlation names are required when processing a self-join. A self-join is a join of a table with itself. The mgr_id column in the salesperson table is a foreign key to the salesperson table. A self-join can be used to list all salespersons along with their managers as shown in the following example. Notice how correlation names are used to distinguish between the manager's row and the salesperson's row. select emp.sale_name, mgr.sale_name from salesperson emp, salesperson mgr where emp.mgr_id = mgr.sale_id; EMP.SALE_NAME Kennedy, Bob Warren, Wayne Williams, Steve Wyman, Eliska Jones, Walter McGuire, Sidney Nash, Gail Robinson, Stephanie SQL User Guide MGR.SALE_NAME Flores, Bob Stouffer, Bill Blades, Chris Porter, Greg Stouffer, Bill Porter, Greg Blades, Chris Flores, Bob 69 7. Retrieving Data from a Database Column Aliases The columns specified in a select result column list can be assigned aliases as specified below. select: select select_item [, select_item]... from_clause [where cond_expr] select_item: [tabname | corrname.]colname [ identifier | "headingstring" ] The identifier or "headingstring" will be displayed in the result set heading instead of the column name. The last example is shown below but using the column aliases "employee" and "manager". select emp.sale_name employee, mgr.sale_name manager from salesperson emp, salesperson mgr where emp.mgr_id = mgr.sale_id; EMPLOYEE Kennedy, Bob Warren, Wayne Williams, Steve Wyman, Eliska Jones, Walter McGuire, Sidney Nash, Gail Robinson, Stephanie MANAGER Flores, Bob Stouffer, Bill Blades, Chris Porter, Greg Stouffer, Bill Porter, Greg Blades, Chris Flores, Bob 7.3.2 Extended Join Specifications The 1992 ANSI SQL standard introduced a new method by which joins between tables can be specified. This new method separates the information needed to form the joins from the where clause and places it in the from clause of the select statement. In addition, the 1992 standard also enhanced join handling to allow the specification of left and right outer joins (the 1989 standard only allowed for inner joins, the "*=" and "=*" outer join operators described in the last section are non-standard). The enhanced syntax for the select statement from clause that incorporates join specifications is given below. select: select {* | select_item [, select_item]...} from table_ref [, table_ref]... table_ref: table_primary | table_join table_primary: table_name_spec | ( table_join ) table_name_spec: [dbname.]tabname [[as] corrname] table_join: natural_join | qualified_join | cross_join natural_join: table_ref natural [inner | {left | right} [outer]] join table_primary SQL User Guide 70 7. Retrieving Data from a Database qualified_join: table_ref [inner | {left | right} [outer]] join table_primary {using (colname[, colname]... ) on cond_expr} cross_join: table_ref cross join table_primary The natural join specification indicates that the join is to be performed based on the common columns (names and types) from the two tables. RDM Server will perform the join based on the columns from the table (or tables) specified on the left side of "natural … join" with those columns from the table (or tables) on the right side that have the same name. The example below gives a natural inner join between the salesperson and customer tables. select sale_name, company from salesperson natural inner join customer; SALE_NAME Kennedy, Bob Kennedy, Bob Flores, Bob Flores, Bob Stouffer, Bill Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Porter, Greg Nash, Gail Nash, Gail Nash, Gail McGuire, Sidney McGuire, Sidney McGuire, Sidney Williams, Steve Williams, Steve Williams, Steve Robinson, Stephanie Robinson, Stephanie Robinson, Stephanie Jones, Walter Jones, Walter Jones, Walter Warren, Wayne Warren, Wayne COMPANY Broncos Air Express Cardinals Bookmakers Seahawks Data Services Forty-niners Venture Group Colts Nuts & Bolts, Inc. Browns Kennels Jets Overnight Express Patriots Computer Corp. 'Bills We Pay' Financial Corp. Giants Garments, Inc. Lions Motor Company Saints Software Support Oilers Gas and Light Co. Cowboys Data Services Steelers National Bank Redskins Outdoor Supply Co. Eagles Electronics Corp. Dolphins Diving School Falcons Microsystems, Inc. Bucs Data Services Raiders Development Co. Chargers Credit Corp. Rams Data Processing, Inc. Chiefs Management Corporation Bengels Imports Bears Market Trends, Inc. Vikings Athletic Equipment Packers Van Lines The common column between the two tables is sale_id so the above natural inner join example is equivalent to the following old style join: select sale_name, company from salesperson, customer where salesperson.sale_id = customer.sale_id; A natural left (right) outer join includes the results of the inner join plus those rows of the left (right) table that do not have a corresponding matching row in the joined table. This is illustrated below where the last example is changed from a natural inner join to a natural left outer join. SQL User Guide 71 7. Retrieving Data from a Database select sale_name, company from salesperson natural left outer join customer; SALE_NAME Kennedy, Bob Kennedy, Bob Flores, Bob Flores, Bob Stouffer, Bill Blades, Chris Lister, Dave Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Porter, Greg Nash, Gail Nash, Gail Nash, Gail Kirk, James McGuire, Sidney McGuire, Sidney McGuire, Sidney Williams, Steve Williams, Steve Williams, Steve Robinson, Stephanie Robinson, Stephanie Robinson, Stephanie Jones, Walter Jones, Walter Jones, Walter Warren, Wayne Warren, Wayne COMPANY Broncos Air Express Cardinals Bookmakers Seahawks Data Services Forty-niners Venture Group Colts Nuts & Bolts, Inc. *NULL* *NULL* Browns Kennels Jets Overnight Express Patriots Computer Corp. 'Bills We Pay' Financial Corp. Giants Garments, Inc. Lions Motor Company Saints Software Support Oilers Gas and Light Co. Cowboys Data Services *NULL* Steelers National Bank Redskins Outdoor Supply Co. Eagles Electronics Corp. Dolphins Diving School Falcons Microsystems, Inc. Bucs Data Services Raiders Development Co. Chargers Credit Corp. Rams Data Processing, Inc. Chiefs Management Corporation Bengels Imports Bears Market Trends, Inc. Vikings Athletic Equipment Packers Van Lines This statement is equivalent to the old style outer join: select sale_name, company from salesperson, customer where salesperson.sale_id *= customer.sale_id; An inner join is the default so that the specification of "natural join" produces a natural inner join. For outer joins, "outer" does not need to be specified. The following example requests a natural inner join between salesperson and customer and a natural left outer join between customer and sales_order. select sale_name, company, ord_num, ord_date, amount from salesperson natural join customer natural left join sales_order; SALE_NAME Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob SQL User Guide COMPANY Broncos Air Express Broncos Air Express Broncos Air Express Broncos Air Express Broncos Air Express Broncos Air Express Broncos Air Express Cardinals Bookmakers ORD_NUM 2207 2220 2237 2264 2282 2304 2321 2209 ORD_DATE 1997-01-06 1997-01-27 1997-02-12 1997-04-01 1997-04-14 1997-05-26 1997-06-24 1997-01-07 AMOUNT 274375 49980 103874.8 21950 21950 19995 6827.96 3715.83 72 7. Retrieving Data from a Database Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob Flores, Bob Flores, Bob Flores, Bob ... Jones, Walter Jones, Walter Jones, Walter Warren, Wayne Warren, Wayne Warren, Wayne Warren, Wayne Warren, Wayne Warren, Wayne Warren, Wayne Warren, Wayne Warren, Wayne Warren, Wayne Cardinals Bookmakers Cardinals Bookmakers Cardinals Bookmakers Cardinals Bookmakers Seahawks Data Services Seahawks Data Services Seahawks Data Services 2253 2269 2301 2313 2215 2225 2229 1997-03-16 1997-04-02 1997-05-14 1997-06-12 1997-01-17 1997-01-29 1997-02-04 143375 35119.46 16227.27 38955 16892 2987.5 8824.56 Bears Market Trends, Inc. Bears Market Trends, Inc. Bears Market Trends, Inc. Vikings Athletic Equipment Vikings Athletic Equipment Vikings Athletic Equipment Vikings Athletic Equipment Vikings Athletic Equipment Vikings Athletic Equipment Packers Van Lines Packers Van Lines Packers Van Lines Packers Van Lines 2249 2271 2295 2202 2223 2248 2266 2296 2315 2211 2235 2292 2327 1997-03-04 1997-04-03 1997-05-06 1997-01-02 1997-01-28 1997-02-28 1997-04-01 1997-05-07 1997-06-17 1997-01-10 1997-02-11 1997-04-30 1997-06-30 28570 49584.65 31580 25915.86 408 3073.54 5190.42 2790.99 12082.39 53634.12 8192.38 77247.5 24103.3 Natural joins will form the join based on equal values from all columns in the joined tables that have the same name. In the examples above, there is only one common column name between salesperson and customer, sale_id, and one between customer and sales_order, cust_id. The customer table in the sales database shares two common columns with the outlet table in the invntory database: city and state. In the next example, a natural join between the customer and outlet tables will produce those customers that are located in the same city and state where there is a distribution outlet. select company, city, state from customer natural join outlet; COMPANY Seahawks Data Services Raiders Development Co. Rams Data Processing, Inc. Chargers Credit Corp. Forty-niners Venture Group Cowboys Data Services Oilers Gas and Light Co. Patriots Computer Corp. Bears Market Trends, Inc. Chiefs Management Corporation Chiefs Management Corporation 'Bills We Pay' Financial Corp. Jets Overnight Express Falcons Microsystems, Inc. Vikings Athletic Equipment Broncos Air Express CITY Seattle Los Angeles Los Angeles San Diego San Francisco Dallas Houston Foxboro Chicago Kansas City Kansas City Buffalo New York Atlanta Minneapolis Denver STATE WA CA CA CA CA TX TX MA IL MO MO NY NY GA MN CO A qualified join is like a natural join except that it requires that the columns on which the join is to be formed be explicitly specified. Two specification methods are provided. With the using clause requires you name the common column names between the joined tables which are to be used to form the join. With the on clause you specify the join predicates as conditional expressions exactly as they would be specified in the where clause under the old style joins. The on clause is necessary whenever the join is to be performed between columns that do not have the same name. The using clause allows you to choose only the matching columns on which you want the join formed. So, for example, to list those customers located in the same state, but not necessarily the same city, as a distribution outlet you would use the following statement: SQL User Guide 73 7. Retrieving Data from a Database select company, city, state from customer inner join outlet using(state); COMPANY Seahawks Data Services Raiders Development Co. Rams Data Processing, Inc. Chargers Credit Corp. Forty-niners Venture Group Cowboys Data Services Oilers Gas and Light Co. Patriots Computer Corp. Bears Market Trends, Inc. Chiefs Management Corporation Chiefs Management Corporation 'Bills We Pay' Financial Corp. Jets Overnight Express Falcons Microsystems, Inc. Vikings Athletic Equipment Broncos Air Express CITY Seattle Los Angeles Los Angeles San Diego San Francisco Dallas Houston Foxboro Chicago Kansas City Kansas City Buffalo New York Atlanta Minneapolis Denver STATE WA CA CA CA CA TX TX MA IL MO MO NY NY GA MN CO It is usually a good database design principle for the columns on which different tables could be joined to have the same names. Doing so will greatly simplify the select statement join specifications. However, there are situations in which this is just not possible and a join is needed in which the columns on which the join is to be made cannot have the same name. One such situation occurs in a self-referencing join, a join that is performed on the same table. For example, the salesperson table’s primary key is sale_id but salesperson also contains a column named mgr_id that is a foreign key reference to the row in the salesperson table associated with that salesperson’s manager. The following example gives a select statement that lists all managers along with those sales persons that they manage. Note that correlation names must be specified for the two salesperson references in the from clause in order to differentiate the manager rows from the employee rows. select mgr.sale_name, emp.sale_name from salesperson mgr join salesperson emp on mgr.sale_id = emp.mgr_id; MGR.SALE_NAME Flores, Bob Flores, Bob Stouffer, Bill Stouffer, Bill Blades, Chris Blades, Chris Porter, Greg Porter, Greg EMP.SALE_NAME Robinson, Stephanie Kennedy, Bob Jones, Walter Warren, Wayne Nash, Gail Williams, Steve Wyman, Eliska McGuire, Sidney Parentheses are sometimes needed to be used to group joins when more than two tables are involved in the from clause. They are required when one table needs to be joined with two or more tables. For example, the statement below produces a list of product orders for those customers who are located in cities where an distribution outlet is also located. A natural join between the customer table and both the sales_order table (based on the cust_id column) and the outlet table (based on the city and state columns) will accomplish this. select company, city, prod_id, quantity from customer natural join (sales_order natural join item natural join outlet); COMPANY Seahawks Data Services SQL User Guide CITY Seattle PROD_ID 16311 QUANTITY 20 74 7. Retrieving Data from a Database Seahawks Data Services Seahawks Data Services Seahawks Data Services ... Broncos Air Express Broncos Air Express Broncos Air Express Broncos Air Express Broncos Air Express Seattle Seattle Seattle 18061 18121 16511 200 500 250 Denver Denver Denver Denver Denver 15340 17214 20303 23200 23400 2 2 2 2 2 By grouping the natural joins between sales_order, item, and outlet together with parentheses, the group is treated like a single table to which a natural join with customer is then formed. The common columns between customer and sales_order (cust_id), item (none), and outlet (city and state) becomes the basis on which the natural join is performed. There can be no duplicate common column names between the table (or tables) on the left side of a join and the table (or tables) on the right side of the join. A cross join is simply a cross product of the two tables where each row of the left table is joined with each row of the right table so that the cardinality of the result (i.e., the number of result rows) is equal to the product of the cardinalities of the two tables. An on clause cannot be specified with a cross join. However, there is nothing that restricts including join conditions in the where clause. In practice, there are very few times when a cross join is needed and since it can be a very expensive operation that can potentially produce huge result sets, its use should be avoided. 7.4 Sorting the Rows of the Result Set You can sort the result set produced by the select statement by using an order by clause that conforms to the following syntax. select: select [first | all | distinct] {* | select_item [, select_item]...} from table_ref [, table_ref]... [where cond_expr] [order by {number | colname} [asc | desc] [, {number | colname} [asc | desc]]...] The order by clause identifies the result set columns which are to be sorted and whether the column value is to be sorted in ascending or descending order. The sort columns are identified either by the ordinal number it appears in the select result column list beginning with 1 or by the name (or alias) of the column. For example, the statement shown below sorts the salesperson table in alphabetical order by salesperson name (sale_name column). select * from salesperson order by sale_name; SALE_ID CMB BNF WAJ BCK JTK DLL SKM GSN SQL User Guide SALE_NAME Blades, Chris Flores, Bob Jones, Walter Kennedy, Bob Kirk, James Lister, Dave McGuire, Sidney Nash, Gail DOB 1958-09-08 1943-07-17 1960-06-15 1956-10-29 2100-08-30 1999-08-30 1947-12-02 1954-10-20 COMMISSION 0.080 0.100 0.070 0.075 0.075 0.075 0.070 0.070 REGION 3 0 2 0 3 3 1 3 OFFICE SEA SEA CHI DEN ATL ATL WDC DAL MGR_ID *NULL* *NULL* BPS BNF *NULL* *NULL* GAP CMB 75 7. Retrieving Data from a Database GAP SWR BPS WWW SSW ERW Porter, Greg Robinson, Stephanie Stouffer, Bill Warren, Wayne Williams, Steve Wyman, Eliska 1949-03-03 1968-10-11 1952-11-21 1953-04-29 1944-08-30 1959-05-18 0.080 0.070 0.080 0.075 0.075 0.075 1 0 2 2 3 1 SEA LAX SEA MIN ATL NYC *NULL* BNF *NULL* BPS CMB GAP As noted above, you can specify columns listed in the order by clause by name or by number. The following lists the salesperson names and birth dates in birth date order. select sale_name, dob from salesperson order by 2; SALE_NAME Flores, Bob Porter, Greg Stouffer, Bill Kennedy, Bob Blades, Chris Robinson, Stephanie DOB 1943-07-17 1949-03-03 1952-11-21 1956-10-29 1958-09-08 1968-10-11 You can use the order by clause to sort on more than one column. Additionally, the clause can be used to specify whether each column is in ascending (the default) or descending order. In the following example, column 1 is the primary sort column. select commission, sale_name from salesperson order by 1 desc, 2 asc; COMMISSION 0.100 0.080 0.080 0.080 0.075 0.070 SALE_NAME Flores, Bob Blades, Chris Porter, Greg Stouffer, Bill Robinson, Stephanie Kennedy, Bob The query below returns the sale total for the sales orders entered on or after 6-1-1997 where the "amount+tax" result column is assigned alias sale_tot which is referenced in the order by clause. select ord_num, amount+tax sale_tot from sales_order where ord_date >= date "1997-06-01" order by sale_tot desc; ORD_NUM 2324 2310 2317 2313 2323 2308 2311 2319 2320 2318 2322 2327 2326 2325 SQL User Guide SALE_TOT 104019.5 51283.9700292969 49778.7600683594 38955 35582.5 32675.6899902344 32589.6000976563 31602.1500976562 27782 27239.1000976563 25231.98 24103.3 22887.96 21532.0899902344 76 7. Retrieving Data from a Database 2314 2309 2316 2312 2315 2321 2307 20780 17388.6600341797 16986.99 16598.0000048828 12940.2399755859 7251.28998657227 4487.76 As you can see, the select statement result columns can be computational as described in the next section. 7.5 Retrieving Computational Results Besides retrieving the values of individual columns, a select statement allows you to specify expressions that can perform arithmetic operations on the columns in a table. The normal arithmetic operators (+, -, *, /) along with a wide range of built-in functions can be included in a select column expression. The complete syntax for column expressions is given below. select: select [first | all | distinct] {* | select_item [, select_item]...} from table_ref [, table_ref]... [where cond_expr] [order by col_ref [asc | desc] [, col_ref [asc | desc]]...] select_item: {tabname | corrname}.* | expression} [identifier | "headingstring"] expression: arith_expr | string_expr arith_expr: arith_operand [arith_operator arith_operand]... arith_operand: constant | [tabname.]colname | arith_function | ( arith_expr) arith_operator: +|-|*|/ arith_function: numeric_function | datetime_function | system_function | user_defined_function string_expr: string_operand [^ string_operand] string_operand: "string" | [tabname.]colname | if ( cond_expr, string_expr, string_expr) | string_function | user_defined_function numeric_function: datetime_function: string_function: system_function: SQL User Guide See Table 7-3. See Table 7-4. See Table 7-5. See Table 7-6. 77 7. Retrieving Data from a Database 7.5.1 Simple Expressions The query example below shows the salespersons' orders with the largest earned commissions. The select statement computes the commission earned by multiplying the commission rate by the amount of the order. It accesses the name of a salesperson by using a three-table join and sorts the result in descending order by earned commission. select sale_name, ord_num, amount*commission from salesperson, customer, sales_order where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id order by 3 desc; SALE_NAME Kennedy, Bob Porter, Greg Wyman, Eliska Kennedy, Bob Robinson, Stephanie Kennedy, Bob Flores, Bob Nash, Gail Porter, Greg Porter, Greg Warren, Wayne Jones, Walter ORD_NUM 2207 2288 2205 2253 2234 2237 2284 2324 2250 2219 2292 2241 AMOUNT*COMMISSION 20578.125000 20194.000000 11315.340000 10753.125000 8726.200000 7790.610000 7431.516000 7281.365000 6594.468000 5922.792000 5793.562500 5762.050000 Note that, because column 3 contains an expression rather than a simple column name, the order by clause is needed in order to use the column number. You could also use a column alias, as shown in the equivalent query below. select sale_name, ord_num, amount*commission earned from salesperson, customer, sales_order where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id order by earned desc; SALE_NAME Kennedy, Bob Porter, Greg Wyman, Eliska ... ORD_NUM 2207 2288 2205 EARNED 20578.125000 20194.000000 11315.340000 In the next example, the select statement retrieves the amount that the company receives from each of the orders shown in the example above. The amount to the company is simply the order amount minus the commission. select sale_name, ord_num, amount-amount*commission "NET REVENUE" from salesperson, customer, sales_order where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id order by 3 desc; SALE_NAME Porter, Greg Kennedy, Bob Kennedy, Bob Robinson, Stephanie Kennedy, Bob SQL User Guide ORD_NUM 2288 2207 2253 2234 2237 NET REVENUE 232231.000451 255168.749918 133338.749957 115933.799963 96603.563969 78 7. Retrieving Data from a Database Porter, Greg Porter, Greg 2250 2219 75836.382147 68112.108132 Arithmetic operators that are specified in an expression are evaluated based on the precedence given in the following table. Table 7-2. Precedence of Arithmetic Operators Priority Operator Use Highest () Parenthetical expressions High + Unary plus High - Unary minus Medium * Multiplication Medium / Division Lowest + Addition Lowest - Subtraction 7.5.2 Built-in (Scalar) Functions RDM Server SQL provides many built-in functions that can be used in select statement expressions. Four classes of built-in functions are provided as noted in the select statement syntax shown above: numeric, datetime, system, and string functions. These functions are called scalar functions because for a given set of argument values they each return a single value. Functions described in the next section are called aggregate functions because they perform computations over a set (group) of rows. The built-in numeric functions provided in RDM Server SQL are described in the following table. Table 7-3. Built-in Numeric Functions Function Description abs(arith_expr) Returns the absolute value of an expression. acos(arith_expr) Returns the arccosine of an expression. asin(arith_expr) atan(arith_expr) atan2(arith_expr) {ceil | ceiling}(arith_ expr) cos(arith_expr) cot(arith_expr) exp(arith_expr) floor(arith_expr) {ln | log}(arith_expr) mod(arith_expr1, arith_ expr2) pi() rand(num) sign(arith_expr) sin(arith_expr) sqrt(arith_expr) tan(arith_expr) SQL User Guide Returns the arcsine of an expression. Returns the arctangent of an expression. Returns the arctangent of an x-y coordinate pair. Finds the upper bound for an expression. Returns the cosine of an angle. Returns the cotangent of an angle. Returns the value of an exponential function. Finds the lower bound for an expression. Returns the natural logarithm of an expression. Returns the remainder of arith_expr1/arith_expr2. Returns the value of pi. Returns next random floating-point number. Non-zero num is seed. Returns the sign of an expression (-1, 0, +1). Returns the sine of an angle. Returns the square root of an expression. Returns the tangent of an angle. 79 7. Retrieving Data from a Database The example below calls the floor function to truncate the cents portion from the amount column in the sales_order table for all orders made by Seahawks Data Services. select ord_num, ord_date, floor(amount) from sales_order where cust_id = "SEA"; ORD_NUM 2215 2225 2229 2258 2273 2311 ORD_DATE 1997-01-17 1997-01-29 1997-02-04 1997-03-23 1997-04-03 1997-06-05 FLOOR(AMOUNT) 16892 2987 8824 1365 650 30036 Table 7-4. Built-in Date/Time Functions Function Description age(dt_expr) Returns the age (in full years). {curdate | current_date}() Returns the current date. {curtime | current_time}() Returns the current time. current_timestamp() Returns the current date and time dayofmonth(dt_expr) Returns the day of the month. dayofweek(dt_expr) dayofyear(dt_expr) hour(dt_expr) minute(dt_expr) month(dt_expr) now() quarter(dt_expr) second(dt_expr) week(dt_expr) year(dt_expr) Returns the Returns the Returns the Returns the Returns the Returns the Returns the Returns the Returns the Returns the day of the week. day of the year. hour. minute. month. current date and time. quarter. second. week. year. The next query returns the age for each salesperson on April 19, 2012. As you can see, it is an experienced sales staff except for Dave Lister who is only 12 and James T. Kirk who will not be born for another 89 years! select sale_name, dob, curdate(), age(dob) from salesperson; SALE_NAME Flores, Bob Blades, Chris Porter, Greg Stouffer, Bill Kennedy, Bob Kirk, James Lister, Dave Warren, Wayne Williams, Steve Wyman, Eliska Jones, Walter McGuire, Sidney SQL User Guide DOB 1943-07-17 1958-09-08 1949-03-03 1952-11-21 1956-10-29 2100-08-30 1999-08-30 1953-04-29 1944-08-30 1959-05-18 1960-06-15 1947-12-02 CURDATE() 2012-04-19 2012-04-19 2012-04-19 2012-04-19 2012-04-19 2012-04-19 2012-04-19 2012-04-19 2012-04-19 2012-04-19 2012-04-19 2012-04-19 AGE(DOB) 68 53 63 59 55 -89 12 58 67 52 51 64 80 7. Retrieving Data from a Database Nash, Gail Robinson, Stephanie 1954-10-20 2012-04-19 1968-10-11 2012-04-19 57 43 Table 7-5. Built-in String Functions Function Description ascii(string_expr) Returns the numeric ASCII value of a character char(num) Returns the ASCII character with numeric value num concat(string_expr1, Concatenates two strings string_expr2) insert(string_expr1, Replace num2 chars from string_expr2 in string_expr1 beginning at position num1 (1st posnum1, num2, string_ ition is 1 not 0) expr2) lcase(string_expr) Converts a string to lowercase left(string_expr, num) length(string_expr) locate(string_expr1, string_expr2, num) ltrim(string_expr) Returns the leftmost num characters from the string Returns the length of the string Locate string_expr1 from position num in string_expr2 Removes all leading spaces from string repeat(string_expr, num) Repeats string num times replace(string_expr1, Replace string_expr2 with string_expr3 in string_expr1 string_expr2, string_ expr3) right(string_expr, num) Returns the rightmost num characters from string rtrim(string_expr) substring(string_expr1, num1, num2) ucase(string_expr) unicode(string_expr) wchar(num) Removes all trailing spaces from string Returns num2 characters from string_expr beginning at position num1. Convert string to uppercase Returns the numeric Unicode value of a character Returns a Unicode character with numeric value num. The next query displays the customer company names and their lengths with the longest listed first. select company, length(company) from customer order by 2 desc; COMPANY LENGTH(COMPANY) 'Bills We Pay' Financial Corp. 30 Chiefs Management Corporation 29 Redskins Outdoor Supply Co. 27 Falcons Microsystems, Inc. 26 Forty-niners Venture Group 26 Rams Data Processing, Inc. 26 ... Broncos Air Express 19 Lions Motor Company 19 Bucs Data Services 18 Packers Van Lines 17 Bengels Imports 15 Browns Kennels 14 The built-in system functions provided in RDM Server SQL are described in the table below. SQL User Guide 81 7. Retrieving Data from a Database Table 7-6. Built-in System Functions Function Description convert(expression, Converts expression result to specified data type. type) convert(expression, Converts expresions to string of no more than width characters according to the specified {char | wchar}, width, format. format) database() Returns string containing a comma-separated list of the currently opened databases. if(cond_expr, expression1, expression2) ifnull(expression1, expression2) user() Returns result of expression1 if cond_expr is true , otherwise expression2. Returns result of expression1 if not null, otherwise expression2. Returns user name as a string. One of the features of the RDM Server select statement is that you can use it as a simple calculator by not specifying a from (or any other) clause. For example, the user and database functions return values that do not derive from any particular database so the following select simply returns their current values. select user(), database(); USER() admin DATABASE() invntory,sales Use of the if and convert functions are described in detail in the next two sections. 7.5.3 Conditional Column Selection The conditional if function allows you to select an expression result based on a specified condition applied to each result row for a select statement. The if function syntax is as follows. if(cond_expr, expression, expression) For each row in which the conditional expression, cond_expr, evalutes is true, the function returns the result of evaluating the first expression. If the condition is false, the second expression is evaluated and its result is returned. The following example uses the if function to identify which customers are located "In-state" or "Out-of-state" where the state is the beautiful state of Washington located in the great Pacific Northwest of the USA! select company, if(state = "WA", "In-state", "Out-of-state") location from customer; COMPANY Cardinals Bookmakers Raiders Development Co. Rams Data Processing, Inc. Chargers Credit Corp. Forty-niners Venture Group Broncos Air Express Dolphins Diving School Bucs Data Services Falcons Microsystems, Inc. Bears Market Trends, Inc. Colts Nuts & Bolts, Inc. Saints Software Support SQL User Guide LOCATION Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state 82 7. Retrieving Data from a Database Patriots Computer Corp. Lions Motor Company Vikings Athletic Equipment Chiefs Management Corporation Giants Garments, Inc. 'Bills We Pay' Financial Corp. Jets Overnight Express Bengels Imports Browns Kennels Eagles Electronics Corp. Steelers National Bank Cowboys Data Services Oilers Gas and Light Co. Redskins Outdoor Supply Co. Seahawks Data Services Packers Van Lines Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state Out-of-state In-state Out-of-state 7.5.4 Formatting Column Expression Result Values The convert function listed in Table 7-6 can be used to do simple type conversions or sophisticated formatting of expression result values into a char or wchar string. The syntax for the convert function is given below. convert_function: convert(expression, type) | convert(expression, {char | wchar}, width, format) data_type: char | wchar | smallint | integer | real | double | date | time | timestamp | tinyint | bigint format_spec: numeric_format | datetime_format numeric_format: "[<< | >> | ><]['text' | $][- | (][#,]#[.#[#]...][e | E]['text' | $ | %]" datetime_format: "[<< | >> | ><]['text' | spchar | date_code | time_code]..." date_code: m | mm | mmm | mon | mmmm | month | d | dd | ddd | dddd | day | yy | yyyy time_code: h | hh | m | mm | s | ss | .f[f]... | [a/p | am/pm | A/P | AM/PM] The expression specifies the SQL expression to be converted. In the first convert function form, type specifies the data type to be returned. It must be a type for which a legal conversion can be performed. The second form of the convert function will convert the expression result into either a char or a wchar string. The maximum length of the result string is specified by width which must be an integer constant greater than 1. The result is formatted as specified by a format string that conforms to the syntax shown above. SQL User Guide 83 7. Retrieving Data from a Database The format specifier for numeric values is represented as shown in the table below. The minimum specifier that must be used for a numeric format is "#". If the display field width is too small to contain a numeric value, the convert function formats the value in exponential format (for example, 1.759263e08). Table 7-7. Numeric Format Specifiers Element Description [ << | >> | >< ] The justification specifier. You can specify left-justified text (<<), right-justified text (>>), or centered text (><). The default for numeric values is right-justified. [ 'text' | $ ] A text character or string to use as a prefix for the result string. You must enclose the character or text with single quotation marks unless the prefix is one dollar sign. A set currency statement will change the symbol that is accepted by convert for the $. [ - | ( ] The display specifier for negative values. You can show negative values with a minus sign or with parentheses around the value. If parentheses are used, positive values are shown with an ending space to ensure alignment of the decimal point. [#,]#[.#[#]...] The numeric format specifier. You can specify whether to show commas every third place before the decimal point. Also, you can specify how many digits (if any) to show after the decimal point. A set thousands or set decimal will change the symbol that is accepted by convert for the "," or the ".". [e | E] Whether to use exponential format to show numeric values. If this option is omitted, exponential format is used only when the value is too large or small to be shown otherwise. You can specify display of an lowercase or uppercase exponent indicator. ['text' | $ | %] A text character or string to use as a suffix for the result string. You must enclose the character or text with single quotation marks unless the suffix is one dollar or percent sign. A set currency statement will change the symbol that is accepted by convert for the $. The format specifier elements for date/time values are described in the next table. The date/time format specifier can contain any number of text items or special characters that are interspersed with the date or time codes. You can arrange these items in any order, but a time specifier must adhere to the ordering rules described in the syntax under "time_code". For the minute codes to be interpreted as minutes (and not months) they must follow the hour codes. You cannot specify the minutes of a time value without also specifying the hour. You can specify the hour by itself. Similarly, you cannot specify the seconds without having specified minutes and you cannot specify fractions of a second without specifying seconds. Thus, the order "hours, minutes, seconds, fractions" must be preserved. Table 7-8. Date and Time Format Specifiers Element General Formatting Elements Description [ << | >> | >< ] The justification specifier. You can specify left-justified text (<<), right-justified text (>>), or centered text (><). The default for numeric values is right-justified. [ 'text' | spchar ] A string or a special character (for example, "-", "/", or ".") to be copied into the result string. The special character is often useful in separating the entities within a date and time. Element Date-Specific Formatting Elements Description m Month number (1-12) without a leading zero. mm Month number with a leading zero. mmm Three-character month abbreviation (e.g., "Jan"). mon Same as mmm. mmmm Fully spelled month name (e.g., "January"). month Same as mmmm. SQL User Guide 84 7. Retrieving Data from a Database Element General Formatting Elements Description d Day of month (1-31) without leading zero. dd Day of month with leading zero. ddd Three character day of week abbreviation (e.g., "Wed"). dddd Fully spelled day of week (e.g., "Wednesday"). day Same as dddd. yy Two-digit year AD with leading zero if year between 1950 and 2049; otherwise same as yyyy. yyyy Year AD up to four digits without leading zero. Element Time-Specific Formatting Elements Description h Hour of day (0-12 or 23) without leading zero. hh Hour of day with leading zero. m Minute of hour (0-59) without leading zero (only after h or hh). mm Minute of hour with leading zero (only after h or hh). s Second of minute (0-59) without leading zero (only after m or mm). ss Second of minute with leading zero (only after m or mm). .f[f]... Fraction of a second: four decimal place accuracy (only after s or ss). a/p | am/pm | A/P | Hour of day is 0-12; AM or PM indicator will be output to result string (only after last time code eleAM/PM ment). The following examples show numeric format specifiers and their results. Function convert(14773.1234, char, 10, "#.#") convert(736620.3795, char, 12, "#,#.###") convert(736620.3795, char, 12, "$#,#.##") convert(736620.3795, char, 13, "<<#.######e") convert(56.75, char, 10, "#.##%") convert(56.75, char, 18, "#.##' percent'") Result " 14773.1" "736,620.380" "$736,620.38" "7.366204e+005" " 56.75%" " 56.75 percent" The examples below show date/time format specifiers and corresponding results. These examples show how the constant "timestamp "1951-10-23 04:40:35" can be returned. The format specifier, rather than the entire function, is shown here in the left column. Format Spec. mmm dd, yyyy hh' hours on' ddd month dd, yyyy dd 'of' month 'of the year' yyyy dddd hh.mm.ss.ffff mm-dd-yyyy 'date:'yyyy.mm.dd 'at' hh:mm A/P Result Oct 23, 1951 04 hours on Tue October 23, 1951 23 of October of the year 1951 Tuesday 04.42.27.1750 10-23-1951 date:1951.10.23 at 04:42 A 7.6 Performing Aggregate (Grouped) Calculations All of the select statements shown thus far have produced detail rows where each row of the result set corresponds to a single row from the table (a base table or table formed from the set of joined tables in the from clause). There are often times when SQL User Guide 85 7. Retrieving Data from a Database you want to perform a calculation on one or more columns from a related set of rows returning only a summary row that includes the calculation result. The set of rows over which the calculations are performed is called the aggregate. The select statement group by clause is used to identify the column or columns that define each aggregate—those rows that have identical group by column values. The syntax for the select statement including group by is as follows. select: select [first | all | distinct] {* | select_item [, select_item]...} from table_ref [, table_ref]... [where cond_expr] [group by col_ref [, col_ref]... [having cond_expr]] [order by col_ref [asc | desc] [, col_ref [asc | desc]]...] select_item: {tabname | corrname}.* | expression} [identifier | "headingstring"] table_ref: table_primary | table_join table_primary: table_name_spec | ( table_join ) table_name_spec: [dbname.]tabname [[as] corrname] table_join: natural_join | qualified_join | cross_join natural_join: table_ref natural [inner | {left | right} [outer]] join table_primary qualified_join: table_ref [inner | {left | right} [outer]] join table_primary {using (colname[, colname]... ) on cond_expr} cross_join: table_ref cross join table_primary cond_expr: rel_expr [bool_oper rel_expr]... rel_expr: | | | | | | | | | expression [not] rel_oper {expression | [{any | some} | all] (subquery)} expression [not] between constant and constant expression [not] in {(constant[, constant]...) | (subquery)} [tabname.]colname is [not] null string_expr [not] like "pattern" not rel_expr ( cond_expr ) [not] exists (subquery) [tabname.]colname *= [tabname.]colname [tabname.]colname =* [tabname.]colname subquery: select {* | expression} from {table_list | path_spec} [where cond_expr] expression: arith_expr | string_expr arith_expr: arith_operand [arith_operator arith_operand]... SQL User Guide 86 7. Retrieving Data from a Database arith_operand: constant | [tabname.]colname | arith_function | ( arith_expr) arith_operator: +|-|*|/ arith_function: {sum | avg | max | min} (arith_expr) | count ({* | [tabname.]colname}) | if ( cond_expr, arith_expr, arith_expr) | numeric_function | datetime_function | system_function | user_defined_function string_expr: string_operand [^ string_operand] string_operand: "string" | [tabname.]colname | if ( cond_expr, string_expr, string_expr) | string_function | user_defined_function The five built-in aggregate functions shown in the arith_function syntax rule above are defined in the table below. Table 6-9. Built-in Aggregate Function Descriptions Function Description count( [distinct] {* | Returns the number (distinct) of rows in the aggregate. [tabname.]colname} ) sum( [distinct] expres- Returns the sum of the (distinct) values of expression in the aggregate. sion ) avg( [distinct] expres- Returns the average of the (distinct) values of expression in the aggregate. sion ) min( expression ) Returns the minimum expression value in the aggregate. max( expression ) Returns the maximum expression value in the aggregate. The following example shows how grouped calculations are used to formulate a select statement that produces the year-todate earnings for each salesperson. All orders for each salesperson are summarized, the total amount of all orders is computed, and the total commissions are calculated. select sale_name, sum(amount), sum(amount*commission) from salesperson, customer, sales_order where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id group by sale_name; SALE_NAME Flores, Bob Jones, Walter Kennedy, Bob McGuire, Sidney Nash, Gail Porter, Greg Robinson, Stephanie Stouffer, Bill Warren, Wayne SQL User Guide SUM(AMOUNT) 173102.02 422560.55 736345.32 208432.11 306807.26 439346.5 374904.47 29053.3 212638.5 SUM(AMOUNT*COMMISSION) 17310.202 29579.2385 55225.899 14590.2477 21476.5082 35147.72 26243.3129 2324.264 15947.8875 87 7. Retrieving Data from a Database Williams, Steve Wyman, Eliska 247179.99 566817.01 18538.49925 42511.27575 The set display statement lets you specify a default display format a data type. The example below uses the set display statement to specify a two decimal place, fixed-point format for all double and real type columns. set double display(14, "#,#.##"); set real display(14,"#,#.##"); Re-executing the previous query now produces the following results. SALE_NAME Flores, Bob Jones, Walter Kennedy, Bob McGuire, Sidney Nash, Gail Porter, Greg Robinson, Stephanie Stouffer, Bill Warren, Wayne Williams, Steve Wyman, Eliska SUM(AMOUNT) SUM(AMOUNT*COMMISSION) 173,102.02 17,310.20 422,560.55 29,579.24 736,345.32 55,225.90 208,432.11 14,590.25 306,807.26 21,476.51 439,346.50 35,147.72 374,904.47 26,243.31 29,053.30 2,324.26 212,638.50 15,947.89 247,179.99 18,538.50 566,817.01 42,511.28 Most of the remaining examples use the above specified formats (the amount column is double and the tax column is real in the example databases). Figure 7-1 illustrates the retrieved aggregate function results. The sales orders for Bob Flores are totaled in the amount and amount*commission columns. Figure 7-1. Group By Calculations If the group by clause is omitted, calculations are performed on all rows as a single aggregate producing a single summary result row. The following example illustrates a select statement that calls the count, min, max, and avg aggregate functions without the group by clause. The statement retrieves the total number of sales orders, along with the minimum, maximum, and average order amounts. SQL User Guide 88 7. Retrieving Data from a Database select count(*), min(amount), max(amount), avg(amount) from sales_order; COUNT(*) 127 MIN(AMOUNT) 68.750000 MAX(AMOUNT) 274375.000000 AVG(AMOUNT) 29269.189213 The next example illustrates use of the sum function. The function computes total year-to-date sales for all salespersons in the sales database. select sum(amount) from sales_order; SUM(AMOUNT) 3,717,187.03 The count function is used to calculate the number of detail rows from which the aggregate is comprised. The next query shows the number of orders placed by each salesperson sorted by the number of orders (most listed first). select sale_name, count(ord_num) from salesperson, customer, sales_order where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id group by 1 order by 2 desc; SALE_NAME Wyman, Eliska Jones, Walter Robinson, Stephanie Kennedy, Bob McGuire, Sidney Warren, Wayne Flores, Bob Nash, Gail Williams, Steve Stouffer, Bill Porter, Greg COUNT(ORD_NUM) 24 15 15 12 11 10 9 9 9 8 5 The argument in count can be any of the column names. Or, since any column you choose will give the same result, you can simply write "count(*)". A special form of the count function can also retrieve the total number of rows in a table, as shown below. select count(*) from on_hand; COUNT(*) 744 The result returned by "select count (*) from tablename" may include uncommitted records. However, if the select query contains additional columns or clauses, the returned result set will not include uncommitted records. RDM Server SQL maintains on-line statistics that include the total number of rows per table, allowing the above query to return the result instantly. However, if you did not specify "count(*)" but included a column in this query (as shown below), RDM Server scans the entire table counting each row, using much more time for the query. SQL User Guide 89 7. Retrieving Data from a Database select count(quantity) from on_hand; COUNT(quantity) 744 If you do not want duplicates included in aggregate calculations, you can specify distinct in an avg, count, or sum function. Use of distinct is shown in the following query, which retrieves both the total number of items and the total number of distinct products sold by each salesperson. select sale_name, count(prod_id), count(distinct prod_id) from salesperson, customer, sales_order, item where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id and sales_order.ord_num = item.ord_num group by 1; SALE_NAME Flores, Bob Jones, Walter Kennedy, Bob McGuire, Sidney Nash, Gail Porter, Greg Robinson, Stephanie Stouffer, Bill Warren, Wayne Williams, Steve Wyman, Eliska COUNT(PROD_ID) 2 62 40 41 20 17 67 19 59 25 79 COUNT(DISTINCT PROD_ID) 24 29 27 27 16 14 34 15 38 17 41 SQL provides the having clause to restrict result rows based on aggregate functions. The next example uses the having clause to limit the result set to only those companies with more than five orders for the year. select company, count(ord_num), sum(amount) from customer natural join sales_order group by company having count(ord_num) > 5; COMPANY Broncos Air Express Browns Kennels Colts Nuts & Bolts, Inc. Patriots Computer Corp. Rams Data Processing, Inc. Seahawks Data Services Vikings Athletic Equipment COUNT(ORD_NUM) 7 7 8 6 8 6 6 SUM(AMOUNT) $498,952.76 $43,284.54 $29,053.30 $120,184.69 $172,936.31 $60,756.36 $49,461.20 Note that your application cannot use a where clause in place of the having clause. The where clause restricts detail rows before they affect the summary calculations, while the having clause restricts aggregate result rows after the calculations are performed. Consider the following query. select sale_name, sum(amount) from salesperson, customer, sales_order where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id and sale_id in ("BNF","GAP") and ord_date between date "1997-06-01" and date "1997-06-30" SQL User Guide 90 7. Retrieving Data from a Database group by 1 having sum(amount) > 50000.0; Figure 7-2 shows the different application of the where and having clauses during the processing of the above query. Figure 7-2. Use of Where and Having Clauses The following example uses the ucase and substring functions to retrieve all customers who have a customer identifier (cust_ id) equal to the first three characters of the company name. select cust_id, company from customer where cust_id = ucase(substring(company,1,3)); CUST_ID SEA SQL User Guide COMPANY Seahawks Data Services 91 7. Retrieving Data from a Database 7.7 String Expressions The concat function is used to concatenate strings. It can be called from the select statement as shown in the following example. select sale_name, concat(city, concat(", ", concat(state, concat(" ", zip)))) locality from accounts; SALE_NAME Flores, Bob Flores, Bob Porter, Greg Stouffer, Bill Robinson, Stephanie Robinson, Stephanie Robinson, Stephanie Kennedy, Bob Kennedy, Bob LOCALITY Seattle, WA 98121 San Francisco, CA 94127 Detroit, MI 48243 Baltimore, MD 46219 Los Angeles, CA 92717 San Diego, CA 92126 Los Angeles, CA 90075 Denver, CO 80239 Phoenix, AZ 85021 In the previous example, the concat function requires several recursive calls to construct the customer locality string. Alternatively, your application can use the string concatenation operator ^ as shown below. select sale_name, city ^ ", " ^ state ^ " " ^ zip from accounts; The query example below uses the select statement with the dayofweek scalar function, which retrieves the day of the week (for example, 1 = Sunday, 7 = Saturday). It retrieves the distribution of sales orders, from our sales database, based on the day of the week when the orders were placed. select dayofweek(ord_date), count(*) from sales_order group by 1; DAYOFWEEK(ORD_DATE) 2 3 4 5 6 COUNT(*) 22 25 25 29 26 In the next example, a select statement calls the month scalar function, which retrieves the month number from the order date. The sum aggregate function computes the sales totals for each month. select month(ord_date), sum(amount) from sales_order group by 1; MONTH(ORD_DATE) 1 2 3 4 5 6 SUM(AMOUNT) $969,467.02 $529,401.19 $415,894.50 $953,985.82 $249,299.81 $599,138.69 You can use the convert scalar function to change an expression to a character string according to a specified format. Using this function overrides the default display format. SQL User Guide 92 7. Retrieving Data from a Database In the next example, the select statement uses convert to compute the total amount of all orders and the total commissions for salespersons in the example sales database. Note that identifiers (for example, total_amt) are used to rename the columns. select sale_name, convert(sum(amount), char, 12, "$#,#.##") total_amt, convert(sum(amount*commission), char, 12, "$#,#.##") total_comm, from acct_sale group by sale_name; SALE_NAME Flores, Bob Kennedy, Bob Porter, Greg Robinson, Stephanie Stouffer, Bill TOTAL_AMT $173,102.02 $736,345.32 $439,346.50 $374,904.47 $29,053.30 TOTAL_COMM $17,310.20 $51,544.17 $35,147.72 $26,243.31 $2,324.26 The application can use the convert function to format a result so that it is easier for users to read. In the following example, the "dddd" format indicates that the full spelling of the day is to be retrieved. select convert(ord_date, char, 10, "dddd") "DAY OF ORDER", count(*) from sales_order group by 1; DAY OF ORDER Monday Tuesday Wednesday Thursday Friday COUNT(*) 22 25 25 29 26 7.8 Nested Queries (Subqueries) Subqueries allow SQL statements to restrict where clause results based on the evaluated result of a select statement nested within the SQL statement. Using its nested query capability, a single SQL select statement can perform a task that may take many statements in procedural programming languages such as C. Subqueries are specified as a where clause relational expression as defined by the syntax below. rel_expr: | | | | expression [not] rel_oper {expression | [{any | some} | all] (subquery)} expression [not] in {(constant[, constant]...) | (subquery)} not rel_expr ( cond_expr ) [not] exists (subquery) subquery: select {* | expression} from {table_list | path_spec} [where cond_expr] rel_oper: | | | | | = | == < > <= >= <> | != | /= RDM Server SQL can evaluate the following subquery classes. SQL User Guide 93 7. Retrieving Data from a Database l Simple, single-value subquery l Multi-value subquery l Complex, correlated subquery l Existence check subquery Each of these types of subqueries are described in the following sections. 7.8.1 Single-Value Subqueries A single-value subquery is the simplest and most often used subquery. This subquery retrieves a single value (often computed from an aggregate function). A single value subquery has the following form: select ... from ... where expression rel_oper (select expression from ...) The subquery's select statement must return only one row. The following example shows the use of a single value subquery in a select statement that retrieves customer orders with order amounts larger than the average sales order. The subquery itself retrieves the average sales order amount. select company, amount from customer, sales_order where customer.cust_id = sales_order.cust_id and amount > (select avg(amount) from sales_order); COMPANY Falcons Microsystems, Inc. Falcons Microsystems, Inc. 'Bills We Pay' Financial Corp. 'Bills We Pay' Financial Corp. Bears Market Trends, Inc. Bears Market Trends, Inc. . . . Eagles Electronics Corp. Eagles Electronics Corp. Cardinals Bookmakers Cardinals Bookmakers Cardinals Bookmakers Seahawks Data Services Forty-niners Venture Group Bucs Data Services Bucs Data Services Redskins Outdoor Supply Co. AMOUNT 62,340.00 38,750.00 150,871.20 46,091.44 46,740.00 49,584.65 37,408.52 47,370.00 143,375.00 35,119.46 38,955.00 30,036.50 74,315.16 39,675.95 35,582.50 47,309.94 You can nest subqueries within other subqueries. The next example uses two nested subqueries to retrieve the orders (by salesperson) larger than the largest order, closed after the date the final order closed from New Jersey. select sale_name, amount, ord_date from salesperson, customer, sales_order where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id and amount > (select avg(amount) from sales_order where ord_date > (select max(ord_date) from sales_order, customer where state = "NJ" and sales_order.cust_id = customer.cust_id)); SALE_NAME Kennedy, Bob SQL User Guide AMOUNT ORD_DATE 274,375.00 1997-01-06 94 7. Retrieving Data from a Database Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob Flores, Bob Flores, Bob Wyman, Eliska Wyman, Eliska Wyman, Eliska . . . Jones, Walter Jones, Walter Jones, Walter Jones, Walter Warren, Wayne Warren, Wayne 49,980.00 103,874.80 143,375.00 35,119.46 38,955.00 30,036.50 74,315.16 32,925.00 54,875.00 66,341.50 1997-01-27 1997-02-12 1997-03-16 1997-04-02 1997-06-12 1997-06-05 1997-04-20 1997-02-06 1997-04-02 1997-04-13 46,740.00 28,570.00 49,584.65 31,580.00 53,634.12 77,247.50 1997-01-02 1997-03-04 1997-04-03 1997-05-06 1997-01-10 1997-04-30 7.8.2 Multi-Valued Subqueries A multi-value subquery retrieves more than one value and has two forms of syntax, as shown below. select ... from ... where expression rel_oper {{any | some} | all} (select expression from ...) or select ... from ... where expression [not] in (select expression from ...) The any or some qualifier (they are synonyms) indicates that the relational operation is true if there is at least one row from the subquery's result set for which it is true. The all qualifier indicates that the relational operation is true only when it is true for every row from the subquery's result set. The in ( subquery ) relational operation is true if there is one row from the subquery result set that is equal to the value of the left-side expression. If not in is specified the relational operation is true when the value of the left-side expression does not equal any of the subquery result row values. Note that, where expression in (select expression from ...) is the same as where expression = some (select expression from ...) For example, the following query uses a subquery to retrieve the customer orders with amounts larger than all orders booked in May. select company, ord_num, ord_date, amount from customer, sales_order where customer.cust_id = sales_order.cust_id and amount > all (select amount from sales_order where ord_date between date "1997-05-01" and date "1997-05-31"); COMPANY ORD_NUM ORD_DATE Falcons Microsystems, Inc. 2230 1997-02-04 'Bills We Pay' Financial Corp. 2205 1997-01-03 'Bills We Pay' Financial Corp. 2317 1997-06-18 Bears Market Trends, Inc. 2201 1997-01-02 SQL User Guide AMOUNT 62,340.00 150,871.20 46,091.44 46,740.00 95 7. Retrieving Data from a Database Bears Market Trends, Inc. Bengels Imports Broncos Air Express Broncos Air Express Broncos Air Express Lions Motor Company Lions Motor Company Lions Motor Company Packers Van Lines Packers Van Lines Oilers Gas and Light Co. Chiefs Management Corporation Raiders Development Co. Patriots Computer Corp. Saints Software Support Saints Software Support Jets Overnight Express Eagles Electronics Corp. Cardinals Bookmakers Forty-niners Venture Group Redskins Outdoor Supply Co. 2271 2257 2207 2220 2237 2219 2250 2288 2211 2292 2226 2241 2234 2281 2218 2324 2270 2290 2253 2284 2310 1997-04-03 1997-03-23 1997-01-06 1997-01-27 1997-02-12 1997-01-27 1997-03-06 1997-04-24 1997-01-10 1997-04-30 1997-01-30 1997-02-21 1997-02-10 1997-04-13 1997-01-24 1997-06-30 1997-04-02 1997-04-29 1997-03-16 1997-04-20 1997-06-04 49,584.65 62,340.00 274,375.00 49,980.00 103,874.80 74,034.90 82,430.85 252,425.00 53,634.12 77,247.50 54,875.00 82,315.00 124,660.00 66,341.50 81,375.00 104,019.50 54,875.00 47,370.00 143,375.00 74,315.16 47,309.94 The next example demonstrates two ways to use multi-value subqueries. The subqueries show customers who are located in states that also have a sales office. select company, city, state from customer where state = any (select state from outlet); .. or .. select company, city, state from customer where state in (select state from outlet); COMPANY Raiders Development Co. Rams Data Processing, Inc. Chargers Credit Corp. Forty-niners Venture Group Broncos Air Express Falcons Microsystems, Inc. Bears Market Trends, Inc. Patriots Computer Corp. Vikings Athletic Equipment Chiefs Management Corporation 'Bills We Pay' Financial Corp. Jets Overnight Express Cowboys Data Services Oilers Gas and Light Co. Seahawks Data Services CITY Los Angeles Los Angeles San Diego San Francisco Denver Atlanta Chicago Foxboro Minneapolis Kansas City Buffalo New York Dallas Houston Seattle STATE CA CA CA CA CO GA IL MA MN MO NY NY TX TX WA The following example illustrates a select statement using the first form of the multi-value subquery to retrieve companies located in states without a sales office. select company, city, state from customer where state <> all (select state from outlet); SQL User Guide 96 7. Retrieving Data from a Database COMPANY Cardinals Bookmakers Dolphins Diving School Bucs Data Services Colts Nuts & Bolts, Inc. Saints Software Support Lions Motor Company Giants Garments, Inc. Bengels Imports Browns Kennels Eagles Electronics Corp. Steelers National Bank Redskins Outdoor Supply Co. Packers Van Lines CITY Phoenix Miami Tampa Baltimore New Orleans Detroit Jersey City Cincinnati Cleveland Philadelphia Pittsburgh Arlington Green Bay STATE AZ FL FL IN LA MI NJ OH OH PA PA VA WI 7.8.3 Correlated Subqueries A correlated subquery is one that refers to a column from the outer query, called an outer reference. RDM Server SQL performs a correlated subquery by executing the inner query for each row of the outer query. Processing a subquery of this type can take some time. An alternative is to create temporary tables and indexes that are then joined using the select statement to retrieve the desired information. The following is an example of a correlated subquery used to retrieve the customers who are located in cities that also have an outlet. Note that the inner query references the state column from the outer query by including the table name shown in the outer query. select company, city, state from customer where city in (select city from outlet where outlet.state = customer.state); COMPANY Raiders Development Co. Rams Data Processing, Inc. Broncos Air Express Falcons Microsystems, Inc. Bears Market Trends, Inc. Vikings Athletic Equipment Chiefs Management Corporation Jets Overnight Express Cowboys Data Services Seahawks Data Services CITY Los Angeles Los Angeles Denver Atlanta Chicago Minneapolis Kansas City New York Dallas Seattle STATE CA CA CO GA IL MN MO NY TX WA The query below retrieves the average sales order amounts for each sales manager's department. select mgr_id, avg(amount) from salesperson join customer using(sale_id) natural join sales_order where mgr_id is not null group by 1; MGR_ID BNF BPS CMB GAP SQL User Guide AVG(AMOUNT) 41,157.40 25,407.96 30,777.07 22,149.97 97 7. Retrieving Data from a Database The next example retrieves salespersons' order amounts greater than the average order amount for the department. You can compare the amounts in the result set with the averages shown above to confirm that the query returned the correct results. Also note the use of extended join syntax in the from clause. select sale_name, mgr_id, ord_num, amount from salesperson sp1 join customer using(sale_id) natural join sales_order where mgr_id is not null and amount > (select avg(amount) from salesperson sp2 join customer using(sale_id) natural join sales_order where sp2.mgr_id = sp1.mgr_id); SALE_NAME Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska . . . Jones, Walter Jones, Walter Jones, Walter Jones, Walter Jones, Walter Jones, Walter Warren, Wayne Warren, Wayne Warren, Wayne MGR_ID ORD_NUM BNF 2207 BNF 2220 BNF 2237 BNF 2253 GAP 2231 GAP 2270 GAP 2306 GAP 2281 GAP 2205 GAP 2259 GAP 2317 BPS BPS BPS BPS BPS BPS BPS BPS BPS 2241 2257 2201 2249 2271 2295 2202 2211 2292 AMOUNT 274,375.00 49,980.00 103,874.80 143,375.00 32,925.00 54,875.00 25,002.78 66,341.50 150,871.20 24,990.00 46,091.44 82,315.00 62,340.00 46,740.00 28,570.00 49,584.65 31,580.00 25,915.86 53,634.12 77,247.50 Since both queries reference different occurrences of the same table, correlation names must be specified (for example, "sp1" and "sp2") for each separate salesperson table. This is necessary in order for SQL to determine which of the two salesperson tables (the inner or outer) the mgr_id refers to. 7.8.4 Existence Check Subqueries A subquery can also be used to simply check whether a select statement retrieves any row at all. The format of the existence check subquery is as follows: select ... from ... where [not] exists (select * from ...) The existence check subquery does not retrieve any result set; it just returns true if the subquery retrieves at least one row, and false otherwise. The following example uses a correlated existence check subquery to return the list of outlets that are warehouses only and not a sales office. select * from outlet where not exists (select * from salesperson where office = loc_id); LOC_ID CITY BOS Boston SQL User Guide STATE MA REGION 1 98 7. Retrieving Data from a Database KCM STL Kansas City St. Louis MO MO 2 2 7.9 Using Temporary Tables to Hold Intermediate Results It is sometimes just not possible to formulate a single select statement to perform a complex query. At those times, the complex query can sometimes be broken into separate, simpler queries in which intermediate results from those simpler queries can be stored in temporary tables to be joined together in the final query to produce the originally desired results. RDM Server provides the create temporary table statement just for this purpose with the following syntax. create_temporary_table: create temporary table tabname (temp_col_defn [, temp_col_defn]...) temp_col_defn: colname type_spec [default {constant | null | auto}] The tabname is a case-insensitive identifier that can be any name except for that of another temporary table already defined in the same connection. The table is comprised of the specified columns which can be declared to be any standard RDM Server SQL data type. The default clause can be used to specify a default value for the table. You can use the create index statement to create an index on a temporary table. A commit statement must be issued after the create temporary table and the create index statements associated with it before you can use the temporary table. You can use initialize table to re-initialize the table to contain other intermediate results (this is much fast than delete from tabname). Temporary tables are visible only to the connection that creates them. They exist until the connection is terminated. Also, you must have at least one database open in order to create a temporary table. Temporary tables can be used as an alternative to use of a correlated subquery where the performance penalty incurred by the subquery it too great. So, while that is not really the issue with the following query, the example below shows how this can be done. Suppose you want a list of the salespersons' order amounts greater than the average order amount for the department (this example was given earlier in section 7.9.3). You could solve this by first storing the department averages in a temporary table indexed on the mgr_id and then just do a join between the salesperson and that temporary table in order to get the desired list. The following SQL script shows how to do this. set double display as (12,"#,#.##"); // produces cleaner output create temp table mgravg(mgr_id char(3), avgsale double); create index mgravgid on mgravg(mgr_id); commit; insert into mgravg select mgr_id, avg(amount) from salesperson join customer using(sale_id) natural join sales_order where mgr_id is not null group by 1; select * from mgravg; MGR_ID BNF BPS CMB GAP SQL User Guide AVGSALE 41,157.40 25,407.96 30,777.07 22,149.97 99 7. Retrieving Data from a Database select sale_name, mgr_id, ord_num, amount from salesperson s, mgravg m, customer c, sales_order o where s.mgr_id = m.mgr_id and s.sale_id = c.sale_id and c.cust_id = o.cust_id and amount > avgsale; SALE_NAME Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob Wyman, Eliska Wyman, Eliska Wyman, Eliska Wyman, Eliska . . . Jones, Walter Jones, Walter Jones, Walter Jones, Walter Warren, Wayne Warren, Wayne Warren, Wayne MGR_ID ORD_NUM BNF 2207 BNF 2220 BNF 2237 BNF 2253 GAP 2231 GAP 2270 GAP 2306 GAP 2281 BPS BPS BPS BPS BPS BPS BPS 2201 2249 2271 2295 2202 2211 2292 AMOUNT 274,375.00 49,980.00 103,874.80 143,375.00 32,925.00 54,875.00 25,002.78 66,341.50 46,740.00 28,570.00 49,584.65 31,580.00 25,915.86 53,634.12 77,247.50 7.10 Other Select Statement Features There are a few other select statement features that need to be described. These are shown in the select statement syntax grammar below. select: select [first | all | distinct] {* | select_item [, select_item]...} from table_ref [, table_ref]... [where cond_expr] [with exclusive lock] [group by col_ref [, col_ref]... [having cond_expr]] [order by col_ref [asc | desc] [, col_ref [asc | desc]]...] select_item: {tabname | corrname}.* | expression} [identifier | "headingstring"] You can indicate that a select statement is to return just the first row of the result set, all rows of the result set (which is the default), or only the distinct result set rows in which duplicate rows have been eliminated. Some examples are shown below. select first * from salesperson; SALE_ID SALE_NAME BCK Kennedy, Bob DOB COMMISSION REGION 1956-10-29 0.075 0 SALES_TOT OFFICE MGR_ID 736,345.32 DEN BNF The next query returns the list of salespersons who have a least one customer account. Try it without the distinct and see what you get. select distinct sale_name from salesperson join customer using(sale_id); SQL User Guide 100 7. Retrieving Data from a Database SALE_NAME Kennedy, Bob Flores, Bob Stouffer, Bill Wyman, Eliska Porter, Greg Nash, Gail McGuire, Sidney Williams, Steve Robinson, Stephanie Jones, Walter Note that select distinct usually requires SQL to sort the rows of the result set. This can be expensive for large result sets so make sure that you really need the distinct rows before using this feature. Like a select * from tabname you can specify tabname.* to have SQL include all of the columns declared in tabname in the select column list. This is useful when more than on table is listed in the from clause. For example, the following select displays all of the note table entries made by each salesperson. select sale_name, note.* from salesperson join note using(sale_id); SALE_NAME Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob Kennedy, Bob . . . Warren, Wayne Warren, Wayne Warren, Wayne Warren, Wayne Warren, Wayne Warren, Wayne Warren, Wayne Warren, Wayne NOTE.NOTE_ID FOLLCALL1 FOLLCALL1 FOLLCALL1 FOLLCALL1 FOLLCALL1 FOLLCALL1 NOTE.NOTE_DATE 1996-12-27 1997-02-06 1997-03-05 1997-03-18 1997-04-03 1997-05-08 NOTE.SALE_ID BCK BCK BCK BCK BCK BCK NOTE.CUST_ID DEN DEN PHO PHO DEN PHO INITMEET INITMEET QUOTE1 QUOTE1 SALESLIT1 SALESLIT1 SALESLIT1 SALESLIT2 1996-11-11 1996-12-30 1996-12-09 1997-04-08 1997-03-10 1997-04-01 1997-05-01 1997-03-23 WWW WWW WWW WWW WWW WWW WWW WWW GBP MIN GBP GBP MIN GBP MIN MIN The with exclusive lock will cause SQL to place a write-lock (as opposed to its usual read-lock) on all of the select statement result rows. It is not usually a good idea to do this but there can be certain processing requirements where exclusive access is needed for all of the access rows even where only some may actually end up being changed. 7.11 Unions of Two or More Select Statements Situations sometimes exist where needed information is stored in different tables or databases. It may be the case that data has not been normalized so that redundant data co-resides in those separate tables or databases and the easiest way to access that information is to submit separate queries on each table/database. Ideally, one wants to have the result sets from those separate queries grouped into a single result set. This can be done through use of temporary tables and using the insert from select statement to run the results of each query into a single table. But a much easier method is by using the union operator to have SQL do that work for you. This section describes the use of the union operator to combine the results of separate select statements into a single result set. SQL User Guide 101 7. Retrieving Data from a Database 7.11.1 Specifying Unions The result sets of two or more similar select statements can be combined into a single result set through use of the union operator. The syntax for the union of multiple select statements is shown below. union: query_expr [order by {colname | num} [asc | desc][, {colname | num} [asc | desc]]...] query_expr: query_term | query_expr union [all] query_term query_term: query_spec | ( query_expr ) query_spec: select [first | all | distinct] {* | select_item[, select_item]...} from tab_ref [, tab_ref]... [where cond_expr] All select statements that are involved in each specified union must have the same number of result columns and each of the corresponding columns must have compatible data types. The results of each of the select statements are combined into a single result set. The results from each pair of select statements that are unioned together will have any duplicate rows removed by default. You can specify union all in order to keep any duplicate rows in the result set. Because RDM Server SQL must maintain a separate index in order to locate the duplicate rows to be removed, the best performance will result by always specifying union all. The unions of more than two select statements are processed in left to right order but can be changed by using parentheses. The size of the final result set can be affected by how the unions are parenthesized if duplicate rows are being eliminated in some of the unions (i.e., all is specified in some but not all of the union operations). Standard SQL assigns no column headings to the result columns. RDM Server SQL, however, by default assigns the result column headings based on the column names or headings specified in the first select statement. You can turn this feature on or off using the following set statement. set_union_headings: set union headings {on | off} 7.11.2 Union Examples The FBI’s National Crime Information Center maintains the national Integrated Automated Fingerprint Identification System (IAFIS) for use by law enforcement agencies throughout the United States. This database contains the fingerprint records for over 55 million criminal subjects as well as the civilian subjects many of whom are current or former employees of various local, state, and federal law enforcement agencies. The FBI also manages the COmbined DNA Index System (CODIS) and the National DNA Index System containing over 6.7 million offender DNA profiles and almost 260,000 forensic DNA profiles extracted from crime scenes (as of February, 2009). The CODIS database contains information on convicted felons, arrestees, and missing persons and their biologically related relatives. In the investigation of a crime, fingerprints and DNA samples are often found that can lead to the identification and apprehension of the perpetrators of the crime. In the following example, a set of fingerprints and human DNA samples that were extracted from a hypothetical crime scene are submitted to these various databases in order to identify any individuals from those databases that match any of the provided fingerprint and DNA codes. All tables contain the name, date of birth (dob), SQL User Guide 102 7. Retrieving Data from a Database gender, height, weight, race, hair color, eye color, and distinguishing scars and marks (dsm) of each person in their respective databases. Each record also contains a unique NCIC identification number. The following query shows how the union operator can be used to return a single result set from these databases containing a list of all persons who match at least one of the specified fingerprint or DNA codes. select name, dob, gender, height, weight, race, hair, eye, dsm, ncic, "IAFIS Criminal" from iafis.criminal where fpid in (fpcode1, fpcode2, ..., fpcodeN) union all select name, dob, gender, height, weight, race, hair, eye, dsm, ncic, "IAFIS Civilian" from iafis.civil where fpid in (fpcode1, fpcode2, ..., fpcodeN) union all select name, dob, gender, height, weight, race, hair, eye, dsm, ncic, "CODIS Felon" from codis.felon where dnacode in (dnacode1, dnacode2, ..., dnacodeN) union all select name, dob, gender, height, weight, race, hair, eye, dsm, ncic, "CODIS Arrestee" from codis.arrestee where dnacode in (dnacode1, dnacode2, ..., dnacodeN) order by 1, 2; The union all ensures that SQL will not have to do the extra work to check for duplicate rows. The character literal that is specified as the last entry in each select column list simply identifies the source from which the matching row was found. The final result set is sorted by name and date of birth. The ncic identification number is returned in the ncic column which can then be used to retrieve the entire record for each result row if desired. Unions can also be used to simplify the kind of select statement to be used to retrieve the desired result. Our sales database example stores only one address for each customer. But there are often situations where the customer has one address for billing and another for shipping. One way this can be implemented is to separate customer address data into a separate table and maintain a billing address foreign key and a shipping address foreign key referencing the address table in the customer table. The DDL which implements this scheme is given below. create table address ( addrid rowid primary key, address1 char(30), address2 char(30), city char(20), state char(2), zip char(5) ); create table customer ( cust_id char(3) primary key, company varchar(30) not null, contact varchar(30), billingaddr rowid not null references address, shippingaddr rowid references address ); The billingaddr column contains the rowid of the address table row that contains the customer’s billing address information. The shippingaddr column contains the rowid of the address table row that contains the shipping address information. When the billing and the shipping address are the same, shippingaddr is null. Now suppose that each customer is to be sent a package of promotional material. A list of each customer’s shipping address is to be retrieved and used to produce mailing labels. One way to do this is shown in the example below. SQL User Guide 103 7. Retrieving Data from a Database select company, contact, if (shippingaddr is null, b.address1, s.address1) address1, if (shippingaddr is null, b.address2, s.address2) address2, if (shippingaddr is null, b.city, s.city) city, if (shippingaddr is null, b.state, s.state) state, if (shippingaddr is null, b.zip, s.zip) zip from (customer inner join address b on (billingaddr = b.addrid)) left outer join address s on (shippingaddr = s.addrid) order by 7; This works well but is pretty complex. The same result, however, can be achieved using a union with a much simpler construction as follows. select company, contact, address1, address2, city, state, zip from customer inner join address on (shippingaddr = addrid) union all select company, contact, address1, address2, city, state, zip from customer inner join address on (billingaddr = addrid) where shippingaddr is null order by 7; SQL User Guide 104 8. Inserting, Updating, and Deleting Data in a Database 8. Inserting, Updating, and Deleting Data in a Database The SQL insert statement is used to add new data into the database. Database data that already exists in the database can be changed using the SQL update statement. You delete data from the database using the SQL delete statement. Use of these three statements are described in detail in this chapter. Changes made by one or more of these statements are not stored in the database until a commit statement is executed. A commit causes all of the database changes made in the current transaction to be safely written to the database. Before describing the use of the SQL statements that you can use to change the data stored in the database, it is necessary to first describe the use of transactions. 8.1 Transactions It is very important that any database management system (DBMS) ensures that the data that is stored in a database satisfies the ACID criteria: Atomicity, Consistency, Isolation, and Durability. Atomicity means that a set of interrelated database modifications all be made together at the same time. If one modification from the set fails then all fail. Consistency means that a database never contains errant data or relationships and that a transaction always transforms the database from one consistent state into another. Consistency is something that is primarily the responsibility of the application because the database cannot be certain that all of the necessary modifications have been properly included in any given transaction. In SQL, consistency rules are specified through DDL foreign and primary key declarations and the check clause and RDM Server SQL does ensure that all database data adheres to those rules. Isolation means that the changes that are being made during a transaction are only visible to the user (connection) making them. Not until the transaction’s changes have been committed to the database are other users (connections) able to see them. Durability refers to the DBMS’s ability to ensure that the changes made by all transactions that have committed survive any kind of system failure. The work necessary to ensure that a DBMS supports "ACIDicity" makes it among the most complex of all system software components. The challenge is to maintain ACIDicity and yet allow the database data to be easily accessed by as many users as possible, as fast as possible. However, there is an unavoidable and severe negative performance impact caused by the need to maintain an ACID compliant database. When enforcement of these properties is relaxed, data can be updated and accessed much more quickly but the consistency and integrity of the data will certainly be impaired should a system failure occur. A transaction is a group of related database modifications (i.e., a sequence of insert, update, and/or delete statements) that are written to the database during execution of a commit statement in such a way as to guarantee that either all of the modifications are successfully written to the database or none are in the event of a system failure while the commit is being processed. Should the application detect an error (e.g., invalid user input) or RDM Server SQL detect an integrity error prior to the commit, a rollback statement can be executed to discard all of the changes made since the start of the transaction. Transactions are controlled through the use of four SQL statements: start transaction, commit, rollback, and savepoint. 8.1.1 Transaction Start The start transaction statement is used to mark the beginning of a new database transaction. Use of this statement is not strictly necessary as a transaction is implicitly started by SQL on execution of the first insert, update or delete statement that follows the most recently executed commit (or call to SQLConnect). However, it is best to explicitly start each transaction as that will clearly delineate transaction boundaries in your application. The syntax for the start transaction is given below. start_trans: start trans[action] SQL User Guide 105 8. Inserting, Updating, and Deleting Data in a Database The start transaction initiates a new database modification transaction in which the changes made by any subsequent insert, update, or delete statements (as well as changes made by any triggers that have been defined on the modified tables, see Chapter 8) will be atomically written to the database as a unit upon execution of a commit statement. In earlier versions of RDM Server the begin transaction statement was used to start (begin) a transaction. Its syntax is shown below and is still accepted by RDM Server SQL. begin [trans[action] | work] [trans_id] The optional trans_id is an identifier that can be used to label the transaction. 8.1.2 Transaction Commit The commit statement is used to atomically write all of the changes made by insert, update and delete statements executed since the most recently executed start transaction statement. The syntax for commit is as follows. commit: commit A simple transaction used to insert a single row into the salesperson table is shown in the following example. start trans; insert into salesperson values "MMB", "Bryant, Mike",date "1960-11-14",0.05,0,"SEA","BNF"; commit; 8.1.3 Transaction Savepoint The savepoint statement is used to mark a transaction savepoint identified by savepoint_id that can be the target of a subsequently executed rollback to savepointsavepoint_id statement which will cause all of the database modifications made after this savepoint to be discarded while keeping intact all changes made in the transaction prior to this savepoint. The syntax for the savepoint statement is shown below. savepoint: savepoint savepoint_id Of course, this statement requires that a transaction has been started. Savepoints are discarded through execution of a rollback to a prior savepoint, or a rollback or commit of the transaction. 8.1.4 Transaction Rollback The rollback statement is used to discard (undo) database modifications made during the current transaction. The syntax for rollback is shown below. rollback: | rollback [trans[action]] rollback to savepoint savepoint_id SQL User Guide 106 8. Inserting, Updating, and Deleting Data in a Database The first form is used to terminate the transaction and discard all of the changes made by all insert, update and delete statements that were executed during the transaction. The second form is used to discard all of the changes made by all insert, update and delete statements that were executed after execution of the savepoint statement with a matching savepoint_id. Changes made during the transaction prior to the savepoint remain in place. The example below illustrates the use of savepoint and rollback. start trans; insert into salesperson ... // new salesperson savepoint new_customer; insert into customer... // new customer for new salesperson insert into customer... // another for the new salesperson ... // discover problem with new customers rollback savepoint to new_customer; commit; // commit new salesperson to database 8.2 Inserting Data The insert statement is used to insert new rows into a table. Three different methods for inserting rows into a table are supported in RDM Server SQL. The insert values statement is the most common and is used to insert a single row into a table. The insert from select statement can be used to insert the results from a select statement into a table. Finally, the insert from file statement allows you to insert rows into a table from a comma-delimited text file or from a XML file. Use of each of these methods is described in the following sections. 8.2.1 Insert Values The insert values statement is used to insert a row into a table. The syntax for the insert values statement is: insert_values: insert into [ dbname.]tabname [ ( colname [, colname ]... ) ] values col_value [, col_value]... col_value: constant | null |? | proc_arg The insert values statement is used to insert a single row into the table tabname which must identify a table declared in a database managed by RDM Server. If more that one database has a table named tabname then dbname should be specified to identify the database containing the desired table. If a colname list is specified it must include every column which requires that a value be specified (a primary key column or one which does not have a default value but does have a not null declared). For each column, there must be a value specified in the same corresponding position in the values list. If no colname list is specified then there must be a value listed for each column declared in the table in the order in which the columns were declared in the create table statement for tabname. The values specified in the values list will usually simply be a constant of a data type that is compatible with the data type of its corresponding column, or null if allowed by the corresponding column definition. However, insert values can include a parameter marker references (designated by a "?") or, if the insert statement is contained within a create procedure statement, procedure argument names (proc_arg). For example, the following statement inserts a new salesperson into the example salesperson table. SQL User Guide 107 8. Inserting, Updating, and Deleting Data in a Database insert into salesperson values "MMB", "Bryant, Mike",date "1960-11-14",0.05,0,"SEA","BNF"; In the salesperson table, the mgr_id column is a foreign key to the row of the salesperson table of the manager. For this example, if RDM Server finds no salesperson row with "BNF" as the value of sale_id, it rejects the insert statement with a referential integrity violation.. When using the insert values statement, the application does not have to specify all columns for the table. Only non-null columns that do not have default values must be specified. For these columns, the insert statement specifies either the default value or null. If your application does not include a column list, it must specify values for all table columns, in the order in which the columns are declared in the create table (or create view) statement. Any SQL program that inserts rows into a table with a rowid primary key must add a place-holder (",,") for the rowid primary key column in any values list, as well as in any text file that is used for importing rows into the table. This affects only the use of the insert statement. Rowid primary key values cannot be modified by using the update statement. The next example shows the insert statements needed to store a complete sales order in the database. start trans; insert into sales_order values("SEA",2311,date "1997-06-30",time "13:17:00",30036.50,2553.10); insert into item values(2311,16311,30); insert into item values(2311,18061,200); insert into item values(2311,18121,1000); commit; The columns in the sales_order table are cust_id, ord_num, ord_date, ord_time, amount, tax, and ship_date respectively. The columns in the item table are ord_num, prod_id, loc_id and quantity. Note that this is a single transaction, which contains four insert statements. Hence, there is a single commit statement. The following example illustrates the use of RDM Server SQL system literal constants in insert statements. .. a statement that could be executed from an extension module or .. stored procedure that is always executed when a connection is made. insert into login_log(user_name, login_time) values(user, now); .. check today's action items select cust_id, note_text from action_items where tickle_date = today; See RDM Server Language Reference for information about specifying constant values including date and time constants and system literal constants, such as those in the example above. 8.2.2 Insert from Select You can also insert new rows into a table from another table using insert from select statement. The syntax for the insert from select statement is given below. The select statement was described in detail in the last chapter and its use with the insert statement will show the basics of how the two can be used together. insert_from_select: SQL User Guide 108 8. Inserting, Updating, and Deleting Data in a Database insert into [db_name.]tabname [(colname[, colname]...)] [from] select The number of result columns returned from the select statement must equal the number of columns specified in the colname list or, if not specified, the number of columns declared in the table. The data type of each result column must also be compatible with its corresponding table column. Your application can create a temporary table to hold temporary results from which additional queries can be processed. A temporary table is visible only to the application session that creates it. It can be queried just like any other table. To create a temporary table, execute a create temporary table statement that conforms to he following syntax. create_temporary_table: create temporary table tabname (temp_col_defn [, temp_col_defn]...) temp_col_defn: colname type_spec [default {constant | null | auto}] The basics of table creation were described earlier in section 6.3. However, no foreign or primary keys or check constraints can be declared for a temporary table. Before any rows are inserted into the temporary table, you can create one or more indexes on the temporary table using the create index statement. Note that the optional attribute and the in clause are not allowed in the create index statement for a temporary index. Use of the create index statement is described in section 6.4. Once the temporary table has been created, rows can be inserted into it using any form of the insert statement and it can be referenced just like any table in other select, update, and delete statements. The following example uses an insert statement to fill a temporary table called sp_sales with the customer orders processed by Sidney McGuire ("SKM"). create temporary table sp_sales( company char(30), city char(17), state char(2), ord_date date, amount float ); create index skm_ndx on sp_sales(state); insert into sp_sales select company, city, state, ord_date, amount from customer, sales_order where customer.sale_id = "SKM" and customer.cust_id = sales_order.cust_id order by 1, 4; The select statement in the insert statement above contains an order by clause that causes the natural ordering of the rows in sp_sales to be sorted in company, ord_date order. Any select statement issued for sp_sales that does not itself have an order by clause specified reports its results in the same order. A temporary table can be reinitialized using the initialize temporary table statement as shown below. initialize_temporary_table: init[ialize] temp[orary] table SQL User Guide tabname[, tabname]... 109 8. Inserting, Updating, and Deleting Data in a Database Each tabname must be the name of a previously created temporary table. The following example shows an initialize statement used to reload the temporary table sp_sales with customers for Bob Flores ("BNF"). init temp sp_sales; insert into sp_sales select company, city, state, ord_date, amount from customer, sales_order where customer.sale_id = "BNF" and customer.cust_id = sales_order.cust_id order by 1, 4; 8.2.3 Importing Data into a Table Your application can use a single insert from file statement to import multiple rows from a comma-delimited text file into a table. This statement can be used to perform a bulk load. If any of the rows in the text file violate any of the integrity constraints defined for the table, the load terminates with an error. All rows inserted up to that point are rolled back automatically. The syntax for this form of the insert statement is as follows: insert_from_file: insert [with auto commit] into [dbname.]tabname [from] [ascii | unicode] file "filename" [, "delimiter"]... [on devname] | insert [with auto commit] into [db_name.]table_name [from] xml file "filename"[, xml_option ...][on devname] xml_option: "blobs={no | yes}" | "tags={columnnames | numbers}" | "attribs={no | only}" | "tabname={no | yes}" | "nulltags={no | yes}" | "dateformat={y | m | d}" The specified file must reside in device devname or, if no device is given, in the user's device. The first form of the insert from file statement stores each row in a text line with the column values delimited with a comma or the specified "delimiter" character would be in the values clause of an insert values statement. The insert from xml file imports the data from an xml formated file. You can specify a variety of options that describe the format of the xml file to be imported. Note that these options are specified in a string with no spaces allowed between the option elements. Also note that the default setting is the first option setting specified in the list. Hence, the default blobs option is no. You can also specify just "y" for "yes" or "n" for "no". The option string is case-insensitive. Each of these options is described in the following table. Table 8-1. XML Import Option Descriptions Option Description blobs Set to "yes" to import the translation string specified for long varbinary column data. tags Set to "numbers" when column tags are identified by their ordinal position in the result set rather than its name (e.g., <COLUMN-2>). attribs Set to "only" when each result row output is a single text line where the SQL User Guide 110 8. Inserting, Updating, and Deleting Data in a Database Option Description column values specified as attributes (e.g., <ROW> sale_id="BNF" namee="Flores, Bob"<\ROW>). tabname Set to "yes" when each result row is tagged with its table name (e.g., <salesperson>) rather than <ROW>. nulltags Set to "yes" when an empty column entry is given for null-valued columns. dateformat Set to "y" when dates are in "YYYY-MM-DD" format (default), to "m" for dates in "MM-DD-YYYY" format, and to "d" for dates in "DD-MM-YYYY" format. Each text line (ending in a newline character, '\n') in file "filename" corresponds to one row in the table. Each value in the text line is specified just as it would be in the values clause of an insert values statement. Each column value is separated by the "delimiter" character which is by default a comma (","). Any errors encountered during the processing of any of the insert will result in an appropriate error return and will discard any rows inserted prior to the occurrence of the error. The with auto commit clause can be specified to indicate that the system is to perform a commit on each row that it inserted into the table from the specified file which will preserve all rows that were inserted up to the one in which the error was detected. If the number of rows to be inserted is very large, your application should either explicitly open the database in exclusive mode or issue an exclusive table lock on the table being accessed. Otherwise, the server is forced to maintain a growing number of record locks for the table, which can cause severe performance degradation on the server. The following example lists the contents of file "outlet.txt" located in the catdev device. "SEA", "LAX", "DAL", "BOS", "CHI", "KCM", "STL", "NYC", "ATL", "MIN", "DEN", "WDC", "Seattle", "Los Angeles", "Dallas", "Boston", "Chicago", "Kansas City", "St. Louis", "New York", "Atlanta", "Minneapolis", "Denver", "Washington", "WA", "CA", "TX", "MA", "IL", "MO", "MO", "NY", "GA", "MN", "CO", "DC", 0 0 3 1 2 2 2 1 3 2 0 1 All these values can be loaded into our example outlet table using the following insert statement. insert into outlet file "outlet.txt" on catdev; commit; You can ensure the fastest possible import processing by first opening the database in exclusive access mode (no locks required) with transaction logging turned off (see example below). Of course, the price paid for this performance is the loss of recoverability in case the server crashes (for example, in a power failure) while the insert statement is being processed. If any integrity constraints are violated, the insert statement terminates but the rows that have already been inserted cannot be rolled back. No rollback capability exists at all in this case, because the changes are not logged. SQL User Guide 111 8. Inserting, Updating, and Deleting Data in a Database The following code illustrates insert from file statements issued for the product, outlet, and on_hand tables. Notice the use of the update stats statement following the bulk load. It is always a good practice to execute update stats after making substantial modifications to a database, such as bulk loads. Executing this statement ensures that the SQL query optimizer is generating access plans based on reasonable data usage statistics. open invntory exclusive with trans off; insert into product from file "product.txt" on catdev; insert into outlet from file "outlet.txt" on catdev; insert into on_hand from file "on_hand.txt" on catdev; commit; update stats on invntory; You use the insert from file statement to import XML data into a database table. The general format of an XML file is as follows: <?xml version="1.0" encoding="UTF-8"?> <anytagname> <anyrowname anyattributename="value" ...> <anytagname>value</anytagname> ... </anyrowname> ... </anytagname> The first level tags (anyrowname) are assumed to enclose row values. The tag name is ignored. When anytagname or anyattributename matches a column in the table named in the insert statement, the value will be assigned for that column. A row will be created if at least 1 column is specified, and the resulting insert of the row is valid according to SQL rules, such as key uniqueness and referential integrity. Tags nested at deeper levels will be ignored. If a column is missing, it will be inserted as a null column value. Note that if the column is not nullable, the insert will fail. If a column is identified more than once in a single row element (one or more attributes with the same name, and/or one or more elements with the same name), only the first value will be used. The remaining values will be ignored. In the following example, the sale_id column will have the value "one". <?xml version="1.0" encoding="UTF-8"?> <RAIMA-SQL> <ROW sale_id="one" dob="1954-05-30" sale_id="two"> <SALE_ID>three</SALE_ID> <SALE_NAME>Flores, Bob</SALE_NAME> <SALE_ID>four</SALE_ID> </ROW> </RAIMA-SQL> 8.2.4 Exporting Data from a Table The insert into file statement can be used to export data from a table (or tables) into either a comma-delimented formatted file or an XML formatted file. The syntax for this statement is given below. insert_into_file: SQL User Guide 112 8. Inserting, Updating, and Deleting Data in a Database | insert into [ascii | unicode] file "filename" [ , "delimiter"] [on devname ] [from] select insert into xml file "filename" [ , xml_option]... [on devname ] [from] select xml_option: "blobs={no | yes} | "header={noversion | version }" | "tags={columnnames | numbers}" | "attribs={no | only}" | "tabname={no | yes}" | "nulltags={no | yes}" | "dtd={no | yes}" | "schema={no | yes}" | "dateformat={y | m | d}" This statement will export the result set returned from the specified select statement in the specified format (ascii, unicode, or xml) into a text file named "filename" which will be stored in the device named devname or in the home device for the user executing the statement. The first form (non-xml) of the insert into file statement will store the result rows from the specified select statement in a comma-delimited file in which the data is stored either as ascii-coded (default) or Unicode-coded (UTF-8) characters. You can use the "delimiter" clause to change the delimiter from a comma to some other special character (e.g., "|"). The xml form of the insert into file statement allows one or more xml control options to be specified. Note that these options are specified in a string with no spaces allowed between the option elements. Also note that the default setting is the first option setting specified in the list. Hence, the default blobs option is no. You can also specify just "y" for "yes" or "n" for "no". The option string is case-insensitive. Each of these options is described in the following table. Table 8-2. XML Export Option Descriptions Option Description blobs Set to "yes" to include a translation string for long varbinary column data. header Set to "version" include in the generated xml file a <!-- --> header line containing the version of RDM Server that executed the statement. tags Set to "numbers" to have each column tag identified by its ordinal position in the result set rather than its name (e.g., <COLUMN-2>). attribs Set to "only" to have each result row output as a single text line with the column values specified as attributes (e.g., <ROW> sale_id="BNF" namee="Flores, Bob"<\ROW>). tabname Set to "yes" to have each result row tagged with its table name (e.g., <salesperson>) rather than <ROW>. nulltags Set to "yes" to output an empty column entry for null-valued columns. dtd Set to "yes" to output a DTD (Document Type Description) header for the xml file. schema Set to "yes" to output a header containing the schema for the result xml table. dateformat Set to "y" to output dates in "YYYY-MM-DD" format (default), to "m" to output dates in "MM-DD-YYYY" format, and to "d" to output dates in "DD-MMYYYY" format. Examples of a variety of export options of the outlet table (invntory database) are provided below. The insert statement is followed by the contents of the generated file. SQL User Guide 113 8. Inserting, Updating, and Deleting Data in a Database insert into file "outlet.txt" on catdev from select * from outlet; "ATL","Atlanta","GA",3 "BOS","Boston","MA",1 "CHI","Chicago","IL",2 "DAL","Dallas","TX",3 "DEN","Denver","CO",0 "KCM","Kansas City","MO",2 "LAX","Los Angeles","CA",0 "MIN","Minneapolis","MN",2 "NYC","New York","NY",1 "SEA","Seattle","WA",0 "STL","St. Louis","MO",2 "WDC","Washington","DC",1 insert into file "outlet.txt", "|" on catdev from select * from outlet; "ATL"|"Atlanta"|"GA"|3 "BOS"|"Boston"|"MA"|1 "CHI"|"Chicago"|"IL"|2 "DAL"|"Dallas"|"TX"|3 "DEN"|"Denver"|"CO"|0 "KCM"|"Kansas City"|"MO"|2 "LAX"|"Los Angeles"|"CA"|0 "MIN"|"Minneapolis"|"MN"|2 "NYC"|"New York"|"NY"|1 "SEA"|"Seattle"|"WA"|0 "STL"|"St. Louis"|"MO"|2 "WDC"|"Washington"|"DC"|1 insert into xml file "outlet.xml" on catdev from select * from outlet; <?xml version="1.0" encoding="UTF-8"?> <RAIMA-SQL> <ROW> <loc_id>ATL</loc_id> <city>Atlanta</city> <state>GA</state> <region>3</region> </ROW> <ROW> <loc_id>BOS</loc_id> <city>Boston</city> <state>MA</state> <region>1</region> </ROW> <ROW> <loc_id>CHI</loc_id> <city>Chicago</city> <state>IL</state> <region>2</region> </ROW> <ROW> <loc_id>DAL</loc_id> <city>Dallas</city> <state>TX</state> <region>3</region> </ROW> <ROW> <loc_id>DEN</loc_id> SQL User Guide 114 8. Inserting, Updating, and Deleting Data in a Database <city>Denver</city> <state>CO</state> <region>0</region> </ROW> <ROW> <loc_id>KCM</loc_id> <city>Kansas City</city> <state>MO</state> <region>2</region> </ROW> <ROW> <loc_id>LAX</loc_id> <city>Los Angeles</city> <state>CA</state> <region>0</region> </ROW> <ROW> <loc_id>MIN</loc_id> <city>Minneapolis</city> <state>MN</state> <region>2</region> </ROW> <ROW> <loc_id>NYC</loc_id> <city>New York</city> <state>NY</state> <region>1</region> </ROW> <ROW> <loc_id>SEA</loc_id> <city>Seattle</city> <state>WA</state> <region>0</region> </ROW> <ROW> <loc_id>STL</loc_id> <city>St. Louis</city> <state>MO</state> <region>2</region> </ROW> <ROW> <loc_id>WDC</loc_id> <city>Washington</city> <state>DC</state> <region>1</region> </ROW> </RAIMA-SQL> insert into xml file "outlet.xml","attribs=only","tabname=y" on catdev from select * from outlet; <?xml version="1.0" encoding="UTF-8"?> <RAIMA-SQL> <outlet loc_id="ATL" city="Atlanta" state="GA" region="3" /> <outlet loc_id="BOS" city="Boston" state="MA" region="1" /> <outlet loc_id="CHI" city="Chicago" state="IL" region="2" /> <outlet loc_id="DAL" city="Dallas" state="TX" region="3" /> <outlet loc_id="DEN" city="Denver" state="CO" region="0" /> <outlet loc_id="KCM" city="Kansas City" state="MO" region="2" /> <outlet loc_id="LAX" city="Los Angeles" state="CA" region="0" /> SQL User Guide 115 8. Inserting, Updating, and Deleting Data in a Database <outlet loc_id="MIN" <outlet loc_id="NYC" <outlet loc_id="SEA" <outlet loc_id="STL" <outlet loc_id="WDC" </RAIMA-SQL> city="Minneapolis" state="MN" region="2" /> city="New York" state="NY" region="1" /> city="Seattle" state="WA" region="0" /> city="St. Louis" state="MO" region="2" /> city="Washington" state="DC" region="1" /> The last example shown above is an insert into xml file that specified two xml options. Any number of the xml options can be specified in an insert statement. 8.3 Updating Data The update statement is used to modify the values of one or more columns of one or more rows in a table. update: update [dbname.]tabname set colname = {expression | null}[, colname = {expression | null}]... [where cond_expr ] The value to which each named column in the set clause is assigned is the evaluated result of its specified expression. The table to be updated is named tabname which, if more than one database has a table of that name, should be qualified with its dbname. The column values in table tabname referenced by the expressions are the pre-updated column values. The rows that are updated are those for which the conditional expression is true. If no where clause is specified, every row in table tabname will be updated. If the update of any of the selected rows results in an integrity constraint violation, the update is aborted and the changes to the rows that had already been modified are discarded. Note that you can only update a primary key of those rows for which there are either no foreign key references or for on which a create join has been declared on all of the foreign keys that reference this primary key. Updates of foreign key columns will be checked to ensure that referential integrity is preserved (i.e., the referenced primary key row exists). The following example shows a basic update statement that sets the commission to eight percent for the salesperson with sale_id "SWR" (Stephanie Robinson). This update modifies only a single row of a table. start transaction; update salesperson set commission = 0.08 where sale_id = "SWR"; commit; The next example gives each non-manager salesperson a 10 percent increase in commission rate. update salesperson set commission = commission + 0.10*commission where mgr_id is null; commit; Assume that Rams Data Processing, Inc., has moved to a new address. The next statement modifies the relevant columns in the customer table of our sales database. update customer set address = "17512 SW 123rd St.", city = "Tustin", zip = "90121" SQL User Guide 116 8. Inserting, Updating, and Deleting Data in a Database where cust_id = "LAN"; commit; The statements below illustrate another update example. Eliska Wyman ("ERW") has left the company. Until her replacement is hired, Eliska's New York and New Jersey customers are to be serviced by Greg Porter ("GAP"), and her other customers will be handled by Sidney McGuire ("SKM"). start trans; update customer set sale_id = where sale_id update customer set sale_id = where sale_id commit; "GAP" = "ERW" and state in ("NY", "NJ"); "SKM" = "ERW" and state not in ("NY", "NJ"); The following example uses the if column selection function. This function allows the application to do in a single statement the modifications requiring two update statements in the previous example. start trans; update customer set sale_id = if (state in ("NY","NJ"), "GAP", "SKM"); where sale_id = "ERW"; commit; 8.4 Deleting Data The delete statement is used to delete one or more rows from a table. The syntax for the delete statement is as follows. delete: delete from [dbname.]tabname [where cond_expr ] The table whose rows are to be updated is named tabname which, if more than one database has a table of that name, should be qualified with its dbname. The rows to be deleted from tabname are those for which the conditional expression specified in the where clause returns true. If no where clause is specified the all of the rows in the table will be deleted. The delete statement will fail and return an error if it attempts to delete a row which is referenced by another foreign key rows in which case no rows will be deleted. The following example shows how the delete statement is used to try to delete the salesperson row with sale_id equal to "ERW". delete from salesperson where sale_id = "ERW"; However, since there are five customers who are serviced by this salesperson which have not been deleted the system (in this case, the rsql utility) returns the following error. ****RSQL Diagnostic 3713: non-zero references on primary/unique key SQL User Guide 117 8. Inserting, Updating, and Deleting Data in a Database In the next example, sales manager Chris Blades has left the company and his salespersons are to be reassigned to Bill Stouffer. Before deleting the salesperson row for Chris Blades (sale_id = "CMB"), an update statement must first be executing to reassign the salesperson rows with mgr_id = "CMB" to mgr_id "BPS". start trans; update salesperson set mgr_id = "BPS" where mgr_id = "CMB"; *** 2 rows affected delete from salesperson where sale_id = "CMB" ****RSQL Diagnostic 3713: non-zero references on primary/unique key Oops. There are still some foreign keys somewhere that reference salesperson row with sale_id = "CMB" but there are no customers assigned to Blades since he is a manager. But there are notes. So, the statements below will successfully complete the transaction. Note that the update salesperson statement is still active in the transaction even those the above delete statement failed. delete from note_line where sale_id = "CMB"; *** 29 rows affected delete from note where sale_id = "CMB"; *** 11 rows affected delete from salesperson where sale_id = "CMB" *** 1 rows affected commit; SQL User Guide 118 9. Database Triggers 9. Database Triggers A trigger is procedure associated with a table that is executed (i.e., fired) whenever that table is modified by the execution of an insert, update, or delete statement. A non-standard trigger mechanism has been available in RDM Server SQL through the use of a User-Defined Function that gets called via the execution of a check condition that was specified in the create table statement. The SQL standard now provides the ability for triggers to be specified using SQL statements. This section describes how standard SQL triggers are implemented in RDM Server SQL. 9.1 Trigger Specification The create trigger statement is used to create a trigger on a specified database table. The syntax for this statement is given below. create_trigger: create trigger trigname ["description"] {before | after} {insert | delete | update [of colname[, colname]...]} on tabname [referencing {old | new} [row] [as] corname [{new | old} [row] [as] corname]] [for each {row [when (search_condition)] | statement] trigger_stmts trigger_stmts: trig_stmt | begin [atomic] trig_stmt... end trig_stmt: | open | close | flush | initialize_database | insert | delete update | lock_table | call | initialize_table | notify The trigname is the unique name of the trigger and must conform to a standard identifier. The tabname is the name of the table with which the trigger is to be associated. If there is more than one database with a table named tabname then you can qualify tabname with the name of its database dbname. An optional string containing a description or comment about the trigger can be specified. This string is stored along with the trigger definition in the system catalog. The trigger is defined to be fired either before or after the changes are made by the specified insert, update, or delete (called the trigger event). The firing of an update trigger can be restricted to occur only when the values of the column names specified in the update of clause are updated. If no columns are specified, then an update trigger will be fired upon the execution of every update statement on tabname. Two types of triggers can be created. A statement-level trigger is created by specifying for each statement in the trigger declaration. If no for each clause is specified, for each statement is the default. A statement-level trigger fires once for each execution of the insert, update, or delete statement as indicated in the specified trigger event. Thus, for example, an update statement that modifies 100 rows in a table will execute a statement-level update trigger on the table only once. A row-level trigger is created by specifying for each row in the trigger declaration. Row-level triggers fire once for each table row that is changed by the insert, update, or delete statement. Row-level triggers are the more useful of the two types of triggers in that they can reference the old and/or new columns values for each row. The referencing clause is used to specify a correlation name for either the old table row values or the new table row values.This clause can only be specified with row- SQL User Guide 119 9. Database Triggers level triggers. The when clause can be used to specify a condition that must evaluate to true in order for the trigger to fire. Note that the only table values that can be referenced in the when conditional expression (cond_expr) are through the referencing old and/or new row correlation names. The new or old column values of each row can be referenced in the trigger’s SQL statements through the correlation names specified in the referencing clause. However, references to blob type columns (long varchar/varbinary/wvarchar) are not allowed. Note that insert triggers only have new column values, delete triggers only have old column values, while update triggers have both old and new column values. The SQL statement to be executed when the trigger fires is specified last. If more than one statement is needed, it must be placed within a begin [atomic] and end block. The SQL standard offers no explanation as to why it chose to include the word "atomic." It normally is used to mean that a sequence of statements are not interruptable. However, since the execution of a trigger can cause other data modifications to occur that also have triggers (they can be nested) this cannot be the case with triggers. We have interpreted it to mean that either all of the SQL statements succeed or if any one fails then the state is restored to its pre-trigger execution condition. Regardless of why they chose to include this term, it does tend to make one not want to use triggers for fear of nuking the database! There are some restrictions on the kinds of SQL statements that can be included in a trigger. No select, DDL, or create statements are allowed in a trigger. A trigger cannot create another trigger. A stored procedure cannot create a trigger. Also, since it is necessary that any database modifications made by a trigger be included as part of the user’s transaction, no transaction statements are allowed in a trigger definition. While stored procedures and user-defined procedures can be executed within a trigger, great care must be exercised to ensure that no harmful side effects occur from the execution of these procedures inside a trigger. A trigger begins to take effect immediately upon the successful execution of the create trigger statement. Thus, it is should be considered more of a DDL than a DML statement since their creation should occur immediately after the DDL statements are issued that define the database tables on which the triggers are associated. Triggers that are created on an existing database may require that the conditions and data relationships being maintained by the triggers be externally established at trigger creation time. See the "Summary Statistics" section below for an example. 9.2 Trigger Execution A trigger that has been defined on a table will be executed based on the {before | after} trigger event specification. Any changes that are made by the SQL statements specified in a before trigger will remain intact even when the triggering data modification statement fails (e.g., with an integrity violation). The triggered SQL statements defined in an after trigger are only executed when the triggering data modification statement succeeds. A before statement-level trigger will execute before any changes are made by the associated (triggering) insert, update, or delete statement. An after statement-level trigger will execute after all changes have been successfully made by the associated insert, update, or delete statement. A before row-level trigger will execute prior to each row modification made by the triggering insert, update, or delete statement. An after row-level trigger executes after each row has been successfully modified by the triggering insert, update, or delete statement. If a when clause has been specified with a row-level trigger, the trigger will only fire on those rows where the evaluation of the when's conditional expression (cond_expr) returns true. All changes made by the SQL statement(s) defined by the trigger are included as part of the user’s transaction. Thus, the triggered database modifications will be committed when the user subsequently issues a commit statement or they will be rolled back should the user subsequently execute a rollback statement. There is no limit to the number of triggers than can be defined on a table. There can even be multiple triggers with the same trigger event specified on a table. Multiple triggers are executed sequentially in the order in which they were defined. SQL User Guide 120 9. Database Triggers The SQL trigger_stmts can themselves make changes to tables on which other triggers have been defined. Thus, trigger execution can be nested. Note that any rows that are modified by a trigger remained locked until the user either commits or rolls back the transaction. Any trigger can be disabled and subsequently re-enabled through use of the alter trigger statement. alter trigger trigname {enable | disable} The altered trigger status takes effect immediately upon successful execution of the alter trigger statement. 9.3 Trigger Security The ability for non-administrator users to create triggers is included in the create database command-level privilege.This can be set by executing the following grant statement. grant create database to user_id [, user_id ]... The create database privilege can be removed by executing the following revoke statement. revoke create database from user_id [, user_id ]... A user must either be an administrator or have create database command privilege in order to create, alter or drop triggers. In addition to having the proper command privilege, A non-administrator user must also have been granted trigger privilege on any tables on which the user will be creating triggers. Trigger privileges are set using the following grant statement. grant trigger on [dbname.]tabname to user_id [, user_id ]... Trigger privilege is required for a user to create, alter, or drop a trigger on the specified table. Trigger privileges can be revoked by issuing the following statement. revoke trigger on [dbname.]tabname from user_id [, user_id ]... Revoking trigger privileges does not affect any triggers that may have already been created by the specified user. Triggers execute under the authority of the user who created the trigger and not that of the user who executed the original insert, update, or delete statement that caused the trigger to fire. Thus, the user that issues the create trigger statement must have the proper security privileges on any table that is to be accessed or modified by the trigger’s SQL statements. Later changes to the security settings for the user who created the trigger will not affect the execution of the trigger. Please refer to Chapter 11, "SQL Database Access Security" for details. A trigger can be dropped by executing the drop trigger statement. drop trigger trigname All triggers that have been defined on a particular table are automatically dropped when the table is dropped. SQL User Guide 121 9. Database Triggers 9.4 Trigger Examples The use of triggers in a database system necessarily means that modifications made to the tables on which triggers have been defined will have side effects that are hidden from the user who issued the original SQL modification statement. Generally, side effects are not a good thing to have occur in a software system. Yet, triggers are am important and useful feature for certain kinds of processing requirements. The examples in this section illustrate two such uses. Triggers are particularly useful in maintaining certain kinds of statistics such as usage or summary stats. Triggers are also very useful in maintain various kinds of audit trails. Summary Statistics The query below returns the sales totals for each customer in the sales database. set double display(12, "#,#.##"); select cust_id, sum(amount) from sales_order group by 1; cust_id sum(amount) ATL 113,659.75 BUF 263,030.36 CHI 160,224.65 . . . SEA 60,756.36 SFF 112,345.66 TBB 104,038.25 WAS 63,039.90 An alternative approach which does not require running a query that scans through the entire sales_order table each time can be implemented with triggers. A new column named sales_tot of type double is declared in the customer table. The following three triggers can be defined on the sales_order table that keeps the related customer’s sales total amount up to date. create trigger InsSalesTot after insert on sales_order referencing new row as new_order for each row update customer set sales_tot = sales_tot + new_order.amount where cust_id = new_order.cust_id; create trigger UpdSalesTot after update of amount on sales_order referencing old row as old_order new row as new_order for each row update customer set sales_tot = sales_tot + (new_order.amount - old_order.amount) where cust_id = new_order.cust_id; create trigger DelSalesTot before delete on sales_order referencing old row as old_order for each row update customer set sales_tot = sales_tot - old_order.amount where cust_id = old_order.cust_id; The first trigger, InsSalesTot, executes an update on the customer table after each successful insert on the sales_order table by adding the new sales_order's amount through the correlation name new_order to the current value of the customer's sales_tot. The second trigger is fired only when there is an update executed that changes the value of the amount column in the sales_ order table. When that occurs the customer's sales_tot column needs to subtract out the old amount and add in the new one. The DelSalesTot trigger fires whenever a sales_order row is deleted causing its amount to be subtracted from the customer's sales_tot. SQL User Guide 122 9. Database Triggers Now suppose you want to also maintain the sales totals for each salesperson in addition to each customer. You can also add a sales_tot column of type double to the salesperson table and use a trigger to update it as well as the customer sales_tot column. The simplest way to do this is to modify the above triggers to update the row of the salesperson table who manages the account of the customer whose sales_order is being modified as shown below. create trigger InsSalesTot after insert on sales_order referencing new row as new_order for each row begin atomic update customer set sales_tot = sales_tot + new_order.amount where cust_id = new_order.cust_id update salesperson set sales_tot = sales_tot + new_order.amount where sale_id = (select sale_id from customer where cust_id = new_order.cust_id) end; create trigger UpdSalesTot after update of amount on sales_order referencing old row as old_order new row as new_order for each row begin atomic update customer set sales_tot = sales_tot + (new_order.amount - old_order.amount) where customer.cust_id = new_order.cust_id; update salesperson set sales_tot = sales_tot + (new_order.amount - old_order.amount) where sale_id = (select sale_id from customer where cust_id = new_order.cust_id) end; create trigger DelSalesTot before delete on sales_order referencing old row as old_order for each row begin atomic update customer set sales_tot = sales_tot - old_order.amount where customer.cust_id = old_order.cust_id; update salesperson set sales_tot = sales_tot - old_order.amount where sale_id = (select sale_id from customer where cust_id = new_order.cust_id) end; Since each trigger contains two SQL update statements, they must be enclosed between the begin atomic and end pairs. Also note that the subquery is needed to locate the salesperson row to be updated through the customer row based on the cust_id column in the sales_order table. The same result can also be achieved not by modifying the original triggers but by introducing one new trigger that updates the salesperson's sales_tot whenever a related customer's sales_tot column is updated. Note that the saleperson sales_tot does not need to be updated when a new customer row is inserted (because the sales_tot is initially zero) or when a customer row is deleted (because the sales_order rows associated with the customer must first be deleted which causes the customer's sales_ tot to be updated). The trigger definition is as follows. create trigger UpdSPSalesTot after update of amount on customer referencing old row as old_cust new row as new_cust for each row update salesperson SQL User Guide 123 9. Database Triggers set sales_tot = sales_tot + (new_cust.amount - old_cust.amount) where sale_id = new_cust.sale_id; This trigger fires whenever an update is executed on the sales_tot column in the customer table. That will only occur when one of the earlier triggers fires due to the execution of an insert, delete, or update of the amount column on the sales_order table. Thus, this is an example of a nested trigger—a trigger which fires in response to the firing of another trigger. The sales database example is delivered with the sales_tot column already declared in the salesperson and customer tables but without the triggers having been declared. Now, however, you want to create the triggers that will maintain the sales_tot values for each customer and salesperson but data already exists in the database. So, the sales totals somehow need to be initialized at the time the triggers are created. To do this the database should be opened in exclusive access to ensure that no updates occur between the time the triggers are first installed and the sales_tot values in the customer table are initialized. The following rsql script shows how this can be done. open sales exclusive; set double display(12, "#,#.##"); select sale_id, sales_tot from salesperson; sale_id sales_tot BCK 0.00 BNF 0.00 BPS 0.00 ... WAJ 0.00 WWW 0.00 select cust_id, sales_tot from customer; sale_id sales_tot ATL 0.00 BUF 0.00 CHI 0.00 ... TBB 0.00 WAS 0.00 create trigger InsSalesTot ... create trigger UpdSalesTot ... create trigger DelSalesTot ... create trigger UpdSPSalesTot ... update customer set sales_tot = query("select sum(amount) from sales_order where cust_id = ?", cust_id); *** 28 rows affected select cust_id, sales_tot from customer; commit; sale_id sales_tot ATL 113,659.75 BUF 263,030.36 CHI 160,224.65 ... TBB 104,038.25 WAS 63,039.90 select sale_id, sales_tot from salesperson; sale_id BCK BNF SQL User Guide sales_tot 237,392.56 112,345.66 124 9. Database Triggers BPS 0.00 ... WAJ WWW close sales; 141,535.34 49,461.20 Note that the update statement that sets the sales_tot values for each row in the customer table uses the query system function (a copy has also been included as an example user-defined function called "subquery"). Audit Trails Audit trails keep track of certain changes that are made to a database along with an identification of the user who initiated the change and a timestamp as to when the change occurred. Suppose we want to keep track of changes made to the sales_ order table. The following statement creates a table called orders_log that will contain on row per sales_order change and grants insert (only) privileges on it to all users. create table sales.orders_log( chg_desc char(30), chg_user char(32) default user, chg_timestamp timestamp default now ); commit; grant insert on orders_log to public; Six statement-level triggers are needed to track all successful and unsuccessful attempts to change the sales_order table: three before triggers to track all attempts and three after triggers to track only those changes that succeed. Note that should the transaction that contains the sales_order change statement be rolled back, the changes to orders_log will also be rolled back. Thus, only unsuccessful change attempts associated with subsequently committed transactions will be logged in the orders_ log table. The declarations of the triggers are given below. create trigger bef_ord_ins before insert on sales_order for each statement insert into orders_log(chg_desc) values "insert attempted"; create trigger bef_ord_upd before update on sales_order insert into orders_log(chg_desc) values "update attempted"; create trigger bef_ord_del before delete on sales_order insert into orders_log(chg_desc) values "delete attempted"; create trigger aft_ord_ins after insert on sales_order insert into orders_log(chg_desc) values "insert successful"; create trigger aft_ord_upd after update on sales_order insert into orders_log(chg_desc) values "update successful"; create trigger aft_ord_del after delete on sales_order insert into orders_log(chg_desc) values "update successful"; By the way, as you can see from the above trigger declarations the for each statement clause is optional and is the default if no for each clause is specified. The rsql script below creates a couple of new users who each make several changes to the sales_order table in order to see the results of the firing of the associated triggers. Note also that the original row-level triggers are still operative. create user kirk password "tiberius" on sqldev; grant all commands to kirk; SQL User Guide 125 9. Database Triggers grant select on orders_log to kirk; create user jones password "tough" on sqldev; grant all commands to jones; grant select on orders_log to jones; .c 2 server kirk tiberius insert into sales_order values "IND",2400,today,now,10000.00,0,null; *** 1 rows affected commit; .c 3 server jones tough update sales_order set amount = 1000.00 where ord_num = 2400; *** 1 rows affected delete from sales_order where ord_num = 2210; ****RSQL Diagnostic 3713: non-zero references on primary/unique key commit; select * from orders_log; chg_desc chg_user chg_timestamp insert attempted kirk 2009-07-27 11:58:17.9460 insert successful kirk 2009-07-27 11:58:17.9460 update attempted jones 2009-07-27 11:59:48.2900 update successful jones 2009-07-27 11:59:48.2900 delete attempted jones 2009-07-27 12:00:06.3680 .c 2 *** using statement handle 1 of connection 2 delete from sales_order where ord_num = 2400; *** 1 rows affected select * from orders_log; chg_desc chg_user insert attempted kirk insert successful kirk update attempted jones update successful jones delete attempted jones delete attempted kirk delete attempted kirk delete successful kirk rollback; select * from orders_log; chg_timestamp 2009-07-27 11:58:17.9460 2009-07-27 11:58:17.9460 2009-07-27 11:59:48.2900 2009-07-27 11:59:48.2900 2009-07-27 12:00:06.3680 2009-07-27 12:05:10.0710 2009-07-27 12:05:49.9620 2009-07-27 12:05:49.9620 chg_desc insert attempted insert successful update attempted update successful delete attempted chg_timestamp 2009-07-27 11:58:17.9460 2009-07-27 11:58:17.9460 2009-07-27 11:59:48.2900 2009-07-27 11:59:48.2900 2009-07-27 12:00:06.3680 chg_user kirk kirk jones jones jones 9.5 Accessing Trigger Definitions Trigger definitions are stored in the system catalog. Two predefined stored procedures are available for accessing trigger definitions. Procedure ShowTrigger will return a result set containing a single char column and one row for each line of text from the original declaration for the trigger name specified in the procedure argument. Procedure ShowAllTriggers returns two columns: the trigger name and a line of text from the original declaration. Example calls and their respective result sets are shown in the example below. SQL User Guide 126 9. Database Triggers exec ShowTrigger("UpdSalesTot"); TRIGGER DEFINITION create trigger UpdSalesTot after update of amount on sales_order referencing old row as old_order new row as new_order for each row update customer set sales_tot = sales_tot + (new_order.amount - old_order.amount) where customer.cust_id = new_order.cust_id; exec ShowAllTriggers; NAME DEFINITION InsSalesTot create trigger InsSalesTot after insert on sales_order InsSalesTot referencing new row as new_order InsSalesTot for each row InsSalesTot update customer set sales_tot = sales_tot + new_order.amount InsSalesTot where customer.cust_id = new_order.cust_id; UpdSalesTot create trigger UpdSalesTot after update of amount on sales_order UpdSalesTot referencing old row as old_order new row as new_order UpdSalesTot for each row UpdSalesTot update customer UpdSalesTot set sales_tot = sales_tot + (new_order.amount - old_order.amount UpdSalesTot where customer.cust_id = new_order.cust_id; DelSalesTot create trigger DelSalesTot before delete on sales_order DelSalesTot referencing old row as old_order DelSalesTot for each row DelSalesTot update customer set sales_tot = sales_tot - old_order.amount DelSalesTot where customer.cust_id = old_order.cust_id; UpdSPSalesTot create trigger UpdSPSalesTot after update of sales_tot on customer UpdSPSalesTot referencing old row as oldc new row as newc UpdSPSalesTot for each row UpdSPSalesTot update salesperson UpdSPSalesTot set sales_tot = sales_tot + (newc.sales_tot - oldc.sales_tot) UpdSPSalesTot where sale_id = newc.sale_id; SQL User Guide 127 10. Shared (Multi-User) Database Access 10. Shared (Multi-User) Database Access An RDM Server database is designed to be efficiently accessed my multiple, concurrent users. In such a multi-user environment, some method must be used by the DBMS to protect against attempts by multiple users to update the same data at the same time. RDM Server SQL applies locks on the shared data in order to restrict changes to shared data to one user at a time. Why locking is needed is explained by the following example. Table 10-1 shows the sequence of actions of two connections trying to update the same table row at approximately the same time without using locks. At time t1, connection 1 reads the row from the database. Connection 2 reads the row at t2. Both connections then update and write the row back to the database, with connection 1 going first for each operation. However, at the end of the last write, the row copy for connection 2 does not include changes from connection 1 (changes occurred after connection 2 read the row). Table 10-1. Multi-User Database Access without Locks Time Connection 1 t1 read row t2 t3 read row update row t4 t5 Connection 2 update row write row t6 write row In this case, connection 2 can access connection 1 changes if connection 2 can read the row after time t5. What is necessary is to provide a lock to serialize updates to the shared data. Table 10-2 illustrates the sequence of operations for the two example applications synchronized by the use of locks. Note that once the lock request for connection 1 is granted at time t2, connection 2 must wait for the row to be unlocked before continuing. When connection 1 completes its updates, it frees the lock at time t6. This action triggers RDM Server to grant the lock to connection 2, after which connection 2 can read the row (including the connection 1 changes) and then make its own changes. Table 10-2. Multi-User Database Access with Locks Time Connection 1 t1 request row lock t2 lock granted t3 read row t4 update row t5 write row t6 free lock Connection 2 request row lock t7 read row t8 update row t9 write row t10 free lock SQL User Guide 128 10. Shared (Multi-User) Database Access As the above example illustrates, an important feature of a multi-user DBMS is to provide the shared database access control necessary to ensure that no data is lost and that the data is logically consistent (that is, the required inter-data relationships exist). RDM Server SQL automatically manages the needed row-level locking for you. Yet it is important for you to know how this is done and the features provided by RDM Server SQL that give you control over how SQL manages it. This is the subject of the rest of this chapter. 10.1 Locking in SQL 10.1.1 Row-Level Locking Two types of locks are used by RDM Server. A read lock (sometimes called a share or shared lock) is issued by SQL for each row that is fetched from a select statement. Any number of connections can have a read lock on the same row. A write lock (sometimes called an exclusive lock) is requested for each row to be modified by an update statement or deleted by a delete statement. Once the write lock request has been granted to one connection, no other connected can lock (read or write) that row until the write lock is freed upon execution of a commit or rollback statement. RDM Server SQL implicitly places a write lock on rows created by execution of an insert statement. Note that SQL will also request and place a write lock on any existing rows that are referenced by foreign key values in the newly inserted row. Locks are managed by SQL in conjunction with transactions. All locks that are issued outside of a transaction are read locks. After a transaction is started, as noted above, write lock requests are issued for the rows that are accessed for each insert, update or delete statement that is executed as part of the transaction. Within a transaction, the row-level locks used by a select statement depend on the current transaction mode (see section 10.2 below). Normally, a read lock is placed on the current row (i.e. most recently fetched row) and freed when the next row is read. This is called cursor stability transaction mode. In read repeatability transaction mode the read locks are kept in place until the transaction ends. This ensures that a previously fetched row does not change during the transaction. The with exclusive lock clause of the select statement requests that the system apply write-locks instead of read-locks to the rows of the result set. If no transaction is active when the statement is executed, SQL will automatically start a transaction. The select statement that contains the with exclusive lock clause must be updateable which means that it: l Does not contain a distinct, group by or order by clause, or a subquery, l Does not contain any column expressions, and l Has a from clause that refers to only a single table. The with exclusive lock clause follows the where clause in the specification of a select statement as shown in the following syntax and example. select: select [first | all | distinct] {* | select_item [, select_item]...} from table_ref [, table_ref]... [where cond_expr] [with exclusive lock] [group by col_ref [, col_ref]... [having cond_expr]] [order by col_ref [asc | desc] [, col_ref [asc | desc]]...] select * from salesperson where mgr_id is not null with exclusive lock; SQL User Guide 129 10. Shared (Multi-User) Database Access 10.1.2 Table-Level Locking The lock table and unlock table statements allow you to lock an entire table. Two lock modes are provided. Shared mode allows you (and others) read-only access to the table. Exclusive mode allows you to modify the table while denying all other users access to it. The syntax for lock table is shown below. lock_table: lock table lock_spec[, lock_spec]... lock_spec: [dbname.]tabname[, [dbname.]tabname]... [in] {share | exclusive} [mode] All table locks are automatically freed whenever a transaction is commited or rolled back. Shared mode table locks can be freed explicitly with unlock table statement shown below. unlock_table: unlock table [dbname.]tabname[[, dbname.]tabname]... For example, you can issue the following lock table statement to lock the salesperson and customer tables in shared mode and the sales_order and item tables in exclusive mode. lock table salesperson, customer share sales_order, item exclusive; A typical use for table locks is to place an exclusive lock on a table in order to do a bulk load, as illustrated in the following example which loads the inventory database from comma-delimited text files stored in the RDM Server catdev device. By the way, notice the update stats statement following the bulk load. It is always a good practice to execute update stats after making substantial modifications to a database to ensure that the optimizer is generating access plans based on reasonable usage statistics. lock table product, outlet, on_hand exclusive; insert into product from file "product.txt" on catdev; insert into outlet from file "outlet.txt" on catdev; insert into on_hand from file "onhand.txt" on catdev; commit; update stats on invntory; 10.1.3 Lock Timeouts and Deadlock RDM Server SQL issues lock requests that are either granted or denied. Lock requests are normally queued, waiting for the current lock on the row (or table) to be freed at which time the request at the front of the queue will be granted. Associated with each connection is a lock timeout value that specifies how long an ungranted lock request can wait on the queue. This is an important feature to prevent the occurrence of a deadlock in which two connections each hold locks on rows for which the other connection has a lock request (this is the simplest form of a deadlock—there are many ways in which deadlock can occur among multiple users). In order to avoid deadlock, when a timeout error is returned from the execution of an insert, update or delete statement, the proper procedure is to rollback the transaction and start over. This will free that transaction's locks allowing another connection's competing transaction to proceed. The set timeout statement can be used to set the lock request timeout value for a connection. SQL User Guide 130 10. Shared (Multi-User) Database Access set_timeout: set timeout [to | =] numseconds The numseconds is an integer constant that specified the minimum number of seconds a lock request is to wait. The system default is 30 seconds. Setting the timeout value to 0 will cause lock requests that cannot be granted to timeout immediately. Setting the timeout value to -1 will cause lock requests to wait indefinitely. WARNING: Do not disable timeouts (set timeout = -1) on a deployed/operational database unless you are absolutely certain that there is no way a deadlock can occur in your application. If you are using row-level locking it is highly unlikely that you can be certain your application is deadlock free. Disabling timeouts is a feature intended primarily for diagnosis and testing. 10.2 Transaction Modes RDM Server SQL automatically controls locking of accessed rows during the processing of select, insert, update, and delete statements. There are several methods provided by RDM Server SQL that allow you to control the behavior of the locking operation. The following two set statements are used to establish the desired multi-user operational behavior. set_read_repeatability: set read repeatability [to | =] {on | off} set_transaction: set trans[action] isolation [to | =] {on | off} The effects of these two statements are described in table 10-3 below. Table 10-3. Transactions Control Settings Part I transaction repeatable description isolation reads on on This is called read repeatability mode. Changes from other connections (users) are not visible until committed. All rows are locked. Read locks within a transaction are kept until the transaction commits or is rolled back. on off Called cursor stability mode; this is the default mode for RDM Server SQL. Changes are not visible to other connections until committed. A read lock is kept for the current row only. When the cursor is advanced to the next row the current row is freed. off on Allows dirty reads outside of a transaction whereby uncommitted changes from other connections are visible and read locks are not required to read data from the database. Inside a transaction, behavior is identical to read repeatability mode. off off Allows dirty reads outside of a transaction whereby uncommitted changes from other connections are visible and read locks are not required to read data from the database. Inside a transaction, behavior is identical to cursor stability mode. SQL User Guide 131 10. Shared (Multi-User) Database Access Regardless of what mode you are in, all rows that are modified through execution of an insert, update, or delete statement are write-locked and remain so until the transaction is ended through either a commit or a rollback. Because of this, it is a good idea to keep the sizes of your transactions small. The more rows that are changed within a transaction, the more locks the server must manage. The overhead associated with this lock management could become excessive. A commit/rollback will free all of the write locks. Thus, short transactions can increase system throughput. Read repeatability mode is the strictest form of transaction isolation available. In this mode, every row that is read within a transaction is read-locked and kept locked until the transaction ends. Thus, rows that are re-fetched inside a transaction are guaranteed to have the same values. Cursor stability mode is the RDM Server SQL system's default mode. In this mode, a read lock is placed on each row as it is fetched. When the next row is fetched, the lock on the current row is freed and the new row is locked. Thus, only the current row is locked at any time. So-called "dirty read" mode is useful in situations where the preciseness of the data is not particularly important. It could be used, for example, when you are looking for a particular row of a table of the "I'll know it when I see it" variety. Its advantage is that it does not place any locks and, therefore, does not get blocked by any rows in its path that happen to be write-locked nor will it block other write-lock requests. The transaction and read repeatability modes can also be set using the form of the set transaction statement shown below. set_transaction: set trans[action] trans_mode[, trans_mode] trans_mode: isolation level {read uncommited | read committed | repeatable read} This form adheres to standard SQL with the modes set as indicated in the table below. Table 10-4. Transactions Control Settings Part II mode setting transaction isolation repeatable reads read uncommitted on on read committed off on repeatable read off off SQL User Guide 132 11. Stored Procedures and Views 11. Stored Procedures and Views 11.1 Stored Procedures 11.1.1 Create a Stored Procedure An RDM Server SQL stored procedure is a precompiled group of one or more SQL (DML) statements stored in the system catalog. A stored procedure is defined using the create procedure statement that conforms to the following syntax. create_procedure: | create proc[edure] procname "description" [(arg_spec[, arg_spec]...)] as proc_stmt... end proc[edure] create proc[edure] procname ["description"] in libname on devname arg_spec: argname type_spec [default constant] proc_stmt: | | | open | close | flush | initialize_database | insert | delete update | select | lock_table | call | initialize_table begin_trans | start_trans | commit | rollback | mark set | notify | update_stats insert: insert_values | insert_from_select The name of the stored procedure is procname and must be unique as stored procedures have system-wide scope. An optional "description" string can be specified and will be stored with the procedure definition in the system catalog. Procedures can have arguments. Associated with each argument is a name, data type and optional default value. The argname is a case-insensitive identifier that can be any name except that it must be unique in the argument list of this procedure. The data type declared for each argument can be any of the specified arg_type entries. Note that it is not necessary to specify a length for a character argument as any is interpreted as a string since the length is determined from the actual value passed to the procedure at the time it is invoked. The same is true for the precision and scaled of decimal type arguments. One or more SQL statements that comprise the body of the stored procedure are placed in the order in which they will be executed between the as and the end procedure clauses. They do not need to be separated by semi-colons. Only the specified SQL statements can be contained in a stored procedure. The syntax for each of those statements is defined elsewhere in this manual and/or the SQL Language Reference. The order_report procedure illustrated below is a typical example of how you might use a stored procedure. The arguments supply the date range over which a standard report is produced. If a value for end_date is not supplied when the procedure is executed, the end date be set to the current date, as defined by its default clause. create procedure order_report( start_date date, end_date date default today) as select sale_name, company, ord_num, ord_date, amount, tax SQL User Guide 133 11. Stored Procedures and Views from salesperson, customer, sales_order where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id and ord_date between start_date and end_date end procedure; Assuming that the salesperson's sale_id is the same as the user name, the check_tickle procedure below retrieves all of a salesperson's notes in date order for the specified note_id. Note that you can abbreviate procedure as proc. create proc check_tickle(id char) as select note_date, cust_id, textln from note, note_line where note_id = id and sale_id = user() and note.note_id = note_line.note_id and note.note_date = note_line.note_date order by 1, 2; end proc; The preceding examples contain only one statement, but a stored procedure can contain any number of statements. The example procedure below, product_summary, uses two select statements. This stored procedure shows the total amount of a particular product stored at all outlets, followed by the total amount of that same product that has been ordered. create proc product_summary(pid smallint) as select prod_id, prod_desc, sum(quantity) total_available from product, on_hand where prod_id = pid and product.prod_id = on_hand.prod_id select prod_id sum(quantity) total_ordered from item where prod_id = pid end proc; 11.1.2 Call (Execute) a Stored Procedure An RDM Server SQL stored procedure is called through an call (execute) statement that references the procedure that conforms to the syntax shown below. call: {call | exec[ute]} procname [( arg_value [, arg_value]... ) arg_value: constant | ? | argname | corname.colname An argument value, arg_value, can be one of the following: l a constant that is compatible with its declared argument type, l a parameter marker (?) but not if the procedure is being called from within another procedure or a trigger, l the name of an argument of the stored procedure containing this call statement, l a reference to an old or new column value within a trigger definition. SQL User Guide 134 11. Stored Procedures and Views RDM Server SQL returns control to the calling application after it completes processing. When an error occurs in the execution of any of the statements in the stored procedure, the procedure immediately terminates and returns an error code to the calling application. If the stored procedure has arguments, the call statement must specify a value (or a place holder) for each one and they must be specified in the same order they were defined in the create procedure statement. The application does not have to supply a value for an argument that has a default value, but it does need to supply a comma as a placeholder for that parameter, as the following example illustrates. call myproc(17002,,1); The statement in the next example invokes the order_report stored procedure created in the previous section. call order_report(date "06-01-1997", date "06-30-1997"); Since the next statement is executed on 6/30/97, it will produce the same results as the preceding statement because of the default value specified for end_date. Note the use of exec (the alternate form of the execute statement) and the comma placeholder for the final default parameter. exec order_report(date "06-01-1997",); The following example invokes the check_tickle stored procedure defined in the prior section. call check_tickle("PROSPECT"); 11.2 Views 11.2.1 Create View A view is a table derived from the results of the select statement which defines the view. Views can be used just like any table, but it does not contain any rows of its own. Instead, it is solely composed of rows returned from its select statement on the underlying base tables. The syntax for the create view statement is shown below. create_view: create view [dbname.]viewname ["description"] [ (colname [, colname ]...) ] as select expression[, expression]... from table_ref [, table_ref]... [where cond_expr] [group by col_ref [, col_ref ]... [having cond_expr] ] [with check option] The table defined by the view is the one that results from executing the specified query. The select expressions constrain the visible columns in the view. The where clause constrains the rows of the view to only those that satisfy its condition. If a list of column names is specified, there must be one column name (colname) for each select expression. The value associated with each column is the value of its respective select expression (for example, the value of the fourth column is the result of the fourth expression). If a column name list is not specified, the column names of the view are the same as the column SQL User Guide 135 11. Stored Procedures and Views names in the select statement. If any of the columns in the select statement are expressions or if two columns have the same name, a column name list must be specified. Note that the select statement defining the view cannot have an order by clause. In the following example, the create view statement defines a view that provides a summary of the total order amounts per salesperson per month for the current year. The sales_summary view contains three columns, sale_name, sales_month (month in which the order was taken), and order_tot (total orders for the salesperson for the month). create view sales_summary(sales_month, sale_name, order_tot) as select month(ord_date), sale_name, sum(amount) from salesperson, customer, sales_order where year(ord_date) = year(curdate()) and salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id group by 1, 2; 11.2.2 Retrieving Data from a View Use a view in exactly the same way you would use any table. For example, the select statement shown below uses the view defined in the previous section. select * from sales_summary where sales_month in (1,2); SALES_MONTH 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 SALE_NAME Flores, Bob Jones, Walter Kennedy, Bob McGuire, Sidney Nash, Gail Porter, Greg Robinson, Stephanie Stouffer, Bill Warren, Wayne Williams, Steve Wyman, Eliska Flores, Bob Jones, Walter Kennedy, Bob McGuire, Sidney Nash, Gail Robinson, Stephanie Stouffer, Bill Warren, Wayne Williams, Steve Wyman, Eliska ORDER_TOT 19879.5 76887.87 328070.83 3437.5 136250 74034.9 29073.51 15901.61 79957.98 32094.75 173878.57 8824.56 86065 103874.8 9386.25 3927.9 164816.47 4049.09 11265.92 62340 74851.2 The next example includes an order by clause. Notice that although the order_tot value is calculated using an aggregate function (sum), the comparison is specified in the where clause and not in a having clause. If the comparison were defined as part of the view, it would need to be in the having clause of the create view's select statement. select order_tot, sales_month, sale_name from sales_summary where order_tot > 10000.0 order by 1 desc; ORDER_TOT 328070.83 SQL User Guide SALES_MONTH 1 SALE_NAME Kennedy, Bob 136 11. Stored Procedures and Views 252425 173878.57 164816.47 143375 137157.05 136250 104019.5 103874.8 103076.79 4 1 2 3 4 1 6 2 6 Porter, Greg Wyman, Eliska Robinson, Stephanie Kennedy, Bob Wyman, Eliska Nash, Gail Nash, Gail Kennedy, Bob Wyman, Eliska 11.2.3 Updateable Views An updateable view can be the table referenced in an insert, delete or update statement. A view is considered updateable when the select statement defining the view meets all the following conditions. l It does not contain a subquery or a distinct, group by or order by clause. l It does not contain any column expressions. l It has a from clause referring to only a single table. . A view having a with check option specification must be updateable. When specified, the with check option requires that any insert or update statements referencing the view must satisfy the where condition of the view's defining select statement. The following create view defines a view that restricts the outlet table to only those rows located in western region states. create view west_outlets as select * from outlet where state in ("CA","CO","OR","WA") with check option; If you attempted to insert a row into the west_outlets view with a state value other than one of the states listed in the where clause, the insert statement would be rejected as in the example below. If the with check option had been omitted in the view definition, the row would have been stored. insert into west_outlets values("SAL","Salem","MA",0); *** integrity constraint violation: check 11.2.4 Drop View When a view is no longer needed, you can delete the view from the system by issuing a drop view statement. drop_view: drop view [dbname.]viewname [cascade | restrict] The cascade option (the default) causes all other views referencing this view to be automatically dropped by the system. The restrict option prohibits dropping viewname if any other views exist that reference this view. The example code below creates two views to help illustrate the use of the drop view statement. create view acct_orders as select sale_id, sale_name, cust_id, ord_num, ord_date, amount SQL User Guide 137 11. Stored Procedures and Views from salesperson, customer, sales_order where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id; create view sales_summary(sales_month, sales_month, order_tot) as select month(ord_date), sale_name, sum(amount) from acct_orders where year(ord_date) = year(curdate()) group by 1, 2; The following statement will be rejected by SQL because a dependent view exists (sales_summary). drop view acct_orders restrict; The next statement, however, will drop not only the acct_orders view but sales_summary as well. drop view acct_orders cascade; The same result will occur using the next statement because cascade is the default action. drop view acct_orders; Thus, when in doubt, always specify restrict. You cannot undo (or roll back) a drop view. 11.2.5 Views and Database Security One of the more important uses of views is in conjunction with database security. Views can have permissions assigned to them just as with any table. If it is important to be able to restrict which columns from a base table users can have access to, you can simply define a view that includes only those columns. The view would be accessible to those users, whereas the base table would not. For example, the following view could be used to hide personal information about salespersons such as date of birth and commission rate from unauthorized eyes. create view sales_staff as select sale_id, sale_name, region, office, mgr_id from salesperson; Once the proper permissions have been established, the dob and commission columns would not be accessible to normal users. You can also use views to restrict rows from certain users. The west_outlets view in the last section could be set up so that those salespersons from the western region could only access information (for example, inventory quantities) from offices located in those particular states. SQL User Guide 138 12. SQL Database Access Security 12. SQL Database Access Security Database security provides the ability to restrict user access to database information through restrictions on the database columns and tables, or on the kinds of statements that a particular user can use. The RDM Server system has two classes of users. Administrator users have full access rights to all system capabilities and databases. Normal users have only the access rights granted to them by administrators and database owners. A system will typically have only a single administrator user (often referred to as the system or database administrator). RDM Server does not, however, require that there be only one administrator. RDM Server SQL provides two classes of access privileges: command privileges and database access privileges. Command privileges allow an administrator to specify the kinds of commands that a particular user is allowed to use. Database access privileges allow an administrator or database owner to specify the database information and operations that a particular user is allowed to access. User access rights are assigned for an RDM Server SQL database using the grant and revoke statements. To manipulate an RDM Server SQL database, a user must have both table and command privileges. For example, RDM Server does not allow a user without delete command privileges to issue a delete statement, even if delete data access privileges on the table have been granted to that user. Attempts to execute an SQL statement by a user for which proper access privileges have not been granted will result in an access rights violation error returned from RDM Server SQL. Changes to a user's security settings do not take effect until the next time that user logs in to RDM Server. 12.1 Command Access Privileges 12.1.1 Grant Command Access Privileges Command privileges specify the kinds of RDM Server SQL statements available to a user for database manipulation. The form of the grant statement that is used to do this is defined by the following syntax. grant: grant cmd_spec to user_id[, user_id]... cmd_spec: | all commands [but command [, command]...] commands command [, command]... command: | | create {database | proc[edure] | trigger} insert | update | delete lock table | unlock table The user_id is an identifier that is case-sensitive and must exactly match the user id for the desired user. Two methods of granting command privileges can be used. You can grant all commands but and list only those commands the user cannot execute or you can grant commands followed by the list of only those command the user can issue. The specific command privilege classes that can be granted (or not granted) are given in the table below. All other commands (including select) can be issued by any user. SQL User Guide 139 12. SQL Database Access Security Table 12-1. Command Privilege Definitions Command Class Description create database Allows user to issue any DDL statement or create, alter, or drop trigger statement. create view Allows user to define his/her own views. create procedure Allows user to define his/her own stored procedures. create trigger Allows user to define triggers. insert, update, or delete Allows user to issue insert, update, or delete statements. lock table Allows user to issue lock and unlock table statements. The example below grants permission for all users to issue any statements except DDL statements. It allows only the users George and Martha to create databases. grant all commands but create database to public; grant all commands to George, Martha; The next example restricts the user Jack to issuing select, update, and create view statements. grant commands create view, update to "Jack"; 12.1.2 Revoke Command Access Privileges To rescind command privileges, an administrator can issue a revoke which identifies the specific commands that a user can no longer issue. As with grant, two methods of specification are allowed as shown below. One form identifies the commands from the restricted list that the user cannot use. The other form (all but) identifies the commands from the restricted list that the user can use. revoke: revoke cmd_spec to user_id[, user_id]... cmd_spec: | all commands [but command [, command]...] commands command [, command]... command: | | create {database | proc[edure] | trigger} insert | update | delete lock table | unlock table The privileges that are being revoked must have been previously granted. The specified privileges can be revoked from all users (public) or be restricted from only the users listed in the revoke command. The example below grants permission for all users to issue any statements except DDL statements. It allows only the users George and Martha to create databases. grant all commands but create database to public; grant all commands to George, Martha; The next example restricts the user Jack to issuing select, update, and create view statements. SQL User Guide 140 12. SQL Database Access Security grant commands create view, update to Jack; 12.2 Database Access Privileges 12.2.1 Grant Table Access Privileges Database access privileges allow an administrator or database owner to specify the database information and operations that a particular user is allowed to access. You can assign user access privileges to database tables, views, and columns using the following form of the grant statement. grant: grant item_spec to {public | user_id[, user_id]...} [with grant option] [cascade | restrict] item_spec: {privilege[, privilege]... | all [privileges] } on [dbname.]tabname privilege: select | delete | insert | update [(colname[, colname]...)] | trigger The creator of a database (that is, the user who issued the create database statement) is the owner of that database. When a database is created, only the owner and administrator users are allowed to access that database. The owner can grant other users certain access privileges to the database. The grant statement is used to assign these access privileges to other users. Particular privileges can be granted to specific users or to all users (public). The with grant option grants the specified users the right to issue other grant statements on the specified table. The cascade option indicates that the access privilege is to cascade down to the RDM Server core level access rights settings for the user. This only matters where the specified user(s) will be executing application components that perform core-level access to the SQL database. The restrict option applies only to SQL usage and is the default. The types of access privileges are defined in the following table. Table 12-2. Database Access Privilege Definitions Command Class Description all privileges Allows user all of the following access privileges on the table. select Allows user to issues select statements on the table. insert Allows user to insert rows into the table. delete Allows user to delete rows from the table. update Allows user to update any column of any row in the table. update (colname [, colname ]...) Allows user to update only the listed columns of any row in the table. trigger Allows user create, alter, or drop a trigger on the table. Note that users who are granted a trigger privilege on a table must also have the create database command privilege. In the example below, the system administrator or database owner is allowing all users privileges to issue select statements to query invntory database tables. Only users George and Martha have permissions to modify the database. grant select on invntory.product to public; grant select on invntory.outlet to public; grant select on invntory.on_hand to public; SQL User Guide 141 12. SQL Database Access Security grant all on invntory.product to George, Martha; grant all on invntory.outlet to George, Martha; grant all privileges on invntory.on_hand to George, Martha; The following example illustrates how you can use a view to restrict access to a portion of a database table. create view skk_customers as select * from customer where sale_id = "SKK" with check option; grant all privileges on skk_customers to Sidney; 12.2.2 Revoke Table Access Privileges The revoke statement is used to rescind a user's database table access privileges that had been previously granted. The syntax for the revoke statement is shown below. revoke: revoke item_spec to {public | user_id[, user_id]...} [with grant option] [cascade | restrict] item_spec: {privilege[, privilege]... | all [privileges] } on [dbname.]tabname privilege: select | delete | insert | update [(colname[, colname]...)] | trigger The specified privileges can be revoked from all users (public) or be restricted from only the users specified in the revoke command. As with grant, the cascade option indicates that the access privilege is to cascade down to the RDM Server core level access rights settings for the user. This only matters where the specified user(s) will be executing application components that perform core-level access to the SQL database. The restrict option applies only to SQL usage and is the default. In the example below, the system administrator or owner is revoking George's access privileges for several tables of the invntory database. revoke insert, update, delete on product from George; revoke insert, update, delete on outlet from George; revoke insert, update, delete on on_hand from George; The next example shows an rsql script that automatically drops the home_sales view when user Martha's access privilege on the salesperson table is revoked. .c 1 RDM Server Admin xyzzy create view home_sales as select sale_name from salesperson where office = "SEA"; grant select on home_sales to Martha; .c 2 RDM Server Martha HipposAreHip select * from home_sales; SALE_NAME Flores, Bob Porter, Greg Stouffer, Bill Blades, Chris SQL User Guide 142 12. SQL Database Access Security .d 2 .c 1 revoke select on home_sales from Martha; .c 2 RDS Martha HipposAreHip select * from home_sales; ****RSQL Diagnostic 4200: user access rights violation: home_sales SQL User Guide 143 13. Using SQL in a C Application Program 13. Using SQL in a C Application Program You use the RDM Server SQL system from a C application program by making calls to the RDM Server SQL application programming interface (API) library functions. The RDM Server SQL API is based on the industry standard Open Database Connectivity API specification developed by Microsoft. A complete description of the ODBC standard is available on the Web at: http://msdn.microsoft.com/en-us/library/ms710252(v=vs.85).aspx. SQL statements are dynamically compiled and executed and result sets are retrieved from the RDM Server through these function calls. Raima has also included a variety of additional functions in order to support RDM Server specific capabilities. The RDM Server ODBC functions allow you to connect to one or more RDM Servers on the network as depicted below in Figure 13.1. A given client application program can have any number of active connections. You can even have more than one connection from a client to one server. Since each connection has its own individual context, you can simultaneously be processing active statements in multiple connections. Figure 13-1. RDM Server Client-Server Application Architecture SQL statements are compiled and executed using different functions. Once compiled, a statement can be repeatedly executed without having to be recompiled. Statements can contain parameter markers that serve as place holders for constant values that are bound to program variables when the statement is executed. New sets of parameter values are assigned by simply changing the value of the program variable before re-executing the statement. The set of select statement result rows (result set) is retrieved a row at a time. The result columns can either be bound to program variables or individually retrieved one column at a time after each result row has been fetched. SQL User Guide 144 13. Using SQL in a C Application Program Cursors can be defined for a select statement to support positioned updates and deletes. An update or delete statement can refer to the cursor associated with an active select statement to modify a particular row of a table. Several functions are provided through which you can interrogate RDM Server SQL about the nature of a compiled statement. For example, you can find out how many columns are in the result set as well as information about each one (such as the column name, type, and length). RDM Server SQL also provides additional function calls to utilize RDM Server SQL enhancements or to simply provide information not included in the standard ODBC API. For example, RDM Server SQL includes a non-ODBC function that will tell you the type of statement after the statement has been compiled (prepared). One of the most powerful features of RDM Server SQL is its extensibility provided through its server-based programming capabilities. The RDM Server SQL API that is used in server-based programming has additional functions to support, for example, User-Defined Function (UDF) and User-Defined Procedure (UDP) implementations as described in Developing SQL Server Extensions. 13.1 Overview of the RDM Server SQL API This section contains summary descriptions of all of the RDM Server SQL functions. The functions are organized into tables from the following usage categories: l Connecting to RDM Server database servers l Setting and retrieving RDM Server SQL options l Preparing (compiling) SQL statements l Executing SQL statements l Retrieving result information and data l Terminating statements and transactions l Terminating RDM Server server connections l System catalog access functions l RDM Server SQL support functions l ODBC support functions Each client application program accesses RDM Server SQL client interface functions through handles which are initially allocated through a call to function SQLAllocHandle. There are four types of handles used in the ODBC API. An environment handle that is used to keep track of the RDM Server connections utilized by the client application. Each connection has an associated connection handle that is first allocated and then passed to function SQLConnect to log in to a specified RDM Server SQL server. All statements that are to be executed on that server are associated with that particular connection handle. A statement handle is used to keep track of all of the information related to the compilation and execution of an SQL statement. A descriptor handle which is used for keeping track of information about columns and parameters. The functions used to establish a connection with an RDM Server are described below. Table 13-1. Server Connection Functions Function Name Purpose SQLAllocHandle Allocates an environment, connection, statement, or descriptor handle. Only one environment SQLConnect SQLDriverConnect SQL User Guide handle is used by each client program. Each environment handle can support multiple connections. Connection handles manages information related to one RDM Server connection. Statement handles manage information related to one RDM Server SQL statement. Descriptor handles are used to hold information about SQL statement parameters and result columns. Connects and logs in to the specified Raima Database Server with the specified user name and password. Connects and logs in to the specified Raima Database Server with the specified user name and password. May prompt user for further information. 145 13. Using SQL in a C Application Program Function Name SQLSessionId SQLConnectWith Purpose Called by the application to get the RDM Server session id associated with an SQL connection handle. The session id is used with the RDM Server remote procedure call function rpc_emCall to call an RDM Server extension module from a client application. Called by an extension module, UDF or UDP to get the RDM Server SQL connection handle associated with an RDM Server session id. The RDM Server SQL system supports a variety of runtime operational control options (attributes). These include four levels of multi-user locking and transaction isolation control. These options can be set for all of the statements executed on a particular connection or for a single statement and are managed using the following functions. Table 13-2. SQL Control Attribute Functions Function Name Purpose SQLGetEnvAttr Returns an environment attribute setting. SQLSetEnvAttr Sets an environment attribute. SQLGetConnectAttr SQLSetConnectAttr SQLSetStmtAttr SQLGetStmtAttr Returns a current connection attribute setting. Sets a connection attribute. Sets a statement attribute. Returns a current statement attribute setting. SQL statements are submitted to an RDM Server SQL server as text strings. As such, they need to be compiled into a form that is suitable for efficient execution. All functions that involve some kind of operation on a specific SQL statement use the same statement handle which must first be allocated via a call to SQLAllocHandle. The functions that are called before a statement can be executed are listed below in Table 13-3. Table 13-3. SQL Statement Preparation Functions Function Name Purpose SQLAllocHandle Allocates a statement handle. SQLGetCursorName Returns the cursor name associated for the statement handle. SQLSetCursorName Sets the cursor name for the statement handle. SQLPrepare SQLBindParameter Prepares an RDM Server SQL statement for execution. Binds a client program variable to a particular SQL parameter marker. Execution of a previously prepared SQL statement is performed through a call to the SQLExecute function. You can both prepare and execute a statement with a single call to the SQLExecDirect function. The execution control functions are listed in Table 2 4. Table 13-4. Statement Execution Functions Function Name Purpose SQLExecute Executes a previously prepared statement. SQLExecDirect Prepares and executes a statement. SQLNumParams SQLDescribeParam SQLParamData SQLPutData Returns the number of parameter markers in a statement. Returns a description (e.g., data type) associated with a parameter marker. Used along with SQLPutData to provide parameter (usually blob data) values. Assigns specific (or next chunk of a blob) value for a parameter. Much of the work performed by an RDM Server SQL application will be associated with the processing of the data actually retrieved from the database on the server. This work entails making inquiries to RDM Server SQL about the characteristics of SQL User Guide 146 13. Using SQL in a C Application Program a compiled or executed statement, fetching results, and processing errors. The functions used in this regard are summarized below. Table 13-5. Results Processing Functions Function Name Purpose SQLRowCount Returns the number of rows affected by the last statement (insert, update, or delete). SQLNumResultCols Returns the number of columns in the select statement result set. SQLDescribeStmt SQLDescribeCol SQLColAttribute SQLBindCol SQLFetch SQLFetchScroll SQLSetPos SQLGetData SQLMoreResults SQLGetDiagField SQLGetDiagRec SQLWhenever SQLError Returns the type of statement that is associated with the specified statement handle. Returns a description of a column in the select statement result set. Returns additional attribute descriptions of a column in the select statement result set. Specifies the location of a client program variable into which a column result is to be stored. Retrieves the next row of select statement result. Retrieves a rowset of select statement result rows. Sets the cursor position within a static cursor. Returns a column value from the result set. Determines if there are more result sets to be processed and, if so, executes the next statement to initialize the result set. Retrieves the current value of a field in the diagnostic record associated with the statement. Retrieves the current value of the diagnostic record associated with the statement. Registers the address of a function in the client program that is to be called by RDM Server SQL whenever the specified error occurs. Returns error or status information. The processing of a statement is terminated using several functions depending on the desired results. Database modification statements (that is, insert, update, or delete) are terminated by either committing or rolling back the changes made during a transaction. When you have finished your use of a statement handle, you should free the handle so that the system can free all of the memory associated with it. The functions that perform these operations are described below. Table 13-6. Statement Termination Functions Function Name Purpose SQLFreeStmt Ends statement processing and closes the associated cursor and discards pending results. SQLCloseCursor Closes the cursor on the statement handle. SQLFreeHandle SQLCancel SQLEndTran Frees statement handle and all resources associated with the statement handle. Cancels an SQL statement. Commits or rolls back a transaction. A client application program ends by disconnecting from all servers that it is connected to and then freeing the connection handles and the environment handle using the following functions. Table 13-7. Connection Termination Functions Function Name Purpose SQLDisconnect Closes the connection. SQLFreeHandle Frees the connection or environment handle. Several functions are provided which allow an SQL application to retrieve database definition information from the SQL system catalog. These functions each automatically execute a system-defined select statement or stored procedure that returns a result set that can be accessed using SQLFetch. SQL User Guide 147 13. Using SQL in a C Application Program Table 13-8. Catalog Access Functions Function Name Purpose SQLTables Retrieves result set of table definitions. SQLColumns Retrieves result set of column definitions. SQLForeignKeys Retrieve information about a table's foreign key columns. SQLPrimaryKeys Retrieve information about a table's primary key columns. SQLSpecialColumns SQLProcedures SQLStatistics Retrieves result set of columns that optimally access table rows. Retrieves result set of available stored procedures. Retrieves result set of statistics about a table and/or indexes. The RDM Server SQL support functions are provided to facilitate use of the direct access capabilities of RDM Server. These functions assist in retrieving rowid values that are automatically assigned by RDM Server SQL as well as allowing SQL programs to easily utilize the low-level RDM Server (Core API) function calls when necessary. Table 13-9. RDM Server SQL Support Functions Function Name Purpose SQLRowId Returns the rowid of the current row. SQLRowDba Returns the RDM Server database address of the current row. SQLDBHandle Returns the RDM Server database handle for an open SQL database. SQLRowIdToDba SQLDbaToRowId Converts SQL row id to RDM Server database address. Converts RDM Server database address to SQL rowid. With the Microsoft ODBC specification, third-party front-end tool vendors can call functions that provide information describing the capabilities that are supported by a back-end database. SQLGetInfo, SQLGetTypeInfo, and SQLGetFunctions can be called to discover the ODBC features that are supported in RDM Server SQL. Table 13-10. ODBC Support Functions Function Name Purpose SQLNativeSql Translates ODBC SQL statement into RDM Server SQL. SQLGetFunctions Retrieves information about RDM Server SQL-supported functions. SQLGetInfo Retrieves information about RDM Server SQL-supported ODBC capabilities. SQLTypeInfo Retrieves result set of RDM Server SQL data types. 13.2 Programming Guidelines This section gives an overview of the calling sequences for accessing the RDM Server SQL server through the RDM Server SQL C API. Most of the function calls must be made in a particular sequence. In most cases, the sequence is quite natural. The guidelines given below illustrate the calling sequences for several of the standard types of operations. Actual programming examples are provided in subsequent sections. This section gives an overview of the calling sequences for accessing the RDM Server SQL server through the RDM Server SQL C API. Most of the function calls must be made in a particular sequence. In most cases, the sequence is quite natural. The guidelines given below illustrate the calling sequences for several of the standard types of operations. Actual programming examples are provided in subsequent sections. Figure 13-2 shows the sequence of calls required to connect to a particular RDM Server. The first call to SQLAllocHandle allocates an environment handle that is then passed to the next call to SQLAllocHandle which is used to allocated a connection handle. The connection handle is passed to SQLConnect, which in turn connects to the specified RDM Server. All SQL User Guide 148 13. Using SQL in a C Application Program of the activity associated with a particular connection is identified by the connection handle. RDM Server SQL allows an application to open any number of connections to any number of RDM Server systems. Figure 13-2. Connecting to RDM Server Flow Chart Function SQLDisconnect will close (log out from) the RDM Server connection. An error is returned if any uncommitted transactions are pending on the connection. Any active statements are automatically freed for the specified connection. Before calling SQLDisconnect, you should explicitly close (SQLFreeStmt) all active statements for a particular connection. The connection handle can be reused in another SQLConnect or released by a call to SQLFreeHandle. When all connections have been closed and freed, a final call to SQLFreeHandle is made to free the environment handle. A flow chart that gives a typical sequence of calls for processing a select statement is shown in Figure 13-3. A statement is associated with a statement handle allocated by the call to SQLAllocHandle. Any number of statement handles (i.e., separate SQL statements) can be active at a time for a given connection. Statement handles are analogous to cursors when the SQL statement associated with the statement handle is a select statement. Function SQLPrepare is used to compile (but not execute) an SQL statement. If SQLPrepare is successful, you can call functions SQLDescribeCol and SQLNumResultCols to get information about the result columns such as the column name, data type and length. This is used so that the appropriate host variables can be set up (through calls to SQLBindCol) to hold the column values for each result row. SQL User Guide 149 13. Using SQL in a C Application Program Figure 13-3. Select Statement Processing Flow Chart SQL statements can have embedded parameter markers. A parameter marker is specified by a '?' in a position that would normally take a literal constant. The host variable for each parameter value must be specified by a call to SQLBindParameter before the statement is executed by SQLExecute. SQL User Guide 150 13. Using SQL in a C Application Program Each row of the result set is retrieved one-at-a-time through the call to SQLFetch. When all rows have been fetched, SQLCloseCursor is called to terminate processing of the select statement. In this example, the handle is then freed by the call to SQLFreeHandle, so it can no longer be used. Alternatively, the statement could just be closed, terminating the current select statement execution but still allowing the statement to be reexecuted at a later time. Notice in this example that statement compilation and execution are performed by separate functions. This allows the same statement to be executed multiple times without having to recompile it. For example, you might specify a different set of parameter values for each subsequent execution. The flow chart shown below shows a modified segment of the prior flow chart to indicate how this is done. Figure 13-4. Select Statement Re-Execution Flow Chart Figure 13-5 gives a flow chart showing a sequence of calls that perform a positioned update. A positioned update statement involves the use of two statement handles. The one associated with the select statement is the cursor. The update statement is executed through the other statement handle once the cursor has been positioned to the desired row. The particular cursor on which the update is performed can be specified two ways. In this example, function SQLSetCursorName is called to specify a user-defined cursor name. That name would then need to be referenced in the where current of clause in the update statement text compiled by the call to SQLPrepare. Alternatively, function SQLGetCursorName could be called to retrieve a system-generated cursor name which would need to be incorporated into the update statement string prior to the call to SQLPrepare used to compile it. When all updates have been completed, function SQLEndTrans is called to commit the changed rows to the database. This could also be done by a call to SQLExecDirect to compile and execute a commit statement. SQL User Guide 151 13. Using SQL in a C Application Program Figure 13-5. Positioned Update Flow Chart 13.3 ODBC API Usage Elements This section describes the basic elements that are used by the RDM Server SQL ODBC API functions. Included are descriptions of the standard header files, data type and constant definitions contained in those header files that are used in the function calls for argument types, indicator and descriptor variables, and status return codes. 13.3.1 Header Files Your RDM Server SQL C application must include at least one of the three standard header files described below in Table 13-11. The files can be found in the RDM Server include directory. Table 13-11. RDM Server SQL Header Files File Description sql.h Standard ODBC Core-level header file. Includes prototypes, data and constant definitions for sqlext.h sqlrds.h the ODBC Core-level functions. Automatically included by sqlext.h. Microsoft ODBC levels 1 and 2 extensions header file. Includes prototypes and data definitions for the ODBC level 1 and 2 functions. Automatically included by sqlrds.h. RDM Server SQL main header file. Includes prototypes and data definitions for all functions used with RDM Server SQL. Inclusion of sqlrds.h in your application provides access to all RDM Server SQL capabilities. You can include sql.h with your application to ensure its conformance to only the ODBC Core-level specification. Include sqlext.h to ensure conformance to the full ODBC specification. The sqlext.h file automatically includes sql.h. The sqlrds.h file automatically includes sqlext.h. SQL User Guide 152 13. Using SQL in a C Application Program The sql.h and sqlext.h files in the RDM Server include directory are our own developed versions of the same files that are part of the Microsoft ODBC SDK. 13.3.2 Data Types SQL API uses a special set of type definitions. Rather than relying on the base types defined in the ANSI C programming language, data types that map into the standard C data types have been defined. The function arguments have been specified using these ODBC-defined data types. The application variables that you pass in to the functions must be declared with the proper data type. Those defined as either int64 or int32 will be int64 on a 64-bit RDM Server installation otherwise int32. There are also some RDM Server-specific data types that are not included in the table. These are described in the SQL API Reference Manual in the descriptions of the functions that use them. Table 13-11. SQL API Data Type Descriptions Type Name Description SQLHANDLE Generic handle. Can be any one of the four handle types: SQLHENV, SQLHDBC, SQLHSTMT, SQLHENV SQLHDBC SQLHSTMT SQLHDESC SQLPOINTER SQLLEN SQLULEN SQLCHAR SQLWCHAR SQLSMALLINT SQLUSMALLINT SQLINTEGER SQLUINTEGER SQLBIGINT SQLUBIGINT SQLREAL SQLFLOAT SQLDOUBLE SQLDECIMAL SQLNUMERIC SQLDATE SQLTIME SQLTIMESTAMP SQLVARCHAR SQLRETURN DATE_STRUCT SQL_DATE_STRUCT TIME_STRUCT SQL_TIME_STRUCT TIMESTAMP_STRUCT SQL_TIMESTAMP_STRUCT SQL User Guide SQLHDESC: "void *". Environment handle. Connection handle. Statement handle. Descriptor handle. Generic pointer variable: "void *". Signed buffer/string length variable: int64 or int32. Unsigned buffer/string length variable: int64 or int32. Standard character: unsigned char. Wide character: usually wchar_t. int16. uint16. int32. uint32. int64. uint64. float. double. double. unsigned char (byte array). unsigned char (byte array). unsigned char (string). unsigned char (string). unsigned char (string). unsigned char (string). Function return code: int16. Unpacked date struct. Unpacked time struct. Unpacked timestamp struct. 153 13. Using SQL in a C Application Program 13.3.3 Use of Handles The RDM Server SQL application uses several ODBC-defined handles. Introduced in ODBC 3, the SQLAllocHandle function is used to allocate all of the handles. An environment handle (SQLHENV type) is allocated by passing SQL_HANDLE_ ENV as the handle type. Although only one environment handle is required, more may be allocated if needed. Before executing any other ODBC function using the environment handle SQLSetEnvAttr should be called with SQL_ATTR_ODBC_ VERSION to set the version of ODBC that the application will use. A connection handle (SQLHDBC type) is allocated by passing SQL_HANDLE_DBC as the handle type. The application can open any number of connections to any number of RDM Servers, with each connection referenced through a separate connection handle. A statement handle (SQLHSTMT type) is allocated by passing SQL_HANDLE_STMT as the handle type. There is no restriction on the number of RDM Server SQL statement handles your application can use. However, to conserve server memory, it is good practice to keep to a minimum the number of active statement handles. Lastly, a descriptor handle (SQLHDESC type) is allocated by passing SQL_ HANDLE_DESC as the handle type. When your RDM Server SQL application needs to call a Core API function or a server-side extension module, it must use a Core session handle (RDM_SESS type) associated with the active server connection handle. (Each connection corresponds to a single RDM Server login session.) The application calls SQLSessionId to retrieve the session handle. For a particular server connection, your RDM Server SQL application might also need to call SQLDBHandle to obtain the database handle (RDM_DB type) that RDM Server uses for a Core database. With this handle, the application can use the runtime API (bypassing RDM Server SQL) to access database information. 13.3.4 Buffer Arguments The usage rules for passing buffer arguments are as follows: l l l l l Each RDM Server function argument pointing to a string or data buffer has an associated length argument. An input length argument contains the actual length of a string or buffer. If the application specifies a length value of SQL_NTS (an ODBC-specified negative constant meaning "null-terminated string"), the pointer must address a null-terminated string. A length greater than or equal to zero implies the input string is not null-terminated (for example, if your application is written in Pascal). Two length arguments are used for an output buffer. The first argument provides the size (in bytes) of the output buffer. The second argument is a pointer to a variable in which RDM Server returns the number of bytes actually written to the buffer. If the value is null, the function sets the result length to the ODBC negative constant SQL_NULL_ DATA. All RDM Server API functions that fill output buffers with character data write null-terminated strings. The buffers your application provides to receive this data must be long enough to hold the terminal null byte. A null can be passed for the result output argument as long as the database schema does not allow a null for the result field. If a null field is allowed in the database schema, then ODBC 3.51 requires the result output argument be supplied. If not supplied, errNOINDVAR will be returned. 13.4 SQL C Application Development 13.4.1 RDM Server SQL and ODBC RDM Server includes an ODBC driver and related files so that RDM Server can be accessed by Microsoft Windows applications through the ODBC Driver Manager (DM). The driver is installed through the instodbc utility for Microsoft Windows. Connecting and using a driver through the ODBC DM is described in detail in the Microsoft ODBC manual. Details of SQL User Guide 154 13. Using SQL in a C Application Program driver installation are given where applicable in Installing the Server Software and Installing RDM Server Client Software, from the RDM Server Installation / Administration Guide. Note that your application does not need to operate with the ODBC DM if RDM Server SQL is the only type of data source used. Since the ODBC API is the native API for RDM Server SQL, you can simply link an ODBC-compliant application directly to the RDM Server SQL API in the client library, eliminating the overhead of the ODBC Driver Manager. 13.4.2 Connecting to RDM Server An application often uses the following basic steps in processing RDM Server SQL statements: 1. Call SQLAllocHandle to allocate the environment handle for the program. 2. Call SQLAllocHandle to allocate a connection handle. 3. Call SQLConnect to connect the user to a specific RDM Sserver. Perform the following basic steps to end your RDM Server session: 1. Call SQLDisconnect to terminate the client connection to the server. 2. Call SQLFreeHandle to free the connection handle. 3. Call SQLFreeHandle to free the environment handle. The example below illustrates the use of these function calls. Note that the type definitions for the environment and connection handles are declared in the standard header file sql.h. Also note the use of the constant SQL_NTS for the length arguments in the call to SQLConnect to indicate that each of the char arguments is a standard C null-terminated string. #include "sql.h" char user[15]; /* user name */ char pw[15]; /* password */ SQLHENV eh; /* environment handle */ SQLHDBC ch; /* connection handle */ SQLHSTMT sh; /* statement handle */ ... SQLAllocHandle(SQL_HANDLE_ENV, NULL, &eh); SQLAllocHandle(SQL_HANDL_DBC, eh, &ch); /* fetch user name and password */ ClientLogin(user, pw); /* connect to MIS server */ stat = SQLConnect(ch, "MISserver", SQL_NTS, user, SQL_NTS, pw, SQL_NTS); if (stat != SQL_SUCCESS ) return ( ErrHandler() ); ... /* run MIS application */ SQLDisconnect(ch); SQLFreeHandle(SQL_HANDLE_DBC, ch); SQLFreeHandle(SQL_HANDLE_ENV, eh); You can establish connections to any RDM Server system that is available on the network. RDM Server SQL maintains each connection in a separate task context. You can even have multiple connections to the same RDM Server system where, for example, you may need to have separate task contexts to the same server. SQL User Guide 155 13. Using SQL in a C Application Program 13.4.3 Basic SQL Statement Processing An application often uses the following basic steps in processing RDM Server SQL statements: l Calls SQLAllocHandle (with SQL_HANDLE_STMT) to allocate a statement handle. This statement handle will be used in all the following steps. l Calls SQLPrepare to compile the statement. l If processing a select statement, calls SQLBindCol to bind column results to host program variables. l Calls SQLBindParameter, if necessary, to associate a host variable with a parameter referenced in the statement. l Calls SQLExecute to execute the statement. This is usually the end of processing for any statement other than select. l For select statements, calls SQLFetch or SQLFetchScroll to retrieve the result set. l When finished with the handle, calls SQLFreeHandle (with SQL_HANDLE_STMT) to free it. If reusing the handle, calls SQLCancel or SQLFreeStmt with the SQL_CLOSE setting. Alternatively, you may call SQLCloseCursor to close an open cursor for reuse. An application can call SQLExecDirect instead of making separate calls to SQLPrepare and SQLExecute but only when a single call to SQLExecute would be needed. The following example shows a simple, statement execution sequence that opens the example sales and invntory databases. It consists of a call to SQLAllocHandle to allocate a statement handle, a call to SQLExecDirect to compile and execute the open statement, and a call to SQLFreeHandle to drop the statement handle. The OpenDatabases function assumes that the server connection handle is valid. If it is not valid, SQLAllocHandle returns the SQL_INVALID_HANDLE error code. #include "sqlext.h" SQLRETURN OpenDatabases(SQLHANDLE dbc) { SQLRETURN stat; SQLHANDLE hstmt; if (stat = SQLAllocHandle(SQL_HANDLE_STMT, dbc, &hstmt)) == SQL_SUCCESS) stat = SQLExecDirect(hstmt, "open sales, invntory", SQL_NTS); SQLFreeHandle(SQL_HANDLE_STMT, hstmt); return stat; } 13.4.4 Using Parameter Markers To save processing time, your application can compile a statement once, by calling SQLPrepare, and then execute the statement multiple times with calls to SQLExecute. If the statement has embedded parameter markers ("?") in it, different values can be substituted for these parameters before each statement execution. The application calls the SQLBindParameter function before statement execution to associate host program variables with parameter markers. Each time the application calls SQLExecute for the statement, the current values from the bound variables are substituted in the statement for the parameter markers. The next example uses parameter markers with an insert statement to insert rows in the product table from data input by a user. The input is gathered by the local function GetValues, which sets the bound variables to the appropriate values. The SQLExecute call then executes the insert statement using the current values in the variables. In this example, note that the call to the SQLGetDiagRec function retrieves the sqlstate code and error message in the event that SQLExecute returns an error. The example also illustrates the use of SQLEndTran to commit or roll back database changes based on the occurrence of an error. SQL User Guide 156 13. Using SQL in a C Application Program #include <stdio.h> #include <stdlib.h> #include "sqlext.h" char insert[] = "insert into product(prod_id, prod_desc, price, cost) values(?,?,?,?)"; SQLHANDLE SQLHANDLE SQLHANDLE SQLHANDLE henv; hdbc; hstmt; error_handle; int16 prod_id; char prod_desc[40]; double price, cost; int16 handle_type; int main(void) { FILE *txtFile; char sqlstate[6], emsg[80]; char user[15], pw[8]; int32 lineno; SQLUSMALLINT txtype; if ((txtFile = fopen("product.txt", "r")) == NULL) abort("unable to open file\n"); SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &henv); SQLSetEnvAttr(henv, SQL_ATTR_ODBC_VERSION, (SQLPOINTER) SQL_OV_ODBC3, SQL_IS_INTEGER); SQLAllocHandle(SQL_HANDLE_DBC, henv, &hdbc); /* fetch user name and password */ ClientLogin(user, pw); /* connect to MIS server */ if ((stat = SQLConnect(hdbc, "MIS", SQL_NTS, user, SQL_NTS, pw, SQL_NTS)) != SQL_SUCCESS) { handle_type = SQL_HANDLE_DBC; error_handle = hdbc; goto quit; } SQLAllocHandle(SQL_HANDLE_STMT, hdbc, &hstmt); SQLPrepare(hstmt, insert, SQL_NTS); SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_SHORT, SQL_SMALLINT, 0, 0, &prod_id, 0, NULL); SQLBindParameter(hstmt, 2, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 40, 0, prod_desc, 40, NULL); SQLBindParameter(hstmt, 3, SQL_PARAM_INPUT, SQL_C_DOUBLE, SQL_FLOAT, 0, 0, &price, 0, NULL); SQLBindParameter(hstmt, 4, SQL_PARAM_INPUT, SQL_C_DOUBLE, SQL_FLOAT, 0, 0, &cost, 0, NULL); handle_type = SQL_HANDLE_STMT; error_handle = hstmt; while (GetValues(&prod_id, prod_desc, &price, &cost)) { if ((stat = SQLExecute(hstmt)) != SQL_SUCCESS) break; } SQL User Guide 157 13. Using SQL in a C Application Program quit: if (stat == SQL_SUCCESS) txtype = SQL_COMMIT; else { SQLGetDiagRec(handle_type, error_handle, 1, sqlstate, NULL, emsg, 80, NULL}; printf("***Line %d - ERROR(%s): %s\n", lineno, sqlstate, errmsg); txtype = SQL_ROLLBACK; } /* commit or roll back transaction */ SQLEndTran(SQL_HANDLE_DBC, hdbc, txtype); return 0; } 13.4.4 Premature Statement Termination Your application terminates a database modification statement (insert, update, or delete) by either committing or rolling back the changes made during the transaction. When the application finishes using a statement handle, it should free the handle so that RDM Server can free all associated memory. The application terminates processing of a select statement by a call to SQLFreeStmt, with the SQL_CLOSE option, SQLCloseCursor, or SQLCancel (see the following example). Any result rows that the application has not fetched are thrown out at this time. #include "sqlext.h" ... /* Print all rows of a table */ SQLRETURN PrintTable( SQLHDBC svr, /* server connection handle */ char *tabname) /* name of table whose rows are to be printed */ { char stmt[80]; SQLHSTMT sh; ... /* set up and compile select statement */ sprintf(stmt, "select * from %s", tabname); SQLAllocHandle(SQL_HANDLE_STMT, svr, &sh); if (SQLExecDirect(sh, (SQLCHAR *)stmt, SQL_NTS) != SQL_SUCCESS) return ErrHandler(); ... /* print all rows in table */ while (SQLFetch(sh) == SQL_SUCCESS) { ... if (cancelled_by_user) SQLCancel(sh); /* or, SQLFreeStmt(sh, SQL_CLOSE); */ ... } SQL User Guide 158 13. Using SQL in a C Application Program return SQL_SUCCESS; } 13.4.5 Retrieving Date/Time Values Your application can access and manipulate RDM Server SQL-specific date and time values at the runtime (d_) level with the VAL functions found in the RDM Server SQL API. These functions enable you to translate a date or time value from its native packed format (types *_VAL) into the ODBC format (types *_STRUCT), or vice versa. To use the date or time manipulation functions, your application must include the sqlrds.h file. Error codes that can be returned by these functions are defined in the valerrs.h header file, which will be automatically included when you include sqlrds.h. These files are found in the RDM Server include directory. Caution: Records changed via the RDM Server runtime API ignore SQL constraint checking. Therefore, if your application defines column constraints, it must validate the values before writing to the database. Note that the *_VAL data types (DATE_VAL, TIME_VAL, etc.) are always associated with internal storage format, while the *_STRUCT data types (DATE_STRUCT, etc.) are standard ODBC types. Both types are defined in RDM Server Reference Manual. Note also that the structure definitions are not currently produced by the ddlproc schema compiler; your application must declare them. 13.4.6 Retrieving Decimal Values RDM Server SQL provides support for the ODBC date, time, and timestamp data types. Database columns of those types can be returned in struct variables of type DATE_STRUCT, TIME_STRUCT, or TIMESTAMP_STRUCT. These structure types are declared in sqlext.h as shown below. typedef struct tagDATE_STRUCT SQLSMALLINT year; SQLUSMALLINT month; SQLUSMALLINT day; } DATE_STRUCT; { /* year (>= 1 A.D., for example, 1993) */ /* month number: 1 to 12 */ /* day of month: 1 to 31 */ typedef struct tagTIME_STRUCT SQLUSMALLINT hour; SQLUSMALLINT minute; SQLUSMALLINT second; } TIME_STRUCT; { /* hour of day: 0 to 23 */ /* minute of hour: 0 to 59 */ /* second of minute: 0 to 59 */ typedef struct tagTIMESTAMP_STRUCT { SQLSMALLINT year; /* year (>= 1 A.D., for example, 1993) */ SQLUSMALLINT month; /* month number: 1 to 12 */ SQLUSMALLINT day; /* day of month: 1 to 31 */; SQLUSMALLINT hour; /* hour of day: 0 to 23 */ SQLUSMALLINT minute; /* minute of hour: 0 to 59 */ SQLUSMALLINT second; /* second of minute: 0 to 59 */ SQLUINTEGER fraction; /* billionths of a second: 0 to 999,900,000 } TIMESTAMP_STRUCT; (RDM Server SQL accurate to 4 places only) */ typedef DATE_STRUCT typedef TIME_STRUCT typedef TIMESTAMP_STRUCT SQL_DATE_STRUCT; SQL_TIME_STRUCT; SQL_TIMESTAMP_STRUCT; The DATE_STRUCT name has been changed in ODBC 3 to SQL_DATE_STRUCT but either can be used. SQL User Guide 159 13. Using SQL in a C Application Program Use of date and time data is shown in the example below, which prints the year-to-date sales orders for a particular customer. In this example, SQLBindCol is called to request the column values in their native data type. 13.4.7 Retrieving Decimal Data The RDM Server SQL support module stores decimal values in a proprietary BCD format. RDM Server provides a library of functions (BCD- prefix) your application can call to manipulate values stored in this format. The functions allow the application to convert between a string representation of a decimal value (for example, "123.4567") and the internal RDM Server BCD format, as well as to perform all of the usual decimal arithmetic. To call a decimal manipulation function, your application must first allocate a BCD environment handle, specifying the maximum precision and scale for the values you will manipulate, as shown in the code example below. The application passes this handle to any of the decimal manipulation functions that it calls. The application can set any BCD value needed. However, if the application must store BCD values directly in the database, the maximum precision and scale you use must be identical to that specified by the RDM Server SQL support module. The following code fragment shows how your application can determine from the syscat (system catalog) database what the system values are for these parameters. ... int16 maxprecision, maxscale, bcd_len; char *bcd_buf; BCD_HENV hBcd; /* determine the max precision and scale on the server */ SQLExecDirect(hStmt, "select maxprecision, maxscale from sysparms", SQL_NTS); SQLBindCol(hStmt, 1, SQL_SMALLINT, &maxprecision, sizeof(int16), NULL); SQLBindCol(hStmt, 2, SQL_SMALLINT, &maxscale, sizeof(int16), NULL); SQLFetch(hStmt); SQLFreeStmt(hStmt, SQL_CLOSE); printf("max precision = %hd, max scale = %hd\n", maxprecision, maxscale); /* allocate a BCD environment corresponding to configuration on server */ BCDAllocEnv((unsigned char)maxprecision, (unsigned char)maxscale, &hBCD); /* allocate a buffer to contain the decimal string */ bcd_len = maxprecision+3; /* sign, decimal, and NULL byte */ bcd_buf = malloc(bcd_len); ... 13.4.8 Status and Error Handling RDM Server returns to your RDM Server SQL application the codes and messages described in the Return Codes and Error Messages section. RDM Server SQL API return code constants are defined in sql.h; these return codes are prefixed by "SQL_". Each RDM Server SQL API function returns a code indicating the success or failure of the operation. If an error occurs, your application must call SQLGetDiagField or SQLGetDiagRec for details about the error. The RDM Server SQL API provides a nonstandard function called SQLSetErrorFcn that your application can call to specify its own error handler. The prototype is shown below. SQLSetErrorFcn can be called any number of times to specify the same or different handlers for different handles or error codes. SQL User Guide 160 13. Using SQL in a C Application Program Your application error handler is called by an RDM Server SQL API function that has produced an error. The following is the prototype for the application error handler function (sqlrds.h file). int32 REXTERNAL ErrorHandler(int16 handleType, SQLHANDLE handle, int32 code) where: handleType handle code (input) (input) (input) Specifies the type of the input handle. Specifies the input handle. Specifies the status/error code. The calling RDM Server SQL function passes into the ErrorHandler the appropriate handle type and handle for the given error. For example, if SQLExecute detects an error, it will pass into ErrorHandler SQL_HANDLE_STMT as the handle type and the statement handle associated with the error. The RDM Server SQL API function also provides the status or error code to the ErrorHandler function; the error handler then can call SQLGetDiagField or SQLGetDiagRec to retrieve detailed information about the status or error. The return from ErrorHandler becomes the status code returned by the originally called RDM Server SQL API function. Normally, the return value is simply equal to the value of the code parameter. The following example shows the simplest use of the SQLSetErrorFcn and ErrorHandler functions. The call to SQLSetErrorFcn passes a valid connection handle and SQL_ERROR as the error code. This causes the automatic calling of the error handler by RDM Server SQL API functions for all errors associated with the specified connection, including functions that reference statement handles allocated from the connection. #include <stdio.h> #include "sqlrds.h" /* SQLSetErrorFcn is a Birdstep extension */ int32 REXTERNAL ErrHandler( int16 hType, SQLHANDLE handle, int32 code) { SQLUINTEGER rsqlcode; SQLCHAR buf[80], sqlstate[6]; SQLGetDiagRec(hType, handle, 1, sqlstate, &rsqlcode, buf, 80, NULL); printf("****RSQL Error %ld: %s\n", rsqlcode, buf); return code; } ... SQLSetErrorFcn(SQL_HANDLE_DBC, hdbc, SQL_ERROR, ErrHandler); ... The next example shows how your application can specify separate error handlers for different errors. The first call to SQLSetErrorFcn registers the standard error handler (ErrHandler) from the previous example. The second call to SQLSetErrorFcn registers a handler for a specific error code, errINVCONVERT. When an error occurs, the RDM Server SQL support module checks to see if there is a handler registered for the associated error code and statement handle. If not, it then checks for a handler for the code and the connection handle. Then, if a handler is still not found, the support module checks for a handler for the return code (for example, SQL_ERROR) corresponding to the statement handle or connection handle. SQL User Guide 161 13. Using SQL in a C Application Program #include <stdio.h> #include "sqlrds.h" /* SQLWhenever is a Birdstep extension */ /* ================================================================ Invalid data type conversion */ int32 REXTERNAL BadConvert( int16 hType, SQLHANDLE handle, int32 code) { /* My error message is better */ printf("**** A type conversion specified in SQLBindCol "); printf("or SQLBindParameter call is not valid.\n"); return code; } ... /* Register standard error handler */ SQLSetErrorFcn(hType, handle, SQL_ERROR, ErrHandler); /* Register invalid conversion handler */ SQLSetErrorFcn(hType, handle, errINVCONVERT, BadConvert); ... In this example, an errINVCONVERT error on any statement handle associated with the server connection handle (hdbc) will result in a call to BadConvert. For any other error on that connection, the ErrHandler function is called. Caution: This particular case was created only to illustrate the use of separate error handlers. You should not define a separate handler for each error code to output a more readable error message. It is far more efficient to use a table in a single error handler. 13.4.9 Select Statement Processing The application associates select with a statement handle allocated by a call to SQLAllocHandle. After allocating the statement handle, the application calls SQLPrepare to compile (but not to execute) the statement. When compilation is successful, the application can call SQLDescribeCol and SQLNumResultCols to get information about the result columns, such as the column name and data type. This information can be used in the call to SQLBindCol to set up host variables to hold the column values for each result row. Before statement execution, your application can call the SQLBindParameter function to associate the host program variables with parameter markers. These markers are placeholders for constant values in the SQL statement, and are specified with a question mark (?). The application then calls SQLExecute to run the select statement. During execution, the values from the host program variables are substituted for the parameter markers. The following example illustrates how parameter markers are used. select company, ord_num, ord_date, amount from customer, sales_order where customer.cust_id = sales_order.cust_id and ord_date = ?; If you need to execute the same select statement multiple times without having to recompile it, use separate calls to SQLPrepare and SQLExecute. SQL User Guide 162 13. Using SQL in a C Application Program When statement execution is complete, your RDM Server SQL application calls SQLFetch to retrieve the rows of the result set, one at a time. When all rows have been fetched, SQLFreeHandle is called to free the select statement handle and drop it so it can no longer be used. Alternatively, the application can call SQLCancel to close the handle, terminating statement execution but still allowing the statement to be re-executed at a later time. It also can call SQLFetchScroll to retrieve multiple rows in a single call. The following example illustrates the processing of a select statement summarizing the year-to-date total sales for each salesperson in the sales database. The SalesSummary function is called with an open connection handle to the server containing the database. This function allocates its own statement handle and calls SQLExecDirect to compile and execute the select statement. Calls to SQLBindCol bind the two column results to character buffers sale_name and amount. These function calls pass the buffer size, as well as SQL_C_CHAR, indicating the result is to be converted to a character string. Note that a buffer size of SQL_NTS is invalid for these calls, since the buffers are for output only. The last parameter passed to SQLBindCol is the address of an integer (SQLLEN) variable to contain the output result length. For both calls in this example this parameter is NULL, indicating that the application does not need the result length. This example retrieves each row of the result set by calling SQLFetch. Each call retrieves the next row and stores the column results in the program locations specified in the SQLBindCol calls. #include "sql.h" char stmt[] = "select sale_name, sum(amount) from salesperson, customer, sales_order " "where salesperson.sale_id = customer.sale_id " "and customer.cust_id = sales_order.cust_id " "group by sale_name"; SQLRETURN SalesSummary( SQLHDBC hdbc) /* connection handle to sales database server */ { char sale_name[31]; /* salesperson name */ char amount[20]; /* formatted sales order amount */ SQLHSTMT sh; /* statement handle */ SQLRETURN stat; /* SQL status code */ if ((stat = SQLAllocHandle(SQL_HANDLE_STMT, hdbc, &sh)) != SQL_SUCCESSS) return(stat); if ((stat = SQLExecDirect(sh, stmt, SQL_NTS)) == SQL_SUCCESS) { SQLBindCol(sh, 1, SQL_C_CHAR, sale_name, 31, NULL); SQLBindCol(sh, 2, SQL_C_CHAR, amount, 20, NULL); while ((stat = SQLFetch(sh)) == SQL_SUCCESS) printf("Acct manager %s has a total of $%s in orders\n", sale_name, amount); } return stat; } In the next example, the PrintTable function outputs all columns and rows contained in the specified table. Unlike the prior example, which has a fixed number of columns in the result set, this example can have a varying number of result columns. Thus the code calls SQLNumResultCols to get the number of columns in the result set. An array of column result descriptors (cols) is allocated to contain the definition and result information for each column. Function SQLDescribeCol is called to retrieve the name, type, and display size for each result set column. The result value buffer is dynamically allocated and bound to its result column through the call to SQLBindCol. SQL User Guide 163 13. Using SQL in a C Application Program As each row is retrieved by SQLFetch, each column value and its result length are stored in the COL_RESULT container for that column. The length for a null column value is returned as SQL_NULL_DATA. In this case (see the following example), the program displays NULL. #include "sql.h" /* result container */ typedef static struct { SQLCHAR name[33]; /* column name */ void *value; /* column value */ SQLSMALLINT type; /* column type */ SQLLEN len; /* result value length */ } COL_RESULT; /* Print all rows of a table */ SQLRETURN PrintTable( SQLHDBC svr, /* server connection handle */ char *tabname) /* name of table whose rows are to be printed */ { char stmt[80]; SQLHSTMT sh; SQLSMALLINT tot_cols; COL_RESULT *cols; SQLUINTEGER size; int32 row; /* set up and compile select statement */ sprintf(stmt, "select * from %s", tabname); SQLAllocHandle(SQL_HANDLE_STMT, svr, &sh); if (SQLExecDirect(sh, stmt, SQL_NTS) != SQL_SUCCESS) return ErrHandler(); /* allocate column results container */ SQLNumResultCols(sh, &tot_cols); cols = (COL_RESULT *)calloc(tot_cols, sizeof(COL_RESULT)); /* fetch column names and bind column results */ for (i = 0; i < tot_cols; ++i) { SQLDescribeCol(sh, i+1, cols[i].name, 33, NULL, &cols[i].type, &size, NULL, NULL); cols[i].value = malloc(size+1); SQLBindCol(sh, i+1, SQL_C_CHAR, cols[i].value, size+1, &cols[i].len); } /* print all rows in record-oriented format */ printf("========== %s ==========", stmt); for (row = 1; SQLFetch(sh) == SQL_SUCCESS; ++row ) { printf("**** row %ld:\n", row); for (i = 0; i < tot_cols; ++i) { printf(" %32.32s: %s\n", cols[i].name, cols[i].len == SQL_NULL_DATA "NULL" : cols[i].value); } } /* drop statement handle and free allocated memory */ SQLFreeHandle(SQL_HANDLE_STMT, sh); for (i = 0; i < tot_cols; ++i) free(cols[i].value); free((void *)cols); return SQL_SUCCESS; } SQL User Guide 164 13. Using SQL in a C Application Program The next example presents an application function that invokes the myproc stored procedure mentioned previously. After compiling and executing the initial statement, the application calls SQLNumResultCols to determine if that statement was a select statement. If there are result columns, the application can call SQLDescribeCol and SQLBindCol to set up processing of the result set. Then the program calls SQLFetch until it returns SQL_NO_DATA. If there are no result columns, the initial statement was either insert, update, or delete. The application can call SQLRowCount to count the number of rows affected by the modification statement. After processing the result, the application calls the SQLMoreResults function to determine if there are any more stored procedure statements to be processed and, if so, to execute the next one. A stored procedure containing more than one select statement requires that the application call the SQLMoreResults function after SQLFetch returns SQL_NO_DATA. This call determines if any more result sets exist and initializes their processing. SQLMoreResults can be called repeatedly to execute each subsequent statement in the procedure. If the statement is a select statement, SQLFetch can then be called repeatedly to fetch the latest result set. When there are no more statements in the procedure, SQLMoreResults returns SQL_NO_DATA. #include "sqlext.h" typedef struct col_result { char name[33]; char *value; SQLUINTEGER prec; } COL_RESULT; void RunProc(SQLHSTMT hstmt) { SQLSMALLINT nocols, col; SQLLEN norows; stat = SQLExecDirect(hstmt, (SQLCHAR *)"execute myproc()", SQL_NTS); while (stat == SQL_SUCCESS) { SQLNumResultCols(hstmt, &nocols); if (nocols > 0) { /* set up and fetch result set */ results = (COL_RESULT *)calloc(nocols, sizeof(COL_RESULT *)); for (col = 0; col < nocols; ++col) { COL_RESULT *rp = &results[col]; SQLDescribeCol(hstmt, col+1, rp->name, 33, NULL, NULL, &rp->prec, NULL, NULL); rp->value = malloc(rp->prec+1); SQLBindCol(hstmt, col+1, SQL_C_CHAR, rp->value, rp->prec+1, NULL); } while (SQLFetch(hstmt) != SQL_NO_DATA) DisplayResultRow(results, nocols); /* free results memory */ for (col = 0; col < nocols; ++col) free(results[col].value); free(results); } else { /* report number of rows affected */ SQLRowCount(hstmt, &norows); if (norows > 0) printf("*** %ld number of rows affected\n"); } SQL User Guide 165 13. Using SQL in a C Application Program stat = SQLMoreResults(hstmt); } } 13.4.10 Positioned Update and Delete A cursor is a named, updateable select statement where the cursor position is the current row (that is, the row returned from the most recent call to SQLFetch). An updateable select statement does not include a group by or order by clause and only refers to a single table in the from clause. If the table is a view, that view must be updateable. Cursors are used in conjunction with positioned updates and deletes to allow the current row from a select statement to be updated or deleted. The general procedure for a positioned update (or delete) is as follows: 1. Call SQLAllocHandle to allocate the statement handle for the select statement. 2. Call SQLAllocHandle to allocate a statement handle for the update statement. 3. Call SQLPrepare with the first statement handle to compile the select statement. 4. To specify your own cursor name, call SQLSetCursorName using the select statement handle. If necessary, this function can be called before step 3. To use a system-generated cursor name, skip this step. 5. Using the select statement handle, call SQLBindCol and SQLBindParameter as often as necessary and then call SQLExecute. 6. If the cursor name is system-generated, call SQLGetCursorName to copy the cursor name into the update statement (where current of clause). 7. Call SQLPrepare to compile the update statement. 8. Call SQLFetch repeatedly with the select statement handle until a row to be modified is retrieved. 9. To perform the update, assign the values to the desired parameters and call SQLExecute using the update statement handle. Repeat steps 8 and 9 until finished. 10. Call SQLEndTran to commit the changes. 11. Free the statement handles by calling SQLFreeHandle. The following example illustrates positioned update processing. The RaiseComm function fetches and displays each row of the salesperson table so that the user (for example, the sales manager) can raise a salesperson's commission rate by 1 percent. RaiseComm uses SQLSetCursorName to give the select statement the cursor name "comm_raise". Note that if the statement associated with the cursor is not an updateable select statement, SQLSetCursorName returns an error code. #include "sql.h" static char SaleSel[] = "select sale_id, sale_name, commission, mgr_id from salesperson"; static char SaleUpd[] = "update salesperson set commission=commission+0.01 " "where current of comm_raise"; /* Raise commission for selected salespersons */ SQLRETURN RaiseComm(HDBC srv) { char sale_id[4], mgr_id[4], sale_name[31]; float comm; SQLLEN mgrIdInd; char sqlstate[6], errmsg[80]; SQL User Guide 166 13. Using SQL in a C Application Program SQLHSTMT sHdl, uHdl; SQLRETURN stat; /* step 1: allocate select statement handle */ if ((stat = SQLAllocHandle(SQL_HANDLE_STMT, svr, &sHdl)) != SQL_SUCCESS) return(stat); /* this will catch connection handle problems */ /* step 2: allocate update statement handle */ SQLAllocHandle(SQL_HANDLE_STMT, svr, &uHdl); /* step 3: compile the select statement */ SQLPrepare(sHdl, SaleSel, SQL_NTS); /* step 4: specify cursor name */ SQLSetCursorName(sHdl, "comm_raise", SQL_NTS); /* step 5: bind select stmt columns and execute select statement */ SQLBindCol(sHdl, 1, SQL_C_DEFAULT, sale_id, 4, NULL); SQLBindCol(sHdl, 2, SQL_C_DEFAULT, sale_name, 31, NULL); SQLBindCol(sHdl, 3, SQL_C_DEFAULT, &comm, sizeof(float), NULL); /* mgrIdInd will be SQL_NULL_DATA for managers */ SQLBindCol(sHdl, 4, SQL_C_DEFAULT, mgr_id, 4, &mgrIdInd); if ((stat = SQLExecute(sHdl)) == SQL_SUCCESS) { /* step 6: compile positioned update statement */ SQLPrepare(uHdl, SaleUpd, SQL_NTS); /* step 7: fetch each row and display, allowing user to update if desired */ while ((stat = SQLFetch(sHdl)) == SQL_SUCCESS) { if (mgrIdInd != SQL_NULL_DATA && DisplaySalesperson(sale_id, sale_name, comm, csize) == UPDATED) { /* step 8: this salesperson gets the raise */ if ((stat = SQLExecute(uHdl)) != SQL_SUCCESS) break; } } } if (stat == SQL_ERROR) { SQLGetDiagRec(SQL_HANDLE_DBC, svr, 1, sqlstate, NULL, emsg, 80, NULL) printf("***ERROR(%s): %s\n", sqlstate, errmsg); SQLEndTran(SQL_HANDLE_DBC, svr, SQL_ROLLBACK); } else { /* step 9: commit the changes */ SQLEndTran(SQL_HANDLE_DBC, svr, SQL_COMMIT); } /* step 10: drop the statement handles */ SQLFreeHandle(SQL_HANDLE_STMT, sHdl); SQLFreeHandle(SQL_HANDLE_STMT, uHdl); return stat; } Like a positioned update, a positioned delete can be used to delete the current row of a specified cursor. Execution of a positioned delete is identical to a positioned update except that no columns are updated and the row is simply deleted. A positioned delete must first define a select statement cursor by calling SQLGetCursorName or SQLSetCursorName. Then the application issues a delete statement using the where clause with a current of qualifier to specify the cursor. In this case, the SQL User Guide 167 13. Using SQL in a C Application Program delete statement removes only the row indicated by the cursor. The following statement deletes the salesperson indicated by the cursor named "comm_raise", as described in Processing a Positioned Update. delete salesperson where current of comm_raise; 13.5 Using Cursors and Bookmarks 13.5.1 Using Cursors In ODBC, a user fetches data from a database by executing an SQL query (through SQLExecDirect or SQLExecute). The server determines a result set of rows that match the requested query, and creates a cursor that points to a row in this result set. The user then fetches the data by calling SQLFetch or SQLFetchScroll. Rowset If the user calls SQLFetch, RDM Server returns the data for one row; if the user calls SQLFetchScroll, RDM Server returns a group of rows (a rowset), starting with the row pointed to by the cursor. The number of rows in a rowset is determined by the rowset size setting, set with the SQLSetStmtAttr function, and the SQL_ROWSET_SIZE option. (The default is 1.) The user can fetch additional rowsets by calling these fetch functions again. Types of Cursors The five types of cursors available in ODBC can be divided into two categories: l l Non-scrollable cursors allow the user to fetch only the next rowset in the result set. When the end of the result set is reached, the fetch function returns SQL_NO_DATA_FOUND. There is one kind of non-scrollable cursor, the forwardonly cursor. Scrollable cursors give users the choice of which rowset to fetch (for example, the next rowset, the previous rowset, or a rowset starting at an absolute row number). Scrollable cursor types include static, dynamic, keyset-driven, and mixed cursors. RDM Server supports one kind of scrollable cursor, the static cursor. 13.5.2 Static Cursors This cursor is called static because the result set's membership is determined when the query is executed and does not change for the life of the query. Therefore, if another user changes the data in a row that the user has fetched, this change is unseen by the first user until that user re-executes the query. In essence, a snapshot of the result set is taken when the query is executed and that snapshot does not change until the query is re-executed. RDM Server caches result set data on the client side. When the data is requested through SQLFetchScroll, RDM Server fetches as many rows from the server as necessary to meet the request. Thus, if the user requests the first rowset, RDM Server only fetches that rowset. But, if the user requests the last rowset, RDM Server must fetch all intervening rows from the server into the client side cache before it can fetch the requested rowset. If the result set is large, this could take several minutes. However, once the data is on the client side, any request for a rowset is met quickly. When the cursor is freed (by calling SQLCloseCursor), the client side cache is cleared. SQL User Guide 168 13. Using SQL in a C Application Program Using Static Cursors By default, all cursors are forward-only. To implement static cursors the user must, before executing the query, call SQLSetStmtAttr with the SQL_ATTR_CURSOR_TYPE option set to SQL_CURSOR_STATIC. If the user employs SQLFetch to retrieve data, the cursor is still restricted to forward-only movement; furthermore, a user cannot mix SQLFetchScroll and SQLFetch on a given cursor. However, if a user employs SQLExtendedFetch, the user can fetch any rowset from the result set in any order. (For instance, the user can fetch the last rowset in the result set or the rowset starting with the fiftieth row.) Once a rowset is fetched, the user can call SQLSetPos (for static cursors only) to position the cursor at a particular row within the rowset. The row's data can then be retrieved into variables using the SQLGetData function. Alternatively, the data can be retrieved by binding columns to arrays of variables, just as with the forward-only cursor. Limitations on Static Cursors As explained in "Static Cursors" above, a static cursor cannot reflect changes to database data made after a query has been executed. Static cursors additionally have the following limitations: Changing the default display string for a data type affects what the server can retrieve. If you bind a column to SQL_C_ CHAR that has a default display type (as set by the SQL statement set type display), the cursor caches it as a string using the format string. This means that you cannot subsequently rebind that column as a non-character type and fetch data until the query is re-executed. Nor can you call SQLGetData to fetch the data in its native form, since this function also retrieves the information from the client side cache. Similarly, if you bind a column that has a specified display format as a non-character type, you cannot rebind it (or use SQLGetData) as a character type during the life of the cursor. This is because the default format information for the type is stored on the server, while the fetched data might be coming from the client side cache, which has no access to this information. Therefore, RDM Server returns the information as specified when the cursor was first opened (i.e., on the first SQLExtendedFetch call). Note that there is no binding/SQLGetData limitation of this type if no default format string was specified for the data type. It is possible in RDM Server to change, between fetches, the default format string for a data type from a result set. RDM Server, however, freezes the format string (if any) at cursor creation time (i.e., during the first SQLExtendedFetch), so the query must be re-executed to reflect the change. The new format will not be used for the data type until the query is reexecuted. This rule also applies if a format string is created for a data type that did not have one at query execution time. The following example demonstrates this limitation. Suppose the user employs static cursors and calls the following statement on the current connection: set real display(10, "$#,#.##"); Any columns of data type real that are bound to SQL_C_CHAR variables will be returned using the specified format string. Suppose the user executes the query and calls SQLExtendedFetch, having bound the only column of type SQL_REAL in the query to char. The resulting data will be returned in the dollar-sign format specified. However, if the user tries to call SQLGetData on the field as follows, an error results: SQLGetData(hstmt, 1, SQL_C_FLOAT, &fval, 0, NULL); Because of this limitation, the user cannot convert the SQL_REAL column to a SQL_C_FLOAT column for the life of the static cursor. SQL User Guide 169 13. Using SQL in a C Application Program Suppose the user re-executes the query, binds the column as SQL_C_FLOAT, calls SQLExtendedFetch, and tries to rebind the column as SQL_C_CHAR. The user will get another error, because now the column is returned as SQL_C_FLOAT and the client side cache does not have access to the previously specified dollar sign display format. BLOB fields are handled differently in static cursor mode, because fetching huge BLOB fields into the client side cache inhibits performance. (Note that this handling method does not meet ODBC specifications.) At cursor creation time (i.e., during the first SQLExtendedFetch), RDM Server only fetches BLOBs that have been bound (using SQLBindCol). Further, it only fetches up to the number of BLOB bytes necessary to fill the requested bound buffer. Thus, if the user binds a BLOB column to a 50-byte field, a maximum of 50 bytes of that particular BLOB will be returned when SQLExtendedFetch is called. The user then cannot fetch more than that 50 bytes, because the data is not available on the client side. The user cannot retrieve the data from the server because a static cursor's data is set when the cursor is created. (The BLOB's data might have changed since the cursor was created with the SQLExtendedFetch call.) To retrieve more data, the user must re-execute the query, binding to a larger buffer on re-execution. Note that an increased bound buffer size could affect performance, because more data might be sent over from the server during the fetch. Also, if the user has not bound the BLOB column before the first call to SQLExtendedFetch, no BLOB data is available for that BLOB for the life of the cursor, unless the user has first used the SQL_FETCH_MAXBLOB option. (For details, see the description of the SQLSetStmtAttr.) 13.5.3 Using Bookmarks RDM Server supports the ODBC concept of bookmarks, which allows you to mark a row and then return to that specific row later. Bookmarks are identifiers for a particular row that can be used to re-fetch a given row, provided the statement has been fetched using static cursors and the SQLExtendedFetch function. Bookmarks are stored in integer buffers. Activate a Bookmark To use bookmarks (which are turned off by default), you must activate them on the statement handle with the SQLSetStmtAttr function. Use the SQL_USE_BOOKMARKS option with the SQL_UB_ON setting. Once bookmarks are activated, execute a query, then fetch a rowset using SQLExtendedFetch (bookmarks do not work with SQLFetch). To set a current row within the rowset, call SQLSetPos. (By default, the first row in the rowset is the current row.) Turn Off a Bookmark To turn off bookmarks, use the SQLSetStmtAttr function and the SQL_ATTR_USE_BOOKMARKS option, the same as for activating, but use the SQL_UB_OFF setting instead. This option only works if the statement has previously been set up to use static cursors. Retrieve a Bookmark To retrieve the bookmark for a row, call SQLGetStmtAttr and specify the SQL_FETCH_BOOKMARK_PTR option. The bookmark is saved in a four-byte integer buffer. A bookmark can also be retrieved by using SQLBindCol or SQLGetData. SQL User Guide 170 13. Using SQL in a C Application Program Return to a Bookmark To return to the bookmarks, call SQLFetchScroll with the fetch option set to SQL_FETCH_BOOKMARK and the rowNum parameter set to the previously fetched bookmark. The returned rowset will start with the row marked by the bookmark. When the statement is freed, the bookmarks become invalid and must be re-fetched for subsequent queries. 13.5.4 Retrieving Blob Data Like other data types, columns of the long varchar, long wvarchar, or long varbinary type can be retrieved via a select statement. However, BLOB types cannot be used in where clauses or in any other expressions, except as parameters to a UDF. The data can be either bound or unbound, and either SQLFetch or SQLFetchScroll can be used to retrieve the data. The limitations involved with these methods are described below. You can use the SQLBindCol function to bind a BLOB column to a buffer. If the buffer is not large enough to hold the entire BLOB, the BLOB will be truncated to fit the buffer. If you have provided an output length variable to SQLBindCol, this variable will contain the full length of the BLOB before truncation. The only way to retrieve the remainder of the BLOB is to use the SQLGetData function. Therefore, use SQLBindCol to bind BLOB parameters only if you have relatively small BLOBs or if you only care about the first portion of a BLOB. To retrieve a large number of rows in a rowset with SQLFetchScroll, you need to call SQLBindCol to bind an array of buffers of rowset size, which will require considerable memory if you are retrieving large BLOBs. As an option, you can allocate less memory for the buffer, which will result in the BLOB data being truncated. You also can retrieve the BLOB data using SQLGetData. This function can be used to retrieve all the BLOB data in chunks of any size. However, you must use SQLFetch to retrieve the result set one row at a time, since you cannot use SQLGetData with SQLFetchScroll. You can call SQLGetData multiple times if necessary; each time it writes into the provided buffer the number of bytes specified in the buffer length parameter of the function. It will also write into the output length parameter (if the parameter is provided) the number of bytes remaining to retrieve from the BLOB before the current call to SQLGetData. The next time you call SQLGetData, the next chunk of the BLOB is returned into the buffer. If truncation has occurred, SQLGetData returns SQL_SUCCESS_WITH_INFO. When it returns the last part of the data, SQLGetData returns SQL_SUCCESS. If called after this, SQLGetData returns SQL_NO_DATA. The following example retrieves all data in the CDAlbum table associated with a specific composer, Beethoven. First, we execute a statement: SQLExecDirect(hstmt, "select * from cdalbum " "where composer = 'Beethoven, Ludwig Von';", SQL_NTS); Next, if we are going to bind the column, we call SQLBindCol: SQLBindCol(hstmt, 6, SQL_C_CHAR, notes, sizeof(notes), NULL); SQLBindCol(hstmt, 7, SQL_C_BINARY, jacketpic, sizeof(jacketpic), NULL); Both notes and jacketpic are character arrays; by binding the long varbinary column as SQL_C_BINARY, we eliminate the need for a null terminator. Next, call SQLFetch or SQLFetchScroll. If we call SQLFetchScroll, the notes and jacketpic buffers must be two dimensional arrays where the first dimension equals the rowset size. For example, if the rowset size is 50, the notes and jacketpic buffers might be declared as follows. char notes[50][NOTES_SIZE]; char jacketpic[50][JACKETPIC_SIZE]; SQL User Guide 171 13. Using SQL in a C Application Program After calling the fetch function, the buffer will contain up to sizeof(buffer) bytes of the BLOB in the buffer. At this point, if you previously called SQLFetch, you can call SQLGetData to get the remainder of the BLOB data. Note that the first time you call SQLGetData, it will refetch the data you already have in the buffer (the first bytes), which were bound to that buffer when you called SQLFetch. Alternatively, if we only use SQLGetData to retrieve the data, we call SQLFetch to fetch each row. Then call SQLGetData multiple times to retrieve all the BLOB data. This approach might look like the following: #define JPIC_SIZE #define NOTES_SIZE 1000 100 char jpic[JPIC_SIZE], notes[NOTES_SIZE]; SQLINTEGER buflen; int32 len, offset; SQLExecDirect(---); /*as above */ while (SQLFetch(hstmt) == SQL_SUCCESS) { offset = 0; do { status = SQLGetData(hstmt, 6, SQL_C_CHAR, notes, NOTES_SIZE, &buflen); if (status == SQL_SUCCESS || status == SQL_SUCCESS_WITH_INFO) { /* Copy data elsewhere, as our buffer will be overwritten the next time we call SQLGetData. */ len = (int32)(buflen < sizeof(notes) ? buflen : sizeof(notes)-1); memcpy(somebuf+offset, notes, len); offset += len; status = SQL_SUCCESS; } } while (status == SQL_SUCCESS); /* Then do the same thing for the jacketpic blob. */ do { status = SQLGetData(hstmt, 7, SQL_C_BINARY, jpic, JPIC_SIZE, &buflen); if (status == SQL_SUCCESS || status == SQL_SUCCESS_WITH_INFO) { /* Same idea as above, except this buffer has no null terminator in it. */ ... } ... } while (status == SQL_SUCCESS); } For most BLOB values, however, the usual way to insert or update BLOB data is to use parameter markers and the SQLParamData and SQLPutData functions to put the data into the BLOB in chunks (or all at once if you wish). To insert or update the data this way, first prepare a statement containing a parameter marker for the BLOB column, bind the parameter, call SQLExecute, call SQLParamData, then repeatedly call SQLPutData to put the data into the BLOB. Finally, call SQLParamData again to prepare the next BLOB for insert/update, or to complete the modifications if there are no further BLOBs. For example, if we have an external data file containing a copy of the album's jacket picture, we must prepare an insert statement: SQLPrepare (hstmt, "insert into cdalbum values(,'Eine Kleine Nachtmusik', 'Mozart, Wolfgang', 'Classical', ?, null);", SQL_NTS); SQL User Guide 172 13. Using SQL in a C Application Program Before executing this statement, we first must bind a variable to the parameter marker in the BLOB field. BLOB parameters must be bound as SQL_DATA_AT_EXEC parameters, meaning data for the parameter will be provided after statement execution. In RDM Server, the SQL_LEN_DATA_AT_EXEC(length) macro indicates that the parameter is DATA_AT_EXEC. The length parameter of this macro must be non-negative (usually 0) and is ignored by RDM Server. The last parameter in SQLBindParameter is a pointer to an SQLINTEGER variable equal to the result of this macro. Our next requirement concerns the variable we bind to the parameter; it must be a 4-byte value. This value can either be a scalar value or a pointer. For example, we might bind to the BLOB parameter a pointer to a string containing the name of the file containing the jacket picture, as shown below. status = SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_BINARY, SQL_LONGVARBINARY, 0, 0, picFileName, 0, &bloblen); Here, picFileName is a pointer to the string "c:\albums\jackets\mozart733.jpg" containing the album's jacket picture data and bloblen = SQL_LEN_DATA_AT_EXEC(0). With all parameters in place, we can execute the statement: status = SQLExecute(hstmt); The value of status after this call will not be SQL_SUCCESS, but SQL_NEED_DATA, indicating that statement execution is not complete. Thus, our next step is adding the BLOB data to this record. First, call SQLParamData, which takes two parameters (the second is a pointer). RDM Server will return into the pointer the value associated with the first bound DATA_ AT_EXEC parameter it finds. The SQL_NEED_DATA status code is returned if any DATA_AT_EXEC parameters are found. RDM Server first searches for and returns all non-BLOB DATA_AT_EXEC parameters (if any), then returns the BLOB parameters. Each call to SQLParamData returns the next parameter, one parameter for each call. When there are no more to return, SQLParamData returns SQL_SUCCESS. In our example, we only have one DATA_AT_EXEC parameter. Therefore, after we call SQLParamData, ptr will point to the path to the jacket cover picture's file string ("c:\albums\jackets\mozart733.jpg ") that we bound earlier with SQLBindParameter: status = SQLParamData(hstmt, &ptr); Next, call SQLPutData as many times as necessary to put all the data into the BLOB field. When finished, call SQLParamData again to move to the next DATA_AT_EXEC parameter. If there is another DATA_AT_EXEC parameter, SQLParamData will return SQL_NEED_DATA. Otherwise, it will return SQL_SUCCESS, indicating the insert is now complete. In our example, we call fopen using the value in ptr set by SQLParamData, and read the data out of the file. We will send the data in chunks of 1024 bytes. We call SQLPutData multiple times until all the data is sent, then call SQLParamData again: #define BUFSIZE 1024 while ((status = SQLParamData(hstmt, &ptr)) == SQL_NEED_DATA) { if ((fn = fopen(ptr, "rb")) != NULL) { do { /* put next block of data from file */ buflen = fread(buf, 1, BUFSIZE, fn); status = SQLPutData(hstmt, buf, buflen); } while (buflen == BUFSIZE && status != SQL_ERROR); fclose(fn); /* check here if status == SQL_ERROR or SQL_SUCCESS */ if (status == SQL_ERROR) { ... SQL User Guide 173 13. Using SQL in a C Application Program } } } It is useful to have the pointer bound in SQLBindParameter represent something uniquely identifying the BLOB, particularly if there is more than one BLOB in the record. You must insert the data into the BLOBs in the order requested by RDM Server (via SQLParamData); RDM Server returns the BLOBs in the order they are placed in the table. Similarly, in an update statement, you cannot use the BLOB in the where clause to identify which rows to update (unless you have a UDF that takes BLOB parameters). It is useful to define another field in the record that will uniquely identify which records you want to update. In our example database, cd_id is a unique primary key that can be used. When the update occurs, the entire new BLOB must be inserted into the database, completely replacing the BLOB already in the database (if any). You cannot simply append changes onto the end of a BLOB. As mentioned earlier, you cannot directly reference columns of the long varchar, long wvarchar, and long varbinary type in the where clause of a select, update, or delete statement. You can, however, pass a BLOB column as an argument to a userdefined function (UDF). One of the many uses for a UDF is doing fast low-level database lookups. Inside the UDF, these low-level operations can be used to manipulate BLOBs. For instance, a UDF could return the BLOB's size, or whether the BLOB is NULL. You might write a "BLOB grep" function to return whether a supplied string occurs in the BLOB. You can also pass BLOB data types into UDFs (or UDPs) as parameters. As a simple example, if we write a UDF called blobgrep, we might execute the following select statement to retrieve the names of composers whose biographies contain the string "violin". select composer from cdalbum where blobgrep(notes, "violin") = 1; The blobgrep function itself could use runtime BLOB functions to search the current BLOB for the requested string, returning 1 if the string is found, 0 if not. SQL User Guide 174 14. Developing SQL Server Extensions 14. Developing SQL Server Extensions SQL server extensions are application-specific, C language modules that are extension of RDM Server and are called from RDM Server SQL. SQL server extensions include the following: l C-based, user-defined functions (UDFs) l C-based, user-defined procedures (UDPs) l Transaction trigger functions All these modules extend the capabilities of RDM Server SQL. Called by the RDM Server SQL system during the processing of SQL statements, these modules run in DLLs or shared libraries on the RDM Server. They are easy to develop and provide a powerful tool for development of high-performance RDM Server SQL database applications. 14.1 User-Defined Functions (UDF) A UDF is an application-specific function used just like the RDM Server SQL scalar and aggregate functions, but developed to meet the specific needs of your SQL application. After you have completed development of your UDF you need to register it with the RDM Server SQL system. This is done using the create function statement, as shown in the following syntax. create_function: create [scalar | aggregate] function[s] fcnname ["description"] [, fcnname ["description"]]... in libname on devname A scalar UDF operates on a single row and retrieves a single value. An aggregate function performs computations on sets of rows that result from a select statement usually specified with the group by clause. For example, the following statements register three user-defined aggregate functions contained in a DLL called "statpack.dll" on an RDM Server database device named add_ins. The select statement calls the standard SQL aggregate function, avg, as well as the user-defined aggregate function, geomean. create aggregate function devsq "compute sum of the squares of deviations", stddev "compute standard deviation", geomean "compute the geometric mean" in statpack on add_ins; select state, avg(amount), geomean(amount) from customer, sales_order where customer.cust_id = sales_order.cust_id group by state; User-defined functions have a variety of uses, such as: l Translating coded values into easy-to-read strings. l Performing special-purpose computations. l Adding new aggregate functionality. l Doing fast, low-level database lookups (including manipulation of BLOBs). l Implementing triggers called when tables are updated. SQL User Guide 175 14. Developing SQL Server Extensions You will find sample UDF code in the examples/udf directory. This code includes a sample module (udf.c) with some source code, which will be used throughout this section. Instructions for using the sample code are provided in Executing Your RDM Server Programs. The sample UDF module defines the six user-defined functions listed in Table 13-1. Table 13-1. Functions Defined in the Sample UDF Module Function Description HaveProduct Trigger. OkayToShip subquery std stds udfcount Trigger. Takes a string containing a select statement that retrieves a single-value result. Computes exact standard deviation. Computes sampled standard deviation. Performs exactly the same operation as the RDM Server SQL built-in count function. These functions can be compiled using the provided udf.mak makefile. The resulting DLL is called udf.dll. After the DLL is created, connect to RDM Server SQL and enter the following create function statement. The examples following this statement illustrate the use of these functions. create aggregate function std "actual standard deviation", stds "sampled standard deviation", udfcount "alternate count function" scalar function subquery "selectable subquery function" in udf on rdsdll; You could show the average sales amounts and their standard deviations per salesperson using the following query. select sale_name, avg(amount), std(amount) from salesperson, customer, sales_order where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id group by sale_name; Both the count and udfcount functions below should return identical results from the following two queries. select sale_name, count(cust_id), udfcount(cust_id) from salesperson, customer where salesperson.sale_id = customer.sale_id group by sale_name; select sale_name, count(distinct state), udfcount(distinct state) from salesperson, customer where salesperson.sale_id = customer.sale_id group by sale_name; The next example uses the subquery function to return a percentage of total values in a single select statement. SQL User Guide 176 14. Developing SQL Server Extensions select state, 100*sum(amount)/subquery("select sum(amount) from sales_order") pct from customer, sales_order where customer.cust_id = sales_order.cust_id group by state; The implementation of these UDFs are described in subsequent sections. 14.1.1 UDF Implementation Keep the following concepts in mind when programming your UDF module. l l l l l l The module must include a load function named udfDescribeFcns which identifies all of the UDFs implemented in the module. Each UDF can optionally include an initialization function of type UDFINIT. If you define this function, SQL calls it when the UDF begins executing. Each UDF must include a function of type UDFCHECK that performs type checking on the UDF's arguments. UDFs can take any number of arguments of any type, and the value returned can be of any data type, except long varchar, long wvarchar, or long varbinary. The main UDF function, of type UDFFUNC, performs processing for the UDF. An aggregate UDF must include a reset function of type UDFRESET that is called by SQL when the group by value changes in order to reset the aggregate calculations. The UDF can optionally include a cleanup function of type UDFCLEANUP. If defined, this function is called by SQL each time UDF execution is completed. l If the UDF is running on Microsoft Windows, the UDF must include LibMain. l A scalar UDF minimally must a type checking function (UDFCHECK) and the processing function (UDFFUNC) itself. l Each UDF should declare REXTERNAL in its function definition. There are other specialized functions that can be used for implementing UDFs. The SQL UDF support functions (SYS prefix) allow UDFs to perform low-level database operations associate SQL modification commands with client application transactions, and use the decimal arithmetic capabilities of RDM Server SQL. UDF implementations also can use the SQL date and time manipulation functions (VAL prefix). By connecting into SQL's internal arithmetic functions, these functions allow the UDFs to include mixed-mode arithmetic operations. The results of mixed-mode arithmetic operations follow standard C-language rules. UDF Module Header Files Your UDF module code must include the header file named emmain.h. This header addresses platform-specific implementation and also includes all other standard header files (e.g., sqlrds.h and sqlsys.h) that you will need in your UDF module. In order to use this header, you must precede the #include with two #define declarations. The first #define specifies the name of the UDF module (in uppercase). The second #define identifies the module type (the emmain.h file is used for all types of server extensions). The following code fragment shows the use of emmain.h for the sample UDF module. /* Definitions and header to setup EM ------------ */ /* (all EMs must have a code block like this) --- */ #define EM_NAME UDF SQL User Guide /* the uppercased module name */ 177 14. Developing SQL Server Extensions #define EMTYPE_UDF /* EMTYPE_EM, EMTYPE_UDF, EMTYPE_UDP, EMTYPE_IEF */ #include "emmain.h" /* must follow the above defs and OS #includes */ The header files that contain definitions used in UDF modules are listed in Table 13-2. The files can be found in the include directory. Table 13-2. UDF Module Header Files Header File Description emmain.h RDM Server standard extension module header. Use #include sqlrds.h sqlsys.h to add it to all extension, UDF, UDP, and IEF modules. Automatically includes the sqlrds.h and sqlsys.h files for UDF, UDP, and IEF modules. RDM Server SQL extensions header file. Includes prototypes and data definitions for the C-language extension module functions used with RDM Server SQL. Provides access to all RDM Server SQL capabilities. This file automatically includes the sqlext.h file. RDM Server SQL UDF header file. Includes UDF function type declarations, UDF specific data type definitions, and SYS function prototypes. Function udfDescribeFcns Each UDF library module must contain a function named udfDescribeFcns that has arguments declared as shown in the prototype specification below. This function is called when the first SQL statement that contains a reference to one of the functions in the library is compiled. The responsibility of udfDescribeFcns is to return a pointer to a function description table containing all of the entry point information. In addition, an optional module description string can be returned that will be displayed on the RDM Server system console indicating that the UDF library module has been loaded. /* ============================================================ User function description, called when statement is prepared */ void REXTERNAL udfDescribeFcns ( unsigned short *NumFcns, /* out: number of functions in module */ PUDFLOADTABLE *UDFLoadTable, /* out: points to UDFLOADTABLE array */ char **fcn_descr); /* out: optional description string */ { *NumFcns = RLEN(UdfTable); *UDFLoadTable = UdfTable; *fcn_descr = "Sample of SQL user-defined functions"; } The UDFLoadTable is a struct array of type UDFLOADTABLE. There must be one entry defined in the array for each UDF supported by the module. The declaration for UDFLOADTABLE is contained in the header file sqlsys.h and is shown below. typedef struct udfloadtable { char udfName[33]; SQL User Guide /* name of user function */ 178 14. Developing SQL Server Extensions UDFFUNC udfCall; UDFCHECK udfCheck; UDFINIT udfInit; UDFCLEANUP udfCleanup; UDFRESET udfReset; } UDFLOADTABLE,*PUDFLOADTABLE; /* address of user function */ /* type checking call */ /* initialization for user function */ /* cleanup for user function */ /* reset for user function */ Each element of the UDFLOADTABLE struct is described in the following table. Table 13-3. UDFLOADTABLE Struct Element Descriptions Function Description udfName The name of the function. Must conform to a standard SQL udpCall udfCheck udfInit udfCleanup udfReset identifier. It is case-insensitive and unique (system-wide). Pointer to the call processing function. Pointer to the argument type checking function. Assign to NULL if there are no arguments. Pointer to pre-execution initialization function. Pointer to post-execution cleanup function. Pointer to function that reset the group calculation values for an aggregate function. Assign to NULL for scalar functions. Each UDFLOADTABLE entry must specify the name of the UDF and the address of at least two functions: the function (udfCall) that actually performs the operation, and another function (udfCheck) that is called during the compilation of an SQL statement that uses the UDF to perform type checking. The type of the argument expression(s) is passed into the function that must validate the argument type and return the result type. In addition, you can optionally specify: 1) the address of a function (udfInit) that is called when the statement is first executed to perform any necessary initialization, and 2) the address of a function (udfCleanup) that is called after the execution has completed (for example, after SQLFetch returns SQL_NO_DATA_FOUND). Aggregate functions are also required to provide the address of a function (udfReset) that resets the accumulator variables when the grouping value changes. Unused function entries should be NULL. The code that defines the UDFLOADTABLE and the udfDescribeFcns code for the examples given in the udf.c module is shown below. /*---------------------------------------------------------------------Function prototypes ----------------------------------------------------------------------*/ /* user function for udfcount */ UDFCHECK CntCheck; UDFFUNC CntFunc; UDFINIT CntInit; UDFCLEANUP CntCleanup; UDFRESET CntReset; /* user function for standard deviation */ UDFCHECK TypeCheck; UDFFUNC StdFunc; UDFINIT StdInit; UDFCLEANUP StdCleanup; UDFRESET StdReset; /* user function for sample standard deviation */ UDFFUNC StdsFunc; SQL User Guide 179 14. Developing SQL Server Extensions /* user function for subquery function */ UDFCHECK QueryCheck; UDFFUNC QueryFunc; UDFINIT QueryInit; UDFCLEANUP QueryCleanup; /* user function for HaveProduct trigger */ UDFCHECK InvCheck; UDFFUNC InvFunc; UDFINIT InvInit; UDFCLEANUP InvCleanup; /* user function for OKayToShip trigger */ UDFCHECK ShipCheck; UDFFUNC ShipFunc; UDFINIT ShipInit; UDFCLEANUP ShipCleanup; /*--------------------------------------------------------------------Table of user-defined functions for this module ---------------------------------------------------------------------*/ /* table of user functions callable from within an sql expression */ static UDFLOADTABLE UdfTable[] = { /*name UDFFUNC UDFCHECK UDFINIT UDFCLEANUP UDFRESET*/ /*------------------ ---------- --------- ------------ --------*/ {"std", StdFunc, TypeCheck, StdInit, StdCleanup, StdReset}, {"stds", StdsFunc, TypeCheck, StdInit, StdCleanup, StdReset}, {"SubQuery", QueryFunc,QueryCheck,QueryInit,QueryCleanup,NULL }, {"udfCount", CntFunc, CntCheck, CntInit, CntCleanup, CntReset}, {"HaveProduct",InvFunc, InvCheck, InvInit, InvCleanup, NULL }, {"OKayToShip", ShipFunc, ShipCheck, ShipInit, ShipCleanup, NULL } }; /* ===================================================================== User function description, called when statement is prepared */ void REXTERNAL udfDescribeFcns( uint16 *NumFcns, /* out: number of functions in module */ PUDFLOADTABLE *UDFLoadTable,/* out: points to UdfTable above */ char **fcn_descr) /* out: optional description string */ { *NumFcns = RLEN(UdfTable); *UDFLoadTable = UdfTable; *fcn_descr = "Sample of SQL user-defined functions"; } SQL Data VALUE Container Description The VALUE data type that is passed to both the type checking and the processing function is a multi-type value container declared as shown below. The type field contains the standard SQL_* data type constant (for example, SQL_INTEGER). The vt union declares a container variable for values of each SQL data type. SQL User Guide 180 14. Developing SQL Server Extensions typedef struct _value { int16 type; int16 cmpfcn; union { int8 tv; int16 sv; int32 lv; int64 llv; float fv; double dv; const BCD_X *bv; const BCD_Z *zv; BINVAR xv; LONGVAR lvv; TIMESTAMP_VAL tsv; const char *cv; const DB_WCHAR *wcv; } vt; } VALUE; /* data type of value (SQL_*) /* INTERNAL USE ONLY */ */ /* /* /* /* /* /* /* /* /* /* /* /* /* */ */ */ */ */ */ */ */ */ */ */ */ */ SQL_TINYINT | SQL_BIT SQL_SMALLINT SQL_INTEGER | SQL_DATE | SQL_TIME SQL_BIGINT SQL_REAL SQL_FLOAT SQL_DECIMAL/SQL_NUMERIC (unpacked) SQL_DECIMAL/SQL_NUMERIC (packed) SQL_BINARY | SQL_VARBINARY SQL_LONGVAR(CHAR|BINARY) SQL_TIMESTAMP SQL_CHAR || SQL_VARCHAR SQL_WCHAR || SQL_WVARCHAR Function udfInit The code for udfCount will be used to explain how you would use each of the five functions. Function CntInit, shown below, is called to initialize processing of a udfcount reference in a specific SQL statement. Initialization functions are passed two arguments. The first is the system handle that is used by SQL to identify and maintain the context of the executing statement. The second argument is the address of a void pointer into which you may return a function context pointer that you allocate. The allocated buffer will be stored by SQL with the statement context associated with hSys. It can contain anything you want. In this example, COUNT_CTX contains the memory allocation tag and a long that will contain the current count value. Although you can use the standard malloc and free memory allocation functions, we recommend that you use the RDM Server resource manager memory allocation function rm_getMemory. An SQL UDF support function called SYSMemoryTag returns the memory allocation tag that you should use in your calls to rm_getMemory. Memory allocated with this tag remains active for the life of the statement that contains the call to the UDF. When the statement has terminated, memory will be automatically freed by SQL. In the rare event that the server should not have enough memory for your rm_ getMemory request, SQL will gracefully abort the statement execution and return status SQL_ERROR (errSRVMEMORY) to the application. The following example shows the CntInit initialization function for the sample aggregate UDF, udfCount. Note the COUNT_CTX structure defining the UDF context. /* used by udfcount */ typedef struct count_cxt { RM_MEMTAG mTag; int32 count; } COUNT_CTX; /* ============================================================ Initialization function for CntFunc() */ int16 REXTERNAL CntInit( HSYS hSys, /* in: system handle */ void **cxtp) /* in: statement context pointer */ { SQL User Guide 181 14. Developing SQL Server Extensions COUNT_CTX *cnt; RM_MEMTAG mTag; SYSMemoryTag(hSys, &mTag); cnt = *cxtp = rm_getMemory(sizeof(COUNT_CTX), mTag); cnt->mTag = mTag; cnt->count = 0L; return SQL_SUCCESS; } Function udfCheck The udfCheck function performs type checking on the argument expression(s) that are passed to the function. Function CntCheck, shown below, does this for udfCount. In this case, however, the job is quite simple in that the result is independent of the data type of the argument and always returns an integer (int32). You typically need to check both the number of arguments and the type of the arguments required by the function. If either is incorrect the function will return status SQL_ERROR, and the result will be assigned a character string value with a specific error message to be returned to the user that submitted the erroneous call. The CntCheck function shown below ensures that only one argument expression has been passed. If not, the result container is used to return an error message and the function returns status SQL_ERROR indicating the fault. UNREF_PARM is an RDM Server macro that references an unused function parameter to meet compiler requirements. Note the absence of a ";" at the end of the calls to this macro. int16 REXTERNAL CntCheck ( HSYS hSys, /* in: int16 noargs, /* in: const VALUE *args, /* in: VALUE *result, /* out: int16 *len) /* out: { int16 status = SQL_SUCCESS; system handle */ number of arguments to function */ array of arguments */ result value */ max length result string */ UNREF_PARM(hSys); UNREF_PARM(args); UNREF_PARM(len); if (noargs != 1) { result->type = SQL_CHAR; result->vt.cv = "only 1 argument expression is allowed"; status = SQL_ERROR; } else result->type = SQL_INTEGER; return status; } Type checking for the subquery UDF involves compiling the statement to ensure that it does not have any errors. If there are errors, the result value is set to type SQL_SMALLINT and the result value is the RDM Server SQL error code returned from SQLError (retrieved by the call to SQLGetDiagRec). When SQL_ERROR is returned from a type checking function, if the result type is SQL_CHAR then SQL understands it to be a descriptive error message. If the result type is SQL_SMALLINT SQL User Guide 182 14. Developing SQL Server Extensions then SQL understands it to be a specific RDM Server SQL error code (for example, errMISMATCH). In the latter case, this will be the error code returned to the calling program. You can see that QueryCheck utilizes both methods of UDF error communication. In order to call a standard RDM Server SQL API function from a UDF, it is necessary to establish a connection handle that corresponds to the connection handle of the statement that is executing the subquery reference. Function SYSSessionId returns the RDS session identifier associated with the SQL system handle. Function SQLConnectWith is then called with that session handle to return the proper connection handle. All of the SQL functions will be passed this connection handle, which is identified by the SQL system as the same connection as that of the invoking user. int16 REXTERNAL QueryCheck ( HSYS hSys, /* in: system handle */ int16 noargs, /* in: number of arguments to function */ const VALUE *args, /* in: array of arguments */ VALUE *result, /* out: result value */ int16 *len) /* out: max length result string */ { /* NOTE: The argument to subquery MUST be a string literal in order for this to work. */ SQLHENV hEnv; /* environment handle for SQL calls */ SQLHDBC hDbc; /* connection handle for SQL calls */ RDM_SESS hSess; /* RDM session id */ SQLHSTMT hStmt; SQLRETURN ret; SQLSMALLINT colcount, parms; int16 status = SQL_SUCCESS; UNREF_PARM(noargs); UNREF_PARM(len); SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hEnv); SQLAllocHandle(SQL_HANDLE_DBC, hEnv, &hDbc); SYSSessionId(hSys, &hSess); SQLConnectWith(hDbc, hSess); SQLAllocHandle(SQL_HANDLE_STMT, hDbc, &hStmt); if ((ret = SQLPrepare(hStmt, (SQLCHAR *)args[0].vt.cv, SQL_NTS)) != SQL_SUCCESS ) { result->type = SQL_SMALLINT; SQLGetDiagRec(SQL_HANDLE_STMT, 1, &result->vt.lv, NULL, 0, NULL); status = SQL_ERROR; } else { SQLNumResultCols(hStmt, &colcount); if (colcount > 1) { result->type = SQL_CHAR; result->vt.cv = "more than one result column"; status = SQL_ERROR; } else { SQLNumParams(hStmt, &parms); if (parms) { result->type = SQL_CHAR; result->vt.cv = "no argument markers allowed"; status = SQL_ERROR; } else SQL User Guide 183 14. Developing SQL Server Extensions SQLDescribeCol(hStmt, 1, NULL, 0, NULL, &result->type, NULL, NULL, NULL); } } SQLFreeHandle(SQL_HANDLE_STMT, hStmt); SQLDisconnect(hDbc); SQLFreeHandle(SQL_HANDLE_DBC, hDbc); SQLFreeHandle(SQL_HANDLE_ENV, hEnv); return status; } Function udfFunc The UDF processing function is called by SQL from the udfFunc entry in UDFLOADTABLE during execution of the SQL statement that references the UDF. It is called once for each row that is retrieved by the statement. The function result is returned in the VALUE container pointed to by argument result. The following example illustrates the aggregate UDF processing function, CntFunc, defined for the udfCount. Note that the result value returns the current count increment for each row processed, even though only the aggregate value is used. Aggregate calculations require a running calculation retrieval from every processing function. This is because you have no way of knowing from within the UDF when RDM Server will call the function for the last time. The result SQL type and value (in this case, the type is SQL_INTEGER and the value is the current count) are return in the result output argument and SQL_ SUCCESS is returned. int16 REXTERNAL CntFunc ( HSYS hSys, /* void **cxtp, /* int16 noargs, /* const VALUE *args, /* VALUE *result) /* { COUNT_CTX *cnt = *cxtp; in: in: in: in: out: system handle */ statement context pointer */ number of arguments to function */ array of arguments */ result value */ UNREF_PARM(hSys); UNREF_PARM(noargs); result->type = SQL_INTEGER; if (args[0].type != SQL_NULL) result->vt.lv = ++cnt->count; else result->vt.lv = cnt->count; return SQL_SUCCESS; } The processing function for the subquery UDF is shown below.Even though QueryCheck (see above) compiled the specified select statement, QueryFunc needs to compile it as well because the statement containing the subquery reference may be contained in a precompile stored procedure.Therefore QueryFunc is being called in a (much) different context than when QueryCheck was called. The NULL context pointer is the signal to both allocate the context and compile, execute and fetch the subquery result. Notice that all of the work occurs on the first call to QueryFunc. All subsequent calls simply return the subquery's result value. SQL User Guide 184 14. Developing SQL Server Extensions int16 REXTERNAL QueryFunc ( HSYS hSys, /* in: system handle */ void **cxtp, /* in: statement context pointer */ int16 noargs, /* in: number of arguments to function */ const VALUE *args, /* in: array of arguments */ VALUE *result) /* out: result value */ { SUBQ_CTX *sqp = *cxtp; /* local context */ SQLHENV hEnv; /* environment handle for SQL calls */ SQLHDBC hDbc; /* connection handle for SQL calls */ RDM_SESS hSess; /* RDM session id */ SQLHSTMT hStmt; SQLUINTEGER prec; SQLPOINTER ptr; RM_MEMTAG mTag; int16 status = SQL_SUCCESS; UNREF_PARM(noargs); if (sqp == NULL) { SYSMemoryTag(hSys, &mTag); sqp = *cxtp = rm_getMemory(sizeof(SUBQ_CTX), mTag); sqp->mTag = mTag; SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hEnv); SQLAllocHandle(SQL_HANDLE_DBC, hEnv, &hDbc); SYSSessionId(hSys, &hSess); SQLConnectWith(hDbc, hSess); SQLAllocHandle(SQL_HANDLE_STMT, hDbc, &hStmt); SQLPrepare(hStmt, (UCHAR *) args[0].vt.cv, SQL_NTS); SQLDescribeCol(hStmt,1,NULL,0,NULL, &sqp->result.type, &prec, NULL, NULL); if (result->type == SQL_CHAR || result->type == SQL_VARCHAR) ptr = sqp->result.vt.cv = rm_getMemory(prec, mTag); else ptr = &sqp->result.vt; SQLBindCol(hStmt, 1, SQL_C_DEFAULT, ptr, prec, NULL); SQLExecute(hStmt); SQLFetch(hStmt); *result = sqp->result; if (SQLFetch(hStmt) != SQL_NO_DATA) { result->type = SQL_CHAR; result->vt.cv = "subquery() must return single row"; status = SQL_ERROR; } else sqp->result = *result; SQLFreeHandle(SQL_HANDLE_STMT, hStmt); SQLDisconnect(hDbc); SQLFreeHandle(SQL_HANDLE_DBC, hDbc); SQLFreeHandle(SQL_HANDLE_ENV, hEnv); } else *result = sqp->result; SQL User Guide 185 14. Developing SQL Server Extensions return status; } Function udfReset The udfReset function is only used in an aggregate UDF to perform a reset after the grouping value changes. In the following example, the CntReset function for the udfCount UDF clears the accumulator variables for the last aggregate, restarting the count for the next group to zero. int16 REXTERNAL CntReset ( HSYS hSys, /* in: void **cxtp) /* in: { COUNT_CTX *cnt = *cxtp; system handle */ statement context pointer */ UNREF_PARM(hSys); cnt->count = 0L; return SQL_SUCCESS; } Function udfCleanup Your UDF can optionally include a cleanup function in the udfCleanup entry for each UDF defined in the UDFLOADTABLE. When SQL statement processing is complete, SQL calls this function to free memory allocated by the udfInit function, or any memory allocated during statement execution. For the sample UDF, udfCount, the cleanup function is called CntCleanup. As shown below, CntCleanup simply frees the context pointer. Do not ever call rm_freeTagMemory within udfCleanup using the memory tag you acquired with SYSMemoryTag. This tag is associated with aspects of the statement's memory that RDM Server uses after udfCleanup returns. Rather, free the memory "manually" using rm_freeMemory. void REXTERNAL CntCleanup ( HSYS hSys, /* in: void **cxtp) /* in: { COUNT_CTX *cnt = *cxtp; system handle */ statement context pointer */ UNREF_PARM(hSys); rm_freeMemory(cnt, cnt->mTag); *cxtp = NULL; } SQL User Guide 186 14. Developing SQL Server Extensions 14.1.2 Using a UDF as a Trigger The definition and use of standard SQL triggers was previously described in Chapter 8 where trigger was defined as "a procedure associated with a table that is executed (i.e., fired) whenever that table is modified by the execution of an insert, update, or delete statement." The standard database triggers as implemented in RDM Server described in that earlier chapter are implemented using SQL statements only. If a trigger implementation requires more complex processing than can be done with SQL statements then either the standard trigger must call as user-defined procedure (see section 14.2) to do the work or it can be implemented through use of a UDF in conjunction with the table's check clause as described in this section. In the database schema, you can define a trigger UDF in the check clause of the create table statement for a particular table.The UDF returns a value (usually 1 for true and 0 for false) that is checked in the check condition. If the result of the condition is true, SQL allows the modification to occur. If the result is false, the modification is rejected. The example UDF module (udf.c) includes two trigger UDFs: HaveProduct and OkayToShip. The create table schema statements that references them are given below. Note that the prod_id and loc_id columns in the item table of the sales database reference the corresponding primary keys in the product and outlet tables in the invntory database. create table item ( ord_num smallint not null references sales_order, prod_id smallint not null references invntory.product, loc_id char(3) not null references invntory.outlet, quantity integer not null "number of units of product ordered", check(HaveProduct(ord_num, prod_id, loc_id, quantity) = 1) ) in salesd1; create table ship_log ( ord_date timestamp default now "date/time when order was entered", ord_num smallint not null "order number", prod_id smallint not null "product id number", loc_id char(3) not null "outlet location id", quantity integer not null "quantity of item to be shipped from loc_id", backordered smallint default 0 "set to 1 when item is backordered", check(OKayToShip(ord_num,prod_id,loc_id,quantity,backordered) = 1) ) in salesd0; The HaveProduct UDF automatically manages the invntory database and the ship_log table. When your RDM Server SQL application executes an insert statement, HaveProduct looks up the on_hand record for the specified prod_id and loc_id columns. If there are enough items available, HaveProduct subtracts the ordered amount of the item from the quantity in the on_hand record and inserts a row in the ship_log table, from which a packing list will be created. If there are not enough items available, HaveProduct assigns the quantity that is available to the order (that is, sets the quantity to zero) and inserts a row in ship_log for that quantity. With the backordered flag (for the OkayToShip UDF) set to 1, HaveProduct specifies the remaining amount needed to fill the order through an additional row in ship_log. When the RDM Server SQL application uses HaveProduct with a delete statement, the UDF adds the number of items ordered to the on_hand record and sets quantity in ship_log to 0 for the appropriate rows. The application can delete rows in ship_log when an order is actually shipped. When the application executes a delete statement for this table, OkayToShip checks the backordered flag. If the flag is set, the UDF rechecks the inventory to see if there are now enough items from which to fill the order. If there are still not enough, the trigger UDF rejects the delete request. If enough items are available, OkayToShip updates the inventory and processes the required number of items. SQL User Guide 187 14. Developing SQL Server Extensions The HaveProduct and OkayToShip trigger UDFs use the SQL statements shown below, along with the SALES_CTX structure containing the UDF statement context data. The statements are compiled in udfInit for each trigger UDF. The needed statements are executed by the udfFunc processing functions./* HaveProduct & OKayToShip SQL statements: */ static char inv_cursor[]= "select quantity from on_hand where prod_id=? and loc_id=?;"; static char inv_update[]= "update on_hand set quantity = ? where current of inv_cursor"; static char shp_insert[]= "insert into ship_log values(now, ?, ?, ?, ?, ?)"; static char shp_update[]= "update ship_log set ord_date = now, quantity = 0 where " "ord_num=? and prod_id=? and loc_id=?"; static char ord_update[]= "update sales_order set ship_date = now where ord_num=?"; /* HaveProduct and OKayToShip context data */ typedef struct sales_ctx { RM_MEMTAG mTag; /* system memory allocation tag */ int16 stype; /* statement type (e.g. sqlINSERT) */ SQLHENV henv; /* SQL environment handle */ SQLHDBC hdbc; /* SQL connection handle */ SQLHSTMT hInvSel; /* SQL statement handle for inv_cursor */ SQLHSTMT hInvUpd; /* SQL statement handle for inv_update */ SQLHSTMT hShpIns; /* SQL statement handle for shp_insert */ SQLHSTMT hShpUpd; /* SQL statement handle for shp_update or ord_update */ } SALES_CTX; The InvInit function shown below is the initialization function (type UDFINIT) for HaveProduct. It calls SQLConnectWith to use the same connection handle as the calling application, to ensure that the database changes made by the UDF are included in the transaction of the calling application. Thus, if the application executes a rollback statement for the transaction, the HaveProduct changes will be rolled back as well. Note also the use of SYSDescribeStmt to determine which type of operation (insert, delete, etc.) the application is performing on the table. static int16 REXTERNAL InvInit ( HSYS hSys, /* in: system handle */ void **cxtp) /* in: statement context pointer */ { SALES_CTX *stp; RDM_SESS hsess; RM_MEMTAG mTag; int16 status = SQL_SUCCESS; SYSMemoryTag(hSys, &mTag); stp = *cxtp = rm_getMemory(sizeof(SALES_CTX), mTag); stp->mTag = mTag; SYSDescribeStmt(hSys, &stp->stype); if ( stp->stype == sqlINSERT || stp->stype == sqlDELETE ) { /* connect to calling statement's connection */ SQLAllocHandle(SQL_HANDLE_ENV, NULL, &stp->henv); SQLSetEnvAttr(stp->henv, SQL_ATTR_ODBC_VERSION, (SQLPOINTER)SQL_OV_ODBC3, SQL_IS_INTEGER); SQLAllocHandle(SQL_HANDLE_DBC, stp->henv, &stp->hdbc); SYSSessionId(hSys, &hsess); SQLConnectWith(stp->hdbc, hsess); SQL User Guide 188 14. Developing SQL Server Extensions SQLAllocHandle(SQL_HANDLE_STMT, stp->hdbc, &stp->hInvSel); SQLSetCursorName(stp->hInvSel, inv_cursor_name, SQL_NTS); SQLPrepare(stp->hInvSel, inv_cursor, SQL_NTS); SQLAllocHandle(SQL_HANDLE_STMT, stp->hdbc, &stp->hInvUpd); SQLPrepare(stp->hInvUpd, inv_update, SQL_NTS); if ( stp->stype == sqlINSERT ) { SQLAllocHandle(SQL_HANDLE_STMT, stp->hdbc, &stp->hShpIns); SQLPrepare(stp->hShpIns, shp_insert, SQL_NTS); } else { SQLAllocHandle(SQL_HANDLE_STMT, stp->hdbc, &stp->hShpUpd); SQLPrepare(stp->hShpUpd, shp_update, SQL_NTS); } } return status; } The following example illustrates the InvCheck type checking function for HaveProduct. InvCheck verifies that the application is passing the correct number and types of parameters to HaveProduct. static int16 REXTERNAL InvCheck HSYS hSys, int16 noargs, const VALUE *args, VALUE *result, int16 *len) { int16 status = SQL_ERROR; ( /* /* /* /* /* in: in: in: out: out: system handle */ number of arguments to function */ array of arguments */ result value */ max length result string */ UNREF_PARM(hSys); UNREF_PARM(len); /* validate arguments */ if ( noargs != 4 ) result->vt.cv = "HaveProduct: requires 4 arguments"; else if ( args[0].type != SQL_SMALLINT ) result->vt.cv = "HaveProduct: ord_num must be 1st arg"; else if ( args[1].type != SQL_SMALLINT ) result->vt.cv = "HaveProduct: prod_id must be 2nd arg"; else if ( args[2].type != SQL_CHAR ) result->vt.cv = "HaveProduct: loc_id must be 3rd arg"; else if ( args[3].type != SQL_INTEGER ) result->vt.cv = "HaveProduct: quantity must be 4th arg"; else { result->type = SQL_SMALLINT; status = SQL_SUCCESS; } return status; } In the following example, the InvFunc function (type UDFFUNC), which is only used in conjunction with an insert or delete statement, performs the actual processing for HaveProduct. First, InvFunc opens a cursor to the row in the on_hand table with the matching prod_id and loc_id values. Then, for an insert statement, InvFunc binds the parameters for the ship_log rows and inserts one or two rows, depending on the available quantity in inventory. The function also updates the on_hand row. If processing a delete statement, with quantity set to 0 for the previously entered ship_log rows, InvFunc updates the on_hand record to include the non-backordered item quantity. SQL User Guide 189 14. Developing SQL Server Extensions static int16 REXTERNAL InvFunc ( HSYS hSys, /* void **cxtp, /* int16 noargs, /* const VALUE *args, /* VALUE *result) /* { int16 stat; int16 backordered; int32 quantity; int32 diff; const SALES_CTX *stp = *cxtp; in: in: in: in: out: system handle */ statement context pointer */ number of arguments to function */ array of arguments */ result value */ UNREF_PARM(hSys); UNREF_PARM(noargs); if ( stp->stype != sqlINSERT && stp->stype != sqlDELETE ) { result->type = SQL_CHAR; result->vt.cv = "cannot update item table - delete and re-insert"; return SQL_ERROR; } else { /* look up on_hand record */ SQLBindCol(stp->hInvSel,1,SQL_C_DEFAULT,&quantity,sizeof(quantity),NULL); SQLBindParameter(stp->hInvSel,1,SQL_PARAM_INPUT,SQL_C_SHORT,SQL_SMALLINT, 0L,0,(void *)&args[1].vt.sv,0,NULL); SQLBindParameter(stp->hInvSel,2,SQL_PARAM_INPUT,SQL_C_CHAR,SQL_CHAR, 3L,0,(void *)args[2].vt.cv,0,NULL); SQLExecute(stp->hInvSel); stat = SQLFetch(stp->hInvSel); if ( stat != SQL_SUCCESS ) { SQLFreeStmt( stp->hInvSel,SQL_CLOSE ); result->type = SQL_CHAR; result->vt.cv = "missing inventory record"; return SQL_ERROR; } /* set up on_hand update parameter */ SQLBindParameter(stp->hInvUpd,1,SQL_PARAM_INPUT,SQL_C_LONG,SQL_INTEGER, 0L,0,&diff,0,NULL); if ( stp->stype == sqlINSERT ) { /* set up ship_log insert parameters */ SQLBindParameter(stp->hShpIns,1,SQL_PARAM_INPUT,SQL_C_SHORT,SQL_SMALLINT, 0L,0,(void *)&args[0].vt.sv,0,NULL); SQLBindParameter(stp->hShpIns,2,SQL_PARAM_INPUT,SQL_C_SHORT,SQL_SMALLINT, 0L,0,(void *)&args[1].vt.sv,0,NULL); SQLBindParameter(stp->hShpIns,3,SQL_PARAM_INPUT,SQL_C_CHAR,SQL_CHAR, 3L,0,(void *)args[2].vt.cv, 0,NULL); SQLBindParameter(stp->hShpIns,4,SQL_PARAM_INPUT,SQL_C_LONG,SQL_INTEGER, 0L,0,(void *)&quantity, 0,NULL); SQLBindParameter(stp->hShpIns,5,SQL_PARAM_INPUT,SQL_C_SHORT,SQL_SMALLINT, 0L,0,(void *)&backordered, 0, NULL); diff = quantity - args[3].vt.lv; if ( diff >= 0 ) { /* all needed inventory is available */ backordered = 0; quantity = args[3].vt.lv; SQLExecute(stp->hShpIns); /* insert ship_log row */ SQLExecute(stp->hInvUpd); /* update inventory amount */ } else { SQL User Guide 190 14. Developing SQL Server Extensions /* there are not enough items available in inventory -- use what is there and backorder the rest */ /* insert ship_log row of used items (all remaining inventory) */ backordered = 0; SQLExecute(stp->hShpIns); /* insert ship_log row of backordered items */ backordered = 1; quantity = args[3].vt.lv - quantity; SQLExecute(stp->hShpIns); /* set inventory amount to zero */ diff = 0; SQLExecute(stp->hInvUpd); } } else { /* delete item row */ /* put items back into inventory */ diff = args[3].vt.lv + quantity; SQLExecute(stp->hInvUpd); /* ship_log.quantity == 0 => order has been changed */ SQLBindParameter(stp->hShpUpd,1,SQL_PARAM_INPUT,SQL_C_SHORT,SQL_SMALLINT, 0L,0,(void *)&args[0].vt.sv, 0, NULL); SQLBindParameter(stp->hShpUpd,2,SQL_PARAM_INPUT,SQL_C_SHORT,SQL_SMALLINT, 0L,0,(void *)&args[1].vt.sv, 0, NULL); SQLBindParameter(stp->hShpUpd,3,SQL_PARAM_INPUT,SQL_C_CHAR,SQL_CHAR, 3L,0,(void *)args[2].vt.cv, 0, NULL); SQLExecute(stp->hShpUpd); } } SQLFreeStmt(stp->hInvSel, SQL_CLOSE); result->type = SQL_SMALLINT; result->vt.sv = 1; return SQL_SUCCESS; /*lint !e438 */ } The InvCleanup cleanup function is shown below for HaveProduct. InvCleanup frees all RDM Server SQL handles used by the trigger UDF, as well as the context memory previously allocated. static void REXTERNAL InvCleanup ( HSYS hSys, /* in: void **cxtp) /* in: { const SALES_CTX *stp = *cxtp; system handle */ statement context pointer */ UNREF_PARM(hSys); if ( stp->stype == sqlINSERT || stp->stype == sqlDELETE ) { SQLFreeHandle(SQL_HANDLE_STMT, stp->hInvSel); SQLFreeHandle(SQL_HANDLE_STMT, stp->hInvUpd); if ( stp->stype == sqlINSERT ) SQLFreeHandle(SQL_HANDLE_STMT, stp->hShpIns); else SQL User Guide 191 14. Developing SQL Server Extensions SQLFreeHandle(SQL_HANDLE_STMT, stp->hShpUpd); SQLDisconnect(stp->hdbc); SQLFreeHandle(SQL_HANDLE_DBC, stp->hdbc); SQLFreeHandle(SQL_HANDLE_ENV, stp->henv); } rm_freeMemory(stp, stp->mTag); /*lint !e449 */ *cxtp = NULL; } The OkayToShip UDF is called from the check clause defined on the ship_log table. A delete on the ship_log table is defined as indicating that the item is to be shipped to the customer. The OkayToShip initialization function, ShipInit, is shown below. This function allocates the UDF context memory and the needed SQL handles. It then calls SQLPrepare to compile the SQL statements that execute the desired trigger actions. static int16 REXTERNAL ShipInit ( HSYS hSys, /* in: void **cxtp) /* in: { SALES_CTX *stp; RDM_SESS hsess; RM_MEMTAG mTag; int16 status = SQL_SUCCESS; system handle */ statement context pointer */ SYSMemoryTag(hSys, &mTag); stp = *cxtp = rm_getMemory(sizeof(SALES_CTX), mTag); stp->mTag = mTag; SYSDescribeStmt(hSys, &stp->stype); if ( stp->stype == sqlDELETE ) { /* connect to calling statement's connection */ SQLAllocHandle(SQL_HANDLE_ENV, NULL, &stp->henv); SQLSetEnvAttr(stp->henv, SQL_ATTR_ODBC_VERSION, (SQLPOINTER)SQL_OV_ODBC3, SQL_IS_INTEGER); SQLAllocHandle(SQL_HANDLE_DBC, stp->henv, &stp->hdbc); SYSSessionId(hSys, &hsess); SQLConnectWith(stp->hdbc, hsess); SQLAllocHandle(SQL_HANDLE_STMT, stp->hdbc, &stp->hInvSel); SQLPrepare(stp->hInvSel, inv_cursor, SQL_NTS); SQLSetCursorName(stp->hInvSel, inv_cursor_name, SQL_NTS); SQLAllocHandle(SQL_HANDLE_STMT, stp->hdbc, &stp->hInvUpd); SQLPrepare(stp->hInvUpd, inv_update, SQL_NTS); SQLAllocHandle(SQL_HANDLE_STMT, stp->hdbc, &stp->hShpUpd); SQLPrepare(stp->hShpUpd, ord_update, SQL_NTS); } return status; } The cleanup function for OkayToShip frees the allocated SQL handles. static void REXTERNAL ShipCleanup ( HSYS hSys, /* in: void **cxtp) /* in: { const SALES_CTX *stp = *cxtp; SQL User Guide system handle */ statement context pointer */ 192 14. Developing SQL Server Extensions UNREF_PARM(hSys) if ( stp->stype == sqlDELETE ) { SQLFreeHandle(SQL_HANDLE_STMT, stp->hInvSel); SQLFreeHandle(SQL_HANDLE_STMT, stp->hInvUpd); SQLFreeHandle(SQL_HANDLE_STMT, stp->hShpUpd); SQLDisconnect(stp->hdbc); SQLFreeHandle(SQL_HANDLE_DBC, stp->hdbc); SQLFreeHandle(SQL_HANDLE_ENV, stp->henv); } rm_freeMemory(stp, stp->mTag); /*lint !e449 */ *cxtp = NULL; } OkayToShip takes all of the ship_log columns except ord_date as arguments. Function ShipCheck, shown below, ensures that the correct number and types have been specified. static int16 REXTERNAL ShipCheck( HSYS hSys, /* int16 noargs, /* const VALUE *args, /* VALUE *result, /* int16 *len) /* { int16 status = SQL_ERROR; in: in: in: out: out: system handle */ number of arguments to function */ array of arguments */ result value */ max length result string */ UNREF_PARM(hSys); UNREF_PARM(len); /* validate arguments */ if ( noargs != 5 ) result->vt.cv = "OkayToShip: requires 5 arguments"; else if ( args[0].type != SQL_SMALLINT ) result->vt.cv = "OkayToShip: ord_num must be 1st arg"; else if ( args[1].type != SQL_SMALLINT ) result->vt.cv = "OkayToShip: prod_id must be 2nd arg"; else if ( args[2].type != SQL_CHAR ) result->vt.cv = "OkayToShip: loc_id must be 3rd arg"; else if ( args[3].type != SQL_INTEGER ) result->vt.cv = "OkayToShip: quantity must be 4th arg"; else if ( args[4].type != SQL_SMALLINT ) result->vt.cv = "OkayToShip: backordered must be 5th arg"; else { result->type = SQL_SMALLINT; status = SQL_SUCCESS; } return status; } Function ShipFunc performs the OkayToShip trigger operations. The on_hand row associated with the warehouse from which the item will be shipped is rechecked for backordered items to see if there is now a sufficient quantity for filling the order. If there is, on_hand.quantity is decremented by the backordered quantity and the delete is allowed. If there is still not enough inventory, the delete is rejected. If all is okay, the ship_date column in the sales_order table is updated indicating that (at least part of) the order has been shipped. SQL User Guide 193 14. Developing SQL Server Extensions static int16 REXTERNAL ShipFunc( HSYS hSys, /* void **cxtp, /* int16 noargs, /* const VALUE *args, /* VALUE *result) /* { int16 stat; int32 quantity; int32 diff; const SALES_CTX *stp = *cxtp; in: in: in: in: out: system handle */ statement context pointer */ number of arguments to function */ array of arguments */ result value */ UNREF_PARM(hSys); UNREF_PARM(noargs); if (stp->stype == sqlDELETE) { if (args[4].vt.sv == 1) { /* item was backordered -- see if inventory now has enough */ SQLBindCol(stp->hInvSel,1,SQL_C_DEFAULT,&quantity,sizeof(quantity),NULL); SQLBindParameter(stp->hInvSel,1,SQL_PARAM_INPUT,SQL_C_SHORT,SQL_SMALLINT, 0L,0,(void *)&args[1].vt.sv,0,NULL); SQLBindParameter(stp->hInvSel,2,SQL_PARAM_INPUT,SQL_C_CHAR,SQL_CHAR, 3L,0,(void *)args[2].vt.cv,0,NULL); SQLExecute(stp->hInvSel); stat = SQLFetch(stp->hInvSel); if ( stat != SQL_SUCCESS ) { SQLFreeStmt(stp->hInvSel, SQL_CLOSE); result->type = SQL_CHAR; result->vt.cv = "missing inventory record"; return SQL_ERROR; } if (quantity >= args[3].vt.lv) { /* inventory now has enough to ship! */ /* set up on_hand update parameter */ SQLBindParameter(stp->hInvUpd,1,SQL_PARAM_INPUT,SQL_C_LONG, SQL_INTEGER,0L,0,(void *)&diff,0,NULL); diff = quantity - args[3].vt.lv; SQLExecute(stp->hInvUpd); SQLFreeStmt(stp->hInvSel, SQL_CLOSE); } else { /* still can't ship */ SQLFreeStmt(stp->hInvSel, SQL_CLOSE); result->type = SQL_CHAR; result->vt.cv = "can't delete(i.e. ship) backordered item"; return SQL_ERROR; } } /* update sales_order's ship_date */ SQLBindParameter(stp->hShpUpd, 1, SQL_PARAM_INPUT, SQL_C_SHORT, SQL_SMALLINT, 0L, 0, (void *) &args[0].vt.sv, 0, NULL); SQLExecute(stp->hShpUpd); } result->type = SQL_SMALLINT; result->vt.sv = 1; return SQL_SUCCESS; /*lint !e438 */ } SQL User Guide 194 14. Developing SQL Server Extensions 14.1.3 Invoking a UDF Before your application can use a UDF, it must register the module using a create function statement, as shown in the following example. This statement registers the UDF module with the syscat database. The following statements create three aggregate UDFs contained in a DLL called statpack on a Velocis device named add_ins. create aggregate function devsq "compute sum of the squares of deviations", stddev "compute standard deviation", geomean "compute the geometric mean" in statpack on add_ins; Once the module is registered, a UDF can be called from SQL statements, just like the built-in RDM Server SQL-callable functions. Examples are given below for using an aggregate UDF and a scalar UDF. The following example illustrates entry of statements to call the sample scalar UDF SubQuery and the sample aggregate UDFs std and stds. The example uses the sales and invntory databases. create aggregate functions std, stds scalar function subquery in udf on sqlsamp; select sale_name, avg(amount), std(amount), stds(amount) from salesperson, customer, sales_order where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id group by 1; SALE_NAME Flores, Bob Jones, Walter Kennedy, Bob McGuire, Sidney Nash, Gail Porter, Greg Robinson, Stephanie Stouffer, Bill Warren, Wayne Williams, Steve Wyman, Eliska AVG(AMOUNT) 19233.557778 28170.703333 61362.110000 18948.373636 34089.695556 87869.300000 24993.631333 3631.662500 21263.850000 27464.443333 23617.375417 STD(AMOUNT) 21767.832956 22055.396667 75487.487619 16888.086829 35751.014170 87370.831661 28766.406110 2731.390470 24150.207498 16696.742874 31511.044841 STDS(AMOUNT) 23088.273442 22829.504456 78844.109392 17712.374895 37919.676831 97683.559422 29776.059184 2919.979236 25456.553886 17709.570165 32188.779254 select state, 100*sum(amount)/subquery("select sum(amount) from sales_order") pct_of_sales from customer, sales_order where customer.cust_id = sales_order.cust_id group by state; STATE AZ CA CO FL GA IL IN LA MA MI MN SQL User Guide PCT_OF_SALES 6.386350 13.108034 13.422859 3.591970 3.057682 4.310374 0.781594 4.993924 3.233216 11.819327 1.330608 195 14. Developing SQL Server Extensions MO NJ NY OH PA TX VA WA WI 3.807593 0.425850 10.425037 4.414228 3.911350 3.259824 1.695903 1.634471 4.389806 Calling an Aggregate UDF After module registration, as described above, the application can call an aggregate UDF from SQL statements. For example, the select statement shown below calls the aggregate UDF geomean, defined in the previous section. Note that the code also calls the built-in aggregate function avg. select state, avg(amount), geomean(amount) from customer, sales_order where customer.cust_id = sales_order.cust_id group by state; Calling a Scalar UDF Your RDM Server SQL application calls a scalar UDF from SQL statements, just as it calls the built-in functions. The next example illustrates the use of the sample UDF SubQuery to retrieve a percentage of total values. Note the power of this UDF, as shown by the need to use only a single select statement in the application. select state, 100*sum(amount)/subquery("select sum(amount) from sales_order") pct_of_sales from customer, sales_order where customer.cust_id = sales_order.cust_id group by state; The select statement in the next example calls a scalar UDF called tax_rate, which returns the tax rate for a given city. select company, city, state, tax_rate(city, state) tax_rate from customer; This tax_rate UDF looks up the tax rate for a locale in an internal table or a database table. An application can use this UDF as shown below, to display sales orders with tax amounts that do not correspond to the going rate. select company, city, state, ord_num, ord_date, amount, tax from customer, sales_order where customer.cust_id = sales_order.cust_id and not equal_float(convert(tax, float), amount * tax_rate(city, state), 0.005); Note that this example also uses a UDF called equal_float that returns TRUE if two floating-point values differ by less than the value of the third parameter. Note also the use of the built-in function convert to change the value of the column tax from type real to type float. SQL User Guide 196 14. Developing SQL Server Extensions 14.1.4 UDF Support Library A library of support functions for SQL user-defined functions has been provided to allow UDFs to perform low-level database operations, associate SQL modification commands with the client's transactions, and utilize the SQL system's data type arithmetic capabilities. A list of the available functions is provided in the table below. Table 13-1. UDF Support Library Functions Function Description SYSDBHandle Retrieve the RDM_DB handle for a specified database SYSDbaToRowId Convert an RDM DB_ADDR to an SQL_DBADDR (RowId) SYSMemoryTag SYSRowIdToDba SYSRowDba SYSRowId SYSSessionId SYSDescribeStmt SYSValChgSign SYSValCompare SYSValAdd SYSValSub SYSValMult SYSValDiv Return UDF memory allocation tag Convert an SQL_DBADDR for a table to an RDM DB_ ADDR Get the DB_ADDR for the current row of the specified table Get the RowId (SQL_DBADDR) for the current row of the specified table Get the RDM_SESS (SessionId) associated with a HSYS Get the statement type associated with a HSYS Change the sign of a VALUE Compare 2 VALUEs Add 2 VALUEs Subtract 2 VALUEs Multiply 2 VALUEs Divide 2 VALUEs The SYS-prefixed functions are used to access SQL-maintained statement context information. Most of the SYS functions have standard SQL-prefixed functions that provide the same functionality for the connection handle (HDBC) instead of the system handle (HSYS). These functions are provided so that the information associated with the SQL statement that uses a UDF can be accessed by the UDF. The SYSVal-prefixed functions are provided so that, if needed, you can do mixed-mode arithmetic in your UDFs. Each of the SYSVal functions is passed arguments of type VALUE as described earlier. The functions that are provided hook into the internal SQL arithmetic functions to perform the mixed-mode arithmetic operations. The results of the mixed mode arithmetic follow the standard C rules. 14.2 User-Defined Procedures A UDP is an application-specific procedure written in C that is architecturally similar to a UDF but is only invoked through the RDM Server SQL call (execute) statement. UDPs can do anything that RDM Server SQL stored procedures can do (including retrieve result sets), but UDPs are more flexible than stored procedures because they are written in C and can support dynamic parameter lists. The example UDP module code (udp.c) described below is included in the examples/udp directory. Before a UDP can be used from SQL is must first be registered with the system which is done using the following form of the create procedure statement. create_procedure: create proc[edure] procname ["description"] in dllname on devname The name of the procedure is given by the identifier procname along with an optional description string. The name of the UDP module in which this UDP is implemented is given by libname which must be located in the RDM Server device named devname. SQL User Guide 197 14. Developing SQL Server Extensions The example UDP module defines three UDPs, tims_data, log_login, and log_logout. The tims_data UDP retrieves a result set consisting of data from the tims database. To illustrate how a UDP can retrieve multiple result sets, tims_data retrieves an additional result set that reports the total number of rows retrieved in the first result set. The sample log_login and log_logout UDPs are special login and logout procedures which keep a record of all users who have logged in and out of the RDM Server through the SQLConnect function. Table 13-5. Procedures Defined in the Sample UDP Module Function Description tims_data Uses RDM Server core-level (d_) API to retrieve a result set log_login log_logout from the TIMS database example. System login tracking procedure. System logout tracking procedure. 14.2.1 UDP Implementation Keep the following concepts in mind when programming your UDP module. l l l l l l l l l The UDP must include a load function named udpDescribeFcns which identifies all of the UDPs implemented in the module. UDP modules that contain the implementation of a transaction trigger (registered through a call to function SQLTransactTrigger) must include a both a ModInit and a ModCleanup function. Each UDP in the module can optionally include an initialization function of type UDPINIT. This function normally allocates and initializes UDP-specific context memory containing statement handles and other operational data. The SQL system calls it when the at the start of the execution of the call statement which invokes the UDP. Each UDP that takes procedures arguments must have a type checking function of type UDPCHECK. UDPs can take any number of arguments of any type, and the value returned can be of any data type, except long varchar, long wvarchar, or long varbinary. Each UDP must include a processing function of type UDPEXECUTE. This function executes the procedure and, if applicable, returns the first select statement result set. Each UDP can optionally include a function of type UDPMORERESULTS to obtain the next result set. Each UDP can optionally include a function of type UDPCOLDATA which can be used with UDPs that return result sets to return a description of one of the result set columns. Each UDP can optionally include a cleanup function of type UDPCLEANUP. If defined, this function is called by RDM Server SQL when the procedure's statement is closed, or when the udpMoreResults function returns status SQL_ NO_DATA (indicating there are no more result sets to be returned). In coding your UDP, you must declare all functions exactly as shown in the function references (including the REXTERNAL attribute). If a function declaration deviates at all, it will not match the UDP type (for example, UDPCHECK) used in the function declaration. This will cause a compilation error. Function udpDescribeFcns The UDP module must contain a function with the name udpDescribeFcns (use exact name). This function is called by RDM Server SQL to fetch the definitions of UDPs contained in the module from the UDPLOADTABLE struct array it returns. A typical udpDescribeFcns implementation is shown in the following example. void REXTERNAL udpDescribeFcns ( uint16 *NumProcs, PUDPLOADTABLE *UDPLoadTable, SQL User Guide /* out: number of procedures in module */ /* out: points to UdfTable above */ 198 14. Developing SQL Server Extensions const char **fcn_descr) /* out: optional description string */ { *NumProcs = RLEN(UdpTable); /* RLEN computes # of entries in struct array */ *UDPLoadTable = UdpTable; *fcn_descr = "Sample of SQL C-based procedures"; } There must be one entry in the UDPLOADTABLE array for each UDP that is implemented in the module. The declaration for UDPLOADTABLE is contained in header file sqlsys.h (included with emmain.h) and is shown below. typedef struct udploadtable { uint8 version; char udpName[33]; PUDPCHECK udpCheck; PUDPEXECUTE udpExecute; PUDPMORERESULTS udpMoreResults; PUDPINIT udpInit; PUDPCLEANUP udpCleanup; PUDPCOLDATA udpColData; } UDPLOADTABLE, *PUDPLOADTABLE; /* /* /* /* /* /* /* /* version of this structure */ name of user procedure */ type checking */ execute first result set */ move to next result set */ initilization for user procedure */ cleanup for user procedure */ column description data */ Each element of the UDPLOADTABLE struct is described in the following table. Table 13-6. UDPLOADTABLE Struct Element Descriptions Element Description version Must be assigned to the RDM Server defined macro: udpName udpCheck udpExecute udpMoreResults udpInit udpCleanup udpColData UDPTBLVERSION. The name of the procedure. Must conform to a standard SQL identifier. It is case-insenstive and unique (system-wide). Pointer to the argument type checking function. Assign to NULL if there are no arguments. Pointer to the execution function. Pointer to the function that initializes processing of the next result set. Assign to NULL if no more than 1 result set is returned. Pointer to pre-execution initialization function. Pointer to post-execution cleanup function. Pointer to function that returns descriptions of the columns in the result set. Assign to NULL if the UDP does not return a result set. The udpDescribeFcns function must place an entry in the load table for each function in each UDP. You also can return an optional string describing the module, which will be printed on the server console when the UDP is loaded. Null can be supplied if there is no string. The following example shows the load table and udpDescribeFcns for the sample UDP module (udp.c). #include <stdio.h> #include <string.h> SQL User Guide 199 14. Developing SQL Server Extensions /* Definitions and header to setup EM ------------ */ /* (all EMs must have a code block like this) --- */ #define EM_NAME UDP /* the uppercased module name */ #define EMTYPE_UDP #define DEF_ModInit #define DEF_ModCleanup /* EMTYPE_EM, EMTYPE_UDF, EMTYPE_UDP, EMTYPE_IEF */ /* remove if no ModInit in this module */ /* remove if no ModCleanup in this module */ #include "emmain.h" ... static UDPCHECK static UDPEXECUTE static UDPMORERESULTS static UDPINIT static UDPCLEANUP static UDPCOLDATA static UDPINIT static UDPCLEANUP static UDPEXECUTE static UDPEXECUTE /* must follow the above definitions and OS #includes */ timsCheck; timsExecute; timsMoreResults; timsInit; timsCleanup; timsColData; logInit; logCleanup; logLogin; logLogout; static TRANSACTTRIGGER TransactTrigger; /*-------------------------------------------------------------------------Table of user-defined procedures in this module ---------------------------------------------------------------------------*/ static const UDPLOADTABLE UdpTable[] = { {UDPTBLVERSION, "tims_data", timsCheck, timsExecute, timsMoreResults, timsInit, timsCleanup, timsColData}, {UDPTBLVERSION, "log_login", NULL, logLogin, NULL, logInit, logCleanup, NULL}, {UDPTBLVERSION, "log_logout", NULL, logLogout, NULL, logInit, logCleanup, NULL} }; Function ModInit If your UDP module has a transaction trigger you must include a ModInit (and ModCleanup) function. The server calls ModInit when the UDP module is loaded, passing a single module handle. Your ModInit function needs to save this handle in a global variable that will be subsequently used by SQLTransactTrigger to register the transaction trigger function. The example below shows the ModInit function defined for the sample UDP module. SQL User Guide 200 14. Developing SQL Server Extensions static HMOD ghMod = NULL; ... int16 REXTERNAL ModInit( HMOD hMod) /* in: Module handle, used by SQLTransactTrigger() */ { ghMod = hMod; return S_OKAY; } Function udpInit Your UDP can optionally include an initialization function of type UDPINIT. RDM Server SQL calls this function from the udpInit entry in UDPLOADTABLE to perform any initialization the UDP may require. Common tasks include allocating memory for any context data and initializations such as handle allocations, connections, and compiling statements to be executed in udpExecute.. If needed, your udpInit function should allocate memory for the UDP context using an rm_cGetMemory call. The memory tag (mTag) argument passed to the function should be used in all dynamic memory allocations. RDM Server SQL uses this tag to free all of the memory allocated by the UDP in case an error occurs outside the UDP. The err argument is a pointer to a standard RDM Server SQL VALUE structure (See SQL Data VALUE Container Description). The udpInit function uses this structure to pass error information back to the RDM Server SQL API for reporting to the application. The udpInit function sets the type field in the structure to either SQL_SMALLINT or SQL_CHAR. If the field is set to SQL_SMALLINT, the vt.sv field should be set to the RDM Server SQL error code to return to the RDM Server SQL API. If udpInit sets the type field in the VALUE structure to SQL_CHAR, then it must set the vt.cv field to point to a static string containing an error message that will be reported under the errGENERAL error code. In addition to setting the err parameter, the udpInit function should also return SQL_ERROR if an error occurs. The timsInit function, shown below for the tims_data UDP, allocates the UDP_CTX structure for the UDP context and opens the tims database via the Core (d_) API with dirty read mode enabled (no locking). Then the function allocates the necessary RDM Server SQL statement handles, executes two statements, and stores the information in the allocated UDP context. If any errors occur, the information is returned to RDM Server in the err argument. typedef struct udp_ctx { SQLHENV hEnv; SQLHDBC hDbc; SQLHSTMT hStmt; RDM_SESS hSess; RDM_DB hDb; int16 finished; } UDP_CTX; static const char TimsCreate[] = "create temporary table timstab(" "author char(31), " "id_code char(15), " "info_title char(48), " "publisher char(31), " "pub_date char(11), " "info_type smallint); "; static const char TimsInsert[] = SQL User Guide 201 14. Developing SQL Server Extensions "insert into timstab values(?,?,?,?,?,?)"; ... /* =================================================================== Initialization function for TIMS DB access */ int16 REXTERNAL timsInit( void **ctxp, /* in: proc context pointer */ int16 noargs, /* in: number of arguments passed */ VALUE *args, /* in: arguments, args[noargs-1] */ RDM_SESS hSess, /* in: current session id */ RM_MEMTAG mTag, /* in: memory tag for rm_ memory calls */ VALUE *err) /* out: container for error messages */ { int16 stat; /* allocate a cleared UDP context memory */ UDP_CTX *ctx = rm_cGetMemory(sizeof(UDP_CTX), mTag); ctx->hSess = hSess; if ((stat = d_open("tims", "s", hSess, &ctx->hDb)) != S_OKAY) err->type = SQL_CHAR; err->vt.cv = "unable to open TIMS"; rm_freeMemory(ctx, mTag); return SQL_ERROR; } ... { /* enable dirty reads */ d_rdlockmodes(1, 1, ctx->hSess); /* allocate and initialize SQL handles */ SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &ctx->hEnv); SQLSetEnvAttr(ctx->hEnv, SQL_ATTR_ODBC_VERSION,(SQLPOINTER)SQL_OV_ODBC3, SQL_IS_UINTEGER); SQLAllocHandle(SQL_HANDLE_DBC, ctx->hEnv, &ctx->hDbc); SQLConnectWith(ctx->hDbc, hSess); SQLAllocHandle(SQL_HANDLE_STMT, ctx->hDbc, &ctx->hStmt); SQLExecDirect(ctx->hStmt, TimsCreate, SQL_NTS); SQLPrepare(ctx->hStmt, TimsInsert, SQL_NTS); *ctxp = ctx; return SQL_SUCCESS; } The call to SQLExecDirect creates a temporary table (timstab) and the call to SQLPrepare compiles a statement to insert data into this table. The columns declared in timstab include the author name, which will be retrieved from the author record in the tims database and one column for each of the fields declared in the info record in tims. The insert statement includes a parameter marker for each of the columns declared in timstab. The last statement before the return in timsInit, (*ctxp = ctx;) must be specified so that the context pointer can be passed by SQL to the other UDP functions. SQL User Guide 202 14. Developing SQL Server Extensions Function udpCheck When the application calls SQLPrepare to compile a call (execute) statement referencing a UDP, SQLPrepare calls the function in the udpCheck entry of UPDLOADTABLE to validate that the arguments specified in the execute statement are correct for the specified UDP. Like the udpInit function, udpCheck uses the err argument to return error information to RDM Server. In addition, the function returns SQL_ERROR if an error returns as its return code. Unlike the udfCheck function, the udpCheck function only uses the err argument for error information. The following example shows the timsCheck type checking function for tims_data. Remember that the sample login and logout procedures do not have any arguments and, hence, do not need a type checking function. int16 REXTERNAL timsCheck( int16 noargs, /* in: number of arguments passed */ const int16 *types, /* in: type of each arg., types[noargs-1] */ VALUE *err) /* out: container for error messages */ { int16 arg; err->type = SQL_CHAR; err->vt.cv = "tims_data requires char arguments only"; for (arg = 0; arg < noargs; ++arg) { if (types[arg] != SQL_CHAR) return SQL_ERROR; } return SQL_SUCCESS; } Function udpExecute The execution function for the UDP (type UDPEXECUTE) is called by RDM Server SQL from the udpExecute entry in UDPLOADTABLE when SQLExecute is processing a call (execute) statement that references the UDP (after the udpInit function and udpCheck functions, if any, have been called). If a result set is generated, then the hstmt referencing this result set must be returned in the phStmt argument. This allows the client side application to fetch the results by calling SQLFetch or SQLFetchScroll with the hstmt used in the procedure execution. The timsExecute function for the tims_data UDP is described below. Since tims is not an SQL database, timsExecute uses the Core API (d_ functions) to retrieve data from the tims database and stores the data in a temporary SQL table. The timsExecute function first calls SQLBindParameter several times to bind values to the parameter markers for the insert statements. The execution function then accesses the tims database. The d_setoo function call sets the current member of the author_list set to null so that the first d_findnm call will return the first member of the set. int16 REXTERNAL timsExecute( void **ctxp, /* in: int16 noargs, /* in: VALUE *args, /* in: RM_MEMTAG mTag, /* in: SQLHSTMT *phStmt, /* out: VALUE *err) /* out: { static char errmsg[65]; char author[32]; struct info ir; int16 stat, arg; SQL User Guide proc context pointer */ number of arguments to procedure */ array of arguments */ memory tag for rm_ memory calls */ hstmt for result set */ container for error messages */ 203 14. Developing SQL Server Extensions char UDP_CTX RDM_DB SQLHSTMT *author_arg = args->vt.cv; *ctx = *ctxp; hdb = ctx->hDb; hstmt = ctx->hStmt; /* set up insert parameter values */ SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 31, 0, author, 0, NULL); SQLBindParameter(hstmt, 2, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 15, 0, ir.id_code, 0, NULL); SQLBindParameter(hstmt, 3, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 79, 0, ir.info_title, 0, NULL); SQLBindParameter(hstmt, 4, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 31, 0, ir.publisher, 0, NULL); SQLBindParameter(hstmt, 5, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 11, 0, ir.pub_date, 0, NULL); SQLBindParameter(hstmt, 6, SQL_PARAM_INPUT, SQL_C_SHORT, SQL_SMALLINT, 0, 0, &ir.info_type, 0, NULL); /* extract data from tims database & insert into SQL table */ d_setoo(AUTHOR_LIST, AUTHOR_LIST, hdb); for (;;) { while ((stat = d_findnm(AUTHOR_LIST, hdb)) == S_OKAY) { d_recread(AUTHOR, author, hdb); for (arg = 0; arg < noargs; ++arg) { /* check for author in argument list */ char *aname = args[arg].vt.cv; if (strncmp(author, aname, strlen(aname)) == 0) break; } if (noargs == 0 || arg < noargs) break; } if (stat != S_OKAY) break; d_setor(HAS_PUBLISHED, hdb); while ((stat = d_findnm(HAS_PUBLISHED, hdb)) == S_OKAY) { d_recread(INFO, &ir, hdb); SQLExecute(hstmt); } } SQLFreeStmt(hstmt, SQL_RESET_PARAMS); /* return result set from SQL table */ SQLExecDirect(hstmt, "select * from timstab", SQL_NTS); *phStmt = ctx->hStmt; return SQL_SUCCESS; } As each author record is retrieved from the author_list set, timExecute compares the name with each string argument passed to the tims_data UDP. When the function finds a match, it fetches the publications for the author from the has_published set. The timsExecute function reads each info record and executes the insert statement to store the result in the timstab table. When all authors have been processed, the timsExecute function executes a select statement that will return the rows stored in timstab. The function returns the handle of the select statement in phStmt, so that subsequent calls to SQLFetch can retrieve the results. SQL User Guide 204 14. Developing SQL Server Extensions Function udpMoreResults UDP that need to return the result sets of more than one select statement must include a udpMoreResults function, which is called by RDM Server SQL from the udpMoreResults entry in UDPLOADTABLE when the application calls SQLMoreResults on the statement handle associated with the call (execute) statement. (SQLMoreResults is called after the initial statement execution, which takes place in udpExecute, has occurred.) The udpMoreResults function has the same argments and executes the same way as the udpExecute function, also returning the statement handle associated with the currently executing select statement in phStmt. When there are no more result sets to return, have udpMoreResults return SQL_NO_ DATA informing SQL that UDP processing is complete and to call the UDP's udpCleanup function, if one exists. In the next example, the first time timsMoreResults is called, the call to SQLExecDirect sets up a result set that retrieves the total number of rows in the timstab table. The finished flag in the UDP context is set to indicate that the first call has been made, so that the next time SQLMoreResults is called, the function sees this set flag and returns SQL_NO_DATA. If a UDP omits the udpMoreResults function, SQLMoreResults automatically returns SQL_NO_ DATA. int16 REXTERNAL timsMoreResults( void **ctxp, /* in: proc context pointer */ int16 noargs, /* in: number of arguments to procedure */ VALUE *args, /* in: array of arguments */ RM_MEMTAG mTag, /* in: memory tag for rm_ memory calls */ SQLHSTMT *phStmt, /* out: hstmt for result set */ VALUE *err) /* out: container for error messages */ { UDP_CTX *ctx = *ctxp; if (ctx->finished) return SQL_NO_DATA; ctx->finished = 1; SQLCloseCursor(ctx->hStmt); SQLExecDirect(ctx->hStmt, "select count(*) 'TOTAL ROWS FOUND' from timstab", SQL_NTS); *phStmt = ctx->hStmt; return SQL_SUCCESS; } Function udpColData This function, if specified, is called by SQL while processing a call to function SQLProcedureColumns in order to retrieve descriptions of the columns in the result set returned by the UDP. The function returns via the pColDescr output argument a description of the column specified by argument colno where the first column is 0. The function must return status SQL_NO_DATA when the specified colno is invalid (either less than zero or greater than or equal to the number of columns in the UDP result set). The udpUpdCol entry in the UDPLOADTABLE must be NULL if the UDP does not return a result set and can be NULL even if the UDP does return a result set. The pColDescr argument is a pointer to a struct of type UDPPROCCOLDATA which is declared in header file sqlsys.h as shown below. SQL User Guide 205 14. Developing SQL Server Extensions /* user defined procedure column data */ typedef struct udpproccoldata { char dbname[33]; char tblname[33]; char procname[33]; char colname[33]; int16 coltype; int32 datatype; char sqltypename[33]; int32 precision; int32 length; int16 scale; int16 radix; int16 nullable; const char *remarks; char col_def[33]; int32 sql_data_type; int32 sql_datetime_sub; int32 char_octet_len; int32 ordinal_pos; char is_nullable[4]; char specific_name[33]; } UDPPROCCOLDATA; Each element of the UPDPROCCOLDATA struct corresponds to a column of the result set returned by the SQLProcedureColumns ODBC function call as described in the following table. Table 13-7. UDPPROCCOLDATA Struct Element Descriptions Element Description dbname The name of the database accessed by the UDP or NULL. tblname The name of the table accessed by the UDP or NULL. procname colname coltype datatype sqltypename precision length scale radix nullable remarks col_def sql_data_type sql_datetime_sub SQL User Guide The name of the UDP. The name of the result set column. The type of column: only 5=result set column is supported. SQL data type constant (e.g., SQL_SMALLINT). RDM Server data type name (e.g., "smallint"). The specified max size of a character column or the precision of a numeric column (e.g., integer, float, double). The maximum length in bytes to contain the column values. For char data this includes the terminating null byte. For columns of type decimal this contains the number of decimal places in the result. Zero otherwise. For numeric types either 10 (decimal) or 2 (all others). Zero for non-numeric types. Indicates if the column can accept a NULL value. Either NULL or a udpColData-allocated (use rm_getMemory with the mTag argument) string containing a description of your choice. A string containing the column's default value or "TRUNCATED" if the default value does not fit in the field. 0 (unused). 0 (unused). 206 14. Developing SQL Server Extensions Element char_octet_len ordinal_pos is_nullable specific_name Description Currently, this returns the same value as the length field. The column's ordinal position in the result set beginning with 1. "YES" if the column can include nulls, "NO" if not, "" if it is unknown. Same as procname. The tims_data version of udpColData is shown below including its declaration of the UPDPROCCOLDATA table for the result set it returns. ... static const UDPPROCCOLDATA timsDataCD[] = { {"","","tims_data","author",0,1,"CHAR",31,31, -1,10,2,NULL,"",0,0,31,1,"","tims_data"}, {"","","tims_data","id_code",0,1,"CHAR",15,15, -1,10,2,NULL,"",0,0,15,2,"","tims_data"}, {"","","tims_data","info_title",0,1,"CHAR",48,48, -1,10,2,NULL,"",0,0,48,3,"","tims_data"}, {"","","tims_data","publisher",0,1,"CHAR",31,31, -1,10,2,NULL,"",0,0,31,4,"","tims_data"}, {"","","tims_data","pub_date",0,1,"CHAR",11,11, -1,10,2,NULL,"",0,0,11,5,"","tims_data"}, {"","","tims_data","info_type",0,5,"SMALLINT",5,2, 0,10,2,NULL,"",0,0,0,6,"","tims_data"}, }; #define MAX_TIMSDATA_COLUMNS (sizeof(timsDataCD)/sizeof(UDPPROCCOLDATA)) ... static int16 REXTERNAL timsColData( const VALUE *args, /* in: array of arguments */ UDPPROCCOLDATA *pColDescr, /* out: procedure column data pointer */ int16 colno, /* in: column number */ RM_MEMTAG mTag, /* in: memory tag for rm_ memory calls */ VALUE *err) /* out: container for error messages */ { const char *procname = args[2].vt.cv; UNREF_PARM(mTag) UNREF_PARM(err) if (stricmp(procname, "tims_data") != 0) return SQL_NO_DATA; if ((uint16) colno >= MAX_TIMSDATA_COLUMNS) return SQL_NO_DATA; memcpy(pColDescr, &timsDataCD[colno], sizeof(UDPPROCCOLDATA)); return SQL_SUCCESS; } Function udpCleanup The udpCleanup function is called by SQL when UDP execution is completed (e.g., when SQL_NO_DATA is returned by updExecute or udpMoreResults). It is used to free memory allocated in udpInit and/or udpExecute, close database SQL User Guide 207 14. Developing SQL Server Extensions connections, and drop temporary tables. NEVER call rm_freeTagMemory within udpCleanup using the memory tag passed into the function. Instead, free any UDP-allocated memory using rm_freeMemory. The following example shows the timsCleanup function from the sample tims_data UDP. This function first closes the active statement handle, then executes a drop table statement to close the timstab temporary table. Afterwards, timsCleanip drops the statement handle, then closes and frees the connection and the SQL environment handle. Finally, the function closes the tims database and frees the context memory using rm_freeMemory. void REXTERNAL timsCleanup( void **ctxp, /* in: statement udp_ctx pointer */ int16 mTag) /* in: memory tag for rm_ memory calls */ { UDP_CTX *ctx = *ctxp; SQLCloseCursor(ctx->hStmt); SQLExecDirect(ctx->hStmt, "drop table timstab", SQL_NTS); SQLFreeHandle(SQL_HANDLE_STMT, ctx->hStmt); SQLDisconnect(ctx->hDbc); SQLFreeHandle(SQL_HANDLE_DBC, ctx->hDbc); SQLFreeHandle(SQL_HANDLE_ENV, ctx->hEnv); d_rdlockmodes(0, 1, ctx->hSess); d_close(ctx->hDb); rm_freeMemory(ctx, mTag); } Function ModCleanup If you include ModCleanup in the UDP module, the server calls it when unloading the module. The following example shows the ModCleanup function for the sample UDP. int16 REXTERNAL ModCleanup( HMOD hMod) /* in: Module handle, used by SQLTransactTrigger() */ { ghMod = NULL; return S_OKAY; } 14.2.2 Calling a UDP Once you have coded and successfully compiled the UDP module it needs to be registered with the system so that SQL will know where to find it. The create procedure statement defined at the start of this section is used to do this as shown in following is an example of this statement issued for tims_data. create procedure tims_data in "udp" on sqlsamp; Once the module is registered, the application can call the UDP just like an SQL-based stored procedure. The following example shows the call statement that executes the tims_data UDP. Each parameter specified for tims_data is the name (or SQL User Guide 208 14. Developing SQL Server Extensions partial name) of an author for whom the UDP is retrieving applicable publications contained in the tims database. Note that this output shows the results of both the udpExecute function and the udpMoreResults function. 14.3 Login or Logout UDP Example Login and logout procedures are stored procedures (or UDPs) that are invoked when all or certain users log in to or log out from the server. These procedures do not have parameters nor retrieve result sets. An administrator initially creates and activates these procedures, but they are automatically invoked by other users when they log in or log out. Login and logout procedures can be used for setting user environment values (for example, display formats) or for performing specialized security functions. Two UDPs are included in the sample UDP module (udp.c) provided with the system. The UDP named log_login is a login procedure and the one named log_logout is, you guessed it, a logout procedure. As loging and logout procedures do not have arguments nor retrieve resultsthe udpCheck and udpMoreResults function entries in the UDPLOADTABLE are NULL. These UDPs both use logInit (type UDPINIT) as their initialization function, as shown in the following code. The logInit function allocates the memory for the UDP context (LOG_CTX), attaches to the client connection, and allocates and initializes needed SQL handles. It also compiles an insert statement (LogInsert) used to store the record of a login or logout operation. This insert stores a new row in the table called activity_log, which must have been previously created or else SQLPrepare will fail. If this happens, LogInit will create the activity_log database and table and call SQLPrepare again. The UDP context includes fields to contain for action and label strings. These are associated (by SQLBindParameter) with the two parameter markers contained in LogInsert. The declarations of the user_name and stamp columns in activity_log specify default values that automatically store the user name and the current timestamp values (the current date and time) in the table. ... typedef struct log_ctx { HENV hEnv; HDBC hDbc; HSTMT hStmt; RDM_SESS hSess; char action[24]; char label[33]; } LOG_CTX; ... static const char LogCreateDb[] = "create database activity_log"; static char LogCreate[] = "create table activity_log(" "action char(23)," "label char(32) default null," "user_name char(32) default user," "stamp timestamp default now)"; static char LogGrant[] = "grant insert on activity_log to public"; static char LogInsert[] = "insert into activity_log(action, label) values(?, ?)"; ... /* ====================================================================== Initialization function for logging functions */ int16 REXTERNAL logInit( void **ctxp, /* in: proc context pointer */ SQL User Guide 209 14. Developing SQL Server Extensions int16 VALUE RDM_SESS RM_MEMTAG VALUE noargs, *args, hSess, mTag, *err) /* /* /* /* /* in: in: in: in: out: number of arguments passed */ arguments, args[noargs-1] */ current session id */ memory tag for rm_ memory calls */ container for error messages */ { int16 stat; HSTMT hstmt; LOG_CTX *ctx = rm_getMemory(sizeof(LOG_CTX), mTag); ctx->hSess = hSess; SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &ctx->hEnv); SQLAllocHandle(SQL_HANDLE_DBC, ctx->hEnv, &ctx->hDbc); SQLConnectWith(ctx->hDbc, hSess); SQLAllocHandle(SQL_HANDLE_STMT, ctx->hDbc, &hstmt); if ((stat = SQLPrepare(hstmt, LogInsert, SQL_NTS)) != SQL_SUCCESS) { /* activity_log table has not been created - create it */ SQLExecDirect(hstmt, LogCreateDb, SQL_NTS); SQLExecDirect(hstmt, LogCreate, SQL_NTS); SQLExecDirect(hstmt, "commit", SQL_NTS); SQLExecDirect(hstmt, LogGrant, SQL_NTS); SQLPrepare(hstmt, LogInsert, SQL_NTS); } SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 10, 0, ctx->action, 0, NULL); SQLBindParameter(hstmt, 2, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 20, 0, ctx->label, 0, NULL); ctx->hStmt = hstmt; *ctxp = ctx; return SQL_SUCCESS; } The udpExecute function for the log_login UDP called logLogin is given below. All that logLogin must do is copy "login" to the action string and place the session ID number in the label string. he procedure then calls SQLExecute to insert the row into activity_log. After this, a call is made to SQLEndTran to commit the new row to the database. The logLogout function for the log_logout procedure is identical to logLogin except that it copies "logout" (instead of "login") to the action string. int16 REXTERNAL logLogin( void **ctxp, /* in: int16 noargs, /* in: VALUE *args, /* in: int16 mTag, /* in: HSTMT *phStmt, /* out: VALUE *err) /* out: { LOG_CTX *ctx = *ctxp; LOG_CTX *ptr; proc context pointer */ number of arguments to procedure */ array of arguments */ memory tag for rm_ memory calls */ hstmt for result set */ container for error messages */ /* record the login */ strcpy(ctx->action, "login"); sprintf(ctx->label, "session %d", ctx->hSess); SQL User Guide 210 14. Developing SQL Server Extensions SQLExecute(ctx->hStmt); SQLEndTran(SQL_HANDLE_DBC, ctx->hDbc, SQL_COMMIT); return SQL_SUCCESS; } Your login or logout procedure can optionally contain a cleanup function. The sample log_login and log_logout UDPs include the logCleanup function shown below. This function performs cleanup by freeing and disconnecting the appropriate SQL handles, and then freeing the context memory. void REXTERNAL logCleanup( void **ctxp, /* in: statement udp_ctx pointer */ RM_MEMTAG mTag) /* memory tag for rm_ memory calls */ { LOG_CTX *ctx = *ctxp; SQLFreeHandle(SQL_HANDLE_STMT, ctx->hStmt); SQLDisconnect(ctx->hDbc); SQLFreeHandle(SQL_HANDLE_DBC, ctx->hDbc); SQLFreeHandle(SQL_HANDLE_ENV, ctx->hEnv); rm_freeMemory(ctx, mTag); } Only an administrative user can set up a login or logout procedure for use. The first step is to issue a create procedure statement similar to the following. create procedure log_login in udp on rdsdll; create procedure log_logout in udp on rdsdll; After the login or logout procedure is set up, the administrator must assign the procedure by using a set login proc statement, as shown below. The for clause is used to set up the login or logout procedure for use by all users or only for the list of specified user identifiers. Recall that user names (ids) are case-sensitive so that "Sam" and "sam" are considered to be different users. set login proc for public to log_login; set logout proc for public to log_logout; With the login procedure (and logout procedure if applicable) assigned, the administrator can turn the registered login or logout procedures on or off as needed. To do this, the administrator issues a set login statement with the on or off clause. The effect is system wide and persists until the next set login statement is issued by any administrative user. Use of login procedures is initially turned off. set login on Now that the sample login and logout procedures are enabled, every login and logout by any user causes a row containing the action, user ID, time, etc., to be inserted into the activity_log table. After a few log ins and log outs, the administrator can view the table by issuing the following select statement. select * from activity_log; SQL User Guide 211 14. Developing SQL Server Extensions To disable calls to a login or logout procedure, the administrator can issue another set login proc statement. This time, the administrator specifies null instead of the procedure name in the to clause. set login proc for public to null; 14.4 Transaction Triggers A transaction trigger is a server-side function, residing in a UDP module, that RDM Server calls whenever a transaction operation occurs (commit, rollback, savepoint, or rollback to a savepoint). The trigger can be registered to be called either for the next transaction operation only or for every transaction operation. The function is activated from within a UDP by a call to the SQL API function, SQLTransactTrigger. 14.4.1 Transaction Trigger Registration Transaction triggers are registered through a call to SQLTransactTrigger usually issued from a login UDP. This function is only available on the server; it cannot be called from the client side. Thus, it must be called from a UDP (C-based procedure). The declaration for SQLTransactTrigger as shown below. int16 REXTERNAL SQLTransactTrigger( HMOD hMod, RDM_SESS hSess, const char *name, PTRANSACTTRIGGER Trigger, void *ptr, int16 mode); The SQLTransactTrigger function arguments are described in the table below. Argument hMod hSess name Trigger ptr mode SQL User Guide Description The handle that uniquely identifies the UDP module. The hMod value is originally passed into ModInit in the UDP module when the module is first loaded. If you are using transaction triggers in your UDP, you must define the ModInit function, so you can get the module handle and save it in a global variable for later use in SQLTransactTrigger. The session handle of the user activating the transaction trigger. This handle is an argument to the udpInit functon which can either itself call SQLTransactTrigger or save the session handle in the UDP context so that the udpExecute function can call SQLTransactTrigger (see examples in this section). A unique name to be associated with this transaction trigger. A pointer to the transaction trigger implementation function to be activated. A pointer to any context memory needed by the transaction trigger. Note that this memory needs to survive as long as the connection (session) is open. Informs the SQL system as to how often the transaction trigger is to fire on transactions issued by the connection associated with hSess.Set to SYS_COMMIT_EVERY to indicate 212 14. Developing SQL Server Extensions Argument Description that the trigger is to fire after every transaction. Set to SYS_ COMMIT_ONCE to indicate that the trigger is only to fire on the next transaction operation after which the transaction trigger is deactivated and a subsequent SQLTransactTrigger call is required in order to reactivate the trigger. Once the transaction trigger is registered, the RDM Server "fires" (calls) the function any time the related connection executes a transaction commit, savepoint, or rollback. This can happen through a call to either SQLEndTran or SQLExtendedTransact, through a commit, mark, or rollback statement, or (if the session is in auto-commit mode) when RDM Server automatically issues a transaction commit following an insert, update, or delete. The code below shows a version of the logLogin function that implements a transaction trigger. After recording the login in the activity_log table, the function allocates a context of type LOG_CTX, calls SQLConnectWith to associated the invoking client's session handle with an SQL connection, allocates a statement handle, and compiles the LogInsert statement. It then calls SQLBindParameter to associate the action and label variables in the log context with LogInsert's parameter markers. Finally, logLogin calls SQLTransactTrigger, passing the global module handle set in the original call to ModInit. The call to SQLTransactTrigger also passes the session handle stored in the log_login context by logInit and names the transaction trigger activity_log. Additionally, the call provides the address of the transaction trigger function named TransactTrigger, a pointer to the context for the transaction trigger function, and a constant indicating that the transaction trigger function is to be called on every commit, savepoint, or rollback operation. typedef struct log_action { int16 type; char *label; struct log_action *next; struct log_action *prev; } LOG_ACTION; typedef struct log_ctx { HENV hEnv; HDBC hDbc; HSTMT hStmt; RDM_SESS hSess; RM_MEMTAG mTag; char action[16]; char label[21]; LOG_ACTION *act_list; } LOG_CTX; ... /* ====================================================================== Main for login procedure*/ int16 REXTERNAL logLogin( void **ctxp, /* in: proc context pointer */ int16 noargs, /* in: number of arguments to procedure */ VALUE *args, /* in: array of arguments */ RM_MEMTAG mTag, /* in: memory tag for rm_ memory calls */ HSTMT *phStmt, /* out: hstmt for result set */ VALUE *err) /* out: container for error messages */ { LOG_CTX *ctx = *ctxp; LOG_CTX *ptr; /* record the login */ SQL User Guide 213 14. Developing SQL Server Extensions strcpy(ctx->action, "login"); sprintf(ctx->label, "session %d", ctx->hSess); SQLExecute(ctx->hStmt); SQLEndTran(SQL_HANDLE_DBC, ctx->hDbc, SQL_COMMIT); /* set the transaction trigger for this connection */ /* allocate trigger's log context on global tag */ ptr = rm_getMemory(sizeof(LOG_CTX), 0); SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &ptr->hEnv); SQLAllocHandle(SQL_HANDLE_DBC, ptr->hEnv, &ptr->hDbc); SQLConnectWith(ptr->hDbc, ctx->hSess); SQLAllocHandle(SQL_HANDLE_STMT, ptr->hDbc, &ptr->hStmt); SQLPrepare(ptr->hStmt, LogInsert, SQL_NTS); SQLBindParameter(ptr->hStmt, 1, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 10, 0, ptr->action, 0, NULL); SQLBindParameter(ptr->hStmt, 2, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR, 20, 0, ptr->label, 0, NULL); ptr->act_list = NULL; ptr->mTag = rm_createTag(NULL, 0, NULL, NULL, 0, RM_NOSEM); SQLTransactTrigger(ghMod, ctx->hSess, "activity_log", TransactTrigger, ptr, SYS_COMMIT_EVERY); return SQL_SUCCESS; } 14.4.2 Transaction Trigger Implementation Once SQLTransactTrigger is called, the transaction trigger function will be called by RDM Server when a transaction operation calls. The transaction trigger function is a function of type TRANSACTTRIGGER, with a prototype as shown below. Note that you can name this function whatever you want as it is only identified from the function pointer passed in to SQLTransactTrigger call that activated it. void REXTERNAL TransactTrigger( int16 type, char *label, char *name, void *ptr); Each of the arguments are described in Table 13-9 below. Table 13-9. Transaction Trigger Implementation Function Argument Descriptions Argument Description type See Table 13-10. The transaction identifier specified with the savepoint or rollback. name The name of the transaction trigger being fired. Allows more than one transaction trigger to share the same implementation function. ptr A pointer to the transaction trigger context data passed in to the call to SQLTransactTrigger which activated this trigger. The table below describes each of the possible values for the type function argument. label SQL User Guide 214 14. Developing SQL Server Extensions Table 13-10. Transaction Type Descriptions Type Description SYS_SAVEPOINT Triggered by a "savepoint label" operation. SYS_ROLLBACK Triggered by a "rollback [to savepoint label]" operation. The SYS_COMMIT SYS_REMOVE label argument will be an empty string ("") to indicate that the entire transaction is rolled back. Triggered by a "commit" operation. The trigger is being deleted. This happens because either the trigger was registered to fire only once and has already fired, or was registered to fire for every transaction and the application is disconnecting from the server. This value indicates that the trigger should perform any necessary cleanup (e.g., freeing the allocated memory pointed to by the ptr argument). The transaction trigger is always called AFTER the transaction operation has completed. Hence, if type is SYS_COMMIT the TransactTrigger function is called after the commit has completed writing its changes to the database. The following example implements a transaction trigger (function TransactTrigger in udp.c) that is used in conjunction with login procedures and that records a log of every transaction including the user id of the user issuing the transaction and the timestamp when it occurred. The log is maintained in the activity_log database/table which is created by the logInit function for the log_login UDP described earlier in section 14.3. TransactTrigger operation depends on the value passed in through argument type. If type is SYS_REMOVE then the transaction trigger is being deleted and the trigger function needs to clean up after itself by freeing its previously allocated handles and memory. Note that the call to SQLDisconnect breaks the association between the connection handle (ctx->hDbc) and the client session handle that was established by the call to SQLConnectWith that was issued by function logInit. If type is SYS_SAVEPOINT or SYS_ROLLBACK to a previous savepoint (indicated by a its presence in the list of previously issued savepoint labels stored in the ctx->act_list linked list), the transaction action is saved in the activity list to be later written to the activity_log table after the application's transaction has been committed (or rolled back). This is necessary in order to ensure that the rows inserted into the activity_log table are not included in the application's transaction as they would get rolled back with an application's rollback leaving no log of the transaction operation. The IsASavepoint function determines which type of rollback is in effect. When saving savepoint actions, the savepoint label is stored in the LOG_ACTION entry. A SYS_ROLLBACK which has a label equal to one of the previously saved savepoint labels indicates that it is a rollback to savepoint operation that triggered this call to TransactTrigger in which case IsASavepoint will return true. (Note that in RDM Server SQL the non-standard begin transaction statement can specify a transaction id which can be specified in a subsequent commit or rollback in which case the label passed to TransactTrigger does not correspond to a savepoint.) When the type is SYS_COMMIT or SYS_ROLLBACK (not associated with a prior savepoint) then a row is inserted into the activity_log table for each previously saved action as well as the final commit or rollback itself. As TransactTrigger is called after the commit or rollback has completed the rows inserted into the activity_log table will be performed in an independent transaction which is committed by the call to SQLEndTran ad the end of the function. /* ====================================================================== Function to find savepoint label in the list */ static int16 IsASavepoint( LOG_ACTION *lap, SQL User Guide 215 14. Developing SQL Server Extensions char *label) { LOG_ACTION *lgp; for (lgp = lap; lgp; lgp = lgp->next) { if (strcmp (lgp->label, label) == 0) return 1; } return 0; } /* ====================================================================== Transact Trigger */ static void REXTERNAL TransactTrigger ( int16 type, const char *label, const char *name, void *ptr) { LOG_CTX *ctx = (LOG_CTX *)ptr; LOG_ACTION *lap = NULL; const LOG_ACTION *lgp; UNREF_PARM(name); if (type == SYS_REMOVE) { rm_freeTagMemory(ctx->mTag, 1); ctx->mTag = NULL; SQLFreeHandle(SQL_HANDLE_STMT, ctx->hStmt); SQLDisconnect(ctx->hDbc); SQLFreeHandle(SQL_HANDLE_DBC, ctx->hDbc); SQLFreeHandle(SQL_HANDLE_ENV, ctx->hEnv); rm_freeMemory(ctx, TAG0); } else { if (type == SYS_SAVEPOINT || (type == SYS_ROLLBACK && IsASavepoint(ctx->act_list, label))) { /* save the event info, to be committed later to the log */ lap = (LOG_ACTION *)rm_getMemory(sizeof(LOG_ACTION), ctx->mTag); if (lap != NULL) { lap->type = type; lap->next = ctx->act_list; lap->prev = NULL; if (ctx->act_list) ctx->act_list->prev = lap; lap->label = (label && *label) ? rm_Strdup(label, ctx->mTag) : NULL; ctx->act_list = lap; } } else if (type == SYS_COMMIT || type == SYS_ROLLBACK) { /* Process stored actions (if any) */ if (ctx->act_list) { /* Find first event (it's last in the list) */ for (lap = ctx->act_list; lap->next; lap = lap->next) ; for (lgp = lap; lgp; lgp = lgp->prev) { switch (lgp->type) { case SYS_SAVEPOINT: SQL User Guide 216 14. Developing SQL Server Extensions strcpy(ctx->action, "savepoint"); break; case SYS_ROLLBACK: strcpy(ctx->action, "rollback to savepoint"); break; default: break; } strcpy(ctx->label, lgp->label); SQLExecute(ctx->hStmt); } ctx->act_list = NULL; rm_freeTagMemory (ctx->mTag, 0); } switch (type) { case SYS_COMMIT: strcpy(ctx->action, "commit"); break; case SYS_ROLLBACK: strcpy(ctx->action, "rollback"); break; default: break; } strcpy(ctx->label, label); SQLExecute(ctx->hStmt); SQLEndTran(SQL_HANDLE_DBC, ctx->hDbc, SQL_COMMIT); } } } SQL User Guide 217 15. Query Optimization 15. Query Optimization The RDM Server SQL query optimizer is designed to generate efficient execution plans for database queries. The typical kinds of queries used in an embedded database application environment include standard join, index lookup optimizations, and grouping and sorting. Some of the more sophisticated optimization techniques (for example, support of on-line analysis processing) are not included in the RDM Server optimizer. However, RDM Server does provide various access methods (which support very fast access to individual rows) with the direct access capabilities of predefined joins and rowid primary keys. Overview of the Query Optimization Process In SQL, queries are specified using the select statement, and many methods (or query execution plans) exist for processing a query. The goal of the optimizer is to discover, among many possible options, which plan will execute in the shortest amount of time. The only way to guarantee a specific plan as optimal is to execute every possibility and select the fastest one. As this defeats the purpose of optimization, other methods must be devised. The query optimizer must resolve two interrelated issues: how it will access each table referenced in the query, and in what order. To access requested rows in a table, the optimizer can choose from a variety of access methods, such as indexes or predefined joins. It determines the best execution plan by estimating the cost associated with each access method and by factoring in the constraints on these methods imposed by each possible access ordering. Note that the decisions made by the optimizer are independent of the listed order of the tables in the from clause or the location of the expressions in the where clause. Consider the following example query from the sales database. select company, ord_num, ord_date, amount from customer, sales_order where customer.cust_id = sales_order.cust_id and state = "CO" and ord_date = date "1993-04-01"; Two tables will be accessed: customer and sales_order. The first relational expression in the where clause specifies the join predicate, which relates the two tables based on their declared foreign and primary keys. The DDL for the sales database (file sales.sql) contains a create join called purchases on the sales_order foreign key, providing bidirectional direct access between the two tables. Note that the state column in the customer table is also the first column in the cust_geo index, and the ord_date column in the sales_order table is the first column in the order_ndx index. Thus the optimizer has choices of which index to use. All possible execution plans considered by the RDM Server query optimizer for this query are listed in the following table. Table 15-1. Possible Execution Plans for Example Query 1. Scan customer table (that is, read all rows) to locate rows where state = "CO", then for each matching customer row, scan sales_order table to locate rows that match customer's cust_id and have ord_date = 1993-04-01. 2. Scan customer table to locate rows where state = "CO", then for each customer row, read each sales_order row that is connected through the purchases join, and return only those that have ord_date = 1993-04-01. 3. Use the cust_geo index to find the customer rows where state = "CO", then for each customer row, scan sales_order table to locate rows that match customer's cust_id and have ord_date = 1993-04-01. 4. Use the cust_geo index to find the customer rows where state = "CO", then for each customer row, read each sales_ order row that is connected through the purchases join, and return only those that have ord_date = 1993-04-01. SQL User Guide 218 15. Query Optimization 5. Scan sales_order table to locate rows where ord_date = 1993-04-01, then for each sales_order row, scan customer table to locate rows that match sales_order's cust_id and have state = "CO". 6. Scan sales_order table to locate rows where ord_date = 1993-04-01, then for each sales_order row, read the customer row that is connected through the purchases join, and return only those that have state = "CO". 7. Use the order_ndx index to find the sales_order rows where ord_date = 1993-04-01, then for each sales_order row, scan customer table to locate rows that match sales_order's cust_id and have state = "CO". 8. Use the order_ndx index to find the sales_order rows where ord_date = 1993-04-01, then for each sales_order row, read the customer row that is connected through the purchases join, and return only those that have state = "CO". Because the time (based on the number of disk accesses) required to scan an entire table is generally much greater than the time needed to locate a row through an index, plans 4 and 8 seem the best. However, it is unclear which of the two plans is optimal. In fact, both are probably good enough to obtain acceptable performance. Additional information to help you make the best choice includes the number of rows in each table (28 customers, 127 sales_orders), the number of customers from Colorado (1), and the number of orders for April 1, 1993 (5). With this data we can deduce that plan 4 is better than plan 8. Plan 4 requires 1 index lookup to find the one customer from Colorado (about 3 reads) plus the average cost to read through an instance of the purchases set to retrieve and check the dates of the related sales_order records (average number of orders per customer = 127/28 = 4). Thus, plan 4 uses about 7 disk accesses. Plan 8 will use the order_ndx index to find the 5 sales_order rows dated 1993-04-01 (about 8 reads) plus one additional read to fetch and check the related customer record through the purchases set (5 reads). Hence, plan 8 uses about 13 disk accesses. Note that plans 1 and 5 perform what is called a Cartesian or cross-product—for each row of the first table accessed, all rows of the second table are retrieved. (Thus if the first table contained 500 rows and the second table contained 1000 rows, the query would read a total of 500,000 rows.) Cross-products are extremely inefficient and will never be considered by the optimizer except when a necessary join predicate has been omitted from the query. In our example, this would occur if the relational expression, "customer.cust_id = sales_order.cust_id" was not specified. Necessary join predicates are often erroneously omitted when four or more tables are listed in the from clause and/or when multi-column join predicates (for compound foreign and primary keys) are required. The following diagram shows the basic operational phases of the query optimization process, as illustrated by the previous example.’ SQL User Guide 219 15. Query Optimization Figure 15-1. Query Optimization Process Using the information in the system catalog, the select statement is parsed, validated, and represented in a set of easily processed query description tables. These tables include a tree representation of the where clause expressions (called the expression tree) and information about the tables, columns, keys, indexes, and joins in the database. The system then analyzes those tables, and constructs both the access rule table and the expression table. For the referenced tables, the analysis process uses the system catalog and the distribution statistics (collected by the update statistics statement). The access rule table contains a rule entry for each possible access method (for example, table scan or index lookup) for each table referenced in the from clause. The expression table has one entry for each conditional expression specified in the where clause. These tables drive the actual optimization process. Finally, the optimizer determines the plan with the lowest total cost. An execution plan basically consists of a series of steps (one step for each table listed in the from clause), of how the table in that particular plan step will be accessed. The possible SQL User Guide 220 15. Query Optimization access rules that can be applied at that step are sorted by their cost so that the first candidate rule is the cheapest. The optimizer's goal is to select one access rule for each step that minimizes the total cost of the complete execution plan. As the optimizer iterates through the steps, the cost of the candidate plan is updated. As soon as a candidate plan's cost exceeds the cost of the currently best complete plan, the candidate plan is abandoned at its current step and the next rule for that step is then tested. Conditional expressions that are incorporated into the plan are deleted from the expression tree so that they are not redundantly executed. Cost-Based Optimization The cost to determine the execution plan is the time it takes the optimizer to find the "optimal" plan. An execution plan consists of n steps where n is the number of tables listed in the from clause. Each step of the plan specifies the table to be accessed and the method to be used to access a row from that table. The cost increases factorially to the number of tables listed in the from clause (n!). Performance impact is noticeable in RDM Server for queries that reference more than about 8 tables. This is due to the increasing number of combinations of access orderings that must be considered (2 tables have 2 possible orderings, 3 have 6, 4 have 24, etc.). The cost to estimate each candidate plan also includes a linear factor of the number of access methods available at each step in a plan from which the optimizer must choose. More access methods means the optimizer must do more work, but the odds of finding a good plan improve. The cost to carry out an execution plan is the amount of I/O time required to read the database information from disk. In RDM Server, an estimate of this cost is based on an estimate of the total number of logical I/O accesses that will occur during execution. Because it is extremely difficult to accurately estimate the effects caused by caching performance and diverse database page sizes, physical I/O estimates are not possible. The logical I/O estimates are based on analysis of the logical I/O time required to access a record occurrence for each access method. An heuristic optimizer selects an execution plan by using built-in heuristics or general rules about which particular access method will return the fewest rows. For example, heuristic optimizers automatically assume that a "col = value" condition will restrict the result set to fewer rows than would a "col > value" condition (which is assumed to restrict, on average, only half the rows). In a case where 100 rows contain "value" and zero rows contain greater than "value", this assumption breaks down and the choice would not be optimal. A cost-based optimizer maintains the data distribution statistics it uses to more quantitatively determine the better of two access methods. A histogram is maintained for the distribution of most commonly occurring data in each column. The percentage of the file that is covered by a given inequality expression involving an indexed column (for example, "where ord_ date between date '1993-02-01' and date '1993-02-28'") is interpolated from the histogram, providing a more accurate assessment of the better alternative than a built-in heuristic. The statistics maintained for use by cost-based optimizers are used to: 1) guide the choice between alternative access methods derived from the relational expressions specified in the where clause, 2) estimate the number of output rows that result from each plan step, and 3) estimate the number of logical I/O's incurred by each possible access method. The statistics used by the RDM Server cost-based optimizer include: l Number of pages in a file l Number of rows per page in a file l Number of rows in a table l Depth of an index's B-tree l Number of keys per page in an index l Frequency distribution histogram of most commonly occurring values for each column SQL User Guide 221 15. Query Optimization Update Statistics The statistics are collected and stored in the SQL system catalog by executing the update statistics statement. The histogram for each column is collected from a sampling of the data files. The other statistics are maintained by the RDM Server runtime system. The histogram for each column contains a sampling of values (25 by default, controlled by the OptHistoSize configuration parameter), and a count of the number of times that value was found from the sampled number of rows (1000 by default, controlled by the OptSampleSize configuration parameter). The sampled values are taken from rows evenly distributed throughout the table. When update statistics has not been performed on a database, RDM Server SQL uses default values that assume each table contains 1000 rows. It is highly recommended that you always execute update statistics on every production database. The execution time for an update statistics statement is not excessive in RDM Server and does not vary significantly with the size of the database. Therefore, we suggest regular executions (possibly once per week or month, or following significant changes to the database). Restriction Factors The histogram values are used to compute a restriction factor associated with a specified conditional expression. The restriction factor estimates the percentage of rows from the table that satisfy the conditional. For example, the restriction factor for a between conditional is equal to the frequency count total of the histogram values that satisfy the conditional, divided by the total of all sampled histogram values. When an equality comparison value is not found in the histogram, the restriction factor is based on the average frequency count of the five histogram values with the lowest frequency counts. The restriction factor for join predicates is based on the average frequency count of all histogram values for the foreign/primary key column (this results in an estimate of the average number of duplicates per value). The histogram for the prod_id column of the item table is shown below. Table 15-2. Histogram for ITEM.PROD_ID Entry # PROD_ID Value # of Occurrences 0 10320 4 1 10433 12 2 11333 8 3 11433 14 4 12325 10 5 13032 9 6 14020 6 7 15200 5 8 16300 3 9 16301 11 10 16311 15 11 17110 2 12 17214 23 13 17419 12 14 19100 4 SQL User Guide 222 15. Query Optimization Entry # PROD_ID Value # of Occurrences 15 19400 5 16 20200 6 17 20308 4 18 20400 9 19 21200 6 20 21500 3 21 23100 11 22 23200 24 23 23400 26 24 24200 17 The item table has 461 rows. All rows were sampled by update stats. The table shows the histogram counts for the first 25 distinct values sampled by update stats from the item table. A total of 249 rows from the item table contained one of those prod_id values. These values are used by the optimizer to compute restriction factors for prod_id comparisons specified in a where clause. The following table gives the restriction factor for some example expressions. Table 15-3. Example Restriction Factors Conditional Expression Restriction Factor Cardinality Estimate Actual Count prod_id = 16311 0.032538 15 15 prod_id >= 21200 0.349398 161 143 prod_id between 11433 and 20200 0.502008 231 246 prod_id = 10450 0.006941 3 7 prod_id in (15200,20200,21200,24200) 0.073753 34 34 The restriction factor multiplied by the cardinality of the table (461) gives the cardinality estimate of the conditional expression (i.e., an estimate of the number of rows from the table that satisfy the conditional). The count of the number of actual matching rows is also listed in the table. The accuracy of the estimate is very good but that is primarily because all table rows were sampled. The restriction factor for the prod_id = 16311 conditional is computed from the histogram count for entry 10 of Table 15-2, divided by the total number of sampled rows (461). Thus, 15/461 = 0.032538. Note that the cardinality estimate equals the histogram count value because all of the rows in the table were sampled, which will only be true for small tables. If there were 1000 rows sampled from a 50,000 row table the restriction factor would have been 0.015 and the cardinality estimate 0.015*50,000 = 750 rows. The restriction factor for inequality conditions is estimated as the percentage of the histogram table that matches the conditional expression. Thus, the restriction factor for prod_id >= 21200 is equal to the sum of the histogram counts for the prod_id entries >= 21200 divided by the sum of all counts (249) or, 87/249 = 0.349398. Applying the same method to prod_ id between 11433 and 20200 gives us 125/249 = 0.502008. For equality comparisons with values not in the table (or when the comparison is against a parameter marker or stored procedure argument) an estimate of the average number of duplicates per row is computed. The estimate is equal to the average SQL User Guide 223 15. Query Optimization counts for the 5 least occurring values in the histogram. Thus, the restriction factor for prod_id = 10450 is estimated as ((2+3+3+4+4)/5)/461 = 0.006941 (or an average of about 3 rows per value). The restriction factor computation for the in conditional is simply the sum of the equality comparisons for each of the listed values. The restriction factor for prod_id in (15200,20200,21200,24200) is (5+6+6+17)/461 = 0.073753. Table Access Methods RDM Server provides a variety of methods for retrieving the rows in a table. Each of these access methods is described below, including how cost is estimated for each method. The cost estimates use the above statistics as represented by the following values. Table 15-4. Cost Estimate Value Definitions Value Definition P The number of pages in the file in which the table's rows are stored. D The depth of the B-tree index. C The cardinality of the table being accessed (that is, the number of rows in the table). Cf The cardinality of the table containing the referenced foreign key. Cp The cardinality of the table containing the referenced primary key. K The maximum number of key values per index page. R The restriction factor, an estimate (between 0 and 1) of the percentage of the rows of the table that satisfy the conditional expression. The restriction factor is determined from the frequency distribution histogram and the constant values specified in the conditional expression. Database I/O in RDM Server is performed by reading data and index file pages. A data file page contains at least one (usually more) table row so each physical disk read will read into the RDM Server cache that number of rows. An index file page contains many keys per page depending on the size of the page and the size of the index values. RDM Server uses a B-tree structure for its indexes, which guarantees that each index page is at least half full. On the average, index pages are about 60-70% full. The depth of a B-tree indicates the number of index pages that must be read to locate a particular key value. Most B-trees have a depth of from 4 to 7 levels. Sequential File Scan Each row of a table is stored as a record in a file. In RDM Server, a data file can contain the rows from one or more tables. The most basic access method in RDM Server is to perform a sequential scan of a file where the table's rows are retrieved by sequentially reading through the file. If the file contains rows from more than one table, only the rows from the needed table are returned. However, all of the rows from all of the tables stored in the file will be read (the rows are intermixed). Thus, the cost (measured in logical disk accesses) to perform a sequential scan of a table is equal to the number of pages in the file: Cost of sequential file scan = P A sequential file scan is used in queries where the where clause contains no optimizable conditional expressions that reference foreign key, primary key, or indexed columns. See the example below. select sale_name, dob, region, office from salesperson where age(dob) >= 40; SQL User Guide 224 15. Query Optimization Direct Access Retrieval Direct access retrieval allows retrieval of an individual row based on the value of a rowed primary key. The rowid primary key value can be specified directly in the query or may result from a join with a table containing a referencing rowid foreign key. The cost of a direct access retrieval is 1 (since a single file read is all that is needed to retrieve the row based on its rowid value): Cost of direct access retrieval = 1 Consider the following table declarations: create table pktable( pkid rowid primary key, pktext char(50) ); create table fktable( fkid rowid references pktable, fktext char(50) }; The optimizer produces an execution plan that uses direct access retrieval to fetch a particular row from pktable for the following query: select * from pktable where pkid = 10; The execution plan for the query below consists of two steps. The first step is a sequential scan of fktable. In the second step, fkid is used to directly access the related pktable row and each fktable row. select pkid, pktext, fktext from pktable, fktable where pkid = fkid; Indexed Access Retrieval Equality Conditionals Indexed access retrieval allows retrieval of an individual row or set of matching rows, based on the value of one or more columns contained in a single index. These values can be specified in the query directly or through a join predicate. For a unique index, the cost to access a single row is equal to the depth of the index's B-tree (seldom more than 4 ) + 1 (to read the row from the data file). For a non-unique index, the cost is based on an estimate of the average number of rows having the same index value derived from the indexed column's histogram. The percentage of the table's rows that match the specified equality constraint is the restriction factor (R). Thus, the estimate of number of matching rows is equal to the cardinality of the table multiplied by the restriction factor, or: number of matching rows = C * R The cost estimate (in logical page reads) of an indexed access retrieval is equal to the number of index pages that must be accessed plus the number of matching rows (1 logical page read per row), or: Cost of index access = D + (C * R)/(.7 * K) + (C * R) SQL User Guide 225 15. Query Optimization This assumes that each index page is an average of 70% full (D = depth of B-tree, K = maximum number of keys per index page). Note that this formula works for both unique and non-unique indexes (for unique indexes, R = 1/C). In the following example, the optimizer uses the order_key index on the sales_order table to retrieve the specified row. select * from sales_order where ord_num = 2310; In the example below, the optimizer selects indexed access retrieval to find the item rows through the item_ids index and the related product rows through the prod_key index. select prod_id, quantity, prod_desc from item, product where item.prod_id = 17214 and product.prod_id = 17214 and item.prod_id = product.prod_id; Notice that the where clause contains a redundant expression. Including redundant expressions provides the optimizer with more access choices. You can set a RDM Server configuration parameter called RedundantExprs to have the optimizer automatically add redundant expressions where appropriate, such as in the above query. IN Conditionals When the in operator is used, the restriction factor is equal to the sum of the equality restriction factors for each of the listed values. Thus, the cost is simply the sum of the costs of the individual values. Cost of index access for: column in (v1 , v2 , ..., vn ) = SUM(cost(column = vi)) for all i: 1..n The optimizer will use the order_key index on the sales_order table to retrieve each of the rows specified in: select * from sales_order where ord_num in (2210, 2215, 2242); Index Scan Inequality Conditionals Indexed scans use an index to access the rows satisfying an inequality relational expression involving the major column in the index. The estimate of the cost of an index scan is calculated exactly the same as the indexed access method. The restriction factor is calculated as the percentage of the column's histogram values that match the specified conditional inequality. Consider the following query: select * from ship_log where ord_num between 2250 and 2270; The ship_log table contains 558 rows. The optimizer computed a restriction factor of .166667, which estimates that when the between condition is applied, 93 rows (.166667*558) will pass. The cost to perform a sequential scan involves reading all 558 rows (145 pages) and is greater than the cost to use the index (D=2, C=558, R=.166667, K=72 => cost = 95). Thus, in this example, the optimizer will choose to use the ship_order_key index. Cost of index scan = D + (C * R)/(.7 * K) + (C * R) SQL User Guide 226 15. Query Optimization LIKE Conditionals Index scans are also used to access rows satisfying like expressions that compare the major column of an index with a literal string pattern. The restriction factor is calculated from the histogram values that match the specified pattern. If no matches are found, it is then calculated from average of the five lowest frequency counts in the histogram. Two types of scans are employed depending on the position of the first wild card character (for example, "%" or "_") in the pattern. If the pattern starts with a wild card character, the entire index will be scanned and each key value will be compared with the specified pattern. Only those keys that match will be returned. The cost of this scan is equal to the cost of reading each index page plus the cost of reading the row associated with each matching index value as given in the following formula: Cost of full index like scan = D + (C/(.7 * K)) + (R * C) The cost typically will be much less when the pattern does not begin with a wild card. This allows the SQL system to position within the index those values having the same prefix (consisting of all characters up to the first wild card). Cost of prefixed like index scan = D + (C * R)/(.7 * K) + (C * R) Note that this is identical to the cost of an equality indexed access (although the restriction factor will be greater in this case). Primary To Foreign Key Join The use of create join on a foreign key in the DDL establishes a predefined join set relationship between the referenced tables. Related rows in the two tables are connected using direct access pointers stored and maintained in each row's physical record storage. All rows from a foreign key table are linked (in a linked list) to the row from the primary key table to which they refer. Thus, the optimizer can generate execution plans that directly access the related foreign key table rows after having accessed the primary key row. This access method is only considered by the optimizer when the join predicate (which equates each foreign key column to its corresponding primary key column) is included in the where clause. The cost of a primary to foreign key join is equal to the average number of foreign key rows for each primary key row: Total cost = Cardinality of primary to foreign key join = Cf / Cp Foreign To Primary Key Join The foreign to primary key join is also made available through use of the create join. This method allows the optimizer to generate execution plans that directly access the primary key row referenced from a previously accessed foreign key row. Again, this access method is only considered by the optimizer when the join predicate that equates each foreign key column to its corresponding primary key column is included in the where clause. The cost of a foreign to primary key join = 1 (each foreign key row references a single primary key row) Note that these latter two access methods are available through the presence of a join predicate in the where clause as in the following example. select sale_name, company, city, state from salesperson, customer where salesperson.sale_id = customer.sale_id; The optimizer can choose to either access the salesperson table first and then the related rows using the primary to foreign key join access method based on the accounts join, or it can first access the customer table and then the related salesperson row using the foreign to primary key join method. The method chosen will depend on the costs involved with first accessing one or the other of the two tables. SQL User Guide 227 15. Query Optimization Foreign Thru Indexed/Rowid Primary Key Predefined Join This method is used to access a foreign keyed table in which the foreign key is used in an equality comparison in the where clause and for which the primary key table is not referenced. When the foreign key has an associated create join the optimizer can generate a plan that allows access to the matching foreign key rows through the primary key's index (or rowid). Look at the query below: select company, city, state from customer where sale_id = "ERW"; The optimizer accesses the matching customer rows using the index on sale_id in the salesperson table, then retrieves the related customer rows through the accounts predefined join. This is equivalent to the following query: select company, city, state from customer, salesperson where salesperson.sale_id = "ERW" and salesperson.sale_id = customer.sale_id; By providing access to joined foreign key tables implicitly through the referenced primary key, faster access is achieved in update statements where a join is not possible. See the following example. update customer set contact = null where sale_id = "ERW"; The cost of accessing a foreign table through the primary key is equal to the cost of accessing the primary row added to the cost of accessing the related foreign key rows. Cost of accessing primary table through index = D + 1 (only one row is located). Cost of accessing related foreign rows = Cost of Primary to foreign key join (see above). If the primary key is of type rowid, the cost to access the primary row is 1. A summary of the table access methods used by the RDM Server optimizer is shown in Table 15-5. Table 15-5. Table Access Methods Cost Formulas Access Method Cost Estimate (logical I/Os) Sequential File Scan P Direct Access 1 Indexed Access (equality) D + ((C * R)/(.7 * K)) + (C * R) Indexed Access (in) SUM(Indexed Access Cost(column = vi)) for all i: 1..n. Index Scan (inequality) D + ((C * R)/(.7 * K)) + (C * R) Index Scan (like/no prefix) D + (C/(.7 * K)) + (R * C) Index Scan (like/with prefix) D + ((C * R)/(.7 * K)) + (C * R) Primary to foreign key join Foreign to primary key join Cf / Cp 1 Foreign thru indexed primary key D + 1 + (Cf / Cp ) Foreign thru rowid primary key 1 + (Cf / Cp ) SQL User Guide 228 15. Query Optimization Optimizable Expressions The RDM Server query optimizer is able to optimize a restricted set of relational expressions that are specified in the where clause of a select statement. Simple expressions involving a comparison between a simple column and a literal constant value (or parameter marker or stored procedure argument) can be analyzed by the optimizer to determine if any access methods exist that can retrieve rows satisfying that particular conditional. Expressions for potential use by the optimizer in an execution plan are referred to as optimizable. Table 15-6 summarizes the optimizable relational expressions. Table 15-6. Optimizable Relational Expressions 1. RowidPkCol = constant 2. NdxCol1 = constant [and NdxCol2 = constant]... 3. FkCol1 = constant [and FkCol2 = constant]... 4. FkCol1 = PkCol1 [and FkCol2 = PkCol2]... 5. NdxCol1 = Cola [and NdxCol2 = Colb ]... 6. NdxCol1 in (constant[, constant]...) 7. NdxCol1 {> | >= | < | <=} constant 8. NdxCol1 {> | >=} constant [and NdxCol1 {< | <=} constant] 9. NdxCol1 between constant and constant 10. NdxCol1 like "pattern" The constant is either a literal, a parameter marker ('?'), or a stored procedure argument (if statement is contained in a stored procedure declaration). The RowidPkCol expression corresponds to a rowid primary key column. The NdxColi's refer to the i'th declared column in a given index. The FkCol i's (PkCol i's) refer to the i'th declared column in a foreign (primary) key. An equality comparison must be provided for all multi-column foreign and primary key columns in order for the optimizer to recognize a join predicate. Cola, Colb , etc., are columns from the same table that match (in type and length) NdxCol1 , NdxCol2 , etc., respectively. These expressions are all written in the following form: ColumnName relop expression. Note that expressions of the form: expression relop ColumnName are recognized and reformed by the optimizer so that the ColumnName is always listed on the left hand side. This transformation may require modification of the relational operator. For example, select ... from ... where 1000 > colname would become select ... from ... where colname < 1000 Depending on how the where clause is organized, an expression may or may not be optimizable. Conditional expressions composed in conjunctive normal form are optimizable. In conjunctive normal form, the where clause is constructed as follows: C1 and C2 and ... Cn Each Ci is a conditional expression comprised of a single or multiple or'ed relational comparison. Only those Ci 's that consist of a single optimizable relational expression are optimizable. In other words, relational expressions that are sub-branches of an or'ed conditional expression are not optimizable. The best possible optimization results are obtained when the desired conditions use and. Some or expressions can be rewritten in a form the optimizer can process. For example, because of the or expression in the following query, the optimizer will not use an index on the state column in the customer table. SQL User Guide 229 15. Query Optimization select ... from customer where state = "CA" or state = "WA" or state = "AZ" or state = "OR"; However, for the equivalent query shown below, the optimizer would use the index on state. select ... from customer where state in ("CA", "WA", "AZ", "OR"); Examples These examples are all based on the example sales and invntory databases. Refer to the sales.sql and invntory.sql DDL files for relevant declarations of the entities referenced below. The following select statement will locate the salesperson record for a particular salesperson ID code using the sale_key index. select * from salesperson where sale_id = "GAP"; The optimizer will use the accounts predefined join to optimize the join predicate in the query below. select * from salesperson, customer where salesperson.sale_id = customer.sale_id; In the next example, those customers serviced by a specific salesperson would be accessed through the accounts predefined join after locating the specified salesperson's row through the sale_key index. select * from salesperson, customer where salesperson.sale_id = customer.sale_id and sale_id = "GAP"; Note that the optimizer would not use the comments join in the following example. select note_id, note_date, txtln from note, note_line where note.note_id = note_line.note_id; The comments join cannot be used because only one of the three foreign and primary key columns from the join are specified in the where clause. The note_id column is the second column in the note_key index, thus the note_key index cannot be used either. Therefore, the optimizer has no good choices for resolving this query. Thus the query will be processed with the candidate rows coming from a cross-product between the two tables and the result rows from those that have matching note_ id values. This result is not what the user wants. Note that the query below will produce (efficiently) the result set the user wants. select sale_id, note_id, note_date, txtln from note, note_line where note.sale_id = note_line.sale_id and note.note_id = note_line.note_id and note.note_date = note_line.note_date; SQL User Guide 230 15. Query Optimization In all of the following queries, the order_ndx index on sales_order is selected by the optimizer to access the rows that satisfy the specified condition. select * from sales_order where ord_date = date '1993-02-12'; select * from sales_order where ord_date > date '1993-03-31'; select * from sales_order where ord_date >= date '1993-04-01' and ord_date < date '1993-05-01'; select * from sales_order where ord_date in (@'1993-01-02',@'1993-02-03',@'1993-03-04'); select * from sales_order where ord_date between date '1993-06-01' and date '1993-06-30'; In the following query, the optimizer cannot use either of the relational expressions specified. select cust_id, ord_num, ord_date, amount from sales_order where ord_num = 2293 or ord_date = date '1993-06-18'; The or expression prohibits the use of either index, because using the index on ord_num would not retrieve those rows that have the specified ord_date, and vice-versa. In this case, the optimizer would select an access method that retrieves all rows in the table (using file scan or a complete index scan). This query is best performed using a separate query for each part. In general, when the table is large, a temporary table can be used as shown below. create temporary table torders( cid char(4), onum smallint, odate date, oamt double ); insert into torders select cust_id, ord_num, ord_date, amount from sales_order where ord_num = 2293; insert into torders select cust_id, ord_num, ord_date, amount from sales_order where ord_date = date '1993-06-18'; select distinct * from torders; Pattern matching using the SQL like operator can be optimized by using an index on the character column, provided the column is the first (or only) declared column in an index, and the pattern is a string literal in which the first character is not a wild-card character. For example, the index on cust_id is used in the following query to quickly select only those customer rows that begin with the letter "S". select * from customer where cust_id like "S%"; If the query is written differently, as shown below, all of the cust_id values in the index will be checked to find customer rows with a cust_id ending with the letter "S". select * from customer where cust_id like "%S"; The conditional is tested using the value from the index before the row is read so that if it does not match, there is no cost of reading the row from the data file. SQL User Guide 231 15. Query Optimization How the Optimizer Determines the Access Plan Selecting Among Alternative Access Methods Consider the following query. select * from ship_log where ord_num = 2284 and prod_id = 23400; The optimizer can choose to use either the index on ord_num or the index on prod_id to process this query. It will select the index that it determines will execute the query in the fewest disk accesses. It estimates the required disk accesses using the data usage statistics accumulated and stored in the system catalog by the update statistics statement. The following table shows the relevant statistics and calculations used by the optimizer for each of the two relational expressions in the above query. Table 15-7. Optimizer Statistics Example #1 optimizer statistic/calculation ord_num = 2284 prod_id = 23400 558 558 depth of B-tree (D) 2 2 number of keys per page (K) 72 72 0.003584 0.046595 estimate of number of result rows (C * R) 2 26 cost estimate 4 28 number of rows in table (C) restriction factor (R) The estimate of the number of rows that match each of the expressions is based on the operation (in this case "=") and the count of histogram matches. If the histogram count is zero, the number of rows to be returned by an equality condition is equal to the average frequency count of the five lowest histogram entries divided by the cardinality of the table. In this example, for the "ord_num = 2284" condition, 2284 is not in the histogram. The average of the 5 lowest frequency counts was 2. Thus, the restriction factor is 2/558 or 0.003584. For the "prod_id = 23400" condition, value 23400 is in the histogram with a frequency count of 26. The restriction factor is, therefore, 26/558 or 0.046595. The optimizer will choose the ord_num index because of its lower cost estimate (use the formula from Table 15-5 under "Indexed Access (equality)" to calculate the costs). Selecting the Access Order When a query references more than one table, the optimization process becomes more complex, because the optimizer must choose between different methods to access each table, and the order in which to access them. Many access methods rely only on the values specified in the conditional expression for the needed data. However, some access methods (those associated with join predicates) require that other tables have already been accessed. This places constraints on the possible orderings. Access methods available at the first step in the plan are those that do not depend on any other tables. For possible access methods at the first plan step, the optimizer chooses the method with the lowest cost from a list of possible methods sorted by cost. The accessed table is then marked as bound. The access methods available at the next step in the plan include the choices from the first step for the other tables, plus those methods that depend on the table bound by the first step. These too are ordered by cost. The optimizer continues in this manner until methods have been chosen for all steps in the plan. It then selects the method with the next highest cost and recursively evaluates a new plan. At any point in the SQL User Guide 232 15. Query Optimization process, if the plan being evaluated exceeds the total cost of the current best complete plan, that plan is abandoned and another is chosen. The entire optimizer algorithm is depicted in Figure 15-2 below. Figure 15-2. Optimizer Algorithm SQL User Guide 233 15. Query Optimization Sorting and Grouping For select statements that include a group by or order by specification, the SQL optimizer performs two separate optimization passes. The first pass restricts the choice of usable access methods to only those that produce or maintain the specified ordering. For example, an index scan retrieves its results in the order specified in the create index declaration. If the results match the specified ordering, they are included as a usable access method. This optimization pass is fast because, typically, very few plans produce the desired ordering without performing an external sort of the result set. Note that ordering clauses can be satisfied through the use of indexes and predefined sorted joins (that is, create join with order sorted). If a plan is produced by the first pass, it is saved (along with its cost estimate), and a second optimization is performed without the ordering restriction. An estimate of the cost required to sort the result set, based on the optimizer's estimate of the result set's size, is added to the cost of the plan produced by the unrestricted pass. The optimizer will then choose the plan with the lowest cost. The estimate of the sort cost is based on the optimizer's cardinality estimate, the length of the sort key, and the sort index page size. The optimizer will calculate the number of I/Os as two times the number of index pages to store the sort index (one pass to create the page and another to read each page in order) and add the number of result rows. Note that if both the group by and order by clauses are specified, only the group by ordering can be satisfied by existing indexes and joins. A separate sort of the result set will always be required for the order by clause. If there is no index to satisfy the specified group by, then two sort passes will be needed. For example, consider the following query on the ship_log table. select * from ship_log where ord_num = 2269 order by prod_id; The table below shows optimizer information on the two optimizer passes for the above query. Pass 1 requires use of the ship_prod_key index because it is the only method available that returns rows in the specified order. Pass 2 is free to choose any access method. The cost difference between these choices is large and the optimizer is correct to choose the plan produced by pass 2, even though it will have to perform a sort on the result rows. Table 15-8. Optimizer Statistics Example #2 Optimizer Statistic/Calculation Pass 1 Pass 2 558 558 Depth of B-tree (D) 2 2 Number of keys per page (K) 72 72 1.0 (prod_id is not used in where) 0.017921 Estimate of number of result rows (C * R) 558 10 Cost estimate 674 38 (includes sort cost) Number of rows in table (C) Restriction factor (R) Unfortunately, the sort cost estimate can be inaccurate, because it is based on a cardinality estimate derived from databasewide data distribution statistics that will not hold for some individual cases. RDM Server provides a configuration parameter, SortLimit, that can influence the sort decision. For cardinality estimates greater than the specified SortLimit number, the optimizer will always choose to use the restricted ordering plan rather than incur the cost of the sort. If SortLimit is zero or the cardinality estimate is less than SortLimit, the optimizer's choice is based on its computed cost estimates. Unless you observe many instances where sorts are being performed when they should not be (or vice versa), it would be best to leave SortLimit set to zero. SQL User Guide 234 15. Query Optimization A user can also force the optimizer to always select the restricted ordering plan by specifying nosort at the end of the order by or group by clause. Thus, if a restricted order plan exists and nosort is specified, that plan will be executed. If not, an external sort of the result set will still be performed. The optimizer will only consider orderings involving the actual columns where the sort clauses are declared in the create index or create join statements. The optimizer does not deduce additional ordering from the presence of join predicates in the where clause. For example, consider the following schema fragment. create table A( a_pk integer primary key, ... ); create table B( b_pk integer primary key, b_fk integer references A, ... ); create join A_to_B order last on B(b_fk); create table C( c_fk references B, c_date date ... ); create join B_to_C order sorted on C(c_fk) by c_date; The optimizer will recognize the join ordering to resolve the following query without performing an external sort: select * from A,B,C where a_pk = b_fk and b_pk = c_fk order by b_pk, c_date; However, in the next query, the optimizer would perform an external sort even though it is possible to deduce from the join predicates that the sort is unnecessary. select * from A,B,C where a_pk = b_fk and b_pk = c_fk order by a_pk, b_pk, c_date; The optimizer looks ahead of the sort field in the order by clause for use of the primary key column from the referenced table of a sorted join; it thus recognizes the order produced by the join access rule. To ensure that the predefined join preserves the sort order imposed by the columns preceding b_pk in the order by clause, the optimizer must know that those columns are unique. Thus, we derive the following two guidelines: 1. To use sorted joins, always include the referenced table's primary key column(s) prior to the sort columns in the order by clause. 2. Do not assume that the optimizer is smarter than you. Outer Join Processing The optimizer processes outer joins by forcing all outer joins into left outer joins (right outer joins are converted into left outer joins by simply reversing the order). It then will disable all access paths that require the right hand table to be accessed before the left hand table. If there is no access path (that is, through an index or predefined join) from the left hand table to the right hand table, the optimizer will simply perform an inner join (rather than doing a very expensive cross-product). SQL User Guide 235 15. Query Optimization Returning the Number of Rows in a Table The row counts for each table in a database are maintained by the RDM Server engine. SQL recognizes queries of the following form: select count(*) from tablename SQL also generates a special execution plan that returns the current row count value for the specified table. No table or index scan is needed. However, if the query is specified as shown in the next box below, the optimizer performs a scan of the table or index (if colname is indexed) and counts the rows. select count(columnname) from tablename Thus, if you need the row count of the entire table, use the first form and not the second. However, note that the row count returned from the first form includes uncommitted rows that have been inserted by another user. The second form counts only the currently committed rows. Select * From Table ANSI standard SQL states that when an order by clause is not specified, the ordering of the result rows from a table is implementation dependent. Some notable ODBC-based front-end application development and report writer tools assume that a "select * from table" returns the rows in primary key order. To work effectively with these products, RDM Server SQL will return the rows in primary key order (or in the order defined by the first unique index on the table if there is no primary key). Query Construction Guidelines Some systems perform a great deal of work to convert poorly written queries into well written queries before submitting the query to the optimizer. This is particularly useful in systems where ad hoc querying (such as in decision-support environments) is performed by non-technical people. SQL is less user friendly, so often this work is performed by front-end tools. RDM Server does not perform complex query transformation analysis (it will do simple things such as converting expressions like "10 = quantity" into "quantity = 10"). Therefore, a thorough understanding of the information provided here will assist you in formulating queries that can be optimized efficiently by RDM Server SQL. Guidelines for writing efficient RDM Server SQL queries are listed below. Formulate where clauses in conjunctive normal form. Avoid using or. Formulate conditional expressions according to the forms listed in Table 15-4. Use literal constants as often as possible. The compile-time for most queries is insignificant compared to their execution time. Thus, dynamically constructing and compiling queries containing literal constants (as opposed to parameter markers or stored procedures) will allow the optimizer to make more intelligent access choices based on the histogram statistics. Include more (not fewer) conditional expressions in the where clause, and include redundant expressions. For example, foreign and primary keys exist between tables A and B, B and C, and A and C. Even though it is not strictly necessary (mathematically) to include a join predicate between A and C, doing so provides the optimizer with additional access path choices. Also, assuming that join predicates exist and a simple conditional is specified for the primary key, you can include the same conditional on the foreign key as well. Look at the following query: SQL User Guide 236 15. Query Optimization select ... from A,B where A.pkey = B.fkey and A.pkey = 1000 You can improve this query by adding the conditional shown in an equivalent version below. select ... from A,B where A.pkey = B.fkey and A.pkey = 1000 and B.fkey = 1000 Make certain join predicates exist for all pairs of referenced tables that are related through foreign and primary keys. Avoid sorting queries with large result sets in which no index is available to produce the desired ordering. If you have heavy report writing requirements, consider using the replication feature to maintain a redundant, read-only copy of the database on a separate server and run your reports from that. This will allow the online server to provide the best response to update requests without blocking or being blocked by a high level of query activity. In defining your DDL, use create join where you would otherwise (that is, using other SQL systems), for performance reasons, create an index on a foreign key. Do not include conditional expressions in the having clause that belong in the where clause. Conditional expressions contained in the having clause should always include an aggregate function reference. Note that expressions in the having clause are not taken into consideration by the optimizer. Execute update statistics on your database(s) whenever changes have occurred which could have a significant effect on the distribution characteristics of the data. When in doubt, run update stats. User Control Over Optimizer Behavior User-Specified Expression Restriction Factor The restriction factor is the fraction of a table between 0 and 1 that is returned as a result of the application of a specific where condition. The lower the value, the greater the likelihood that the access method associated with that condition will be chosen by the optimizer. This factor is computed by the RDM Server optimizer from the data distribution statistics. Note that you can override the optimizer's estimate by using a non-standard RDM Server SQL feature. A relational expression, relexpr, can be written as "(relexpr, factor)", where factor is a decimal fraction between 0 and 1 indicating the percentage of the file restricted by relexpr. In the example below, where the optimizer would normally access the data using the index on ord_num, the user-specified restriction factor causes the optimizer to use instead the index on ord_date. select * from sales_order where (ord_date > date '1996-05-20',0.00001) and (ord_num = 2210, 1.0); When statistics used by the optimizer are not accurate enough for a given query and the result is unsatisfactory, you can use this feature to override the stats-based restriction factor and substitute your own value. However, your use of this feature renders the query independent of future changes to the data distribution statistics. User-Specified Index If a column referenced in an optimizable conditional expression is used in more than one index, the optimizer will generate an access rule for each index and select the index that it sees as the best choice. If the optimizer makes a poor choice, you can SQL User Guide 237 15. Query Optimization force its choice by specifying the index name in the column reference using colname@index_name syntax. This is illustrated in the following example from the diskdir database. select * from filetab where size@sizendx >= 100000; Besides the sizendx index, the optimizer could have chosen to use sizenmndx or sizedatendx. By specifying the index name with the column name in the conditional expression, the optimizer will only consider use of that particular index. Be certain you know exactly what you are doing when you use this feature (as well as the one from the last section). Optimizer Iteration Threshold (OptLimit) The time required by the optimizer to determine the optimal execution plan for a query increases factorially with the number of tables referenced in the from clause. Thus, the time to compile and optimize a query can become noticeable when there are many (> 8-10) referenced tables. The algorithm used by RDM Server SQL will often (but not always) determine the best access plan (or a reasonably good one) early in the optimization phase. The optimizer algorithm includes a failure-to-improve threshold limit based on the number of access plan step iterations. When the algorithm fails to generate a better access plan within the specified limit, the optimizer stops and uses the best plan found up to that point. The number of iterations that the algorithm processes depends on the number of tables being accessed and the number of usable access methods that can be chosen. The OptLimit configuration parameter can be used to specify this failure-to improve value. When set, the optimizer will stop prematurely, after executing OptLimit number of steps, even though a better plan than the current best plan has not been found. (This is similar to how chess programs work in timed games.) We recommend that you keep this parameter disabled (OptLimit = 0) unless you have an ongoing need to dynamically compile complex queries in which the optimization time degrades overall system performance. If you need to specify OptLimit, the value is the number of optimizer iterations (i.e., candidate execution plan steps). You will typically choose a value greater than 1,000 and less than 10,000. The higher the number, the longer the optimizer will take, but the better the likelihood of finding the best plan. An administrator can use the set opt_limit SQL statement to change the value for a particular session. The OptLimit configuration parameter sets the limit for all sessions. All configuration parameters in rdmserver.ini are read only at initial server startup. Enabling Automatic Insertion of Redundant Conditionals A configuration parameter named RedundantExprs can be defined in rdmserver.ini (RedundantExprs=1) that allows the optimizer to include redundant expressions involving foreign and primary key columns involved in a join predicate. Checking Optimizer Results Retrieving the Execution Plan (SQLShowPlan) You can view the execution plan by calling the SQLShowPlan function from your C/C++ application program. You can also view the plan from the rsql utility program. SQLShowPlan returns a result set containing one row for each step in the execution plan (one step per table listed in the from clause). This result set returns columns as shown in Table 15-9. SQL User Guide 238 15. Query Optimization Table 15-9. SQLShowPlan Result Set Definition Column Name Description STEP_NUMBER The step number in the execution plan. DB_NAME TABLE_NAME ACCESS_METHOD ACCESS_NAME STEP_CARDINALITY PLAN_CARDINALITY PLAN_COST SORT_LEN GROUP_LEN The first row is step 1, second is step 2, etc, The name of the database in which the table is defined. The name of the base table being accessed. The method by which the table is accessed (see below). The name of the index or predefined join used by the access method, if applicable. Estimate of the number of rows returned from this step. Estimate of the total number of rows returned by the query. Estimate of the total cost (in logical I/Os) to execute the query. Length of the sort record for the order by clause (0 => no sort required). Length of the sort record for the group by clause (0 => no sort required) The last four columns return the same values for each row in the result set. The access method is identified using the names in Table 15-10 below. Table 15-10. SQLShowPlan Access Methods Name Used in SQLShowPlan Access Method TABLE SCAN Sequential file scan DIRECT INDEX FIND INDEX LIST INDEX SCAN INDEX LIKE P-TO-F JOIN F-TO-P JOIN JOINED INDEX JOINED DIRECT Direct access Indexed access (equality) Indexed access (in) Index scan (inequality) Index scan (like) Primary to foreign key join Foreign to primary key join Foreign thru indexed primary key Foreign thru rowid primary key SQLShowPlan is called with two statement handles. The first is the statement handle into which the execution plan result set will be returned. The second statement handle is for the statement whose execution plan will be retrieved. This second handle must be at least in the prepared state (that is, the statement must have already been compiled using SQLPrepare or SQLExecDirect). The prototype for SQLShowPlan is given below. RETCODE SQLShowPlan( HSTMT thisHstmt, //in: handle for SQLShowPlan result set HSTMT thatHstmt) //in: handle of statement whose plan is to be fetched SQLShowPlan will return an error if thisHstmt is not in a compatible state with SQLExecDirect (SQLShowPlan calls SQLExecDirect). An error will also be returned if thatStmt is not a prepared or executed select, update, or delete statement. You can view a statement's execution plan from rsql using the ".X" command. You must execute this command under a separate statement handle from the one whose execution plan you are interested. The following "select office, count" example illustrates the use of the command. SQL User Guide 239 15. Query Optimization 002 rsql: select office, count(*) from salesperson, customer, sales_order + 002 rsql: where salesperson.sale_id = customer.sale_id + 002 rsql: and customer.cust_id = sales_order.cust_id + 002 rsql: and state in ("AZ","CA",'CO','WA','TX') group by 1 order by 2 desc; OFFICE LAX DEN SEA DAL COUNT(*) 15 12 9 6 004 rsql: .h 2 *** using statement handle 2 of connection 1 004 rsql: .t *** table mode is off 004 rsql: .X 1 STEP_NUMBER : 1 DB_NAME : SALES TABLE_NAME : CUSTOMER ACCESS_METHOD : INDEX LIST ACCESS_NAME : CUST_GEO STEP_CARDINALITY: 9 PLAN_CARDINALITY: 40.000000 PLAN_COST : 54.000000 SORT_LEN : 9 GROUP_LEN : 24 004 rsql: .n STEP_NUMBER : 2 DB_NAME : SALES TABLE_NAME : SALESPERSON ACCESS_METHOD : F-TO-P JOIN ACCESS_NAME : ACCOUNTS STEP_CARDINALITY: 9 PLAN_CARDINALITY: 40.000000 PLAN_COST : 54.000000 SORT_LEN : 9 GROUP_LEN : 24 004 rsql: .n STEP_NUMBER : 3 DB_NAME : SALES TABLE_NAME : SALES_ORDER ACCESS_METHOD : P-TO-F JOIN ACCESS_NAME : PURCHASES STEP_CARDINALITY: 40 PLAN_CARDINALITY: 40.000000 PLAN_COST : 54.000000 SORT_LEN : 9 GROUP_LEN : 24 004 rsql: .n *** no more rows Using the SqlDebug Configuration Parameter The ENVIRONMENT section of rdmserver.ini contains a parameter called SqlDebug. This parameter has been implemented for internal use by Raima, but it can also be used by an SQL developer to discover the execution plan choices made by the RDM Server optimizer. Use this method only when you need more information than that provided by SQLShowPlan. SQL User Guide 240 15. Query Optimization WARNING: Enabling the SqlDebug parameter will cause many debug files to be created which can quickly consume disk space. Do not enable this parameter on a production server; it should be used strictly in a test environment with one user only. Debug information for each query is written into a separate debug text file in the current directory within which the RDM Server is executing. The files are named debug.nnn where nnn is 000 for the first file, 001 for the second, and so forth. The SqlDebug parameter is a bit mapped value in which each bit setting controls the output of certain SQL internal tables. RDM Server SQL maintains the debug settings shown in the Table 15-11 below. When more than one bit setting is specified, the output for each is written into a separate debug file. Table 15-11. SqlDebug Parameter Values SqlDebug Debug File Output (setting 2 is currently unused) 1 Formatted dump of compiled statement Formatted dump of query optimizer tables and execution plan Formatted dump of query execution plan 1 and 4 1 and 8 4 and 8 1, 4 and 8 4 8 5 9 12 13 Setting 1 is of no interest with respect to query optimization. Setting 8 provides basically the same information as SQLShowPlan. Setting 4 will produce a formatted dump of the internal tables that drive the optimizer's analysis. Included at the beginning of the debug file is a copy of the SQL select statement being optimized. A sample for the "select office, count" query in Retrieving the Execution Plan section is shown below. select office, count(*) from salesperson, customer, sales_order where salesperson.sale_id = customer.sale_id and customer.cust_id = sales_order.cust_id and state in ("AZ","CA",'CO','WA','TX') group by 1 order by 2 desc; ---------------------------------------------------------------------------------FROM Table: # name tableid viewid # rows step # -- ---------------------- ------- ------- -------- -----0 SALESPERSON 36 0 14 1 1 CUSTOMER 37 0 28 0 2 SALES_ORDER 38 0 127 2 Referenced Column Table: # name tableno colno accmap ndxname -- --------------- ------- ----- ------ ------0 OFFICE 0 6 0x0010 1 SALE_ID 0 1 0x0004 2 SALE_ID 1 8 0x0002 3 CUST_ID 1 1 0x0004 4 CUST_ID 2 1 0x0002 5 STATE 1 6 0x0004 Access Table: # type tableno/name SQL User Guide id -----ref'd columns----- binds 1 2 3 4 5 6 7 8 table updated 241 15. Query Optimization -- ---- ------------------- ----- ----------------------0 i 0/SALESPERSON 0 1 -1 -1 -1 -1 -1 -1 -1 1 i 0/SALESPERSON 1 -1 0 -1 -1 -1 -1 -1 -1 2 i 1/CUSTOMER 0 3 -1 -1 -1 -1 -1 -1 -1 3 i 1/CUSTOMER 1 5 -1 -1 -1 -1 -1 -1 -1 4 j 1/CUSTOMER 0 2 -1 -1 -1 -1 -1 -1 -1 5 j 2/SALES_ORDER 0 4 -1 -1 -1 -1 -1 -1 -1 ----- ------no no no no no no no no no no no no Expression Table: # optable col0 emult0 col1 emult1 join operation -- ------- ---- --------- ---- --------- ---- --------0 2 1 0.071429 2 0.035714 yes eq 1 2 3 0.035714 4 0.036220 yes eq 2 2 5 0.321429 -1 0.000000 no in Rule Table: # -0 1 2 3 4 5 6 7 8 9 10 11 12 13 binds uses binds uses sort method tab # tab # # rows cost id expr #s col #s col #s col #s ----------- ----- ----- -------- ------ ----- -------- ------- ------- ------FILE SCAN 0 -1 14.00 145 -1 FILE SCAN 1 -1 28.00 145 -1 FILE SCAN 2 -1 127.00 145 -1 INDEX SCAN 0 -1 14.00 15 0 INDEX SCAN 0 -1 14.00 15 1 INDEX SCAN 1 -1 28.00 29 2 INDEX SCAN 1 -1 28.00 29 3 INDEX FIND 0 1 1.00 2 0 0 1 2 INDEX FIND 1 2 1.01 2 2 1 3 4 INDEX LIST 1 -1 9.00 9 3 2 5 JOIN OWNER 1 2 1.00 1 20002 1 3 4 JOIN MEMBER 2 1 4.54 4 20002 1 4 3 JOIN OWNER 0 1 1.00 1 20001 0 1 2 JOIN MEMBER 1 0 2.00 2 20001 0 2 1 ---------------------------------------------------------------------------------Best Access Plan: cost = 54 i/o's, cardinality estimate = 40 rows step rule # cost rows in rows out ---- ------ --------- --------- --------0 9 9 9 9 1 12 9 9 9 2 11 36 40 40 Number of optimizer iterations: 15 Table 15-12 below lists the tables referenced in the from clause of the statement. Table 15-12. FROM Table Column Heading Description # Index into the FROM table; referred to in other tables as the name tableid viewed # rows SQL User Guide "tableno". The table name, view name, or correlation (alias) name, if specified. SQL's permanent ID number for the table as assigned in the system catalog. SQL's permanent ID number for the view as assigned in the system catalog. The cardinality of the table when the statement was compiled. 242 15. Query Optimization Column Heading step # Description Identifies the step number in the best plan where this table is accessed. Table 15-13 contains one entry for each column that is referenced in the statement. Table 15-13. Referenced Column Table Column Heading Description # Referenced column number; used in the other tables to identify a name tableno colon accmap ndxname specific column. The base column name (not the alias, if an alias was specified). FROM table index of the table where this column is declared. The column declaration number from its table (1 = first declared column in table). The column access type bit map (see Table 15-14 below). Identifies a user-specified index name. Table 15-14. Access Type Bit Map Values Bit Map Description 0x0001 Column is a rowid primary key (direct access) 0x0002 Column is the major (first) column in a joined foreign key 0x0004 Column is the major (first) column in an index 0x0008 0x0010 Column is a minor (not the first) column in a joined foreign key Column is a minor (not the first) column in an index Table 15-15 contains information about the indexes and joins that potentially can be used by the optimizer. Table 15-15. Access Table Column Heading Description # Index number into this table; referenced in the "id" column of the type tableno/name id ref'd columns binds table updated Rule Table. Access type: "i" for an index, "j" for a predefined join. The FROM table number and name of accessed table. The index, foreign key, or primary/unique key entry for the accessed table (this value is used to index into internal tables attached to the table definition). Identifies the columns (up to 8) in the index or foreign key that are referenced in the statement. The non-negative values are indexes into the Referenced Column Table. A -1 indicates an undeclared or unreferenced column. Indicates whether all of the columns from the table that are referenced in the query are contained in the index. If yes, SQL will not have to read the row from the data file but can retrieve all the column values from the index key value. Only used on update statements and indicates if one of the columns in the index is being modified in the statement. The "Expression Table" has one entry for each optimizable conditional expression. SQL User Guide 243 15. Query Optimization Table 15-16. Expression Table Column Heading Descripiton # Index number into this table; it is referenced in the "expr #s" column optable col0 emult0 col1 emult1 join operation of the Rule Table. A value of 2 indicates that at least one efficient access method is associated with the expression. A value of 1 indicates that no efficient access methods exist for rows that satisfy the condition to be efficiently retrieved. The Referenced Column Table entry corresponding to the (left-hand) column referenced in the conditional. The restriction factor multiplier value associated with the col0 expression. The Referenced Column Table entry corresponding to the (righthand) column referenced in a join condition. The restriction factor multiplier value associated with the col1 join condition. Two restriction factors are needed depending on which table is being accessed. Indicates if the expression is a join condition (that is, col0 = col1). The relational operator specified in the expression. The heart of the optimization analysis is driven by the Rule Table (Table 15-17). Table 15-17. Rule Table Column Heading Description # Index number into this table; it is referenced in the "rule #" column method binds tab # uses tab # # rows cost id expr #s binds col #s SQL User Guide of the Best Access Plan. The access method associated with this rule. See Table 15-10 for a list of the access methods. The tableno of the table being accessed by that method. A table becomes "bound" at that step in the access plan where the rule is applied. Prior to that step, the table is unbound. The tableno of the table that contains column values needed by this rule's access method. A -1 value means that the rule does not depend on values from any other table. Rules that rely on values from another table (through join predicates) can only be used in plan steps that follow the rule that accesses the used table. The optimizer's estimate of the number of rows from the table that will be returned by the rule. When a table depends on another table having first been bound (that is, "uses tab #" != -1), then the "# rows" is the average number returned for each row of the dependent table. Estimate of the number of logical disk reads required for each application of the rule based on the formulas given in Table 15-5. The Access Table entry that contributed to this rule or the internal join identifier (i.e., a core-level d_ API set id constant). A -1 value indicates that it is unused. List of Expression Table entries that contributed to this rule. The Referenced Columns (from the "binds tab #" table) that are accessed by the rule. 244 15. Query Optimization Column Heading uses cols #s sort col #s Description The Referenced Columns (from the "uses tab #" table) that are used by the rule. The Referenced Columns that specify the sort order in which the rule returns its rows. A negative value indicates that this column is returned in descending order (formed as -colno - 1). A summary of the optimizer's results follow in Table 15-18. The cost and cardinality estimate of the Best Access Plan are reported and followed by the plan itself. The plan lists the steps in the order of their execution. Table 15-18. Best Access Plan Column Heading Description step Access plan step number. The steps are executed in this order. rule # The Rule Table entry for the rule that the optimizer selected for this cost rows out rows in rows in rows out step. The cost to apply the rule at this step in the plan is equal to the prior step's times the rule's cost from the Rule Table. For step 0, the cost is the rule's cost. The number of rows from the prior step that invoke an application of the rule in this step. It is equal to the prior step's rows out times the "# rows" value from the Rule Table for the rule being applied at this step. For step 0, is the "# rows" value from the Rule Table. The optimizer's estimate of the number of the rows in rows that satisfy all conditionals from the where clause involving the table accessed at that step in the plan. Computed from the restriction factors of all expressions that contribute to this rule. When a group by or order by clause is specified and the optimizer selects an execution plan that satisfies the desired ordering without requiring a separate sort pass, a statement will be reported similar to the following: Plan produces target ordering for order by: 3 d 1 a The numbers here are simply the result column ordinals. When an external sort is required, the sort costs will automatically be incorporated into the report plan cost and no notice will be printed. Note that in the example above, an external sort was required (the "target ordering..." message was not printed). The optimizer's estimate of the cost of the sort can be computed by subtracting from the plan's total cost, the cost reported in the last step of the Best Access Plan. Finally, the total number of optimizer iterations needed to determine the best access plan is reported. Limitations Optimization of View References Each view in RDM Server SQL is optimized at its creation time and stored in a compiled format. A view referenced in a select statement is accessed according to its precompiled execution plan. This can cause performance problems if a view is referenced in a query with extra conditionals or is joined with another table. Instead of "unraveling" the view definition and re- SQL User Guide 245 15. Query Optimization optimizing it along with the extra conditionals, the view's rows are retrieved and the additional constraints are evaluated at execution time. Thus, it is best to avoid creating joins that involve views (but a view definition can include joins on base tables). An alternative is to use stored procedures, which are optimized at compile time but can be parameterized; the optimizer does incorporate stored procedure parameter references in its analysis. Merge-Scan Join Operation is Not Supported A merge-scan join operation is a join processing technique where indexes on the joined columns are merged and only rows common to both indexes are returned. Some optimizers even include the cost of creating an index when one of the columns is not joined. RDM Server does not include this technique because of its ability to define direct access joins using the create join statement. Join processing based on these predefined joins is optimal. Subquery Transformation (Flattening) Unsupported Some optimizers perform an optimization technique on nested correlated subqueries where the query is "flattened" into an equivalent query that has replaced the subqueries with joins. This method is not implemented in RDM Server. SQL User Guide 246