Transact-SQL by TechNet Wiki Community

Transcription

Transact-SQL by TechNet Wiki Community
Transact-SQL by TechNet Wiki Community
About this eBook
This eBook is provided "as is". The information and views expressed in this eBook, including URL and
other web site references, may change without notice. You assume the entire risk of use.
This eBook does not provide you with legal rights to the ownership of Microsoft products, but just the
use, unless this is explicitly stated in the document.
You can copy and use this whitepaper for your projects, labs and other needs.
TechNet Wiki 2014 All rights reserved.
For more information, please contact:





Ed Price
Gokan Ozcifci
Durval Ramos
Naomi Nosonovsky
Saeid Hasani
What is TechNet WIKI?
The TechNet Wiki is a library of information about Microsoft technologies, written by the community for
the community. Whether you write code, manage servers, keep mission-critical sites up and running, or
just enjoy digging into details, we think you will be at home in the TechNet Wiki.




This is a community site. For official documentation, see MSDN Library, TechNet Library or
contact Microsoft Support
The Wiki is focused on Microsoft technologies. The community will edit or remove topics that
get too far off track
We are inspired by Wikipedia
Anyone who joins can participate and contribute content
How Can I Participate?
The simplest way to participate is to use the information in this Wiki. The community is providing howto
guides, troubleshooting tips and techniques, practical usage scenarios, scripting pointers as well as
overview, conceptual and technology overview topics.






Read the terms of use
Sign in, upload an avatar and configure your profile
Review the Code of Conduct. It takes after the Ubuntu Code of Conduct and guides our behavior
Visit Getting Started and What Makes a Great Article to get the basics
Find topics using search, the tag cloud or by visiting the article spotlight page
Create a topic. Contribute boldly, edit gently!
We welcome your feedback. Head over to the TechNet Wiki Discussion forum, connect with us on the
Wiki, or Tweet feedback using #TNWiki (and follow WikiNinjas).
Help us write the future.
Articles used in this eBook
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
T-SQL Useful Links by Naomi N
T-SQL: Hierarchical Table Sorting with a Parent-Child Relation by Kev Riley
APPLY Operator in SQL Server by Ghouse Barq
T-SQL: Applying APPLY Operator by Naomi N
Fixing Missing Data Based on Prior Row Information by Naomi N
SQL Server PIVOT by Naomi N
T-SQL: Display Horizontal Rows Vertically by SathyanarrayananS
T-SQL: Dynamic Pivot on Multiple Columns by Naomi N
T-SQL: Create Report for Last 10 Years of Data by Naomi N
T-SQL: Relational Division by Naomi N
Microsoft SQL Server 2012 New Functions by Ahsan Kabir
EOMONTH() Function Usage in SQL Server 2012 and On by Kalman Toth
How SQL Server Determines Type of the Constant by Naomi N
Understanding NOLOCK Query Hint by Shanky
SET ANSI_PADDING Setting and Its Importance by Naomi N
All-at-Once Operations in T-SQL by Saeid Hasani
SQL Server Columnstore Index FAQ by Eric N. Hanson MSFT
SQL Server Columnstore Performance Tuning by Eric N. Hanson MSFT
T-SQL: Simplified CASE expression by Saeid Hasani
Structured Error Handling Mechanism in SQL Server 2012 by Saeid Hasani
Error Handling within Triggers Using T-SQL by Saeid Hasani
Custom Sort in Acyclic Digraph by Saeid Hasani
Patindex Case Sensitive Search by Naomi N
T-SQL: Remove Leading and Trailing Zeros by SathyanarrayananS
T-SQL: How to Find Rows with Bad Characters by Naomi N
T-SQL: Random String by Ronen Ariely (aka pituach)
Sort Letters in a Phrase using T-SQL by Saeid Hasani
T-SQL: Date-Related Queries by Naomi N
How To Find Various Day, Current Week, Two Week, Month, Quarter, Half Year and Year In SQL
Server by kishhr
SQL Server: How to Find the First Available Timeslot for Scheduling by Arnie Rowland
T-SQL: Group by Time Interval by Naomi N
Avoid T (space) while generating XML using FOR XML clause by SathyanarrayananS
Generate XML with Same Node Names using FOR XML PATH by SathyanarrayananS
Generate XML - Column Names with their Values as text() Enclosed within their Column Name
Tag by SathyanarrayananS
SQL Server XML: Sorting Data in XML Fragments by Stefan Hoffmann
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
How to Extract Data in XML to Meet the Requirements of a Schema by Durval Ramos
T-SQL Script to update string NULL with default NULL by SathyanarrayananS
T-SQL: FIFO Inventory Problem - Cost of Goods Sold by Naomi N
T-SQL: Gaps and Islands Problem by Naomi N
Crazy TSQL Queries play time by Ronen Ariely (aka pituach)
RegEx Class by Marc Noon
SQL Server Resource Re-Balancing in Failover Cluster by Andrew Bainbridge
SQL Server: Create Random String Using CLR by Ronen Ariely (aka pituach)
How to Compare Two Tables Definition / Metadata in Different Databases by
SathyanarrayananS
T-SQL: Script to Find the Names of Stored Procedures that Use Dynamic SQL by
SathyanarrayananS
T-SQL Script to Get Detailed Information about Index Settings by SathyanarrayananS
How to Check when Index was Last Rebuilt by Mohammad Nizamuddin
How to Generate Index Creation Scripts for all Tables in a Database using T-SQL by
SathyanarrayananS
T-SQL: Fast Code for Relationship within the Database by DIEGOCTN
How to Check the Syntax of Dynamic SQL Before Execution by SathyanarrayananS
Using Bulk Insert to Import Inconsistent Data Format (Using Pure T-SQL) by Ronen Ariely (aka
pituach)
Guest Authors
Ahsan Kabir
Microsoft Certified Professional
Andrew Bainbridge
SQL Server DBA based in
London, United Kingdom
Arnie Rowland
Microsoft Certified Trainer and
has been recognized by
Microsoft as a SQL Server MVP.
He has been a Subject Matter
Expert (SME) with on SQL
Server 2000 and SQL Server
2005 training courses and has
recently been involved as a SME
with the development of the
Microsoft SQL 2008
Administrator and Developer
Certification Exams.
Diego CTN
Durval Ramos
Database Administrator, MTA
SQL Server 2012.
Currently works with MS
databases solutions like
PowerView, BizTalk and
SQL Server
Eric N. Hanson
Principal Program Manager in
the Microsoft SQL Server Big
Data team.
Works on architecture and R&D
for data warehousing and Big
Data/data warehouse
interaction
Ghouse Barq
An Architect with MCA from
Bangalore University
Kalman Toth
Kalman Toth is SQL Server and
BI Architect and Trainer, author
of many SQL Server books.
His training website
is www.SQLUSA.com
Kev Riley
SQL Server Specialist.
IT Professional with specific
experience in database
technology, mainly Microsoft
SQL Server since 1999.
MCITP 2008 Dev & DBA MCSE
Data Platform
Kishhr
Program designer with Qatar
Government.
Have worked on almost all
microsoft technologies, have
executed various projects and
was involved from scratch in
many projects.
Marc Noon
Software Developer and
Website Designer
Mohammad
Nizamuddin
Microsoft Certified Technology
Specialist in SQL Server 2005
Microsoft Certified Technology
Specialist in C#, .Net
Framework 2.0
Microsoft Certified Professional
Developer in Web Developer
Microsoft Certified IT
Professional in SQL Server 2005
Database Developer
Naomi Nosonovsky
IT professional with more than
15 years of experience in
variety of programming
languages and technologies.
Microsoft Community Award
Recipient and Personality of the
Year at UniversalThread.com
forum in 2008,2009,2010,2011.
Also a TechNet Guru
competition multi times Gold
Winner.
Ronen Ariely
(aka pituach)
Programmer professional, SQL
& BI Architect with more than
12 years of experience in
variety of programming
languages and technologies,
leading and managing
development teams and BI
projects.
Saeid Hasani
Works for an ERP software
company as a Senior Database
Developer. His main expertise is
T-SQL Programming and Query
Tuning.
He spends a lot of time involved
in the TechNet and MSDN
Community, including writing
articles for Wiki and
participating in the SQL Server
MSDN forums.
SathyanarrayananS
Born and brought up in Chennai,
India, currently working as
Database developer in an
American multinational
information technology.
MSDN Moderator for SQL
Server Reporting Services,
Power View , Transact-SQL ,
SQL Server Integration Services
and Getting started with SQL
Server Forums.
Shanky
Won "TechNet Guru Awards" July 2013, August 2013,
September 2013, October 2013
IT Analyst from Mumbai (India).
Very patriotic person and
believes that best service one
can do is a service to
motherland.
Currently works for a big
Software service provider
company.
Stefan Hoffmann
Microsoft MVP SQL Server
Acknowledgement
This e-book was created from the Transact-SQL articles published in TechNet Wiki by many authors. We
are very grateful for their work and dedication.
This book was put together by Saeid Hasani with the help of Durval Ramos Junior, Naomi Nosonovsky and
Ronen Ariely (aka pituach).
The Editors of this eBook thank all TechNet Wiki Members who contributed their content into Microsoft
TechNet Wiki.
Contents
T-SQL USEFUL LINKS ............................................................................................................................................. 20
SELECT TOP N ROWS PER GROUP ....................................................................................................................................20
PERFORMANCE OPTIMIZATION .......................................................................................................................................20
EXECUTE VS SP_EXECUTESQL ......................................................................................................................................20
SQL SERVER INTERNALS ................................................................................................................................................20
DYNAMIC SEARCH ........................................................................................................................................................21
OPTION RECOMPILE......................................................................................................................................................21
DATES ........................................................................................................................................................................21
CALENDAR TABLE .........................................................................................................................................................21
GAPS AND ISLANDS .......................................................................................................................................................21
CONCURRENCY ............................................................................................................................................................21
PARAMETER SNIFFING ...................................................................................................................................................22
CURSORS ....................................................................................................................................................................22
INFORMATION ABOUT ALL OBJECTS ..................................................................................................................................22
STRING MANIPULATIONS ...............................................................................................................................................22
STRING SPLIT ...............................................................................................................................................................22
XML .........................................................................................................................................................................22
CONCATENATE ROWS ...................................................................................................................................................22
COMMON TABLE EXPRESSION.........................................................................................................................................23
CTE PERFORMANCE .....................................................................................................................................................23
CTE SYNTACTIC SUGAR ..................................................................................................................................................23
CTE VERSUS TEMP TABLE ..............................................................................................................................................23
PIVOT .......................................................................................................................................................................23
UNPIVOT ..................................................................................................................................................................23
RUNNING TOTAL ..........................................................................................................................................................23
ASP.NET ...................................................................................................................................................................23
OTHER TOPICS .............................................................................................................................................................24
HIERARCHICAL TABLE SORTING WITH A PARENT-CHILD RELATION ...................................................................... 26
PROBLEM ...................................................................................................................................................................26
SOLUTION ...................................................................................................................................................................26
APPLY OPERATOR IN SQL SERVER ........................................................................................................................ 30
INTRODUCTION ............................................................................................................................................................30
APPLY OPERATORS ......................................................................................................................................................30
USING THE CODE .........................................................................................................................................................30
TOP OPERATOR ..........................................................................................................................................................31
T-SQL: APPLYING APPLY OPERATOR .................................................................................................................... 32
PROBLEM DESCRIPTION .................................................................................................................................................32
SOLUTION ...................................................................................................................................................................33
SQL SERVER 2012 SOLUTION ........................................................................................................................................35
CONCLUSION ...............................................................................................................................................................35
ADDENDUM ................................................................................................................................................................35
FIXING MISSING DATA BASED ON PRIOR ROW INFORMATION ............................................................................ 36
SQL SERVER PIVOT ............................................................................................................................................... 39
PROBLEM DEFINITION ...................................................................................................................................................39
COMMON PROBLEM .....................................................................................................................................................40
OTHER BLOGS .............................................................................................................................................................40
T-SQL: DISPLAY HORIZONTAL ROWS VERTICALLY ................................................................................................ 41
HOW TO DISPLAY DYNAMICALLY HORIZONTAL ROWS VERTICALLY............................................................................................41
T-SQL: DYNAMIC PIVOT ON MULTIPLE COLUMNS ................................................................................................ 44
HOW TO MAKE A DYNAMIC PIVOT ON MULTIPLE COLUMNS .................................................................................................44
ADDITIONAL RESOURCES ...............................................................................................................................................46
T-SQL: CREATE REPORT FOR LAST 10 YEARS OF DATA .......................................................................................... 47
PROBLEM DEFINITION ...................................................................................................................................................47
SOLUTION ...................................................................................................................................................................49
CONCLUSION ...............................................................................................................................................................50
T-SQL: RELATIONAL DIVISION .............................................................................................................................. 52
INTRODUCTION ............................................................................................................................................................52
PROBLEM DEFINITION ...................................................................................................................................................52
SOLUTIONS .................................................................................................................................................................52
BEST EXACT MATCH SOLUTION .......................................................................................................................................56
SLIGHT VARIATION OF THE ORIGINAL PROBLEM..................................................................................................................57
CONCLUSION ...............................................................................................................................................................60
MICROSOFT SQL SERVER 2012 NEW FUNCTIONS ................................................................................................. 62
EOMONTH ...............................................................................................................................................................62
CHOOSE ...................................................................................................................................................................62
CONCAT ...................................................................................................................................................................62
LAST_VALUE AND FIRST_VALUE ...............................................................................................................................62
LEAD ........................................................................................................................................................................63
EOMONTH() FUNCTION USAGE IN SQL SERVER 2012 AND ON ............................................................................. 64
HOW SQL SERVER DETERMINES TYPE OF THE CONSTANT .................................................................................... 66
PROBLEM DEFINITION ...................................................................................................................................................66
EXPLANATION ..............................................................................................................................................................66
CONCLUSION ...............................................................................................................................................................66
UNDERSTANDING NOLOCK QUERY HINT ............................................................................................................. 67
SET ANSI_PADDING SETTING AND ITS IMPORTANCE ........................................................................................... 73
PROBLEM DESCRIPTION .................................................................................................................................................73
INVESTIGATION ............................................................................................................................................................73
RESOLUTION ...............................................................................................................................................................73
SCRIPT TO CORRECT PROBLEM IN THE WHOLE DATABASE ......................................................................................................75
DEFAULT DATABASE SETTINGS ........................................................................................................................................76
ALL-AT-ONCE OPERATIONS IN T-SQL .................................................................................................................... 77
INTRODUCTION ............................................................................................................................................................77
DEFINITION .................................................................................................................................................................78
PROS AND CONS ..........................................................................................................................................................81
CAUTION .................................................................................................................................................................83
EXCEPTION..................................................................................................................................................................89
CONCLUSION ...............................................................................................................................................................89
SQL SERVER COLUMNSTORE INDEX FAQ .............................................................................................................. 90
CONTENTS ..................................................................................................................................................................90
1. OVERVIEW ..............................................................................................................................................................90
2. CREATING A COLUMNSTORE INDEX ..............................................................................................................................91
3. LIMITATIONS ON CREATING A COLUMNSTORE INDEX ...............................................................................................95
4. MORE DETAILS ON COLUMNSTORE TECHNOLOGY ...........................................................................................................95
5. USING COLUMNSTORE INDEXES...................................................................................................................................99
6. MANAGING COLUMNSTORE INDEXES .........................................................................................................................102
7. BATCH MODE PROCESSING ......................................................................................................................................107
SQL SERVER COLUMNSTORE PERFORMANCE TUNING ....................................................................................... 110
INTRODUCTION ..........................................................................................................................................................110
FUNDAMENTALS OF COLUMNSTORE INDEX-BASED PERFORMANCE ......................................................................................110
DOS AND DON'TS FOR USING COLUMNSTORES EFFECTIVELY .............................................................................................111
MAXIMIZING PERFORMANCE AND WORKING AROUND COLUMNSTORE LIMITATIONS ..............................................................112
ENSURING USE OF THE FAST BATCH MODE OF QUERY EXECUTION.......................................................................................112
PHYSICAL DATABASE DESIGN, LOADING, AND INDEX MANAGEMENT ....................................................................................112
MAXIMIZING THE BENEFITS OF SEGMENT ELIMINATION .....................................................................................................112
ADDITIONAL TUNING CONSIDERATIONS ..........................................................................................................................112
T-SQL: SIMPLIFIED CASE EXPRESSION ................................................................................................................ 114
INTRODUCTION ..........................................................................................................................................................114
DEFINITION ...............................................................................................................................................................114
DETERMINE OUTPUT DATA TYPE ....................................................................................................................................117
DETERMINE OUTPUT NULL-ABILITY ...............................................................................................................................118
PERFORMANCE ..........................................................................................................................................................125
IS NULL AND OR.......................................................................................................................................................125
CASE ......................................................................................................................................................................126
COALESCE ..............................................................................................................................................................127
ISNULL ...................................................................................................................................................................128
DYNAMIC SQL...........................................................................................................................................................129
COALESCE ..............................................................................................................................................................131
ISNULL ...................................................................................................................................................................132
XML .......................................................................................................................................................................132
CHOOSE .................................................................................................................................................................133
UDF FUNCTION .........................................................................................................................................................134
PERMANENT LOOKUP TABLE ........................................................................................................................................135
MORE READABILITY ....................................................................................................................................................136
CONCLUSION .............................................................................................................................................................137
STRUCTURED ERROR HANDLING MECHANISM IN SQL SERVER 2012 .................................................................. 138
PROBLEM DEFINITION..................................................................................................................................................138
INTRODUCTION ..........................................................................................................................................................138
SOLUTION .................................................................................................................................................................138
CORRECT LINE NUMBER OF THE ERROR! ..........................................................................................................................147
EASY TO USE..............................................................................................................................................................148
COMPLETE TERMINATION.............................................................................................................................................148
INDEPENDENCE OF SYS.MESSAGES .................................................................................................................................150
XACT_ABORT .........................................................................................................................................................155
@@TRANCOUNT ...................................................................................................................................................155
CONCLUSION .............................................................................................................................................................155
ERROR HANDLING WITHIN TRIGGERS USING T-SQL ........................................................................................... 156
PROBLEM DEFINITION..................................................................................................................................................156
SOLUTION .................................................................................................................................................................157
CONCLUSION .............................................................................................................................................................161
CUSTOM SORT IN ACYCLIC DIGRAPH ................................................................................................................. 162
PROBLEM DEFINITION..................................................................................................................................................162
VOCABULARY ............................................................................................................................................................162
SOLUTION .................................................................................................................................................................162
PATINDEX CASE SENSITIVE SEARCH ................................................................................................................... 166
REMOVE LEADING AND TRAILING ZEROS ........................................................................................................... 167
T-SQL: HOW TO FIND ROWS WITH BAD CHARACTERS........................................................................................ 168
CONCLUSION.............................................................................................................................................................170
RANDOM STRING .............................................................................................................................................. 171
INTRODUCTION ..........................................................................................................................................................171
SOLUTIONS ...............................................................................................................................................................171
CONCLUSIONS AND RECOMMENDATIONS ........................................................................................................................178
SORT LETTERS IN A PHRASE USING T-SQL .......................................................................................................... 180
PROBLEM DEFINITION..................................................................................................................................................180
INTRODUCTION ..........................................................................................................................................................180
SOLUTION .................................................................................................................................................................180
LIMITATIONS .............................................................................................................................................................182
T-SQL: DATE-RELATED QUERIES ......................................................................................................................... 184
FINDING DAY NUMBER FROM THE BEGINNING OF THE YEAR ...............................................................................................184
FINDING BEGINNING AND ENDING OF THE PREVIOUS MONTH .............................................................................................184
HOW TO FIND VARIOUS DAY, CURRENT WEEK, TWO WEEK, MONTH, QUARTER, HALF YEAR AND YEAR IN SQL
SERVER .............................................................................................................................................................. 185
DATE COMPUTATION ..................................................................................................................................................185
FINDING CURRENT DATE..............................................................................................................................................185
FINDING START DATE AND END DATE OF THE WEEK .........................................................................................................185
FINDING END DATE OF THE WEEK .................................................................................................................................185
FINDING START DATE AND END DATE OF THE TWO WEEKS ................................................................................................186
FINDING START DATE AND END DATE OF THE CURRENT MONTH .........................................................................................186
FINDING START DATE AND END DATE OF THE CURRENT QUATER .........................................................................................187
FINDING START DATE AND END DATE FOR HALF YEAR.......................................................................................................187
FINDING START DATE AND END DATE FOR YEAR ..............................................................................................................188
SQL SERVER: HOW TO FIND THE FIRST AVAILABLE TIMESLOT FOR SCHEDULING................................................ 189
CREATE SAMPLE DATA ................................................................................................................................................189
T-SQL: GROUP BY TIME INTERVAL...................................................................................................................... 191
SIMPLE PROBLEM DEFINITION ......................................................................................................................................191
SOLUTION .................................................................................................................................................................191
COMPLEX PROBLEM DEFINITION AND SOLUTION ..............................................................................................................191
AVOID T (SPACE) WHILE GENERATING XML USING FOR XML CLAUSE ................................................................ 193
GENERATE XML WITH SAME NODE NAMES USING FOR XML PATH .................................................................... 195
GENERATE XML - COLUMN NAMES WITH THEIR VALUES AS TEXT() ENCLOSED WITHIN THEIR COLUMN NAME TAG
.......................................................................................................................................................................... 197
SQL SERVER XML: SORTING DATA IN XML FRAGMENTS ..................................................................................... 198
PROBLEM DEFINITION .................................................................................................................................................198
APPROACHES .............................................................................................................................................................198
PROBLEM SOLUTION ...................................................................................................................................................200
CONCLUSION .............................................................................................................................................................202
TERMINOLOGY ...........................................................................................................................................................202
HOW TO EXTRACT DATA IN XML TO MEET THE REQUIREMENTS OF A SCHEMA ................................................. 203
INTRODUCTION ..........................................................................................................................................................203
PROBLEM .................................................................................................................................................................203
CAUSES ....................................................................................................................................................................204
DIAGNOSTIC STEPS .....................................................................................................................................................204
BUILDING THE SCENARIO OF THE PROBLEM .....................................................................................................................204
SOLUTION .................................................................................................................................................................205
ADDITIONAL INFORMATION ..........................................................................................................................................206
CREDITS ...................................................................................................................................................................206
REFERENCES ..............................................................................................................................................................207
TECHNET LIBRARY ......................................................................................................................................................207
T-SQL SCRIPT TO UPDATE STRING NULL WITH DEFAULT NULL ........................................................................... 209
FIFO INVENTORY PROBLEM - COST OF GOODS SOLD ......................................................................................... 211
DIFFERENT METHODS OF CALCULATING COST OF GOODS SOLD IN THE INVENTORY CALCULATION ...............................................211
IMPLEMENTING FIFO COST OF GOODS SOLD IN OUR APPLICATION.......................................................................................211
CURRENT PROCEDURE TO CALCULATE COST OF GOODS ON HAND ........................................................................................215
FIFO COST OF GOODS SOLD ........................................................................................................................................223
THE COST OF GOODS SOLD FIFO PROCEDURE .................................................................................................................233
SUMMARY ................................................................................................................................................................252
T-SQL: GAPS AND ISLANDS PROBLEM ................................................................................................................ 253
PROBLEM DEFINITION .................................................................................................................................................253
SOLUTION .................................................................................................................................................................253
CRAZY TSQL QUERIES PLAY TIME ....................................................................................................................... 255
BACKGROUND ...........................................................................................................................................................255
PLAYING WITH JOIN & UNION ...................................................................................................................................255
UNION USING JOIN ..................................................................................................................................................255
INNER JOIN USING SUB QUERY ................................................................................................................................256
LEFT JOIN USING SUB QUERY & UNION ...................................................................................................................256
RIGHT JOIN WE CAN QUERY USING LEFT JOIN ..............................................................................................................257
FULL OUTER JOIN USING "LEFT JOIN" UNION "RIGHT JOIN" ....................................................................................257
FULL OUTER JOIN USING SUB QUERY & UNION .......................................................................................................257
PLAYING WITH NULL ..................................................................................................................................................258
ISNULL USING COALESCE .........................................................................................................................................258
COALESCE using ISNULL ................................................................................................................................258
PLAYING WITH CURSOR AND LOOPS ...............................................................................................................................258
CURSOR USING WHILE LOOP (WITHOUT USING CURSOR) ...................................................................................................258
REFERENCES & RESOURCES ..........................................................................................................................................260
REGEX CLASS...................................................................................................................................................... 262
SQL SERVER RESOURCE RE-BALANCING IN FAILOVER CLUSTER .......................................................................... 264
SQL SERVER: CREATE RANDOM STRING USING CLR ........................................................................................... 267
INTRODUCTION ..........................................................................................................................................................267
RESOURCES ...............................................................................................................................................................268
HOW TO COMPARE TWO TABLES DEFINITION / METADATA IN DIFFERENT DATABASES .................................... 270
T-SQL: SCRIPT TO FIND THE NAMES OF STORED PROCEDURES THAT USE DYNAMIC SQL ................................... 272
T-SQL SCRIPT TO GET DETAILED INFORMATION ABOUT INDEX SETTINGS .......................................................... 273
HOW TO CHECK WHEN INDEX WAS LAST REBUILT ............................................................................................. 277
SQL SCRIPT FOR REBUILDING ALL THE TABLES’ INDEXES
..................................................................................................277
HOW TO GENERATE INDEX CREATION SCRIPTS FOR ALL TABLES IN A DATABASE USING T-SQL.......................... 278
T-SQL: FAST CODE FOR RELATIONSHIP WITHIN THE DATABASE ......................................................................... 280
HOW TO CHECK THE SYNTAX OF DYNAMIC SQL BEFORE EXECUTION ................................................................. 281
USING BULK INSERT TO IMPORT INCONSISTENT DATA FORMAT (USING PURE T-SQL) ....................................... 284
INTRODUCTION ..........................................................................................................................................................284
THE PROBLEM............................................................................................................................................................284
OUR CASE STUDY .......................................................................................................................................................284
THE SOLUTION: ..........................................................................................................................................................285
STEP 1: IDENTIFY THE IMPORT FILE FORMAT ...................................................................................................................285
STEP 2: INSERT THE DATA INTO TEMPORARY TABLE ..........................................................................................................290
STEP 3: PARSING THE DATA INTO THE FINAL TABLE ............................................................................................................292
SUMMARY ................................................................................................................................................................293
COMMENTS ..............................................................................................................................................................294
RESOURCES ...............................................................................................................................................................295
CHAPTER 1:
T-SQL Useful Links
T-SQL Useful Links
This article will share collection of links in regards to various aspects in Transact-SQL language. Many of
these links come very handy answering various questions in SQL Server related forums.
Select Top N Rows per Group



Optimizing TOP N per Group Queries - blog by Itzik Ben-Gan explaining various optimization
ideas.
Including an Aggregated Column's Related Values - this blog presents several solutions of the
problem with explanations for each.
Including an Aggregated Column's Related Values - Part 2 - the second blog in the series with
use cases for the previous blog.
Performance Optimization





Speed Up Performance And Slash Your Table Size By 90% By Using Bitwise Logic - interesting
and novel blog by Denis Gobo.
Only In A Database Can You Get 1000% + Improvement By Changing A Few Lines Of Code very impressive blog by Denis Gobo.
Slow in the Application, Fast in SSMS? - comprehensive long article by Erland Sommarskog.
Performance consideration when using a Table Variable - Peter Larsson article.
LEFT JOIN vs NOT EXISTS - performance comparison by Gail Shaw.
EXECUTE vs sp_ExecuteSQL


Avoid Conversions In Execution Plans By Using sp_executesql Instead of Exec - by Denis Gobo.
Changing exec to sp_executesql doesn't provide any benefit if you are not using parameters
correctly - by Denis Gobo.
SQL Server Internals





How SQL Server stores data - by Dmitri Korotkevich.
Inside the Storage Engine: Anatomy of a record - by Paul Randal.
Advanced T-SQL Tuning - Why Internals Knowledge Matters - very interesting article by Paul
White.
Brad's Sure Guide to SQL Storage Compress
Do not use spaces or other invalid characters in your column names - helpful tip by George
Mastros.
Dynamic Search






Do you use ISNULL(...). Don't, it does not perform - short blog by Denis Gobo.
Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later) - long and
comprehensive article by Erland Sommarskog.
Catch All Queries - short blog by Gail Shaw.
Sunday T-SQL tip: How to select data with unknown parameter set - nice blog by Dmitri
Korotkevich.
Relevant MSDN forum's thread
Is this worth the effort - Discussion about NULL integer parameters.
Option Recompile

Option recompile discussion thread
Dates







Dear ISV: You’re Keeping Me Awake Nights with Your VARCHAR() Dates
The ultimate guide to the datetime datatypes - very long and comprehensive article by Tibor
Karaszi.
Bad habits to kick : mis-handling date / range queries - from the Aaron Bertrand Series of Bad
Habits to Kick
Date Range WHERE Clause Simplification - article by Erik E.
Weekly data thread
T-SQL: Date Related Queries - Naomi's TechNet WiKi article.
How to get the first and last day of the Month, Quarter, Year
Calendar Table

Why should I consider a Calendar table?
Gaps and Islands



T-SQL: Gaps and Islands Problem
MSDN Thread with Hunchback solution
Refactoring Ranges - blog by Plamen Ratchev.
Concurrency


Patterns that do not work as expected - by Alex Kuznetsov.
Developing Modifications that Survive Concurrency - very long and interesting article by Alex
Kuznetsov.
Parameter Sniffing

Parameter Sniffing
- blog by Plamen Ratchev.
Cursors



The Truth about Cursors - Part 1
The Truth about Cursors - Part 2
The Truth about Cursors - Part 3
- Series of blogs about cursors by Brad Schulz.
Information about All objects





How to get information about all databases without a loop
How to search a value in all columns in all tables
How to script all stored procedures in a database
Find All Tables With Triggers In SQL Server
Find all Primary and Foreign Keys In A Database
String Manipulations




Handy String Functions - several functions emulating VFP functions by Brad Schulz.
MSDN thread about RegEx in T-SQL
CLR RegEx - interesting series about CLR RegEx
Create Random String - 7 different options including CLR code.
String Split




Arrays & Lists in SQL Server - long article by Erland Sommarskog.
Integer List Splitting
Splitting list of integers - another roundup
Tally OH! An Improved SQL 8K “CSV Splitter” Function - by Jeff Moden.
XML






XML get related tables info
XML Shred Issues
XML Performance
MSDN Thread about XML Update in a loop
SQL Server - (XML,XQUERY,XPATH)
Jacob Sebastian XML Blogs
Concatenate Rows



MSDN thread about concatenating rows
Making a list and checking it twice
Concatenating Rows - Part 1


Concatenating Rows - Part 2
String concatenation techniques
Common Table Expression


CTE and hierarchical queries
CTE: Coolest T-SQL Enhancement
- interesting blog by Brad Schulz.
CTE Performance


Umachandar Jayachandran ideas
MS Feedback Suggestion by Adam Machanic
CTE syntactic sugar


MSDN related thread
Another related thread by Umachandar Jayachandran
CTE versus Temp Table


MSDN Thread by Umachandar Jayachandran
MSDN thread by Adam Haines
PIVOT




Understanding SQL Server 2000 Pivot with Aggregates
Dynamic Pivot on multiple columns
T-SQL: Dynamic Pivot on Multiple Columns
SQL Server Pivot
UNPIVOT


Spotlight on UNPIVOT, Part 1
Spotlight on UNPIVOT, Part 2
Running Total


MSDN thread with many helpful links
Lightning Fast Hybrid RUNNING TOTAL - Can you slow it down?
ASP.NET




Getting the identity of the most recently added record - Mikesdotnetting blog.
How to insert information into multiple related tables and return ID using SQLDataSource
How to Avoid SQL Injection Attack - Long FAQ on ASP.NET forum.
SQL Server 2008 Table-Valued Parameters and C# Custom Iterators: A Match Made In Heaven!
Other Topics
Design Decisions


Surrogate vs. Natural Keys - Quiz question and answers.
DATABASE DESIGN - SURROGATE KEYS: PART 1 OF MANY (RULES FOR SURROGATE KEYS, E. F.
CODD AND C J DATE RESEARCH AND PROBLEMS THEY SOLVE) - very good article by Tony
Rogerson.
Many tables JOIN calculation problem


Aggregates with multiple tables
Question on query with sum using 5 tables
Blocking problems

Blocking sessions script
Structure change problem

Can not change structure
NOT IN problem

Why you should never use IN/NOT IN in SQL
JOIN problem

Why LEFT JOIN doesn't bring records from the LEFT table
Orphans check

Discussion about disabled constraints and finding orphan records
Update Records in batch



Update Records in Batch
BULK INSERT into a table with specific columns
Using Bulk Insert to import inconsistent data format (using pure T-SQL)
UPDATE FROM

Dear FROM clause
Questions and Surveys - random order

Randomize order interesting problem
CHAPTER 2:
CTE
Hierarchical Table Sorting with a Parent-Child Relation
Problem
Given the following table Accounts
AccountID
--------1
2
3
4
5
6
7
8
9
Name ParentID
----------Alex
0
John
1
Mathew
2
Philip
1
Shone
0
Shine
2
Tom
2
George
1
Jim
5
the requirement is to have a query that should sort the table perfectly based on the Child parent
hierarchy. Or more clearly each child must be directly under its parent. Like the below
AccountID
--------1
8
2
3
6
7
4
5
9
Name
---Alex
George
John
Mathew
Shine
Tom
Philip
Shone
Jim
ParentID
-------0
1
1
2
2
2
1
0
5
Think of it as a depth-first search, where the children are sorted in the alphabetical order.
Go as far down the left-most branch as you can, then move one branch to the right. So the children of
John have to be listed before carrying on listing the children of Alex.
Solution
This uses a recursive cte to build the hierarchy, and each level, orders by name. If you leave the [path]
column in the final select, you will see how it has been built up, and this is used to order the final result
set.
declare @Accounts table (AccountID int, name varchar(50), ParentID int)
insert into @Accounts
insert into @Accounts
insert into @Accounts
insert into @Accounts
insert into @Accounts
insert into @Accounts
insert into @Accounts
insert into @Accounts
insert into @Accounts
select 1,'Alex',0
select 2,'John',1
select 3,'Mathew',2
select 4,'Philip',1
select 5,'Shone',0
select 6,'Shine',2
select 7,'Tom',2
select 8,'George',1
select 9,'Jim',5
;with cte as
(
select
Accountid,
name,
parentid,
cast(row_number()over(partition by parentid order by name) as varchar(max))
as [path],
0 as level,
row_number()over(partition by parentid order by name) / power(10.0,0) as x
from @Accounts
where parentid = 0
union all
select
t.AccountID,
t.name,
t.ParentID,
[path] +''+ cast(row_number()over(partition by t.parentid order by t.name) as varchar(max
)),
level+1,
x + row_number()over(partition by t.parentid order by t.name) /
power(10.0,level+1)
from
cte
join @Accounts t on cte.AccountID = t.ParentID
)
select
Accountid,
name,
ParentID,
[path],
x
from cte
order by x
this gives
Accountid
--------1
8
2
3
6
7
4
5
9
name
ParentID
--------- -------Alex
0
George
1
John
1
Mathew
2
Shine
2
Tom
2
Philip
1
Shone
0
Jim
5
path
x
------ -------------------1
1.000000000000000000
1-1
1.100000000000000000
1-2
1.200000000000000000
1-2-1
1.210000000000000000
1-2-2
1.220000000000000000
1-2-3
1.230000000000000000
1-3
1.300000000000000000
2
2.000000000000000000
2-1
2.100000000000000000
The [path] column explains the level the account is in the hierarchy, so for example 'Shine' is 122, which
reading right-to-left means the second child of the second child of the first child, or in other words, the
second child of the second child of Alex => the second child of John.
CHAPTER 3:
Apply Operator
APPLY Operator in SQL Server
Introduction
APPLY operator is a new feature in SQL Server 2005 and TOP has some new enhancements in SQL 2005.
We will discuss these two operators in this article.
APPLY Operators
APPLY operator is a new feature in SQL Server 2005 used in the FROM clause of a query. It allows you to
call a function-returning TABLE for each row of your outer TABLE. We can pass outer table's columns as
function arguments.
It has two options:
1.CROSS APPLY and
2.OUTER APPLY
CROSS APPLY will not return the outer tables row if function table has no row corresponding to it,
whereas OUTER APPLY returns NULL values instead of function columns.
The below query returns all the records of the customer table matching with cust.CustomerID. To
execute the code below, you need to have two database tables listed below with some data in it.
CREATE TABLE Customer(CustomerID INT, CustName VARCHAR(max))
CREATE TABLE Orders(OrderID int IDENTITY(1,1) NOT NULL,
CustomerID int, SalesPersonID int, OrderDate datetime, Amount int)
Using the Code
--Function returning an OUTER query result in a table
CREATE FUNCTION fnGetCustomerInfo (@custid int)
RETURNS TABLE
AS
RETURN
(
--Outer Query
SELECT *
FROM Orders
WHERE customerid = @custid)
--Use APPLY
SELECT * FROM Customer cust
CROSS APPLY
fnGetCustomerInfo(cust.CustomerID)
ORDER BY cust.CustName
TOP Operator
In SQL 2005, TOP is used to restrict the number of rows returned as a number or percentage in SELECT,
UPDATE, DELETE or INSERT statements. Earlier, this was possible only with SELECT query. This enhanced
feature replaces SET ROW COUNT which had performance issues.
Syntax: TOP <literal> or <expression> [PERCENT]
Note: Expression should be of bigint for literal option and float for expression option.
SELECT TOP 10 * FROM Orders
SELECT TOP (SELECT count(*) FROM Customer) * FROM Orders
DECLARE @NOROWS AS FLOAT
SET @NOROWS = 70
SELECT TOP (@NOROWS) PERCENT * FROM Orders
T-SQL: Applying APPLY Operator
This article originates from the following MSDN Transact SQL Forum's question: Complex logic to be
implemented in SQL - Please help! and I hope I made a pun with its title.
In my solution to the problem presented by the thread's originator I am going to show how to
use OUTER APPLY operator to solve common problems.
Problem Description
The problem to be solved was the following:
Given this table:
CREATE TABLE Enrollments (
StudentId INT NOT NULL
,Enroll_Date DATE NOT NULL
,Class VARCHAR(30) NOT NULL
)
ALTER TABLE Enrollments ADD CONSTRAINT PK_Enrollments_StudentID_Enroll_Date PRIM
ARY KEY (
StudentId
,Enroll_Date
)
INSERT INTO Enrollments (
StudentId
,Enroll_Date
,Class
)
VALUES (
1001
,'20130101'
,'Dance'
)
,(
1001
,'20130401'
,'Swimming'
)
,(
1001
,'20130601'
,'Karate'
)
We would need to produce the following output:
Solution
The first idea that comes to mind is that since we would need to expand ranges of dates we would need
a Calendar table with all the months. There are many common date related queries scenarios that
benefit from the permanent Calendar table in each database, as well as a Numbers table. You may want
to check this excellent article explaining why it is important to have such a Calendar table: Why should I
consider a Calendar table? For this particular problem we only need to have one row per each month,
so we can either generate such a table on the fly or select from our existing Calendar table. While
working on this article I discovered that the database I used to create the Enrollments table didn't have
a permanent Calendar table, so I used this quick script to generate it for the purpose of solving the
original problem:
IF OBJECT_ID('tempdb..#Tally', N'U') IS NOT NULL DROP TABLE #Tally;
SELECT TOP 2000000 IDENTITY(INT, 1, 1) AS N
INTO #Tally
FROM Master.dbo.SysColumns sc1
,Master.dbo.SysColumns sc2
CREATE UNIQUE CLUSTERED INDEX cx_Tally_N ON #Tally (N);
SELECT CAST(dateadd(month, N-1, '19000101') AS DATE) AS the_date
INTO dbo.Calendar
FROM #Tally T
WHERE N <= datediff(month, '19000101', '20200101');
So with that script we prepared the Calendar table with one row per each month from 01/01/1900 till
01/12/2019.
With that table in place I can now proceed with solving the problem we wanted to solve.
We need to create the start and end date for each enrollment and then join with the Calendar table to
expand ranges. The start date is obviously the enrollment date and the end date is either the date one
month prior to the next enrollment date for that Student or the first day of the current month.
Therefore I used the obvious idea here which I have used many times in the past for the similar kind of
the problems:
;WITH cte
AS (
SELECT S.StudentId
,S.Enroll_Date AS Start_Date
,COALESCE(DATEADD(month, - 1, N.Enroll_Date), DATEADD(month,
DATEDIFF(month, '19000101',CURRENT_TIMESTAMP), '19000101')) AS End_Date
,S.Class
FROM Enrollments S
OUTER APPLY (
SELECT TOP (1) Enroll_Date
FROM Enrollments E
WHERE E.StudentId = S.StudentId
AND E.Enroll_Date > S.Enroll_Date
ORDER BY Enroll_Date
) N)
SELECT *
FROM cte;
I've added SELECT * FROM cte so we can examine our intermediate result and verify that it is correct
logic.
Now we only need to add a JOIN to Calendar table to get the desired result with expanded ranges:
;WITH cte
AS (
SELECT S.StudentId
,S.Enroll_Date AS Start_Date
,COALESCE(DATEADD(month, - 1, N.Enroll_Date), DATEADD(month,
DATEDIFF(month, '19000101',CURRENT_TIMESTAMP), '19000101')) AS End_Date
,S.Class
FROM Enrollments S
OUTER APPLY (
SELECT TOP (1) Enroll_Date
FROM Enrollments E
WHERE E.StudentId = S.StudentId
AND E.Enroll_Date > S.Enroll_Date
ORDER BY Enroll_Date
) N)
SELECT S.StudentId, Cal.the_date AS Enroll_Date, S.Class
FROM cte S INNER JOIN dbo.Calendar
Cal ON Cal.the_date BETWEEN S.Start_Date AND S.End_Date;
SQL Server 2012 Solution
SQL Server 2012 and up offers a simpler alternative to the OUTER APPLY solution. In SQL Server 2012
the LEAD() and LAG() functions were introduced that allow us to avoid correlated subquery and
transform that solution into this code:
;WITH cte
AS (
SELECT S.StudentId
,S.Enroll_Date AS Start_Date
,DATEADD(month, -1,LEAD(S.Enroll_Date, 1, DATEADD(day, 1,
EOMONTH(CURRENT_TIMESTAMP))) OVER
(PARTITION BY S.StudentId ORDER BY S.Enroll_Date)) AS End_Date
,S.Class
FROM Enrollments S
)
SELECT S.StudentId, Cal.the_date AS Enroll_Date, S.Class
FROM cte S INNER JOIN dbo.Calendar
Cal ON Cal.the_date BETWEEN S.Start_Date AND S.End_Date;
In this solution I also used the new EOMONTH() function in order to advance one month from the
current month for the default value in the LEAD function. Then we're subtracting one month from that
expression as a whole.
Conclusion
In this article we learned how to apply simple T-SQL tricks to solve a problem. We learned 2 solutions one which only works in SQL Server 2012 and above and another solution that may be used in prior
versions of SQL Server.
Addendum
Today's Transact SQL Server MSDN Forum post "Dynamic Columns with some additional logic" is an
interesting continuation of this article theme and also my other T-SQL: Dynamic Pivot on Multiple
Columns article. In my reply to the thread's originator I hinted the possible solution using the ideas from
both articles. Please leave a comment to this article if you want that case to become a new article or
part of this article.
Fixing Missing Data Based on Prior Row Information
One of the commonly asked problems in the Transact-SQL forum is how to provide missing
information based on the information in the first prior row that has data (or alternatively in the next row
(by date)). One of the examples where this problem was discussed is this thread .
In this thread the original poster was kind enough to provide DDL
was easy to define a solution based on the OUTER APPLY :
and the DML
(data sample), so it
CREATE TABLE [dbo].[test_assign] (
[name] [varchar](25) NULL
,[datestart] [date] NULL
,[dateEnd] [date] NULL
,[assign_id] [int] IDENTITY(1, 1) NOT NULL
,CONSTRAINT [PK_test_assign] PRIMARY KEY CLUSTERED ([assign_id] ASC) WITH (
PAD_INDEX = OFF
,STATISTICS_NORECOMPUTE = OFF
,IGNORE_DUP_KEY = OFF
,ALLOW_ROW_LOCKS = ON
,ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[test_measure] (
[name] [varchar](25) NULL
,[measurementDate] [date] NULL
,[measure_id] [int] IDENTITY(1, 1) NOT NULL
,CONSTRAINT [PK_test_measure] PRIMARY KEY CLUSTERED
([measure_id] ASC) WITH (
PAD_INDEX = OFF
,STATISTICS_NORECOMPUTE = OFF
,IGNORE_DUP_KEY = OFF
,ALLOW_ROW_LOCKS = ON
,ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO Test_Measure (
NAME
,Measurementdate
)
SELECT 'Adam'
,'1/1/2001'
INSERT INTO Test_Measure (
NAME
,Measurementdate
)
SELECT 'Adam'
,'2/2/2002'
INSERT INTO Test_assign (
NAME
,DateStart
,DateEnd
)
SELECT 'Adam'
,'1/15/2001'
,'12/31/2001'
INSERT INTO Test_assign (
NAME
,DateStart
,DateEnd
)
SELECT 'Adam'
,'2/15/2002'
,'12/31/2002'
INSERT INTO Test_assign (
NAME
,DateStart
,DateEnd
)
SELECT 'Adam'
,'3/15/2003'
,'12/31/2003'
-- Solution starts now
SELECT TA.*
,M.MeasurementDate
FROM Test_Assign TA
OUTER APPLY (
SELECT TOP (1) *
FROM Test_Measure TM
WHERE TM.NAME = TA.NAME
AND TM.MeasurementDate <= TA.Datestart
ORDER BY TM.MeasurementDate DESC
) M
The idea of this solution is to use correlated OUTER APPLY subquery to get first measurement date that
is prior the Start date of the main table.
A similar problem is also described in this thread and the solution will also be a variation of CROSS
APPLY solution. So, you can see that this problem is very common.
CHAPTER 4:
Pivot
SQL Server PIVOT
Problem Definition
Recently in this thread I helped to solve a relatively simple problem. I will quote my solution and then
I will explain the main problem people often encounter with PIVOT solutions
;WITH CTE_STC_DETAIL_CODES
AS
(
SELECT
[Code_V_2].[CODE_CAT],
[Code_V_2].[DESCRIPTION]
FROM [dbo].[STC_Detail]
INNER JOIN [STC_Header_V_2]
ON [STC_Header_V_2].[STCID]
INNER JOIN [STC_Code]
ON [STC_Code].[STCDTLID]
INNER JOIN [Code_V_2]
ON [Code_V_2].[CodeID]
WHERE [STC_Header_V_2].[STC]
=
[STC_Detail].[STCID]
=
[STC_Detail].[STCDTLID]
=
[STC_Code].[CodeID]
'33 '
=
)
SELECT [STCDTLID],
[SN]
[NT]
[CV]
[TQ]
[AI]
[CS]
[IC]
Code',
[QQ]
FROM CTE_STC_DETAIL_CODES
PIVOT
(
MAX([DESCRIPTION])
FOR CODE_CAT
(
[SN],
[NT],
[CV],
[TQ],
[AI],
[CS],
[IC],
[QQ]
)) AS Pvt
AS
AS
AS
AS
AS
AS
AS
'Sub Net',
'Network Indicator',
'Coverage Level',
'Time Period Qualifier',
'Authorization Indicator',
'Cost Share Type',
'Insurance Certificate
AS 'Quantity Qualifier Code'
IN
Common Problem
The pivot solution by itself is not complex, it is a simple static PIVOT. But the thread originator was
having a problem arriving to it. The main problem is to understand, that all columns which are not
mentioned in the PIVOT aggregate function in the PIVOT clause will be aggregated, so if there is a
column with unique values in the source table for the pivot and it is not mentioned in the PIVOT clause,
it will be a source of the aggregation and therefore the result will have as many rows as you have unique
columns in the table defeating the main purpose of the PIVOT.
This is something I wanted to emphasize.
Other Blogs
There are two blog posts that may help understanding PIVOT better:
Understanding SQL Server 2000 Pivot with Aggregates
by George Mastros
and also my own blog post which is a bit advanced
Dynamic PIVOT on multiple columns
T-SQL: Display Horizontal Rows Vertically
This article is an outcome of my answer to this
question on MSDN forum.
Consider this scenario:
Table 1:
DEPARTMENTEMPID ENAME SALARY
A/C
1
TEST1
2000
SALES
2
TEST2
3000
Table 2:
ColumnName 1
DEPARTMENT A/C
EMPID
1
ENAME
TEST1
SALARY
2000
2
SALES
2
TEST2
3000
If we are required to transform resultset in Table1 format to Table2 format:
How to display dynamically horizontal rows vertically
To display dynamically horizontal rows vertically, I have used the technique of dynamic unpivoting (using
Xquery and nodes() method )
and then dynamic pivoting .
Below code block will transform resultset in Table1 format to Table2 format.
DECLARE @EMPLOYEE TABLE (DEPARTMENT VARCHAR(20),EMPID INT,ENAME VARCHAR(20),SAL
ARY INT)
INSERT @EMPLOYEE SELECT 'A/C',01,'TEST1',2000
INSERT @EMPLOYEE SELECT 'SALES',02,'TEST2',3000
SELECT * FROM @EMPLOYEE
DECLARE @Xmldata XML = (SELECT * FROM @EMPLOYEE FOR XML PATH('') )
--Dynamic unpivoting
SELECT * INTO ##temp FROM (
SELECT
ROW_NUMBER()OVER(PARTITION BY ColumnName ORDER BY ColumnValue) rn,* FROM (
SELECT i.value('local-name(.)','varchar(100)') ColumnName,
i.value('.','varchar(100)') ColumnValue
FROM @xmldata.nodes('//*[text()]') x(i) ) tmp ) tmp1
--SELECT * FROM ##temp
--Dynamic pivoting
DECLARE @Columns NVARCHAR(MAX),@query NVARCHAR(MAX)
SELECT @Columns = STUFF(
(SELECT ', ' +QUOTENAME(CONVERT(VARCHAR,rn)) FROM
(SELECT DISTINCT rn FROM ##temp ) AS T FOR XML PATH('')),1,2,'')
SET @query = N'
SELECT ColumnName,' + @Columns + '
FROM
(
SELECT * FROM ##temp
) i
PIVOT
(
MAX(ColumnValue) FOR rn IN ('
+ @Columns
+ ')
) j ;';
EXEC (@query)
--PRINT @query
DROP TABLE ##temp
T-SQL: Dynamic Pivot on Multiple Columns
How to make a dynamic PIVOT on multiple columns
The problem of transposing rows into columns is one of the most common problems discussed in MSDN
Transact-SQL forum . Many times the problem of creating a dynamic pivot comes into the light. One
thing that many people who ask this question forget is that such transposing is much easier to perform
on the client side than on the server where we need to resort to dynamic query. However, if we want to
make such pivot dynamically, the important thing to understand is that writing dynamic query is only
slightly more difficult than writing static query. In fact, when I am presented with the problem of
dynamic pivot, I first figure out how static query should look like. Then making such query dynamically
becomes rather trivial task.
I had written on the topic of dynamic pivot on multiple columns before in this blog post: Dynamic PIVOT
on multiple columns .
I don't want to re-tell what I already told in that blog so this article will show another example from the
most recent thread on the topic of dynamic pivot.
In that thread I presented the following solution to the problem of dynamic pivot for unknown number
of columns
USE tempdb
CREATE TABLE tblTest (
Id INT
,Col_1 INT
)
INSERT INTO tblTest
VALUES (
1
,12345
)
,(
1
,23456
)
,(
1
,45678
)
,(
2
,57823
)
,(
2
,11111
)
,(
2
,34304
)
,(
2
,12344
)
DECLARE @MaxCount INT;
SELECT @MaxCount = max(cnt)
FROM (
SELECT Id
,count(Col_1) AS cnt
FROM tblTest
GROUP BY Id
) X;
DECLARE @SQL NVARCHAR(max)
,@i INT;
SET @i = 0;
SET @SQL = '';
WHILE @i < @MaxCount
BEGIN
SET @i = @i + 1;
SET @SQL = @Sql + ',
MAX(CASE WHEN RowNo = ' + cast(@i AS NVARCHAR(10)) + ' THEN
Col' + cast(@i ASNVARCHAR(10));
END
Col_1 END) AS
SET @SQL = N';WITH CTE AS (
SELECT ID, Col_1, row_number() OVER (PARTITION BY ID ORDER BY Col_1) AS
rowno
FROM
tblTest
)
SELECT ID ' + @SQL + N'
FROM
CTE
GROUP BY ID';
PRINT @SQL;
EXECUTE (@SQL);
In this solution the first step was figuring out the static solution using ROW_NUMBER() with partition
approach. This is CASE based pivot although we could have used the true PIVOT syntax here instead.
CASE based pivot is easier to use if we need to transpose multiple columns. Once we knew the static
pivot, we were able to easily turn it into dynamic using WHILE loop.
Just for completion, I also show the same problem solved using PIVOT syntax:
DECLARE @MaxCount INT;
SELECT @MaxCount = max(cnt)
FROM (
SELECT Id
,count(Col_1) AS cnt
FROM tblTest
GROUP BY Id
) X;
DECLARE @SQL NVARCHAR(max)
,@i INT;
SET @i = 0;
WHILE @i < @MaxCount
BEGIN
SET @i = @i + 1;
SET @SQL = COALESCE(@Sql + ', ', '') + 'Col' + cast(@i AS NVARCHAR(10));
END
SET @SQL = N';WITH CTE AS (
SELECT ID, Col_1, ''Col'' + CAST(row_number() OVER (PARTITION BY ID ORDER
BY Col_1) AS Varchar(10)) AS RowNo
FROM
tblTest
)
SELECT *
FROM
CTE
PIVOT (MAX(Col_1) FOR RowNo IN (' + @SQL + N')) pvt';
PRINT @SQL;
EXECUTE (@SQL);
As you see, the code is very similar to the first solution, but using PIVOT syntax instead of CASE based
pivot.
I hope to add more samples to this article as new opportunities present themselves.
There was another recent question about dynamic PIVOT where this article solution was right on target.
This entry participated in the Technology Guru TechNet WiKi for May contest and won
Additional Resources
Dynamic PIVOT on multiple columns
the Gold prize.
T-SQL: Create Report for Last 10 Years of Data
Recently in the MSDN Transact-SQL forum thread Please help with dynamic pivot query/ CTE/
SSRS I provided a solution for a very common scenario of generating report for last N (10 in that
particular case) years (months, days, hours, etc.) of data.
Problem Definition
In the course of the thread the topic starter has provided the following definitions of the tables:
CREATE TABLE [dbo].[_Records](
[ID] [varchar](255) NULL,
[FirstName] [varchar](255) NULL,
[LastName] [varchar](255) NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[_Records]
([ID], [FirstName], [LastName])
VALUES
('1', 'A1', 'B1'),
('2', 'A2', 'B2'),
('3', 'A3', 'B3'),
('4', 'A4', 'B4'),
('5', 'A5', 'B5')
GO
CREATE TABLE [dbo].[_RecordDetails](
[RecordID] [varchar](255) NULL,
[Address] [varchar](255) NULL,
[Phone] [varchar](255) NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[_RecordDetails]
([RecordID]
,[Address]
,[Phone])
VALUES
('1', 'Add1', 'P1'),
('2', 'Add2', 'P2'),
('3', 'Add3', 'P3'),
('4', 'Add4', 'P4'),
('5', 'Add5', 'P5')
GO
CREATE TABLE [dbo].[_Money](
[RecordID] [varchar](255) NULL,
[Date] [varchar](255) NULL,
[Amount] [numeric](20, 4) NOT NULL,
) ON [PRIMARY]
GO
INSERT INTO [dbo].[_Money]
([RecordID]
,[Date]
,[Amount])
VALUES
('1', '1/1/2004', '5'),
('1', '2/1/2004', '10'),
('1', '4/1/2006', '4'),
('1', '6/1/2007', '6'),
('1', '3/1/2010', '8'),
('2', '3/1/2004', '4'),
('2', '4/1/2004', '6'),
('2', '5/1/2005', '7'),
('2', '6/1/2011', '8'),
('3', '1/1/2005', '5'),
('3', '2/1/2005', '10'),
('3', '3/1/2007', '4'),
('3', '4/1/2008', '6'),
('3', '5/1/2008', '8'),
('3', '6/1/2009', '4'),
('3', '7/1/2012', '6'),
('3', '8/1/2012', '7'),
('3', '9/1/2012', '8'),
('4', '1/1/2006', '5'),
('4', '2/1/2006', '10'),
('4', '3/1/2008', '4'),
('4', '4/1/2008', '6'),
('4', '5/1/2008', '8'),
('4', '6/1/2010', '4'),
('4', '7/1/2011', '6'),
('4', '8/1/2011', '7'),
('4', '9/1/2011', '8'),
('4', '10/1/2012', '5'),
('4', '11/1/2012', '10'),
('4', '7/1/2013', '4'),
('4', '8/1/2013', '6'),
('4', '9/1/2013', '8'),
('5', '4/1/2008', '4'),
('5', '6/1/2010', '6'),
('5', '6/1/2011', '7'),
('5', '7/1/2011', '8'),
('5', '8/1/2011', '5'),
('5', '9/1/2012', '10'),
('5', '10/1/2012', '4'),
('5', '11/1/2013', '6'),
('5', '7/1/2013', '8'),
('5', '8/1/2013', '4'),
('5', '9/1/2013', '6'),
('5', '10/1/2013', '7'),
('5', '11/1/2013', '8')
GO
Given these 3 tables with data we wanted the following output (click on the link below to view it in a
separate window):
Solution
The idea here is to use the dynamic PIVOT. To generate last 10 years of data I am going to use a loop.
The reason I am using a simple direct loop instead of more commonly used scenario of querying the
table and generating column list using XML path solution, is that
1) In theory we may have missing data in our table (this is more theoretical situation with years, but not
uncommon with months or days)
2) Direct loop allows us to be more flexible and add more columns, if needed. Say, it is easy to adjust
solution to not only show year column, but also a percent difference between this year and prior year,
for example.
DECLARE @StartDate DATETIME
,@EndDate DATETIME
SET @StartDate = dateadd(year, - 10 +
datediff(year, '19000101', CURRENT_TIMESTAMP), '19000101');
SET @EndDate = dateadd(year, 11, @StartDate); -- this is used as an open end
range date, thus I am adding 11 years
DECLARE @Columns NVARCHAR(max)
,@Year INT;
SET @Columns = '';
SET @Year = datepart(year, CURRENT_TIMESTAMP) - 10; -- starting year
WHILE @year <= datepart(year, CURRENT_TIMESTAMP)
BEGIN
SET @Columns = @Columns + ', ' + quotename(cast(@year AS NVARCHAR(max)))
SET @year = @year + 1;
END
SET @Columns = STUFF(@Columns, 1, 2, '');
--SELECT @Columns;
declare @SQL nvarchar(max);
SET @SQL = ';WITH CTE AS (SELECT R.[ID], R.[FirstName], R.[LastName],
RD.Address, RD.Phone
FROM dbo._Records R LEFT JOIN dbo._RecordDetails RD on R.ID = RD.RecordID),
cte2 AS
(SELECT cte.ID, cte.FirstName, cte.LastName, cte.Address, cte.Phone,
M.RecordID, datepart(year,M.[Date]) as yDate, M.Amount
FROM CTE INNER JOIN dbo._Money M ON cte.ID = M.RecordID
WHERE M.[Date] >=@StartDate and M.Date < @EndDate)
SELECT * FROM cte2 PIVOT (SUM(Amount) FOR yDate IN (' + @Columns + ')) pvt'
execute sp_ExecuteSQL @SQL, N'@StartDate datetime, @EndDate datetime',
@StartDate, @EndDate
So, you can see we used dynamic PIVOT to generate desired output and then sp_ExecuteSQL system
stored procedure to run our query with 2 date parameters.
Conclusion
I showed how easily we can generate a report for last N years (months, days, hours) of data and how
easily we can add more columns to the output using direct loop solution.
CHAPTER 5:
Relational Devision
T-SQL: Relational Division
In this article I am going to discuss one of the problems of relational algebra which was recently
brought up in this Transact-SQL MSDN Forum T-sql - finding all sales orders that have similar
products.
Introduction
There are certain kind of problems in relational database which may be solved using principals from
Relational Division. There are many articles in the Internet about Relational Division or Relational
Algebra. I list just a few very interesting articles Divided We Stand: The SQL of Relational
Division by Celko and Relational division by Peter Larsson and suggest readers to take a look
at them and other articles on this topic. Peter also pointed me out to this new and very interesting
article Relationally Divided over EAV which I am going to study in next couple of days.
Problem Definition
In the aforementioned thread the topic starter first wanted to find all orders that have similar products.
He provided the table definition along with few rows of data.
Rather than using data from that thread I want to consider the same problem but
using AdventureWorks database instead. So, I'll first show a solution for the problem of finding orders
that have the same products.
Solutions
This problem has several solutions. First two are the true relational division solutions and the last
solution is non-portable T-SQL only solution based on the de-normalization of the table. The first
solution in that script was suggested by Peter Larsson after I asked him to check this article. I'll post the
script I ran to compare all three solutions:
USE AdventureWorks2012;
SET NOCOUNT ON;
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT 'PESO Solution';
SELECT t1.SalesOrderID AS OrderID
,t2.SalesOrderID AS SimilarOrderID
FROM (
SELECT SalesOrderID
,COUNT(*) AS Items
,MIN(ProductID) AS minProdID
,MAX(ProductID) AS maxProdID
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
) AS v
INNER JOIN Sales.SalesOrderDetail AS t1 ON t1.SalesOrderID = v.SalesOrderID
INNER JOIN Sales.SalesOrderDetail AS t2 ON t2.ProductID = t1.ProductID
AND t2.SalesOrderID > t1.SalesOrderID
INNER JOIN (
SELECT SalesOrderID
,COUNT(*) AS Items
,MIN(ProductID) AS minProdID
,MAX(ProductID) AS maxProdID
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
) AS w ON w.SalesOrderID = t2.SalesOrderID
WHERE w.minProdID = v.minProdID
AND w.maxProdID = v.maxProdID
AND w.Items = v.Items
GROUP BY t1.SalesOrderID
,t2.SalesOrderID
HAVING COUNT(*) = MIN(v.Items);
PRINT 'Common Relational Division /CELKO/Naomi solution';
SELECT O1.SalesOrderId AS OrderID
,O2.SalesOrderID AS SimilarOrderID
FROM Sales.SalesOrderDetail O1
INNER JOIN Sales.SalesOrderDetail O2 ON O1.ProductID = O2.ProductID
AND O1.SalesOrderID < O2.SalesOrderID
GROUP BY O1.SalesOrderID
,O2.SalesOrderID
HAVING COUNT(O1.ProductID) = (
SELECT COUNT(ProductID)
FROM Sales.SalesOrderDetail SD1
WHERE SD1.SalesOrderID = O1.SalesOrderID
)
AND COUNT(O2.ProductID) = (
SELECT COUNT(ProductID)
FROM Sales.SalesOrderDetail SD2
WHERE SD2.SalesOrderID = O2.SalesOrderID
);
PRINT 'XML PATH de-normalization solution';
WITH cte
AS (
SELECT SalesOrderID
,STUFF((
SELECT ', ' + CAST(ProductID AS VARCHAR(30))
FROM Sales.SalesOrderDetail SD1
WHERE SD1.SalesOrderID = SD.SalesOrderID
ORDER BY ProductID
FOR XML PATH('')
), 1, 2, '') AS Products
FROM Sales.SalesOrderDetail SD
GROUP BY SD.SalesOrderID
)
SELECT cte.SalesOrderID AS OrderID
,cte1.SalesOrderID AS SimilarOrderID
,cte.Products
FROM cte
INNER JOIN cte AS cte1 ON cte.SalesOrderID < cte1.SalesOrderID
AND cte.Products = cte1.Products;
SET STATISTICS IO OFF;
First solution JOINS with the number of items in each order and MIN/MAX product in each order. This
solution is based on the idea Peter proposed in this closed MS Connect Item Move T-SQL language closer
to completion with a DIVIDE BY operator .
The second solution self-joins the table based on the ProductID using an extra condition of O1.OrderID <
O2.Order2 (we're using < instead of <> in order to avoid opposite combinations), then groups by both
OrderID columns and uses HAVING clause to make sure the number of products is the same as the
number of products in each individual order. This HAVING idea is very typical for the Relational Division
problem.
Interestingly, the number of combinations in AdventureWorks database is 1,062, 238 (more than rows in
the SalesOrderDetail table itself). This is due to the fact that many orders consist of only single product.
The last solution is rather straightforward and uses XML PATH approach to get all products in one row
for each order ID, then self-joins based on this new Products column. This solution is not portable into
other relational database languages but specific for T-SQL only. Interestingly, it performs better than
second 'true' Relational Division solution as you can see in this picture.
As you can see, the first query takes 0%, second 60% while the last takes 40% of the execution time.
The last solution, however, is also not very flexible and is only suitable for finding exact matches.
These are results I got on SQL Server 2012 SP1 64 bit (they are much better on SQL Server 2014 CTP
according to Peter):
PESO Solution
Table 'SalesOrderDetail'. Scan count 3410626, logical reads 7265595, physical reads 0, read-ahead reads
0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 855922, logical reads 3462746, physical reads 0, read-ahead reads 0, lob
logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 35272 ms, elapsed time = 114920 ms.
Common Relational Division /CELKO/Naomi solution
Table 'SalesOrderDetail'. Scan count 36, logical reads 3292, physical reads 0, read-ahead reads 0, lob
logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 266, logical reads 907592, physical reads 0, read-ahead reads 0, lob logical
reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 478703 ms, elapsed time = 214748 ms.
XML PATH de-normalization solution
Table 'Worktable'. Scan count 0, logical reads 12971, physical reads 0, read-ahead reads 0, lob logical
reads 8764, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 62932, logical reads 194266, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5054 ms, elapsed time = 14069 ms.
Best Exact Match Solution
Peter sent yet another variation of the solution for the integer Product ID (this solution will not work if
the product ID /Item ID uses character or GUID key).
SELECT
FROM
v.SalesOrderID AS OrderID,
w.SalesOrderID AS SimilarOrderID,
v.Items
(
SELECT
SalesOrderID,
COUNT(ProductID
) AS Items,
MIN(ProductID)
AS minProdID,
MAX(ProductID)
AS maxProdID,
SUM(ProductID)
AS sumProdID,
CHECKSUM_AGG(10
000 * ProductID) AS cs
FROM
GROUP BY
Sales.SalesOrderDetail
SalesOrderID
) AS v
INNER JOIN (
SELECT
SalesOrderID,
COUNT(ProductID
) AS Items,
MIN(ProductID)
AS minProdID,
MAX(ProductID)
AS maxProdID,
SUM(ProductID)
AS sumProdID,
CHECKSUM_AGG(10
000 * ProductID) AS cs
FROM
Sales.SalesOrderDetail
GROUP BY SalesOrderID
) AS w ON w.Items = v.Items
AND w.minProdID = v.minProdID
AND w.maxProdID = v.maxProdID
WHERE
AND w.cs = v.cs
AND w.sumProdID = v.sumProdID
w.SalesOrderID > v.SalesOrderID
This solution joins 2 aggregate information together based on CHECKSUM_AGG function. By checking
all these aggregate functions it is enough to conclude if the orders consist of the same products or not.
This is the simplest and ingenious query and it performs the best among the other variations I tried. The
limitation of this query is that it assumes integer key for the product id.
I got the following results for this solution:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 2, logical reads 2492, physical reads 0, read-ahead reads 0, lob
logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 562 ms, elapsed time = 9791 ms.
Slight Variation of the Original Problem
In that thread the topic starter also wanted to compare orders based on partial similarity. You may
recognize this problem as 'Customers who bought this item also bought...' as you often can see in
different websites.
Say, we want to find orders, that have 2/3 or more of the products matching. We will only consider
orders with more than 2 items (3 and up) for this problem. The first solution can be easily adjusted for
this new problem:
WITH cte
AS (
SELECT SalesOrderID
,ProductID
,COUNT(ProductID) OVER (PARTITION BY SalesOrderID) AS ProductsCount
FROM Sales.SalesOrderDetail
)
SELECT O1.SalesOrderId AS OrderID
,O2.SalesOrderID AS SimilarOrderID
FROM cte O1
INNER JOIN cte O2 ON O1.ProductID = O2.ProductID
AND O1.SalesOrderID < O2.SalesOrderID
WHERE O1.ProductsCount > = 3
AND O2.ProductsCount >= 3
GROUP BY O1.SalesOrderID
,O2.SalesOrderID
HAVING COUNT(O1.ProductID) >= (
(
SELECT COUNT(ProductID)
FROM Sales.SalesOrderDetail SD1
WHERE SD1.SalesOrderID = O1.SalesOrderID
) * 2.0
) / 3.0
AND COUNT(O2.ProductID) >= (
(
SELECT COUNT(ProductID)
FROM Sales.SalesOrderDetail SD2
WHERE SD2.SalesOrderID = O2.SalesOrderID
) * 2.0
) / 3.0
ORDER BY OrderID
,SimilarOrderID;
We can verify our results back for the few first rows:
SELECT SalesOrderID
,stuff((
SELECT ', ' + cast(ProductID AS VARCHAR(30))
FROM Sales.SalesOrderDetail SD1
WHERE SD1.SalesOrderID = SD.SalesOrderID
ORDER BY ProductID
FOR XML PATH('')
), 1, 2, '') AS Products
FROM Sales.SalesOrderDetail SD
WHERE SalesOrderID IN (
43659
,43913,
43659,
43659,
43659,
43659,
44566,
44761,
46077)
44528,
GROUP BY SalesOrderID
ORDER BY SalesOrderID
I will show two variations of the solutions for the similar orders problem. While I am getting better reads
for the second query, the execution time is much better for the first query:
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
DECLARE @Percentage DECIMAL(10, 2);
SET @Percentage = 0.75;
WITH cte
AS (
SELECT SalesOrderID
,ProductID
,COUNT(ProductID) OVER (PARTITION BY SalesOrderID) AS ProductsCount
FROM Sales.SalesOrderDetail
)
SELECT O1.SalesOrderId AS OrderID
,O2.SalesOrderID AS SimilarOrderID
FROM cte O1
INNER JOIN cte O2 ON O1.ProductID = O2.ProductID
AND O1.SalesOrderID < O2.SalesOrderID
WHERE O1.ProductsCount > = 3
AND O2.ProductsCount >= 3
GROUP BY O1.SalesOrderID
,O2.SalesOrderID
HAVING COUNT(O1.ProductID) >= (
SELECT COUNT(ProductID)
FROM Sales.SalesOrderDetail SD1
WHERE SD1.SalesOrderID = O1.SalesOrderID
) * @Percentage
AND COUNT(O2.ProductID) >= (
SELECT COUNT(ProductID)
FROM Sales.SalesOrderDetail SD2
WHERE SD2.SalesOrderID = O2.SalesOrderID
) * @Percentage
ORDER BY OrderID
,SimilarOrderID;
WITH cte
AS (
SELECT SalesOrderID
,COUNT(ProductID) AS Items
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
)
SELECT O1.SalesOrderId AS OrderID
,MIN(C1.Items) AS [Products 1]
,O2.SalesOrderID AS SimilarOrderID
,MIN(C2.Items) AS [Products 2]
FROM Sales.SalesOrderDetail O1
INNER JOIN cte C1 ON O1.SalesOrderID = C1.SalesOrderID
INNER JOIN Sales.SalesOrderDetail O2 ON O1.ProductID = O2.ProductID
AND O1.SalesOrderID < O2.SalesOrderID
INNER JOIN cte C2 ON O2.SalesOrderID = C2.SalesOrderID
GROUP BY O1.SalesOrderID
,O2.SalesOrderID
HAVING COUNT(*) >= MIN(C1.Items) * @Percentage
AND COUNT(*) >= MIN(C2.Items) * @Percentage
AND MIN(C1.Items)>=3 AND MIN(C2.Items) >=3
ORDER BY OrderID
,SimilarOrderID;
I will be interested in your results and ideas for this type of the query for partial (percentage) match.
Conclusion
In this article I showed that some common relational division problems can be solved using set-based
solutions. These solutions may not perform well, however, on the big datasets. I encourage the readers
to provide their ideas for the listed problems and their Pros/Cons.
CHAPTER 6:
SQL Server in general
Microsoft SQL Server 2012 New Functions
EOMONTH
We had a problem whenever we wanted to identify the end date of a month. There was no built in
function. But now that problem is solved in SQL Server 2012. The function EOMONTH returns the date
of the month.
SELECT EOMONTH('05/02/2012') AS 'EOM Processing Date'
Output: 2012-02-29
You can specify a number of months in the past or future with the EOMONTH function.
SELECT EOMONTH ( Getdate(), -1 ) AS 'Last Month'
Output: 2012-01-31
CHOOSE
Using this to select a specific item from a list of values.
SELECT CHOOSE ( 4, 'CTO', 'GM', 'DGM', 'AGM', ’Manager’ )
Output: AGM
CONCAT
This function is concatenating two or more strings
SELECT CONCAT( emp_name,'Joining Date', joingdate)
Output: Rahman Joining Date 01/12/2001
LAST_VALUE and FIRST_VALUE
Using the function you can last value among the set of ordered values according to specified ordered &
partitioned criteria. First value return the first value in an ordered set of values.
INSERT INTO result(Department ,ID ,Marks ) VALUES (1,103,70), (1,104,58)
(2,203,65) (2,201,85)
SELECT Department,Id ,Marks, LAST_VALUE(Marks) over
(Partition by Department ORDER BY Marks) AS
'Marks Sequence' ,FIRST_VALUE (Marks) over
(Partition by Department ORDER BY Marks) as ‘First value’
FROM result
OutPut
Department Id
1
104
1
103
2
203
2
201
Marks
58
70
65
85
Marks Sequence
58
70
65
85
First value
58
58
65
65
LEAD
Using the function you can access data from a subsequent row in the same result set without the use of
a self-join.
SELECT EntityID, YEAR(QuotaDate) AS SalesYear, SalesQuota AS CurrentQuota,
LEAD(SalesQuota, 1,0) OVER (ORDER BY YEAR(QuotaDate)) AS PreviousQuota
FROM Sales.SalesPersonQuotaHistory
WHERE BusinessEntityID = 275 and YEAR(QuotaDate) IN ('2005','2006');
OutPut
EntityID
SalesYear CurrentQuota
PreviousQuota
---------------- ----------- --------------------- --------------------275
2005
367000.00
556000.00
275
2005
556000.00
502000.00
275
2006
502000.00
550000.00
275
2006
550000.00
1429000.00
275
2006
1429000.00
1324000.00
275
2006
1324000.00
0.00
File Group Enhancement
A FILESTREAM filegroup can contain more than one file.
EOMONTH() Function Usage in SQL Server 2012 and On
The EOMONTH() function is new in SQL Server 2012.
BOL link: http://technet.microsoft.com/en-us/library/hh213020.aspx
In the previous version (SQL Server 2008), a popular albeit obscure way to get the end of the month:
SELECT CONVERT(DATE, dateadd(mm, datediff(mm,0, current_timestamp)+1,-1));
-- 2013-06-30
Using the new function which returns DATE:
SELECT EOMONTH(current_timestamp);
-- 2013-06-30
We can add an optional parameter to get the end date for other months:
SELECT EOMONTH(current_timestamp, +1); -- 2013-07-31
SELECT EOMONTH(current_timestamp, -1); -- 2013-05-31
Using a dynamic parameter, we can get the last day of previous year:
SELECT EOMONTH(current_timestamp, -MONTH(current_timestamp)); -- 2012-12-31
Applying the DATEADD function we can obtain the first day of current year:
SELECT DATEADD(DD, 1, EOMONTH(current_timestamp, MONTH(current_timestamp))); -- 2013-01-01
Applying the DATEDIFF function we can calculate today's Julian date:
SELECT DATEDIFF(DD, EOMONTH(current_timestamp, MONTH(current_timestamp)), current_timestamp);
-- 163
The first parameter can be local variable:
DECLARE @dt date = current_timestamp;
SELECT EOMONTH(@dt, -1);
-- 2013-05-31
We can use EOMONTH() in a query:
SELECT SalesOrderID, OrderDate, EOMONTH(OrderDate) AS MonthEnd
FROM Sales.SalesOrderHeader ORDER BY OrderDate, SalesOrderID;
/*
SalesOrderID OrderDate MonthEnd
....
43841 2005-07-31 00:00:00.000 2005-07-31
43842 2005-07-31 00:00:00.000 2005-07-31
43843 2005-08-01 00:00:00.000 2005-08-31
43844 2005-08-01 00:00:00.000 2005-08-31
....
*/
How SQL Server Determines Type of the Constant
Problem Definition
There was an interesting question asked recently in Transact-SQL forum
function" .
"Basic doubt in Round
The problem was stated as following:
SELECT ROUND(744, -3)
produced 1000 while
SELECT ROUND(744.0, -3)
gave an error "Arithmetic overflow error converting expression to data type numeric."
Explanation
So, what is happening here? Why we're getting this error? The explanation lies in the way SQL Server
determines the type of the constant. In this particular case it figures that it can use precision 4 and scale
1 (1 figure after decimal point). So, that precision will not be enough to hold the value 1000 and thus
we're getting the error.
We can verify the type, precision and scale using the following query:
SELECT
SQL_VARIANT_PROPERTY(744.0,
SQL_VARIANT_PROPERTY(744.0,
SQL_VARIANT_PROPERTY(744.0,
SQL_VARIANT_PROPERTY(744.0,
'BaseType') as BaseType,
'Precision') as Precision,
'Scale') as Scale,
'MaxLength') as MaxLength
which returns:
BaseType Precision Scale MaxLength
numeric 4
1
5
This page in BOL shows what types the constants can be. It does not explain the rules how SQL Server
figures it out.
All constants have datatypes. Integer constants are given datatype int, decimal values are given datatype
numeric(p,q) where p is the number of digits (not counting leading zeros) in the number, and q is the
number of digits to the right of the decimal point (including trailing zeroes).
Conclusion
As shown in this article it is better to explicitly CAST to the desired type rather than rely on SQL Server
making the decision.
Understanding NOLOCK Query Hint
In our day to day T-SQL querying we use lot of query hints to modify the way a particular query will be
executed.
When we specify query hint SQL Server produces optimized plan using this query hint. This can be
dangerous if it is not tested before in UAT as it is known fact that query plan which SQL Server makes using
optimizer, which is its prized possession, is the best.
The algorithm which is written for optimizer at low level is not known to the ordinary people, how it makes
best/optimized, most cost effective plan is not known to outside world but we know it does.
Query hints specify that the indicated hints should be used throughout the query and they affect all
operators in the statement. One such query hint is NOLOCK. As the name suggests and many users feel,
that when this hint is specified in the query, the operation does not take lock. This is not the case!
I will demonstrate it using simple query. I create a simple table with "e_id" as PK col, "name", "address"
and "cell no.
BEGIN TRAN
SELECT * FROM dbo.employee WHERE e_id = 'a1'
EXEC sp_lock
If you see below this transaction has SPID 55 which is ID for the code which is just executed. It has
taken two locks IS,S
In Mode Column
S =Shared lock
IS=Intent Shared
In Type Column
DB = Database
TAB= Table
Now let us run same query with NOLOCK query hint and see if it actually takes any lock.
BEGIN TRAN
SELECT * FROM dbo.employee WITH(NOLOCK) WHERE e_id = 'a1'
EXEC sp_lock
As you can see same lock is taken on the same table (see Objid in both fig they are same 1131151075) .
IS and S.
So point is what is difference between query execution one which is given with NOLOCK and one which
is not given with any nolock query hint.
Difference comes when both are trying to select data from table which has taken exclusive lock, I mean
to say difference comes when query is trying to access table which is locked by INSERT/UPDATE
statement.
I will show this with query > let us run an update command on the same table for the same row.
BEGIN TRAN
UPDATE dbo.employee SET e_name='SHASHANK' WHERE e_id = 'a1'
EXEC sp_lock
Now I run the same queries Query1 and Query2
Query 1 - Running with query hint NOLOCK
Query 2 - Now other query which is not using any query hint
Now we see the difference: query with NOLOCK query hint produced output but simple query with no
hint is not producing any output. It is blocked and that can be seen by running sp_who2, I ran this query
and result is below:
As you can see SPID 56 is blocking SPID 55. Then I ran DBCC INPUTBUFFER command to find out text
corresponding to these SPID's, below is the result:
From the above query output it is clear that when we use NOLOCK query hint, transaction can read data
from table which is locked by Update/insert/delete statement by taking the exclusive lock (exclusive lock
is not compatible with any other lock). But if in same transaction we don't use query hint (NOLOCK) it
will be blocked by update statement.
Drawback of NOLOCK is dirty read. So it is not advised to use it in production environment. But it can be
used to read data from a table partition which won't be updated when this select is running. Like you
can run query to select data from Table partition containing Jan 2013 data summing no records will be
updated for January.
SET ANSI_PADDING Setting and Its Importance
Problem Description
Recently I got an interesting escalation to solve for the client. Our VFP based application was getting the
following SQL Server error: "Violation of PRIMARY KEY constraint 'rep_crit_operator_report'.
Cannot insert duplicate key in object 'dbo.rep_crit' The duplicate key value is (ADMIN,
REPORT_PERIOD_SALES)."
Investigation
I started my investigation of the problem by checking VFP code and finding it to be a bit sloppy with no
good error handling (the code was issuing a TABLEUPDATE without checking its return status).
I then connected to the client through TeamViewer and observed that error in action. I then also fired SQL
Server Profiler and found that the tableupdate command was attempting to do an insert instead of
UPDATE and therefore was failing with the above error. At that point I was afraid that we would not be
able to solve the problem without fixing the source code.
In the VFP source code we were always padding the report column which was defined as varchar(20) to
20 characters. I am not sure why we were doing it this way and why in this case we were not using
CHAR(20) instead of VARCHAR(20) since the value was always saved with extra spaces at the end. But
since this code was there for a long time, I didn't try to question its validity.
At that point I decided to test what was the actual length of report column saved in the table. So, I ran the
following query
SELECT *, DATALENGTH(Report) as Report_Length FROM dbo.rep_crit
To my surprise I saw values less than 20. I ran the same code in my local database and got expected
value 20 for all rows. The strange behavior on the client was a bit perplexing.
I then thought I'll try to fix the problem and ran the following UPDATE statement:
UPDATE dbo.rep_crit SET report = LEFT(RTRIM(report) + SPACE(20),20)
to pad the column with spaces at the end. Again, I verified that code locally first. I ran that code on the
client and then ran the first select statement and got the same result as before - the column still showed
length less than 20 characters.
Resolution
To be honest, I should have guessed what was happening by myself. But I must admit that I still didn't, I
sent e-mail to my colleagues asking what do they think about that strange behavior and I also posted this
thread Weird problem with the client . My colleague immediately recognized the problem as one he
already experienced with another client. And Latheesh NK
also pointed out into SET
ANSI_PADDING setting as possible culprit.
So, somehow several tables were saved with the wrong ANSI_PADDING setting being in effect and
therefore the column's setting overrode sessions’ settings.
Recently I made a change in our VFP applications to save varchar columns as varchar (prior to that all
varchar columns were automatically padded with spaces to their length). This caused the above
mentioned problem when the client upgraded the software to the recent release version.
The solution to that particular error was to run ALTER TABLE statement to alter report column to be the
same width as the original column but using SET ANSI_PADDING ON before running the statement. This
fixed the wrong padding on the column.
This is how we can check column's status in design mode when we right click on the column and check its
properties:
ANSI Padding Status is close to the bottom in the designer.
After the problem was identified, we wanted to check the scope of the problem and also correct the
problem for other columns that have been saved with wrong ANSI_PADDING setting.
Script to correct problem in the whole database
I came up with the following script to correct the problem:
;WITH cte
AS (
SELECT c.is_nullable
,c.object_id AS table_id
,OBJECT_NAME(c.object_id) AS TableName
,c.max_length
,c.NAME column_name
,CASE c.is_ansi_padded
WHEN 1
THEN 'On'
ELSE 'Off'
END AS [ANSI_PADDING]
,T.NAME AS ColType
FROM sys.columns c
INNER JOIN sys.types T ON c.system_type_id = T.system_type_id
WHERE T.NAME IN ('varbinary', 'varchar')
)
SELECT 'ALTER TABLE dbo.' + quotename(cte.TableName) + ' ALTER COLUMN ' +
QUOTENAME(cte.column_name) + ' ' + cte.ColType + '(' + CASE
WHEN cte.max_length = - 1
THEN 'max'
ELSE CAST(cte.max_length AS VARCHAR(30))
END + ')' + CASE
WHEN cte.is_nullable = 1
THEN ' NULL '
ELSE ' NOT NULL'
END
FROM cte
INNER JOIN (
SELECT objname
FROM fn_listextendedproperty('SIRIUS_DefaultTable', 'user', 'dbo', 'table',
NULL, NULL, NULL)
) st ON st.objname = cte.TableName
AND cte.ANSI_PADDING = 'Off'
In this code the extra INNER JOIN is done to perform the update only on our tables in the database. In
generic case you don't need this extra JOIN.
We need to run the code above using Query results to Text option from the Query menu. Then we can
copy the output of that statement into new query window and run it to fix this problem.
Default Database Settings
I discussed this problem in one more thread SET ANSI_PADDING setting
additional insight into the importance of the correct setting.
. This thread provides
It would be logical to expect that when we create a new database, the default settings have correct
values for SET ANSI_NULL and SET ANSI_PADDING. However, this is not the case even for SQL Server
2012. If we don't change database defaults, they all come up wrong. See them here:
Therefore if we want correct settings on the database level, it may be a good idea to fix them at the
moment we create a new database. However, these settings are not very important since they are
overwritten by the session settings.
As noted in the Comments, another interesting case of varbinary truncation due to this wrong setting is
found in this Transact-SQL forum 's thread.
All-at-Once Operations in T-SQL
I remember when I read about this concept in a book from Itzik Ben-Gan in 2006, I was so excited and
could not sleep until daylight. When I encountered the question about this concept in MSDN Forum, I
answered it with the same passion that I read about this mysterious concept. So I made a decision to
write an article about it. I want to ask you to be patient and do not see the link of the question until end
up reading this article. Please wait, even you know this concept completely, because I hope this will be
an amazing trip.
Introduction
Each SQL query statement is made up by some clauses and each clause help us to achieve the expected
result. Simply, in one SELECT query we have some of these clauses:





SELECT
FROM
WHERE
GROUP BY
HAVING
Each of which performs one logical query processing phase. T-SQL is based on Sets and logic. When we
run a query against a table, in fact, the expected result is a Sub-Set of that table. With each phase we
create a smaller Sub-Set until we get our expected result. In each phase we perform a process over whole
Sub-Set elements. The next figure illustrates this:
Definition
All-at-Once
"All-at-Once Operations" means that all expressions in the same logical query process phase are
evaluated logically at the same time.
I explain this with an example using the following code:
-- create a test table
DECLARE @Test TABLE ( FirstName NVARCHAR(128), LastName NVARCHAR(128));
-- populate with sample data
INSERT @Test
( FirstName, LastName )
VALUES ( N' Saeid ',
-- FirstName
N'Hasani Darabadi' -- LastName
) ;
-- query
SELECT
LTRIM( RTRIM( FirstName ) ) + ' ' AS [Corrected FirstName],
[Corrected FirstName] + LastName AS FullName
FROM @Test ;
As illustrated with this figure, after executing we encounter this error message:
Invalid column name 'Corrected FirstName'.
This error message means that we cannot use an alias in next column expression in the SELECT clause. In
the query we create a corrected first name and we want to use it in next column to produce the full
name, but the All-at-Once operations concept tells us you cannot do this because all expressions in the
same logical query process phase (here is SELECT) are evaluated logically at the same time.
Why this concept is essential?
Because T-SQL is a query language over Relational Database System (Microsoft SQL SERVER), it deals
with Sets instead of variables. Therefore, query must be operated on a Set of elements. Now I want to
show another example to illustrate this.
-- drop test table
IF OBJECT_ID( 'dbo.Test', 'U') IS NOT NULL
DROP TABLE dbo.Test ;
GO
-- create a test table
CREATE TABLE dbo.Test
(
Id
INT PRIMARY KEY ,
ParentId
INT ,
CONSTRAINT FK_Self_Ref
FOREIGN KEY ( ParentId )
REFERENCES dbo.Test ( Id )
);
GO
-- insert query
INSERT dbo.Test
( Id, ParentId )
VALUES ( 1, 2 ),
( 2, 2 ) ;
-- there is not any id = 2 in table
SELECT * FROM dbo.Test ;
-- update query
UPDATE dbo.Test
SET Id = 7,
ParentId = 7
WHERE Id = 1 ;
-- there is not any id = 7 in table
SELECT * FROM dbo.Test ;
After execute this code, as it shown in following figure, we see that whether there is no ( id=2 ) in the
table, but we can insert it as a foreign key in the table. This is because of All-at-Once operations.
As illustrated in next figure this behavior is repeated in UPDATE query. If we do not have All-at-Once
operations feature we should first insert or update the primary key of the table, then modify the foreign
key.
Many programmers who are experts in non SQL language, like C# and VB, confuse with this behavior at
first, because they fall into the habit that processing a variable in first line of code and using the processed
variable in the next line. They expected to do something like that in the T-SQL. But as I noted earlier, TSQL is a query language over Relational Database System (Microsoft SQL SERVER), and it deals with Sets
instead of variables. Therefore, the query must be operated on a Set of elements at the same time.
Moreover, in each logical query process phase, all expressions processed logically at the same point of
time.
Pros and Cons
This concept impacts on every situation in T-SQL querying. Some days it makes things hard to do and
sometimes it makes a fantastic process that we do not expect. To illustrate these impacts I explain four
real situations with their examples.
Silent Death
One of the problems that lack of attention to All-at-Once operations concept might produce is writing a
code that might encounter the unexpected error.
We know that square root of a negative number is undefined. So in the code below we put two conditions
inside where clause; first condition checks that Id1 is greater than zero. This query might encounter an
error, because the All-at-Once operations concept tells us that these two conditions are evaluated
logically at the same point of time. If the first expression evaluates to FALSE, SQL Server will Short Circuit
and whole WHERE clause condition evaluates to FALSE. Therefore, SQL Server can evaluate conditions in
WHERE clause in arbitrary order, based on the estimated execution plan.
-- drop test table
IF OBJECT_ID( 'dbo.Test', 'U') IS NOT NULL
DROP TABLE dbo.Test ;
GO
-- create a test table
CREATE TABLE dbo.Test ( Id1 INT, Id2 INT)
GO
-- populate with sample data
INSERT dbo.Test
( Id1, Id2 )
VALUES ( 0, 0 ), ( 1, 1 ), ( -1, -1 )
GO
-- query
SELECT *
FROM dbo.Test
WHERE
id1 > 0
AND
SQRT(Id1) = 1
If after executing the above code you do not receive any error, we need to perform some changes on our
code to force SQL Server to choose another order when evaluating conditions in the WHERE clause.
-- drop test table
IF OBJECT_ID( 'dbo.Test', 'U') IS NOT NULL
DROP TABLE dbo.Test ;
GO
-- create a test table
CREATE TABLE dbo.Test ( Id1 INT, Id2 INT)
GO
-- populate with sample data
INSERT dbo.Test
( Id1, Id2 )
VALUES ( 0, 0 ), ( 1, 1 ), ( -1, -1 )
GO
-- create a function that returns zero
CREATE FUNCTION dbo.fnZero ()
RETURNS INT
AS
BEGIN
DECLARE @Result INT;
SET @Result = ( SELECT TOP (1) Id2 FROM dbo.Test WHERE Id1 < 1 );
RETURN @Result;
END
GO
-- query
SELECT *
FROM dbo.Test
WHERE
id1 > dbo.fnZero()
AND
SQRT(Id1) = 1
As illustrated in the next figure we encounter an error.
One way to avoid encountering error in this query is using CASE like this query:
-- query
SELECT *
FROM dbo.Test
WHERE
CASE
WHEN Id1 < dbo.fnZero() THEN 0
WHEN SQRT(Id1) = 1 THEN 1
ELSE 0
END = 1;
CAUTION
After publishing this article, Naomi Nosonovsky noted me that "even CASE does not provide
deterministic order of evaluation with short circuiting".
For more information please see these links:
Don’t depend on expression short circuiting in T-SQL (not even with CASE)
Aggregates Don't Follow the Semantics Of CASE
Now we see another example. Although we add a condition in HAVING clause to check if Id2 is opposite
to zero, because of the All-at-Once operations concept, there is a probability to encounter an error.
-- drop test table
IF OBJECT_ID( 'dbo.Test', 'U') IS NOT NULL
DROP TABLE dbo.Test ;
GO
-- create a test table
CREATE TABLE dbo.Test ( Id1 INT, Id2 INT)
GO
-- populate with sample data
INSERT dbo.Test
( Id1, Id2 )
VALUES ( 0, 0 ), ( 1, 1 ), ( 2, 1 )
GO
-- query
SELECT Id2, SUM(Id1)
FROM dbo.Test
GROUP BY Id2
HAVING
id2 <> ( SELECT Id2 FROM dbo.Test WHERE Id1 < 1 )
subquery returns zero*/
AND
SUM(Id1) / Id2 = 3 ;
/* this
As illustrated in the next figure we encounter an error.
Therefore, the lack of attention to All-at-Once operations concept in T-SQL might result in encountering
the unexpected errors!
Code complexity
Moreover, this concept leads to complexity in debugging T-SQL code. Suppose we have a table “Person”.
This table has two columns “FirstName” and “LastName”. For some reasons the values within these
columns are mixed with extra characters. The problem is to write a query that retrieve a new column as
Full Name. This code produces our test data:
-- drop test table
IF OBJECT_ID( 'dbo.Person', 'U') IS NOT NULL
DROP TABLE dbo.Person ;
GO
-- create a test table
CREATE TABLE dbo.Person
(
PersonId INT IDENTITY PRIMARY KEY ,
FirstName NVARCHAR(128) ,
LastName NVARCHAR(128)
);
GO
-- populate table with sample data
INSERT dbo.Person
( FirstName, LastName )
VALUES ( N'
Saeid
123
N' Hasani
'
) ;
GO
', -- FirstName
-- LastName
As illustrated in this figure the problem with column “FirstName” is that it’s mixed with extra numbers
that should be removed. And the problem with column “LastName” is that it’s mixed with extra space
characters before and after the real Last Name. Here is the code to do this:
SELECT PersonId ,
LEFT( LTRIM( RTRIM( FirstName ) ) , CHARINDEX( N' ' , LTRIM( RTRIM(
FirstName ) ) ) - 1 ) + N' ' + LTRIM( RTRIM( LastName ) ) AS [FullName]
FROM dbo.Person ;
Because of All-at-Once operations we cannot use an alias in next column expression in the SELECT clause.
So the code can be very complex to debug.
I found that one way to avoid this problem is using right Code Style and extra comments. The next code
is a well formed code style of the former code with same output result and it's easy to debug.
SELECT PersonId ,
/*
Prototype:
[FullName] ::: LEFT( [FirstName Trim],
[Index of first space character in FirstName Trim] - 1 ) + ' '+ [Corrected
LastName]
elements:
[FirstName Trim] ::: LTRIM( RTRIM( FirstName ) )
[Index of first space character in FirstName Trim] ::: CHARINDEX( N' ' ,
[FirstName Trim] )
[Corrected LastName] ::: LTRIM( RTRIM( LastName ) )
*/
LEFT( LTRIM( RTRIM( FirstName ) )
-[FirstName Trim]
, CHARINDEX( N' ' , LTRIM( RTRIM( FirstName ) ) ) - 1
--[Index of
first space character in FirstName Trim]
)
+ N' '
+ LTRIM( RTRIM( LastName )
[Corrected LastName]
) AS [FullName]
FROM dbo.Person ;
--
Other solutions are "creating modular views" or "using Derived Table or CTE". I showed "creating
modular view" approach in this Forum thread
.
Impact on Window Functions
This concept explains why we cannot use Window Functions in WHERE clause. We use ad
absurdum argument like those we use in mathematics. Suppose that we can use Window Functions in
WHERE clause. Please see the following code.
IF OBJECT_ID('dbo.Test', 'U') IS NOT NULL
DROP TABLE dbo.Test ;
CREATE TABLE dbo.Test ( Id INT) ;
GO
INSERT INTO dbo.Test
VALUES ( 1001 ), ( 1002 ) ;
GO
SELECT Id
FROM dbo.Test
WHERE
Id = 1002
AND
ROW_NUMBER() OVER(ORDER BY Id) = 1;
All-at-Once operations tell us these two conditions evaluated logically at the same point of time.
Therefore, SQL Server can evaluate conditions in WHERE clause in arbitrary order, based on estimated
execution plan. So the main question here is which condition evaluates first.
We can think about these two orders:

SQL Server check if ( Id = 1002 ) first,
Then check if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 )
In this order the output will be ( 1002 ).

SQL Server check if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 ) first, it means ( Id = 1001 )
Then check if ( Id = 1002 )
In this order the output will be empty.
So we have a paradox.
This example shows why we cannot use Window Functions in WHERE clause. You can think more about
this and find why Window Functions are allowed to be used just in SELECT and ORDER BY clauses!
Magic Update
This is the most exciting part of this article that I love it. The question is that how to swap values of two
columns in a table without using a temporary table? This code provide sample data for us:
-- drop test table
IF OBJECT_ID( 'dbo.Person', 'U') IS NOT NULL
DROP TABLE dbo.Person ;
GO
-- create a test table
CREATE TABLE dbo.Person
(
PersonId INT IDENTITY PRIMARY KEY,
FirstName NVARCHAR(128) ,
LastName NVARCHAR(128)
);
GO
-- populate table with sample data
INSERT dbo.Person
( FirstName, LastName )
VALUES
( N'Hasani', N'Saeid' ) ,
( N'Nosonovsky', N'Naomi' ) ,
( N'Price', N'Ed' ) ,
( N'Toth', N'Kalman' )
GO
Consider that in all other non SQL languages, we have to use a temporary variable to swap values
between two variables. If we want to see the problem from the non SQL programmer, we should do
something like this prototype:
update Person
set @swap=Firsname
set Firstname=Lastname
set lastname=@swap
If we see the problem from a SQL programmer we can translate the above prototype by using a
temporary table “#swap”. The code should be like this:
SELECT PersonId,
FirstName ,
LastName
INTO #swap
FROM dbo.Person ;
UPDATE dbo.Person
SET FirstName = a.LastName ,
LastName = a.FirstName
FROM #swap a
INNER JOIN dbo.Person b ON a.PersonId = b.PersonId
This code works fine. But the main question is that how much time above script needs to run, if we have
millions of records?
If we are known with All-at-Once operations concept in T-SQL, we can do this job through one update
statement with the following simple code:
UPDATE dbo.Person
SET FirstName = LastName ,
LastName = FirstName ;
This behavior is amazing, isn't it?
Exception
In definition section I noted that the query must be operated on a Set of elements. What will happen if a
query deal with multiple tables? In such queries we use table operators like JOIN and APPLY inside
FROM clause. By the way, these operators are logically evaluated from left to right. Because we have
multiple Sets, first we need to transform them to a Set then we have All-at-Once operations concept.
Therefore, this concept is not applicable to the table operators in FROM clause.
Conclusion
All-at-Once operations is one of the most important concept in T-SQL that has extreme impact on our TSQL programming, code style and performance tuning solutions.
SQL Server Columnstore Index FAQ
The SQL Server in-memory columnstore index (formerly called xVelocity) stores data by columns instead
of by rows, similar to a column-oriented DBMS. The columnstore index speeds up data warehouse query
processing in SQL Server 2012 and SQL Server 2014, in many cases by a factor of 10 to 100. We'll be
posting answers to frequently asked questions here.
SQL Server 2012 introduced nonclustered columnstore indexes. For more information, see the 2012
version of Columnstore Indexes on MSDN.
SQL Server 2014 has both clustered and nonclustered columnstore indexes, and both of these indexes
are updateable. For more information, see the 2014 pre-release version of Create Columnstore Index
(Transact-SQL) and Columnstore Indexes .
For both SQL Server 2012 and SQL Server 2014, see the wiki article SQL Server Columnstore
Performance Tuning on Technet.
Contents
1.
2.
3.
4.
5.
6.
7.
Overview
Creating a Columnstore Index
Limitations on Creating a Columnstore Index
More Details on Columnstore Technology
Using Columnstore Indexes
Managing Columnstore Indexes
Batch Mode Processing
1. Overview
What are Microsoft's in-memory technologies?
Microsoft SQL Server has a family of in-memory technologies. These are all next-generation technologies
built for extreme speed on modern hardware systems with large memories and many cores. The inmemory technologies include in-memory analytics engine (used in PowerPivot and Analysis Services),
and the in-memory columnstore index (used in the SQL Server database).
SQL Server 2012, SQL Server 2014, and SQL Server PDW all use in-memory technologies to accelerate
common data warehouse queries. SQL Server 2012 introduced two new features: a nonclustered
columnstore index and a vector-based query execution capability that processes data in units called
"batches." Now, SQL Server 2014 is adds updateable clustered columnstore indexes.
What is a columnstore?
A columnstore is data that is logically organized as a table with rows and columns, and physically stored
in a columnar data format. Relational database management systems traditionally store data in rowwise fashion. The values comprising one row are stored contiguously on a page. We sometimes refer to
data stored in row-wise fashion as a rowstore.
What is a columnstore index?
A columnstore index is a technology for storing, retrieving and managing data by using a columnar data
format, called a columnstore. The data is compressed, stored, and managed as a collection of partial
columns, called column segments. You can use a columnstore index to answer a query just like data in
any other type of index.
A columnstore index appears as an index on a table when examining catalog views or the Object
Explorer in Management Studio. The query optimizer considers the columnstore index as a data source
for accessing data just like it considers other indexes when creating a query plan.
What do I have to do to use a columnstore index?
For nonclustered columnstore indexes, all you have to do is create a nonclustered columnstore index on
one or more tables in your database. The query optimizer will decide when to use the columnstore
index and when to use other types of indexes. The query optimizer will also choose when to use the new
batch execution mode and when to use row execution mode.
For clustered columnstore indexes, you need to first create a table as a heap or clustered index, and
then use the CREATE CLUSTERED COLUMNSTORE INDEX statement to convert the existing table to a
clustered columnstore index. If your existing table has indexes, you need to drop all indexes, except for
the clustered index, before creating a clustered columnstore index. Since the clustered columnstore
index is the data storage mechanism for the entire table, the clustered columnstore index is the only
index allowed on the table.
When can I try columnstore indexes?
Nonclustered columnstore indexes are available in SQL Server 2012. Clustered columnstore indexes are
in the preview releases of SQL Server 2014 and will ship in the final release.
Are columnstore indexes available in SQL Azure?
No, not yet.
2. Creating a Columnstore Index
How do I create a nonclustered columnstore index?
You can create a nonclustered columnstore index by using a slight variation on existing syntax for
creating indexes. To create an index named mycolumnstoreindex on a table named mytable with three
columns, named col1, col2, and col3, you would use the following syntax:
CREATE NONCLUSTERED COLUMNSTORE INDEX mycolumnstoreindex ON mytable (col1, col2, col3);
To avoid typing the names of all the columns in the table, you can use the Object Explorer in
Management Studio to create the index as follows:
1. Expand the tree structure for the table and then right click on the Indexes icon.
2. Select New Index and then Nonclustered columnstore index
3. Click Add in the wizard and it will give you a list of columns with check boxes.
4. You can either choose columns individually or click the box next to Name at the top, which will
put checks next to all the columns. Click OK.
5. Click OK.
How do I create a clustered columnstore index?
When you create a clustered columnstore index, there is no need to specify columns since all columns in
the table are included in the index. This example converts a clustered index called myindex into a
clustered columnstore index.
CREATE CLUSTERED COLUMNSTORE INDEX myindex ON mytable WITH (DROP_EXISTING = ON);
Does it matter what order I use when listing the columns in the CREATE INDEX statement?
No. When the columnstore index is created, it uses a proprietary algorithm to organize and compress
the data.
Does the columnstore index have a primary key?
No. There is no notion of a primary key for a columnstore index.
How many columns should I put in my columnstore index?
Typically, you will put all the columns in a table in the columnstore index, although it is not necessary to
include all the columns. The limit on the number of columns is the same as for other indexes (1024
columns). If you have a column that has a data type that is not supported for columnstore indexes, you
must omit that column from the columnstore index.
What data types can be used with columnstore indexes?
A columnstore index can include columns with the following data types: int, big int, small int, tiny int,
money, smallmoney, bit, float, real, char(n), varchar(n), nchar(n), nvarchar(n), date, datetime,
datetime2, small datetime, time, datetimeoffset with precision <=2, decimal or numeric with precision
<= 18.
What data types cannot be used in a columnstore index?
The following data types cannot be used in a columnstore index: decimal or numeric with precision > 18,
datetimeoffset with precision > 2, binary, varbinary, image, text, ntext, varchar(max), nvarchar(max),
cursor, hierarchyid, timestamp, uniqueidentifier, sqlvariant, xml.
How long does it take to create a columnstore index? Is creating a columnstore index a parallel
operation?
Creating a columnstore index is a parallel operation, subject to the limitations on the number of CPUs
available and any restrictions set on MaxDOP. Creating a columnstore index takes on the order of 1.5
times as long as building a B-tree on the same columns.
My MAXDOP is greater than one but the columnstore index was created with DOP = 1. Why it
was not created using parallelism?
If your table has less than one million rows, SQL Server will use only one thread to create the
columnstore index. Creating the index in parallel requires more memory than creating the index
serially. If your table has more than one million rows, but SQL Server cannot get a large enough memory
grant to create the index using MAXDOP, SQL Server will automatically decrease DOP as needed to fit
into the available memory grant. In some cases, DOP must be decreased to one in order to build the
index under constrained memory.
How much memory is needed to create a columnstore index?
The memory required for creating a columnstore index depends on the number of columns, the number
of string columns, the degree of parallelism (DOP), and the characteristics of the data. SQL Server will
request a memory grant before trying to create the index. If not enough memory is available to create
the index in parallel with the current max DOP, SQL Server will reduce the DOP as needed to get an
adequate memory grant. If SQL Server cannot get a memory grant to build the index with DOP = 1, the
index creation will fail.
A rule of thumb for estimating the memory grant that will be requested for creating a columnstore index
is:
Memory grant request in MB = [(4.2 *Number of columns in the CS index) + 68]*DOP + (Number of
string cols * 34)
What can I do if I do not have enough memory to build the columnstore index?
It's possible for creation of a columnstore index to fail either at the very beginning of execution if it can't
get the necessary initial memory grant, or later during execution if supplemental grants can't be
obtained. If the initial grant fails, you'll see error 8657 or 8658. You may get error 701 or 802 if memory
runs out later during execution. If out-of-memory error 8657 or 8658 occur at the beginning of
columnstore index creation, first, check your resource governor settings. The default setting for resource
governor limits a query in the default pool to 25% of available memory even if the server is otherwise
inactive. This is true even if you have not enabled resource governor. Consider changing the resource
governor settings to allow the create index statement to access more memory. You can do this using
TSQL:
ALTER WORKLOAD GROUP [DEFAULT] WITH (REQUEST_MAX_MEMORY_GRANT_PERCENT=X)
ALTER RESOURCE GOVERNOR RECONFIGURE
GO
where X is the percent, say 50.
If you get error 701 or 802 later during the index build, that means that the initial estimate of memory
usage was too low, and additional memory was consumed during index build execution and memory ran
out. The only viable way to work around these errors in this case is to explicitly reduce DOP when you
create the index, reduce query concurrency, or add more memory.
For all these error conditions (701, 802, 8657, and 8658), adding more memory to your system may
help.
See SQL Server Books Online for ALTER WORKLOAD GROUP
for additional information.
Another way to deal with out-of-memory conditions during columnstore index build is to vertically
partition a wide table into two or more tables so that each table has fewer columns. If a query touches
both tables, the table will have to be joined, which will affect query performance. If you use this option,
you will want to allocate columns to the different tables carefully so that queries will usually touch only
one of the tables. This option would also affect any existing queries and loading scripts. Another option
is to omit some columns from the columnstore index. Good candidates are columns that are
infrequently touched by queries that require scanning large amounts of data.
In some cases, you may not be able to create a columnstore index due to insufficient memory soon after
the server starts up, but later on it may work. This is because SQL Server, by default, gradually requests
memory from the operating system as it needs it. So it may not have enough memory available to satisfy
a large memory grant request soon after startup. If this happens, you can make the system grab more
memory by running a query like "select count(*) from t" where t is a large table. Or, you can set both
the min server memory and max server memory to the same value using sp_configure, which will
force SQL Server to immediately grab the maximum amount of memory it will use from the operating
system when it starts up.
Can I create a columnstore index on a compressed table?
Yes. The base table can have PAGE compression, ROW compression, or no compression. The
columnstore index will have its own compression, which cannot be specified by the user.
I tried to create a columnstore index with SQL Server Management Studio using the Indexes->New
Index menu and it timed out after 20 minutes. How can I work around this?
Run a CREATE NONCLUSTERED COLUMNSTORE INDEX statement manually in a T-SQL window instead of
using the graphical interface. This will avoid the timeout imposed by the Management Studio graphical
user interface.
3. Limitations on Creating a Columnstore Index
Can I create a filtered columnstore index?
No. A columnstore index must contain data from all the rows in the table.
Can I create a columnstore index on a computed column?
No. A computed column cannot be part of a columnstore index.
Can I create a columnstore index on a sparse column?
No. A sparse column cannot be part of a columnstore index.
Can I create a columnstore index on an indexed view?
No. A columnstore index cannot be created on an indexed view. You also cannot use a columnstore
index to materialize a view.
Can I create multiple columnstore indexes?
No. You can only create one columnstore index on a table. The columnstore index can contain data from
all, or some, of the columns in a table. Since the columns can be accessed independently from one
another, you will usually want all the columns in the table to be part of the columnstore index.
4. More Details on Columnstore Technology
What are the advantages and disadvantages of row stores and column stores?
When data is stored in column-wise fashion, the data can often be compressed more effectively than
when stored in row-wise fashion. Typically there is more redundancy within a column than within a row,
which usually means the data can be compressed to a greater degree. When data is more compressed,
less IO is required to fetch the data into memory. In addition, a larger fraction of the data can reside in a
given size of memory. Reducing IO can significantly speed up query response time. Retaining more of
your working set of data in memory will speed up response time for subsequent queries that access the
same data.
When data is stored column-wise, it is possible to access the column individually. If a query only
references a few of the columns in the table, it is only necessary for a subset of the columns to be
fetched from disk into memory. For example, if a query references five columns from a table with 50
columns (i.e. 10% of the columns), IO is reduced by 90% (in addition to any benefits from compression).
On the other hand, storing columns in independent structures means that the data must be recombined
to return the data as a row. When a query touches only one (or a few) rows, having all the data for one
row stored together can be an advantage if the row can be quickly located with a B-tree index. Row
stores may offer better query performance for very selective queries, such as queries that lookup a
single row or a small range of rows. Updating data is also simpler in a row store.
What is the difference between a pure column store and a hybrid column store?
SQL Server columnstore indexes are pure column stores. That means that the data is stored and
compressed in column-wise fashion and individual columns can be accessed separately from other
columns. A hybrid columnstore stores a set of rows together, but within that set of rows, data is
organized and compressed in column-wise fashion. A hybrid column store can achieve good
compression from a column-wise organization within the set of rows, but when data is fetched from
disk, the pages being fetched contain data from all the columns in each row. Even if a query references
only 10% of the columns in a table, all the columns must be fetched from disk, and unused columns also
take up space in main memory. SQL Server columnstore indexes require less I/O and give better mainmemory buffer pool hit rates than a hybrid columnstore.
Is a columnstore index better than a covering index that has exactly the columns I need for a query
The answer depends on the data and the query. Most likely the columnstore index will be compressed
more than a covering row store index. If the query is not too selective, so that the query optimizer will
choose an index scan and not an index seek, scanning the columnstore index will be faster than scanning
the row store covering index. In addition, depending on the nature of the query, you can get batch
mode processing when the query uses a columnstore index. Batch mode processing can substantially
speed up operations on the data in addition to the speed up from a reduction in IO. If there is no
columnstore index used in the query plan, you will not get batch mode processing. On the other hand, if
the query is very selective, doing a single lookup, or a few lookups, in a row store covering index might
be faster than scanning the columnstore index.
Another advantage of the columnstore index is that you can spend less time designing indexes. A row
store index works well when it covers all the columns needed by a query. Changing a query by adding
one more column to the select list can render the covering index ineffective. Building one columnstore
index on all the columns in the table can be much simpler than designing multiple covering indexes.
Is the columnstore index the same as a set of covering indexes, one for each column?
No. Although the data for individual columns can be accessed independently, the columnstore index is a
single object; the data from all the columns is organized and compressed as an entity. While the
amount of compression achieved is dependent on the characteristics of the data, a columnstore index
will most likely be much more compressed than a set of covering indexes, resulting in less IO to read the
data into memory and the opportunity for more of the data to reside in memory across multiple
queries. In addition, queries using columnstore indexes can benefit from batch mode processing,
whereas a query using covering indexes for each column would not use batch mode processing.
Is columnstore index data still compressed after it is read into memory?
Yes. Column segments are compressed on disk and remain compressed when cached in memory.
Do columnstore indexes use bitmap indexes?
No. Columnstore indexes use a proprietary data representation based on Vertipaq. It’s not the same as a
bitmap index and doesn’t use one. But it has some similar benefits to bitmap indexes, such as reducing
the time it takes to filter on a column with a small number of distinct values.
I want to show other people how cool SQL Server columnstore indexes are. What can I show them?
OR
Where can I find more information (including documents and videos) about SQL Server columnstore
indexes?
White paper:
http://download.microsoft.com/download/8/C/1/8C1CE06B-DE2F-40D1-9C5C3EE521C25CE9/Columnstore%20Indexes%20for%20Fast%20DW%20QP%20SQL%20Server%2011.pdf
Product documentation:
http://msdn.microsoft.com/en-us/library/gg492088(SQL.110).aspx
SQL Server Columnstore FAQ:
http://social.technet.microsoft.com/wiki/contents/articles/sql-server-columnstore-index-faq.aspx
SQL Server Columnstore Performance Tuning Guide:
http://social.technet.microsoft.com/wiki/contents/articles/sql-server-columnstore-performancetuning.aspx
The Coming In-Memory Tipping Point, by David Campbell
http://blogs.technet.com/b/dataplatforminsider/archive/2012/04/09/the-coming-in-memory-databasetipping-point.aspx
Microsoft Virtual Academy talk video, 47 minutes, March 2012:
http://technet.microsoft.com/en-us/edge/Video/hh859842
TechEd 2011 talk video, Columnstore Indexes Unveiled, 1 hour, 9 minutes:
http://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI312
TechEd 2012 talk video, SQL Server Columnstore Performance Tuning, 1 hour, 15 minutes:
http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI409
Columnstore performance and partition switching demo video, 9 minutes:
http://channel9.msdn.com/posts/SQL11UPD02-REC-02
Columnstore performance demo video, 4 minutes:
http://www.youtube.com/watch?v=vPN8_PCsJm4
ACM SIGMOD 2011 paper on SQL Server columnstore indexes:
http://dl.acm.org/citation.cfm?doid=1989323.1989448
IEEE Data Engineering Bulletin Paper on SQL Server columnstore indexes, March 2012:
http://sites.computer.org/debull/A12mar/apollo.pdf
VertiPaq vs ColumnStore: Performance Analysis of the xVelocity Engine, v1.0, rev 2, Aug 3, 2012.
http://www.sqlbi.com/wp-content/uploads/Vertipaq-vs-ColumnStore1.pdf
Microsoft SQL Server 2012 Columnstore for Real Time Reporting in Manufacturing Automation (COPADATA zenon Analyzer), 2012.
http://www.kreatron.ro/news/newsdetail_65.html
Case Study (bwin.party):
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012/bwin.party/Company-CutsReporting-Time-by-up-to-99-Percent-to-3-Seconds-and-Boosts-Scalability/710000000087
Case Study (Motricity: Migration from Sybase IQ to xVelocity columnstore index):
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/Motricity/MobileAdvertiser-Makes-Gains-with-Easy-Migration-of-Sybase-Database-to-Microsoft/710000000170
Case Study (MS People):
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012/Microsoft-Information-TechnologyGroup-MSIT/Microsoft-Cuts-Costs-and-Improves-Access-to-Information-with-Enhanced-DataWarehouse/4000011545
Case Study (Columnstore Indexes to Speed ETL):
http://prologika.com/CS/blogs/blog/archive/2011/12/07/columnstore-indexes-to-speed-etl.aspx
Case Study (Mediterranean Shipping Company):
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012/Mediterranean-Shipping-CompanyMSC/Shipper-Supports-Expansion-by-Boosting-Speed-Control-and-Savings-withMicrosoft/4000011460
Case Study (Beth Israel Deaconess Medical Center):
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012/Beth-Israel-Deaconess-MedicalCenter/Hospital-Improves-Availability-and-Speeds-Performance-to-Deliver-High-QualityCare/5000000011
Case Study (Belgacom)
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/BICS/Telecom-PerformsDatabase-Queries-Five-Times-Faster-Gains-Ability-to-Sustain-Growth/710000000579
Case Study (BNZ - New Zealand Bank)
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000000356
Case Study (RHI - Refractory Materials Manufacturer)
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/RHI/ManufacturerSpeeds-Queries-and-Improves-Business-Decisions-with-New-BI-Solution/710000001276
Case Study (Recall -- Records Management Firm)
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/Recall/RecordsManagement-Firm-Saves-1-Million-Gains-Faster-Data-Access-with-Microsoft-BI/710000001279
Slide deck on CDR (Telecom) application design loading 100M rows per day with 3 year retention
http://sqlug.be/media/p/1238.aspx
Internal Microsoft Columnstore Benchmark:
http://download.microsoft.com/download/7/2/E/72E63D2D-9F73-42BB-890FC1CA0931511C/SQL_Server_2012_xVelocityBenchmark_DatasheetMar2012.pdf
SQL Server Column-Store available for all major SAP BW releases
http://blogs.msdn.com/b/saponsqlserver/archive/2012/10/29/sql-server-column-store-generallyavailable-for-sap-bw.aspx
SQL Server 2012 and Tableau -- speeding things up
http://random-thunks.com/2012/11/23/sql-server-2012-and-tableau-speeding-things-up/
What determines how many segments there will be?
Each physical partition of a columnstore index is broken into one-million-row chunks called segments
(a.k.a. row groups). The index build process creates as many full segments as possible. Because multiple
threads work to build an index in parallel, there may be a few small segments (typically equal to the
number of threads) at the end of each partition with the remainder of the data after creating full
segments. That's because each thread might hit the end of its input at different times. Non-partitioned
tables have one physical partition.
5. Using Columnstore Indexes
How do I know whether the columnstore index is being used for my query?
You can tell whether a columnstore index is being used by looking at showplan. In graphical showplan,
there is a new icon for columnstore index scans. In addition, columnstore index scans have a new
property, storage, with the value ColumnStore.
How can I force the query to use a columnstore index?
Existing hints work with columnstore indexes. If you have a nonclustered columnstore index named
mycsindex on a table named mytable you could use a table hint such as
… FROM mytable WITH (INDEX (mycsindex)) …
How can I prevent the use of a columnstore index in my query?
You can either use a table hint to force the use of a different index, or you can use a new query hint:
IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX. This new hint will prevent the use of any
nonclustered columnstore indexes in the query.
Below is an example of using the hint to prevent use of any nonclustered columnstore index in a query:
SELECT DISTINCT (SalesTerritoryKey)
FROM dbo.FactResellerSales
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX);
Are columnstore indexes an in-memory database technology?
SQL Server columnstores provide the performance benefits of a pure in-memory system with the
convenience and economics of a system that stores data on disk and caches recently used data in
memory. Columnstores hold data in memory in a different format than is kept on disk. This in-memory
representation is highly optimized to support fast query execution on modern processors. Not all data
has to fit in memory with a SQL Server columnstore index. But if all columnstore data does fit in
memory, SQL Server provides pure-in-memory levels of performance.
Why require all data to fit in memory (capping your database size or demanding a large budget to
purchase memory, and demanding slow system startup times) if you can get the best of both worlds,
that is, state-of-the-art query performance on economical hardware?
Does all the data have to fit in memory when I use a columnstore index?
No, a columnstore index is persisted on disk just like any other index. It is read into memory when
needed just like other types of indexes. The columnstore index is divided into units called segments,
which are the unit of transfer. A segment is stored as a LOB, and can consist of multiple pages. We
elected to bring columnstore index data into memory on demand rather than require that all data fits in
memory so customers can access databases much bigger than will fit in main memory. If all your data
fits in memory, you'll get reduced I/O and the fastest possible query performance. But it's not necessary
for all data to fit in memory, and that's a plus.
What determines whether the columnstore index is stored in memory?
A columnstore index is read into memory when needed just like other types of indexes.
Can I force a whole columnstore index to be loaded into memory?
You cannot force the columnstore index to be loaded, or kept, in memory but you can warm the cache
by running a query that will cause the columnstore data to be read into memory.
When should I build a columnstore index?
Columnstore indexes are designed to accelerate data warehouse queries, not OLTP workloads. Use
columnstore indexes when your query workload entails scanning and aggregating large amounts of data
or joining multiple tables, especially in a star join pattern. The restrictions on how you update the data
will also affect your choice. Columnstore indexes will be easiest to manage if you have a read-mostly
workload and if partition switching to update the data will fit into your workflow. Partition switching for
handling updates is easier if most updates consist of appending new data to the existing table and can
be placed in a staging table that can be switched into the table during periodic load cycles.
Typically you will want to build a columnstore index on large fact tables and maybe on large dimension
tables as well. You can build a columnstore index on very small tables, but the performance advantage
is less noticeable when the table is small. If you frequently update your dimension tables, and they are
not too large, you may find the maintenance effort outweighs the benefit of a columnstore index.
When should I not build a columnstore index?
If you frequently update the data in a table, or if you need to update a large table but partition switching
does not fit your workflow, you might not want to create a columnstore index. If most of your queries
are small lookup queries, seeking into a B-tree index may be faster and you may not find a columnstore
index to be beneficial. If you test a columnstore index and it does not benefit your workload, you can
drop or disable the index.
Can you do trickle load and real-time query with a columnstore index?
Yes. Even though tables with a columnstore index are read-only, you can maintain two tables, the one
with the columnstore, and a second table with the same schema structured as a B-tree or heap. The
second table, called adifferential file, holds newly inserted rows. You query the combined table by
modifying your queries to aggregate results from the two tables separately, and combine them. This is
called local-global aggregation. Periodically, (say during a nightly batch window) you move data from the
row-structured table to the columnstore table. See here for details and an example on how to do trickle
load.
6. Managing Columnstore Indexes
Do columnstore indexes work with Transparent Data Encryption?
Yes.
Can I compress the columnstore index?
The columnstore index is compressed when it is created. You cannot apply PAGE or ROW compression
to a columnstore index. When a columnstore index is created, it uses the VertiPaqTM compression
algorithms, which compress the data more than either PAGE or ROW compression. There is no user
control over compression of the columnstore index.
What is the difference in storage space used between the base table and the columnstore index?
Based on our experiments with a variety of different data sets, columnstore indexes are about 4X to 15X
smaller than an uncompressed heap or clustered B-tree index, depending on the data.
Do columnstore indexes work on partitioned tables?
Yes, you can create a columnstore index on a partitioned table. The columnstore index must be
partition-aligned with the base table. If you do not specify a partition scheme when you create the
columnstore index, the index will be automatically created using the same partition scheme as the base
table. You can switch a partition in and out of a partitioned table with the same requirements regarding
matching indexes as exist for other types of clustered and nonclustered indexes.
Can I partition a columnstore index?
Yes, you can partition a columnstore index, but the base table must also be partitioned and the
columnstore index must be partition-aligned with the base table.
How do I add to, or modify, the data in a table with a columnstore index?
Once you create a columnstore index on a table, you cannot directly modify the data in that table. A
query with INSERT, UPDATE, DELETE, or MERGE will fail and return an error message. To add or modify
the data in the table, you can do one of the following:

Disable or drop the columnstore index. You can then update the data in the table. If you disable
the columnstore index, you can rebuild the columnstore index when you finish updating the
data. For example:
ALTER INDEX mycolumnstoreindex ON mytable DISABLE;
-- update the data -ALTER INDEX mycolumnstoreindex ON mytable REBUILD;
Now the columnstore index is ready to use again.


Load data into a staging table that does not have a columnstore index. Build a columnstore
index on the staging table. Switch the staging table into an empty partition of the main table.
Switch a partition from the table with the columnstore index into an empty staging table. If
there is a columnstore index on the staging table, disable the columnstore index. Perform any
updates. Build (or rebuild) the columnstore index. Switch the staging table back into the (now
empty) partition of the main table.
See also the question about trickle load.
What happens if I try to update a table that has a columnstore index?
The update will fail and return an error message.
Can I disable and rebuild the index on a single partition?
No. You can only disable or rebuild a columnstore index on the entire table. If you want to rebuild only
one partition, you should switch the partition into an empty staging table, disable/rebuild the index on
the staging table, and switch the staging table back into the main table. There is no need to rebuild the
index except when you want to modify the data in the table.
How can I tell whether there is a columnstore index on my table?
There are two ways to determine whether a columnstore exists on a table. In Management Studio, you
can look at the Object Explorer. Each table has an entry for Indexes. Columnstore indexes are included in
the list of indexes and have their own icon and description. You can also look at various catalog tables. In
sys.indexes, a columnstore index has type = 6 and type_desc = “NONCLUSTERED COLUMNSTORE.” A
new catalog table, sys.column_store_index_stats, has one row for each columnstore index.
How can I find out more about my columnstore indexes? Is there metadata?
There are two new catalog tables with data about columnstore indexes:


sys.column_store_segments
sys.column_store_dictionaries
VIEW DEFINITIONS permission on a table is required to see information in the catalog tables about a
columnstore index on that table. In addition, a user must have SELECT permission on the table to see
data in the following columns:
sys.column_store_segments:
has_nulls, base_id, magnitude, min_data_id, max_data_id, null_value, data_ptr
sys.column_store_dictionaries:
last_id, entry_count, data_ptr
A user who does not have SELECT permission on a table will see NULL as the value in the columns listed
above.
Does the columnstore compression algorithm compress each partition separately?
Yes, each partition is compressed separately. Each partition has its own dictionaries. All segments within
a partition share dictionaries. Dictionaries for different partitions are independent. This allows partition
switching to be a metadata-only operation.
How big are my columnstore indexes?
You can use the new catalog tables or sys.dm_db_partition_stats to determine how big the columnstore
indexes are on disk. A relatively simple query to get the size of one columnstore index is:
SELECT SUM(s.used_page_count) / 128.0 on_disk_size_MB
FROM sys.indexes AS i
JOIN sys.dm_db_partition_stats AS S
ON i.object_id = S.object_id
and I.index_id = S.index_id
WHERE i.object_id = object_id('<tablename>')
AND i.type_desc = 'NONCLUSTERED COLUMNSTORE'
Here are some other queries that total up column store component sizes.
-- total size
with total_segment_size as (
SELECT
SUM (css.on_disk_size)/1024/1024 AS segment_size_mb
FROM sys.partitions AS p
JOIN sys.column_store_segments AS css
ON p.hobt_id = css.hobt_id
)
,
total_dictionary_size as (
SELECT SUM (csd.on_disk_size)/1024/1024 AS dictionary_size_mb
FROM sys.partitions AS p
JOIN sys.column_store_dictionaries AS csd
ON p.hobt_id = csd.hobt_id
)
select
segment_size_mb,
dictionary_size_mb,
segment_size_mb + isnull(dictionary_size_mb, 0) as total_size_mb
from total_segment_size
left outer join total_dictionary_size
on 1 = 1
go
-- size per index
with segment_size_by_index AS (
SELECT
p.object_id as table_id,
p.index_id as index_id,
SUM (css.on_disk_size)/1024/1024 AS segment_size_mb
FROM sys.partitions AS p
JOIN sys.column_store_segments AS css
ON p.hobt_id = css.hobt_id
group by p.object_id, p.index_id
) ,
dictionary_size_by_index AS (
SELECT
p.object_id as table_id,
p.index_id as index_id,
SUM (csd.on_disk_size)/1024/1024 AS dictionary_size_mb
FROM sys.partitions AS p
JOIN sys.column_store_dictionaries AS csd
ON p.hobt_id = csd.hobt_id
group by p.object_id, p.index_id
)
select
object_name(s.table_id) table_name,
i.name as index_name,
s.segment_size_mb,
d.dictionary_size_mb,
s.segment_size_mb + isnull(d.dictionary_size_mb, 0) as total_size_mb
from segment_size_by_index s
JOIN sys.indexes AS i
ON i.object_id = s.table_id
and i.index_id = s.index_id
left outer join dictionary_size_by_index d
on s.table_id = s.table_id
and s.index_id = d.index_id
order by total_size_mb desc
go
-- size per table
with segment_size_by_table AS (
SELECT
p.object_id as table_id,
SUM (css.on_disk_size)/1024/1024 AS segment_size_mb
FROM sys.partitions AS p
JOIN sys.column_store_segments AS css
ON p.hobt_id = css.hobt_id
group by p.object_id
) ,
dictionary_size_by_table AS (
SELECT
p.object_id AS table_id,
SUM (csd.on_disk_size)/1024/1024 AS dictionary_size_mb
FROM sys.partitions AS p
JOIN sys.column_store_dictionaries AS csd
ON p.hobt_id = csd.hobt_id
group by p.object_id
)
select
t.name AS table_name,
s.segment_size_mb,
d.dictionary_size_mb,
s.segment_size_mb + isnull(d.dictionary_size_mb, 0) as total_size_mb
from dictionary_size_by_table d
JOIN sys.tables AS t
ON t.object_id = d.table_id
left outer join segment_size_by_table s
on d.table_id = s.table_id
order by total_size_mb desc
go
-- size per column
with segment_size_by_column as (
SELECT
p.object_id as table_id,
css.column_id,
SUM (css.on_disk_size)/1024/1024.0 AS segment_size_mb
FROM sys.partitions AS p
JOIN sys.column_store_segments AS css
ON p.hobt_id = css.hobt_id
GROUP BY p.object_id, css.column_id
),
dictionary_size_by_column as (
SELECT
p.object_id as table_id,
csd.column_id,
SUM (csd.on_disk_size)/1024/1024.0 AS dictionary_size_mb
FROM sys.partitions AS p
JOIN sys.column_store_dictionaries AS csd
ON p.hobt_id = csd.hobt_id
GROUP BY p.object_id, csd.column_id
)
-- It may be that not all the columns in a table will be or can be included
-- in a nonclustered columnstore index,
-- so we need to join to the sys.index_columns to get the correct column id.
Select Object_Name(s.table_id) as table_name, C.column_id,
col_name(S.table_id, C.column_id) as column_name, s.segment_size_mb,
d.dictionary_size_mb, s.segment_size_mb + isnull(d.dictionary_size_mb, 0) total_size_mb
from segment_size_by_column s
join
sys.indexes I -- Join to Indexes system table
ON I.object_id = s.table_id
join
sys.index_columns c --Join to Index columns
ON c.object_id = s.table_id
And I.index_id = C.index_id
and c.index_column_Id = s.column_id --Need to join to the index_column_id with the
column_id
left outer join
dictionary_size_by_column d
on s.table_id = d.table_id
and s.column_id = d.column_id
Where I.type_desc = 'NONCLUSTERED COLUMNSTORE'
order by total_size_mb desc
go
Why is a columnstore index built from a heap larger than a columnstore index built on the same
data from a clustered B-tree?
The columnstore index has to store an extra bookmark column (containing the record id, or rid, for the
row) when the base table is a heap. The bookmark is 8 bytes long and unique. Hence, if you have 1
million rows, that's an extra 8MB to store, since the columnstore index cannot compress distinct values.
So, please keep that in mind when you build a columnstore index directly on top of a heap. If
compression is a high priority, consider building a clustered index before you build a nonclustered
columnstore index.
Are there statistics for columnstore indexes?
The query optimizer uses table statistics to help choose query plans. Tables with a columnstore index
can have statistics. The statistics are gathered from the underlying B-tree or heap on the table with the
columnstore, not from the columnstore itself. No statistics are created as a byproduct of creating a
columnstore index. This is different from creation of a B-tree, where statistics are created for the B-tree
key. See here for additional information about statistics and columnstore indexes.
Is there a best practice about putting columnstore indexes on filegroups?
For columnstore indexes in large data warehouses, we recommend you use the same best practices for
file group management as for clustered indexes for large fact tables described in the Fast Track 3.0
guidelines here:http://msdn.microsoft.com/en-us/library/gg605238.aspx . As the Fast Track guidelines
evolve, we expect to provide explicit guidance for filegroup placement of columnstore indexes.
Can columnstore indexes be used with FILESTREAM?
Yes. Although a FILESTREAM column can't be included in a columnstore index, other columns of the
table can.
I am running out of space in my PRIMARY file group with columnstores. How can I avoid this?
Metadata for each row group is kept in the primary file group in a set of internal system tables, even if
your tables are kept in other file groups. Every time a new row group is created, a little more space is
used in the primary file group. A row group typically contains about one million rows, although smaller
row groups can be created under certain conditions.
Each row in the column segment system table is 96 bytes. Total space for a rowgroup = Number of
columns * 96 bytes.
Each row in the dictionary system table is 64 bytes. Total space per rowgroup = Number of dictionaries
(primary + secondary) in the HoBt * 64.
Query sys.column_store_dictionaries and sys.column_store_segments to see how much row group
metadata you have.
Make sure to provide enough space in your primary file group to accommodate this metadata. For
example, a 300 column table could use close to 50,000 bytes per row group. If this table has ten billion
rows it will have about ten thousand row groups. This could take up to 500MB for the row group
metadata in the primary file group. Provision plenty of space in advance for the primary file group, or
leave autogrow on and provide enough raw disk space to accommodate the growth.
7. Batch Mode Processing
What is batch mode processing?
Batch mode processing uses a new iterator model for processing data a-batch-at-a-time instead of arow-at-a-time. A batch typically represents about 1000 rows of data. Each column within a batch is
stored as a vector in a separate area of memory, so batch mode processing is vector-based. Batch mode
processing also uses algorithms that are optimized for the multicore CPUs and increased memory
throughput that are found on modern hardware. Batch mode processing spreads metadata access costs
and other types of overhead over all the rows in a batch, rather than paying the cost for each
row. Batch mode processing operates on compressed data when possible and eliminates some of the
exchange operators used by row mode processing. The result is better parallelism and faster
performance.
How do I know whether batch mode processing is being used for my query?
Batch mode processing is only available for certain operators. Most queries that use batch mode
processing will have part of the query plan executed in row mode and part in batch mode. You can tell
whether batch mode processing is being used for an operator by looking at showplan. If you look at the
properties for a scan or other operator in the Actual Execution Plan, you will see two new properties:
EstimatedExecutionMode and ActualExecutionMode. Only EstimatedExecutionMode is displayed in the
Estimated Execution Plan. The values for these two properties can be eitherrow or batch. There is also a
new operator for hash joins when they are being executed in batch mode.
TheBatchHashTableBuild operator appears in graphical showplan and has a new icon.
Can EstimatedExecutionMode and ActualExecutionMode be different? When and why?
The query optimizer chooses whether to use batch mode processing when it formulates the query plan.
Most of the time, EstimatedExecutionMode and ActualExecutionMode will have the same value,
either batch or row. At run time, two things can cause a query plan to be executed in row mode instead
of batch mode: not enough memory or not enough threads. The most common reason for the
ActualExecutionMode to be row when the EstimatedExecutionMode was batch is that there was a large
hash join and all the hash tables could not fit in memory. Batch mode processing uses special in-memory
hash tables. If the hash tables do not fit in memory, execution of the query reverts to using row mode
and traditional hash tables that can spill to disk. The other reason for changing to row mode is when not
enough threads are available for parallel execution. Serial execution always occurs in row mode. You can
tell that a fall back to serial execution occurred if the estimated query plan shows parallel execution but
the actual query plan is executed serially.
If the query executes in parallel but falls back to row mode processing, you can infer that memory was
the problem. There is also an xevent (batch_hash_table_build_bailout) that is fired when there is not
enough memory during hash join and the query falls back to row mode processing. If this happens,
incorrect cardinality estimation may have contributed to the problem. Check the cardinality estimation
and consider updating statistics on the table.
Is a parallel query plan required to get batch mode processing?
Yes. Batch mode processing occurs only for parallel query execution. If the cost of the query plan is
small, the optimizer may choose a serial plan that is "good enough." When experimenting with
columnstore indexes you may need a large data set to see the effects of batch mode processing. Check
the degree of parallelism if you see that a query was executed in row mode when you expected batch
mode.
Can I get batch mode processing even if I don’t have a columnstore index?
No. Batch mode processing only occurs when a columnstore index is being used in the query.
What query execution plan operators are supported in batch mode in Denali?
Filter
Project
Scan
Local hash (partial) aggregation
Hash inner join
(Batch) hash table build
What about the parallelism operators in batch mode hash joins? Why are they always in row
mode?
Some of the parallelism operators in query plans for batch mode hash joins are not needed in batch
mode. Although the operator appears in the query plan, the number of rows for the operator is zero
and the query does not incur the cost of redistributing rows among different threads. The operator
remains in the query plan because, if the hash join must spill to disk (if all the hash tables do not fit into
the memory allotted for the query), the query reverts to row mode when it spills to disk. The parallelism
operators are required for executing the query in row mode. If the hash join spills to disk you will see
the warning "Operator used tempdb to spill data during execution." If you look at the properties for the
parallelism operators (Repartition Streams), you will see that the actual number of rows is greater than
zero if the hash join has spilled.
SQL Server Columnstore Performance Tuning
Introduction
SQL Server columnstore indexes are new in the SQL Server 2012 release. They are designed to improve
query performance for data warehouses and data marts. This page describes query performance tuning
for columnstores.
Fundamentals of Columnstore Index-Based Performance
Columnstore indexes can speed up some queries by a factor of 10X to 100X on the same hardware
depending on the query and data. These key things make columnstore-based query processing so fast:




The columnstore index itself stores data in highly compressed format, with each column
kept in a separate group of pages. This reduces I/O a lot for most data warehouse queries
because many data warehouse fact tables contain 30 or more columns, while a typical query
might touch only 5 or 6 columns. Only the columns touched by the query must be read from
disk. Only the more frequently accessed columns have to take up space in main memory. The
clustered B-tree or heap containing the primary copy of the data is normally used only to build
the columnstore, and will typically not be accessed for the large majority of query processing.
It'll be paged out of memory and won't take main memory resources during normal periods of
query processing.
There is a highly efficient, vector-based query execution method called "batch processing"
that works with the columnstore index. A "batch" is an object that contains about 1000 rows.
Each column within the batch is represented internally as a vector. Batch processing can reduce
CPU consumption 7X to 40X compared to the older, row-based query execution
methods. Efficient vector-based algorithms allow this by dramatically reducing the CPU
overhead of basic filter, expression evaluation, projection, and join operations.
Segment elimination can skip large chunks of data to speed up scans. Each partition in a
columnstore indexes is broken into one million row chunks called segments. Each segment has
metadata that stores the minimum and maximum value of each column for the segment. The
storage engine checks filter conditions against the metadata. If it can detect that no rows will
qualify then it skips the entire segment without even reading it from disk.
The storage engine pushes filters down into the scans of data. This eliminates data early
during query execution, improving query response time.
The columnstore index and batch query execution mode are deeply integrated into SQL Server. A
particular query can be processed entirely in batch mode, entirely in the standard row mode, or with a
combination of batch and row-based processing. The key to getting the best performance is to make
sure your queries process the large majority of data in batch mode. Even if the bulk of your query
can't be executed in batch mode, you can still get significant performance benefits from columnstore
indexes through reduced I/O, and through pushing down of predicates to the storage engine.
To tell if the main part of your query is running in batch mode, look at the graphical showplan, hover the
mouse pointer over the most expensive scan operator (usually a scan of a large fact table) and check the
tooltip. It will say whether the estimated and actual execution mode was Row or Batch. See here for an
example.
DOs and DON'Ts for using Columnstores Effectively
Obeying the following do's and don'ts will help you get the most out of columnstores for your decision
support workload.
DOs




Put columnstore indexes on large tables only. Typically, you will put them on your fact tables
in your data warehouse, but not the dimension tables. If you have a large dimension table,
containing more than a few million rows, then you may want to put a columnstore index on it as
well.
Include every column of the table in the columnstore index. If you don't, then a query that
references a column not included in the index will not benefit from the columnstores index
much or at all.
Structure your queries as star joins with grouping and aggregation as much as possible.
Avoid joining pairs of large tables. Join a single large fact table to one or more smaller
dimensions using standard inner joins. Use a dimensional modeling approach for your data as
much as possible to allow you to structure your queries this way.
Use best practices for statistics management and query design. This is independent of
columnstore technology. Use good statistics and avoid query design pitfalls to get the best
performance. See the white paper on SQL Server statistics for guidance. In particular, see the
section "Best Practices for Managing Statistics."
DON'Ts
(Note: we are already working to improve the implementation to eliminate limitations associated with
these "don'ts" and we anticipate fixing them sometime after the SQL Server 2012 release. We're not
ready to announce a timetable yet.) Later, we'll describe how to work around the limitations.


Avoid joins and string filters directly on columns of columnstore-indexed tables. String
filters don't get pushed down into scans on columnstore indexes, and join processing on strings
is less efficient than on integers. Filters on number and date types are pushed down. Consider
using integer codes (or surrogate keys) instead of strings in columnstore indexed fact tables. You
can move the string values to a dimension table. Joins on the integer columns normally will be
processed very efficiently.
Avoid use of OUTER JOIN on columnstore-indexed tables. Outer joins don't benefit from
batch processing. Instead, SQL Server 2012 reverts to row-at-a-time processing.


Avoid use of NOT IN on columnstore-indexed tables. NOT IN (<subquery>) (which internally
uses an operator called "anti-semi-join") can prevent batch processing and cause the system to
revert to row mode. NOT IN (<list of constants>) typically works fine though.
Avoid use of UNION ALL to directly combine columnstore-indexed tables with other
tables. Batch processing doesn't get pushed down over UNION ALL. So, for example, creating a
view vFact that does a UNION ALL of two tables, one with a columnstore indexes and one
without, and then querying vFact in a star join query, will not use batch processing.
Maximizing Performance and Working Around Columnstore Limitations
Follow the links to the topics listed below about how to maximize performance with columnstores
indexes, and work around their functional and performance limitations in SQL Server 2012.
Ensuring Use of the Fast Batch Mode of Query Execution








Parallelism (DOP >= 2) is Required to Get Batch Processing
Use Outer Join and Still Get the Benefit of Batch Processing
Work Around Inability to get Batch Processing with IN and EXISTS
Perform NOT IN and Still Get the Benefit of Batch Processing
Perform UNION ALL and Still Get the Benefit of Batch Processing
Perform Scalar Aggregates and Still get the Benefit of Batch Processing
Maintaining Batch Processing with Multiple Aggregates Including one or More DISTINCT
Aggregates
Using HASH JOIN hint to avoid nested loop join and force batch processing
Physical Database Design, Loading, and Index Management






Adding Data Using a Drop-and-Rebuild Approach
Adding Data Using Partition Switching
Trickle Loading with Columnstore Indexes
Avoid Using Nonclustered B-tree Indexes
Changing Your Application to Eliminate Unsupported Data Types
Achieving Fast Parallel Columnstore Index Builds
Maximizing the Benefits of Segment Elimination




Understanding Segment Elimination
Verifying Columnstore Segment Elimination
Ensuring Your Data is Sorted or Nearly Sorted by Date to Benefit from Date Range Elimination
Multi-Dimensional Clustering to Maximize the Benefit of Segment Elimination
Additional Tuning Considerations

Work Around Performance Issues for Columnstores Related to Strings



Force Use or Non-Use of a Columnstore Index
Workarounds for Predicates that Don't Get Pushed Down to Columnstore Scan (Including OR)
Using Statistics with Columnstore Indexes
T-SQL: Simplified CASE expression
Introduction
SQL Server 2012 introduces these two new functions which simplify CASE expression:


Conditional function ( IIF)
Selection function ( CHOOSE )
We also have been working with COALESCE, an old simplified CASE expression statement as a NULLrelated statement since early versions. Although ISNULL is a function, which logically simplifies a CASE
expression, but it never translates to a CASE expression behind the scene (by execution plan). By the way,
we will also cover ISNULL in this article, as it is an alternative to COALESCE. The goal of this article is
providing in depth tutorial about these statements:
1.
2.
3.
4.
ISNULL
COALESCE
IIF
CHOOSE
I prefer using the term “statement” because although they do similar job, but they are not in same
category by their purpose. For example, ISNULL is a function while COALESCE is an expression.
As we will see later, the main purpose of introducing these statements is improving code readability and
achieving cleaner code. Using these statements may result to poor performance in some situations.
Therefore, we also will discuss alternative solutions.
This article targets all levels of readers: from newbies to advanced. So, if you are familiar with these
statements, you may prefer skipping Definition section.
Definition
ISNULL
ISNULL(expr_1, expr_2)
If expr_1 is null, then ISNULL function returns expr_2, otherwise returns expr_1. Following example shows its
functionality.
DECLARE @expr_1 NVARCHAR(10) ,
@expr_2 NVARCHAR(10) ;
SET @expr_1 = NULL ;
SET @expr_2 = N'Saeid' ;
SELECT @expr_1 AS expr_1,
@expr_2 AS expr_2,
ISNULL(@expr_1, @expr_2) AS [ISNULL Result]
Output:
When the data types of two arguments are different, if they are implicitly convertible, SQL Server converts
one to the other, otherwise returns an error. Executing follow code results an error as illustrated in output
figure.
DECLARE @Val_1 INT ,
@Val_2 NVARCHAR(10) ;
SET @Val_1 = NULL ;
SET @Val_2 = 'Saeid' ;
SELECT @Val_1 AS [Value 1],
@Val_2 AS [Value 2],
ISNULL(@Val_1, @Val_2) AS [ISNULL Result]
Output:
Changing value of variable @Val_2 to ‘500’, we do not encounter any error. Because this value is
convertible to numeric data type INT. Following code shows this:
DECLARE @Val_1 INT ,
@Val_2 NVARCHAR(10) ;
SET @Val_1 = NULL ;
SET @Val_2 = '500' ;
SELECT @Val_1 AS [Value 1],
@Val_2 AS [Value 2],
ISNULL(@Val_1, @Val_2) AS [ISNULL Result]
Implicit conversion may lead to data truncation. This will happen if the length of expr_1 data type is
shorter thanlength of expr_2 data type. So it is better to convert explicitly if needed. In the next example
first output column suffers from value truncation while second will not.
DECLARE @Val_1 NVARCHAR(2) ,
@Val_2 NVARCHAR(10) ;
SET @Val_1 = NULL ;
SET @Val_2 = 'Saeid' ;
SELECT ISNULL(@Val_1, @Val_2) AS [ISNULL Result],
ISNULL(CONVERT(NVARCHAR(10), @Val_1),
@Val_2) AS [ISNULL Result with explicit convert]
Determine output data type
There are few rules to determine output column's data type generated via ISNULL. The next code
illustrates these rules:
IF OBJECT_ID('dbo.TestISNULL', 'U') IS NOT NULL
DROP TABLE dbo.TestISNULL ;
DECLARE @Val_1 NVARCHAR(200) ,
@Val_2 DATETIME ;
SET @Val_1 = NULL ;
SET @Val_2 = GETDATE() ;
SELECT ISNULL('Saeid', @Val_2) AS Col1,
ISNULL(@Val_1, @Val_2) AS Col2,
ISNULL(NULL, @Val_2) AS Col3,
ISNULL(NULL, NULL) AS Col4
INTO dbo.TestISNULL
WHERE 1 = 0 ;
GO
SELECT COLUMN_NAME ,
DATA_TYPE ,
CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = N'dbo'
AND TABLE_NAME = N'TestISNULL' ;
Output:
Determine output NULL-ability
Follow code illustrates the rules to determine output column data type generated via ISNULL:
IF OBJECT_ID('dbo.TestISNULL', 'U') IS NOT NULL
DROP TABLE dbo.TestISNULL ;
DECLARE @Val_1 NVARCHAR(200) ,
@Val_2 DATETIME ;
SET @Val_1 = NULL ;
SET @Val_2 = GETDATE() ;
SELECT ISNULL('Saeid', @Val_2) AS Col1,
ISNULL(@Val_1, @Val_2) AS Col2
INTO dbo.TestISNULL
WHERE 1 = 0 ;
GO
SELECT COLUMN_NAME ,
IS_NULLABLE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = N'dbo'
AND TABLE_NAME = N'TestISNULL' ;
Output
COALESCE
COALESCE(expr_1, expr_2, ..., expr_n)
,(for n >=2)
COALESCE returns the first NOT NULL expression in the expression list. It needs at least two expressions.
Dissimilar from ISNULL function, COALESCE is not a function, rather it’s an expression. COALESCE always
translates to CASE expression. For example,
COALESCE (expr_1, expr_2)
is equivalent to:
CASE
WHEN (expr_1 IS NOT NULL) THEN (expr_1)
ELSE (expr_2)
END
Therefore the database engine handles it like handling a CASE expression. So this is inside our simplified
CASE expression list.
Following code is one of many samples that could illustrate different execution plans for COALESCE and
ISNULL:
USE AdventureWorks2012 ;
GO
SELECT *
FROM Sales.SalesOrderDetail
WHERE ISNULL(ProductID, SpecialOfferID) = 3 ;
SELECT *
FROM Sales.SalesOrderDetail
WHERE coalesce(ProductID, SpecialOfferID) = 3 ;
Pic
By using COALESCE, we do not have the limitations that discussed about ISNULL function, neither about
output column data type nor output column NULL-ability. Even there is no more suffering from value
truncation. The next example is the new revision of the ISNULL section examples, but replacing with
COALESCE:
-- value truncation
DECLARE @Val_1 NVARCHAR(2) ,
@Val_2 NVARCHAR(10) ;
SET @Val_1 = NULL ;
SET @Val_2 = 'Saeid' ;
SELECT ISNULL(@Val_1, @Val_2) AS [ISNULL Result],
ISNULL(CONVERT(NVARCHAR(10), @Val_1),
@Val_2) AS [ISNULL Result with explicit convert],
COALESCE(@Val_1, @Val_2) AS [COALESCE Result]
GO
----------------------------------------------------------- output data type
IF OBJECT_ID('dbo.TestISNULL', 'U') IS NOT NULL
DROP TABLE dbo.TestISNULL ;
DECLARE @Val_1 NVARCHAR(200) ,
@Val_2 DATETIME ;
SET @Val_1 = NULL ;
SET @Val_2 = GETDATE() ;
SELECT COALESCE('Saeid', @Val_2) AS Col1,
COALESCE(@Val_1, @Val_2) AS Col2,
COALESCE(NULL, @Val_2) AS Col3
INTO dbo.TestISNULL
WHERE 1 = 0 ;
GO
SELECT COLUMN_NAME ,
DATA_TYPE ,
CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = N'dbo'
AND TABLE_NAME = N'TestISNULL' ;
GO
----------------------------------------------------------- NULL-ability
IF OBJECT_ID('dbo.TestISNULL', 'U') IS NOT NULL
DROP TABLE dbo.TestISNULL ;
DECLARE @Val_1 NVARCHAR(200) ,
@Val_2 DATETIME ;
SET @Val_1 = NULL ;
SET @Val_2 = GETDATE() ;
SELECT COALESCE('Saeid', @Val_2) AS Col1,
COALESCE(@Val_1, @Val_2) AS Col2
INTO dbo.TestISNULL
WHERE 1 = 0 ;
GO
SELECT COLUMN_NAME ,
IS_NULLABLE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = N'dbo'
AND TABLE_NAME = N'TestISNULL' ;
GO
Output
IIF
IIF( condition , x, y)
IIF is a logical function which was introduced in SQL Server 2012. It is like conditional operator in C-Sharp
language. When condition is true, x evaluated, else y evaluated. Following example illustrates this function
usage.
DECLARE @x NVARCHAR(10) ,
@y NVARCHAR(10) ;
SET @x = N'True' ;
SET @y = N'False' ;
SELECT IIF( 1 = 0, @x, @y) AS [IIF Result]
Like COALESCE expression, IIF function always translates to CASE expression. For instance,
IIF ( condition, true_value, false_value )
is equivalent to:
CASE
WHEN (condition is true) THEN (true_value)
ELSE (false_value)
END
This example shows that this translation.
USE AdventureWorks2012 ;
GO
SELECT *
FROM Sales.SalesOrderDetail
WHERE IIF ( OrderQty >= SpecialOfferID , OrderQty, SpecialOfferID ) = 1
Pic 010
CHOOSE
CHOOSE(index, val_1, val_2, ..., val_n)
,(for n >=1)
CHOOSE is a selection function which was introduced in SQL Server 2012. It’s like switch operator in CSharp language. If index (must be convertible to data type INT) is NULL or its value is not found, the output
will be NULL. This function needs at least two arguments, one for index and other for value. Following
code illustrates this function usage.
DECLARE @index INT ;
SET @index = 2 ;
SELECT CHOOSE(@index, 'Black', 'White', 'Green')
Like COALESCE expression and IIF function, CHOOSE also always translates to CASE expression. For
example,
CHOOSE ( index, val_1, val_2 )
is equivalent to:
CASE
WHEN (index = 1) THEN val_1
WHEN (index = 2) THEN val_2
ELSE NULL
END
This simple code shows that this translation.
USE AdventureWorks2012 ;
GO
SELECT *
FROM Sales.SalesOrderDetail
WHERE CHOOSE(OrderQty, 'Black', 'White', 'Green') = 'White'
Pic 012
Performance
Although the main purpose of simplified CASE expression statements is increasing readability and having
cleaner codes, but one important question is how these statements impact on the database performance.
Is there any performance difference between CASE expression and these statements? By the way, to
achieve best performance it’s usually better to find alternative solutions and avoid using CASE and these
statements.
Dynamic filtering
This is common to write reports which accept input parameters. To achieve better performance it’s a good
practice to write their code within stored procedures, because procedures store the way of their executing
as an execution plan and reuse it again. By the way, there are some popular solutions to write this type of
procedures.
IS NULL and OR
This is the most common solution. Let me start with an example and rewrite it with comparable solutions:
USE AdventureWorks2012;
GO
IF OBJECT_ID('Sales.SalesOrderDetailSearch', 'P') IS NOT NULL
DROP PROC Sales.SalesOrderDetailSearch ;
GO
CREATE PROC Sales.SalesOrderDetailSearch
@ModifiedDate AS DATETIME = NULL ,
@ShipDate AS DATETIME = NULL ,
@StoreID AS INT = NULL
AS
SELECT b.ShipDate ,
c.StoreID ,
a.UnitPriceDiscount ,
b.RevisionNumber ,
b.DueDate ,
b.ShipDate ,
b.PurchaseOrderNumber ,
b.TaxAmt ,
c.PersonID ,
c.AccountNumber ,
c.StoreID
FROM Sales.SalesOrderDetail a
RIGHT OUTER JOIN Sales.SalesOrderHeader b ON a.SalesOrderID = b.SalesOrderID
LEFT OUTER JOIN Sales.Customer c ON b.CustomerID = c.CustomerID
WHERE (a.ModifiedDate = @ModifiedDate OR @ModifiedDate IS NULL)
AND (b.ShipDate = @ShipDate OR @ShipDate IS NULL)
AND (c.StoreID = @StoreID OR @StoreID IS NULL)
GO
------------------------------------------------ now execute it with sample values
EXEC Sales.SalesOrderDetailSearch @ModifiedDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @ShipDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @StoreID = 602
Execution statistics:
The main problem here, as illustrated in above figure, is using same execution plan for all the three
situations. It’s obvious that the third one suffers from an inefficient execution plan.
CASE
We can change the combination of IS NULL and OR and translate it using CASE. Now we rewrite above
code like this one:
USE AdventureWorks2012;
GO
IF OBJECT_ID('Sales.SalesOrderDetailSearch', 'P') IS NOT NULL
DROP PROC Sales.SalesOrderDetailSearch ;
GO
CREATE PROC Sales.SalesOrderDetailSearch
@ModifiedDate AS DATETIME = NULL ,
@ShipDate AS DATETIME = NULL ,
@StoreID AS INT = NULL
AS
SELECT b.ShipDate ,
c.StoreID ,
a.UnitPriceDiscount ,
b.RevisionNumber ,
b.DueDate ,
b.ShipDate ,
b.PurchaseOrderNumber ,
b.TaxAmt ,
c.PersonID ,
c.AccountNumber ,
c.StoreID
FROM Sales.SalesOrderDetail a
RIGHT OUTER JOIN Sales.SalesOrderHeader b ON a.SalesOrderID = b.SalesOrderID
LEFT OUTER JOIN Sales.Customer c ON b.CustomerID = c.CustomerID
WHERE a.ModifiedDate
= CASE WHEN @ModifiedDate IS NOT NULL THEN @ModifiedDate ELSE a.ModifiedDate END
AND b.ShipDate
= CASE WHEN @ShipDate IS NOT NULL THEN @ShipDate ELSE b.ShipDate END
AND c.StoreID
= CASE WHEN @StoreID IS NOT NULL THEN @StoreID ELSE c.StoreID END
GO
------------------------------------------------ now execute it with sample values
EXEC Sales.SalesOrderDetailSearch @ModifiedDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @ShipDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @StoreID = 602
Execution
statistics:
Using CASE shows improvements to IS NULL and OR, but with more CPU cost for the first one. Also the
Reads and Actual Rows decreased in first two executions. So it’s better but still we continue our
experiment.
COALESCE
We also can change CASE and translate it to COALESCE. Now we rewrite above code like this:
USE AdventureWorks2012;
GO
IF OBJECT_ID('Sales.SalesOrderDetailSearch', 'P') IS NOT NULL
DROP PROC Sales.SalesOrderDetailSearch ;
GO
CREATE PROC Sales.SalesOrderDetailSearch
@ModifiedDate AS DATETIME = NULL ,
@ShipDate AS DATETIME = NULL ,
@StoreID AS INT = NULL
AS
SELECT b.ShipDate ,
c.StoreID ,
a.UnitPriceDiscount ,
b.RevisionNumber ,
b.DueDate ,
b.ShipDate ,
b.PurchaseOrderNumber ,
b.TaxAmt ,
c.PersonID ,
c.AccountNumber ,
c.StoreID
FROM Sales.SalesOrderDetail a
RIGHT OUTER JOIN Sales.SalesOrderHeader b ON a.SalesOrderID = b.SalesOrderID
LEFT OUTER JOIN Sales.Customer c ON b.CustomerID = c.CustomerID
WHERE a.ModifiedDate = COALESCE(@ModifiedDate, a.ModifiedDate)
AND b.ShipDate = COALESCE(@ShipDate, b.ShipDate)
AND c.StoreID = COALESCE(@StoreID, c.StoreID)
GO
------------------------------------------------ now execute it with sample values
EXEC Sales.SalesOrderDetailSearch @ModifiedDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @ShipDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @StoreID = 602
Execution statistics:
It’s obvious that because COALESCE translates to CASE internally, so there is no difference between them.
ISNULL
Now we rewrite above code and use ISNULL instead of COALESCE:
USE AdventureWorks2012;
GO
IF OBJECT_ID('Sales.SalesOrderDetailSearch', 'P') IS NOT NULL
DROP PROC Sales.SalesOrderDetailSearch ;
GO
CREATE PROC Sales.SalesOrderDetailSearch
@ModifiedDate AS DATETIME = NULL ,
@ShipDate AS DATETIME = NULL ,
@StoreID AS INT = NULL
AS
SELECT b.ShipDate ,
c.StoreID ,
a.UnitPriceDiscount ,
b.RevisionNumber ,
b.DueDate ,
b.ShipDate ,
b.PurchaseOrderNumber ,
b.TaxAmt ,
c.PersonID ,
c.AccountNumber ,
c.StoreID
FROM Sales.SalesOrderDetail a
RIGHT OUTER JOIN Sales.SalesOrderHeader b ON a.SalesOrderID = b.SalesOrderID
LEFT OUTER JOIN Sales.Customer c ON b.CustomerID = c.CustomerID
WHERE a.ModifiedDate = ISNULL(@ModifiedDate, a.ModifiedDate)
AND b.ShipDate = ISNULL(@ShipDate, b.ShipDate)
AND c.StoreID = ISNULL(@StoreID, c.StoreID)
GO
------------------------------------------------ now execute it with sample values
EXEC Sales.SalesOrderDetailSearch @ModifiedDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @ShipDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @StoreID = 602
Execution statistics:
There is no change in Duration, but with more estimated rows.
Dynamic SQL
Using above four solutions we could not achieve good performance, because we need different efficient
execution plan for each combination of input parameters. So it’s time to use an alternative solution to
overcome this problem.
USE AdventureWorks2012;
GO
IF OBJECT_ID('Sales.SalesOrderDetailSearch', 'P') IS NOT NULL
DROP PROC Sales.SalesOrderDetailSearch ;
GO
CREATE PROC Sales.SalesOrderDetailSearch
@ModifiedDate AS DATETIME = NULL ,
@ShipDate AS DATETIME = NULL ,
@StoreID AS INT = NULL
AS
DECLARE @sql NVARCHAR(MAX), @parameters NVARCHAR(4000) ;
SET @sql = '
SELECT b.ShipDate ,
c.StoreID ,
a.UnitPriceDiscount ,
b.RevisionNumber ,
b.DueDate ,
b.ShipDate ,
b.PurchaseOrderNumber ,
b.TaxAmt ,
c.PersonID ,
c.AccountNumber ,
c.StoreID
FROM
Sales.SalesOrderDetail a
RIGHT OUTER JOIN Sales.SalesOrderHeader b ON a.SalesOrderID =
b.SalesOrderID
LEFT OUTER JOIN Sales.Customer c ON b.CustomerID = c.CustomerID
WHERE
1 = 1 '
IF @ModifiedDate IS NOT NULL
SET @sql = @sql + ' AND a.ModifiedDate = @xModifiedDate '
IF @ShipDate IS NOT NULL
SET @sql = @sql + ' AND OrderQty = @xShipDate '
IF @StoreID IS NOT NULL
SET @sql = @sql + ' AND ProductID = @xStoreID '
SET @parameters =
'@xModifiedDate AS DATETIME ,
@xShipDate AS DATETIME ,
@xStoreID AS INT' ;
EXEC sp_executesql @sql, @parameters,
@ModifiedDate, @ShipDate, @StoreID ;
GO
------------------------------------------------ now execute it with sample values
EXEC Sales.SalesOrderDetailSearch @ModifiedDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @ShipDate = '2008-04-30 00:00:00.000'
EXEC Sales.SalesOrderDetailSearch @StoreID = 602
Execution statistics:
There is no doubt that this solution is the best one! Here is the comparison chart. (lower is better)
You can find more information about last solution in Erland Sommarskog website
.
Concatenate values in one column
This is another common problem that fits our discussion. In this example we just cover COALESCE and
ISNULL solutions and at last we will see an alternative solution which performs better than using the CASE
solutions.
COALESCE
Next code concatenates the values of column “ProductID” and delimits each with comma separator.
USE AdventureWorks2012
GO
DECLARE @sql NVARCHAR(MAX);
SELECT @sql = COALESCE(@sql + ', ', '') + CONVERT(NVARCHAR(100), ProductID)
FROM Sales.SalesOrderDetail
WHERE SalesOrderID < 53000
Execution statistics:
This code executed in 13 seconds in our test system.
ISNULL
Now we rewrite above code and use ISNULL instead of COALESCE:
USE AdventureWorks2012
GO
DECLARE @sql NVARCHAR(MAX);
SELECT @sql = ISNULL(@sql + ', ', '') + CONVERT(NVARCHAR(100), ProductID)
FROM Sales.SalesOrderDetail
WHERE SalesOrderID < 53000
Execution statistics:
The duration decreased to 3 seconds.
XML
It’s time to use alternative solution to overcome this problem.
USE AdventureWorks2012
GO
DECLARE @sql NVARCHAR(MAX);
SELECT @sql = ( SELECT STUFF(( SELECT ',' + CONVERT(NVARCHAR(100),
ProductID) AS [text()]
FROM Sales.SalesOrderDetail
WHERE SalesOrderID < 53000
FOR
XML PATH('')
), 1, 1, '')
) ;
The duration decreased to 21 milliseconds. Here is the comparison chart. (lower is better)
Note that XML runs at lowest duration.
There is no doubt that this solution is the best one. But because using XML, this solution has some
limitations related to XML reserved characters like "<" or ">".
Branch program execution based on switch between possible values
This is so common to use CHOOSE function to write cleaner codes. But is it the best solution to achieve
optimal performance? In this section we will discuss about this question.
CHOOSE
Let’s start with an example that uses CHOOSE as its solution.
USE AdventureWorks2012 ;
GO
SELECT *
FROM Sales.SalesOrderDetail
WHERE CHOOSE(OrderQty, 'J', 'I', 'H', 'G', 'F', 'E', 'D', 'C', 'B', 'A')
IN ( 'J', 'Q', 'H', 'G', 'X', 'E', 'D', 'Y', 'B', 'A', NULL )
GO
Execution statistics:
This code executed in 352 milliseconds in our test system.
UDF function
Now we rewrite above code and use a Table Valued Function to produce CHOOSE list:
USE AdventureWorks2012 ;
GO
CREATE FUNCTION ufnLookup ()
RETURNS TABLE
AS
RETURN
SELECT 1 AS Indexer, 'J' AS val
UNION ALL
SELECT 2, 'I'
UNION ALL
SELECT 3, 'H'
UNION ALL
SELECT 4, 'G'
UNION ALL
SELECT 5, 'F'
UNION ALL
SELECT 6, 'E'
UNION ALL
SELECT 7, 'D'
UNION ALL
SELECT 8, 'C'
UNION ALL
SELECT 9, 'B'
UNION ALL
SELECT 10, 'A'
GO
SELECT *
FROM Sales.SalesOrderDetail a
JOIN dbo.ufnLookup() b ON a.OrderQty = b.Indexer
WHERE b.val IN ( 'J', 'Q', 'H', 'G', 'X', 'E', 'D', 'Y', 'B', 'A', NULL ) ;
Execution statistics:
The duration decreased to 195 milliseconds.
Permanent Lookup Table
It’s time to use alternative solution to overcome this problem.
USE AdventureWorks2012 ;
GO
CREATE TABLE LookupTable
( id INT PRIMARY KEY, val CHAR(1) ) ;
GO
INSERT dbo.LookupTable
( id, val )
SELECT 1 AS Indexer, 'J' AS val
UNION ALL
SELECT 2, 'I'
UNION ALL
SELECT 3, 'H'
UNION ALL
SELECT 4, 'G'
UNION ALL
SELECT 5, 'F'
UNION ALL
SELECT 6, 'E'
UNION ALL
SELECT 7, 'D'
UNION ALL
SELECT 8, 'C'
UNION ALL
SELECT 9, 'B'
UNION ALL
SELECT 10, 'A' ;
GO
SELECT *
FROM Sales.SalesOrderDetail a
JOIN dbo.LookupTable b ON a.OrderQty = b.Id
WHERE b.val IN ( 'J', 'Q', 'H', 'G', 'X', 'E', 'D', 'Y', 'B', 'A', NULL )
The duration decreased to 173 milliseconds. Next figure shows the comparison chart between these
solutions. (lower is better)
This solution is the best one. By increasing the number of values in parameter list of CHOOSE function,
the performance decreases. So by using permanent look-up table that benefits from physical index we
can achieve the best performance.
More Readability
The most important goal to use these simplified CASE statements is achieve cleaner code. Many times
we encounter this issue that code is so large that the SELECT list becomes more than hundred lines of
code. Therefore there is a significant reason to use these statements. I was faced a simple problem just
few years ago. In first sight it seems that solution should be very simple. But after writing the code using
CASE, I found that I am in trouble. The problem was so simple. Assume that a department store has two
discount plan, one based on purchases amount, and other based on the distance from customer’s home
to store. But the rule was that just one plan that is greater is applicable. Next code shows two solutions,
first by using CASE and second uses IIF.
IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL
DROP TABLE #temp ;
CREATE TABLE #temp ( CustomerId INT, Bill MONEY, Distance INT ) ;
INSERT #temp
( CustomerId, Bill, Distance )
VALUES ( 1, 30.00, 3 ),
( 2, 10.00, 8 ),
( 3, 5.00, 14 ),
( 4, 20.00, 21 ),
( 5, 25.00, 23 ),
( 6, 5.00, 27 ) ;
SELECT *
FROM #temp
-- solution using CASE
SELECT
CASE
WHEN
CASE WHEN Bill < 10.00 THEN 10 ELSE 20 END > CASE WHEN Distance <
10 THEN 7 ELSE 13 END
THEN CASE WHEN Bill < 10.00 THEN 10 ELSE 20 END
ELSE CASE WHEN Distance < 10 THEN 7 ELSE 13 END
END AS Discount
FROM #temp
--solution using IIF
SELECT
IIF( IIF( Bill < 10.00 , 10 ,20 ) > IIF( Distance < 10 , 7 , 13 )
,IIF( Bill < 10.00 , 10 ,20 ) , IIF( Distance < 10 , 7 , 13 )
) AS Discount
FROM #temp
As this code illustrates, IIF solution is more readable.
Conclusion
Using simplified CASE expression statements results to have cleaner code and speed up development
time, but they show poor performance in some situations. So if we are in performance tuning phase of
software development, it’s better to think about alternative solutions.
Structured Error Handling Mechanism in SQL Server 2012
The goal of this article is to provide a simple and easy to use error handling mechanism with minimum
complexity.
Problem definition
There are many questions in MSDN forum
SQL Server. These are such questions:






and other Internet communities about Error Handling in
Is there any structured Error Handling mechanism in SQL SERVER?
Is it a good idea to use a general procedure as a modular Error Handler routine?
What are the benefits of THROW, when we have RAISERROR?
I want to check a condition in a TRY part. How can I control the flow of execution and raise
error?
Does the CATCH part automatically rollback the statements within the TRY part?
Can someone use TRANSACTION in the TRY/CATCH block?
Introduction
There are many articles written by the best experts in this context and there are complete references
about Error Handling in SQL Server. The goal of this article is to provide a simple and easy to use error
handling mechanism with minimum complexity. Therefore I will try to address this topic from a problem
solving approach and particularly in SQL Server 2012 version. So the road map of this article is to cover
the above questions as well as providing a step by step tutorial to design a structured mechanism for error
handling in SQL Server 2012 procedures.
Solution
Is there any structured Error Handling mechanism in SQL Server?
Yes, there is. The TRY/CATCH construct is the structured mechanism for error handling in SQL Server 2005
and later. This construct has two parts; we can try executing some statements in TRY block and handling
errors in the CATCH block if they occur. Therefore, the simplest error handling structure can be like this:

TRY

CATCH
o
o
Try executing statements
Handle the errors if they occur
Here is a sample code to provide the above structure in the simplest form:
SET NOCOUNT ON;
BEGIN TRY
SELECT 1 / 0;
END TRY
BEGIN CATCH
PRINT 'Error occurs!'
END CATCH
-- Start to try executing statements
/* Executing statements */
-- End of trying to execute statements
-- Start to Handle the error if occurs
/* Handle the error */
-- End of Handling the error if occurred
--result
Will all statements in TRY block try to execute?
When executing statements in the TRY block, if an error occurs the flow of execution will transfer to
the CATCH block. So the answer is NO!
We can see this behavior with an example. As you can see after executing the following code, the
statement no. 3 does not try executing, because the flow of execution will transfer to the CATCH block as
soon as statement no. 2 raises an error.
SET NOCOUNT ON;
BEGIN TRY
-- Start to try executing statements
PRINT 'Before Error!'
-- Statement no1
SELECT 1 / 0;
-- Statement no2
PRINT 'After Error!'
-- Statement no3
END TRY
BEGIN CATCH
PRINT 'Error occurs!'
END CATCH
-- End of trying to execute statements
-- Start to Handle the error if occurs
/* Handle the error */
-- End of Handling the error if occurred
--result
Does the CATCH part automatically handle the errors?
No. The role of the TRY/CATCH construct is just providing a mechanism to try executing SQL statements.
Therefore, we need to use another construct or statements to handle the errors in the CATCH block that
I explain later. For instance, the following code will try to execute a divide by zero statement. It does not
automatically handle any errors. In fact, in this sample code, when an error occurs the flow control
immediately transfers to the CATCH block, but in the CATCH block we do not have any statement to tell
us that there was an error!
SET NOCOUNT ON;
BEGIN TRY
SELECT 1 / 0;
-- Start to try executing statements
-- Statement
END TRY
BEGIN CATCH
-- End of trying to execute statements
-- Start to Handle the error if occurs
END CATCH
-- End of Handling the error if occurred
--result
In the CATCH block we can handle the error and send the error message to the application. So we need
an element to show what error occurs. This element is RAISERROR. So the error handling structure could
be like this:

TRY

CATCH
o
o
Try executing statements
Handle the error if occurs
 RAISERROR
Here is sample code to produce the above structure:
SET NOCOUNT ON;
BEGIN TRY
SELECT 1 / 0;
END TRY
BEGIN CATCH
-- Start to try executing statements
-- Statement
-- End of trying to execute statements
-- Start to Handle the error if occurs
RAISERROR('Error!!!', 16, 1);
END CATCH
-- End of Handling the error if occurred
--result
The RAISERROR itself needs other elements to identify the error number, error message, etc. Now we
can complete the error handling structure:

TRY

CATCH
o
o
Try executing statements
Handle the error if occurs
 RAISERROR
 ERROR_NUMBER()
 ERROR_MESSAGE()
 ERROR_SEVERITY()
 ERROR_STATE()
 ERROR_PROCEDURE()
 ERROR_LINE()
Here is sample code to produce the above structure:
SET NOCOUNT ON;
BEGIN TRY
SELECT 1 / 0;
END TRY
BEGIN CATCH
-- Start to try executing statements
-- Statement
-- End of trying to execute statements
-- Start to Handle the error if occurs
DECLARE @ErrorMessage NVARCHAR(4000);
DECLARE @ErrorSeverity INT;
DECLARE @ErrorState INT;
SELECT
@ErrorMessage = ERROR_MESSAGE(),
@ErrorSeverity = ERROR_SEVERITY(),
@ErrorState = ERROR_STATE();
RAISERROR (@ErrorMessage, -- Message text.
@ErrorSeverity, -- Severity.
@ErrorState -- State.
);
END CATCH
-- End of Handling the error if occurred
--result
Is it a good idea to use a general procedure as a modular Error Handler routine?
From a modular programming approach it’s recommended to create a stored procedure that do the
RAISERROR job. But I believe that using a modular procedure (I call it spErrorHandler) to re-raise errors
is not a good idea. Here are my reasons:
1. When we call RAISERROR in procedure “spErrorHandler”, we have to add the name of the procedure
that the error occurs within to the Error Message. This will confuse the application end-users
(Customer). Customer do not want to know which part of his car is damaged. He prefers that his car just
send him a simple message which tells him there is an error in its functions. In the software world it’s
more important to send a simple (English) message to the customer, because if we send a complex error
message, he will be afraid of what will happen to his critical data!
2. If we accept the first reason and decide to resolve this issue, we need to send a simple message to the
client application. So we will lose the procedure name that the error occurs within and other useful
information for debug unless we insert this useful information in an Error-Log table.
You can test this scenario with the following code:
CREATE PROCEDURE spErrorHandler
AS
SET NOCOUNT ON;
DECLARE @ErrorMessage NVARCHAR(4000);
DECLARE @ErrorSeverity INT;
DECLARE @ErrorState INT;
SELECT
@ErrorMessage = ERROR_MESSAGE(),
@ErrorSeverity = ERROR_SEVERITY(),
@ErrorState = ERROR_STATE();
RAISERROR (@ErrorMessage, -- Message text.
@ErrorSeverity, -- Severity.
@ErrorState -- State.
);
go
----------------------------------------CREATE PROCEDURE spTest
AS
SET NOCOUNT ON;
BEGIN TRY
SELECT 1 / 0;
END TRY
BEGIN CATCH
-- Start to try executing statements
-- Statement
-- End of trying to execute statements
-- Start to Handle the error if occurs
EXEC spErrorHandler;
END CATCH
go
exec spTest;
-- End of Handling the error if occurred
--result
As is illustrated in this figure, when using spErrorHandler, the values of ERROR_PROCEDURE() and
ERROR_NUMBER() are changed in the output. This behavior is because of the RAISERROR functionality.
This function always re-raises the new exception, so spErrorHandler always shows that the value of
ERROR_PROCEDURE() simply is “spErrorHandler”. As I said before there are two workarounds to fix this
issue. First is concatenating this useful data with the error message and raise it, which I spoke about in
reason one. Second is inserting this useful data in another table just before we re-raise the error in
spErrorHandler.
Now, we test the above sample without using spErrorHandler:
CREATE PROCEDURE spTest
AS
SET NOCOUNT ON;
BEGIN TRY
SELECT 1 / 0;
END TRY
BEGIN CATCH
-- Start to try executing statements
-- Statement
-- End of trying to execute statements
-- Start to Handle the error if occurs
DECLARE @ErrorMessage NVARCHAR(4000);
DECLARE @ErrorSeverity INT;
DECLARE @ErrorState INT;
SELECT
@ErrorMessage = ERROR_MESSAGE(),
@ErrorSeverity = ERROR_SEVERITY(),
@ErrorState = ERROR_STATE();
RAISERROR (@ErrorMessage, -- Message text.
@ErrorSeverity, -- Severity.
@ErrorState -- State.
);
END CATCH
-- End of Handling the error if occurred
go
exec spTest;
--result
As you see in this figure, the procedure name and error number are correct. By the way, I prefer that if
one customer reports an error, I go for SQL Server Profiler, simulate the environment completely, and
test those SQL statements in SSMS to recreate the error and debug it based on the correct error number
and procedure name.
In the THROW section, I will explain that the main advantage of THROW over RAISERROR is that it shows
the correct line number of the code that raises the error, which is so helpful for a developer in
debugging his code.
3. Furthermore, with the THROW statement introduced in SQL SERVER 2012, there is no need to write
extra code in the CATCH block. Therefore there is no need to write a separate procedure except for
tracking the errors in another error log table. In fact this procedure is not an error handler, it's an error
tracker. I explain the THROW statement in the next section.
What are the benefits of THROW when we have RAISERROR?
The main objective of error handling is that the customer knows that an error occurred and reports it to
the software developer. Then the developer can quickly realize the reason for the error and improve his
code. In fact error handling is a mechanism that eliminates the blindness of both customer and
developer.
To improve this mechanism Microsoft SQL Server 2012 introduced the THROW statement. Now I will
address the benefits of THROW over RAISERROR.
Correct line number of the error!
As I said earlier this is the main advantage of using THROW. The following code will enlighten this great
feature:
create proc sptest
as
set nocount on;
BEGIN TRY
SELECT 1/0
END TRY
BEGIN CATCH
declare @msg nvarchar(2000) = error_message();
raiserror( @msg , 16, 1);
THROW
END CATCH
go
exec sptest
--result
As you can see in this figure, the line number of the error that RAISERROR reports to us always is the line
number of itself in the code. But the error line number reported by THROW is line 6 in this example,
which is the line where the error occurred.
Easy to use
Another benefit of using the THROW statement is that there is no need for extra code in RAISERROR.
Complete termination
The severity level raised by THROW is always 16. But the more important feature is that when the
THROW statement in a CATCH block is executed, then other code after this statement will never run.
The following sample script shows how this feature protects the code compared to RAISERROR:
create proc sptest
as
set nocount on;
BEGIN TRY
SELECT 1/0
END TRY
BEGIN CATCH
declare @msg nvarchar(2000) = error_message();
raiserror( @msg , 16, 1);
CREATE TABLE #Saeid (id int)
INSERT #Saeid
VALUES ( 101 );
SELECT *
FROM #Saeid;
DROP TABLE #Saeid;
THROW
PRINT 'This will never print!!!';
END CATCH
go
exec sptest
--result
Independence of sys.messages
This feature makes it possible to re-throw custom message numbers without the need to
use sp_addmessage to add the number.
The feature is in real time, as you can see in this code:
create proc sptest
as
set nocount on;
BEGIN TRY
SELECT 1/0
END TRY
BEGIN CATCH
THROW 60000, 'This a custom message!', 1;
END CATCH
go
exec sptest
Tip
The statement before the THROW statement must be followed by the semicolon (;) statement
terminator.
I want to check a condition in the TRY block. How can I control the flow of execution and
raise the error?
This is a simple job! Now I change this question to this one:
“How can I terminate the execution of the TRY block?”
The answer is using THROW in the TRY block. Its severity level is 16, so it will terminate execution in the
TRY block. We know that when any statement in the TRY block terminates (encounters an error) then
immediately execution goes to the CATCH block. In fact the main idea is to THROW a custom error as in
this code:
create proc sptest
as
set nocount on;
BEGIN TRY
THROW 60000, 'This a custom message!', 1;
END TRY
BEGIN CATCH
THROW
END CATCH
go
exec sptest
--result
As you can see, we handle the error step by step. In the next session we will complete this structure.
Does the CATCH part automatically rollback the statements within the TRY part?
This is the misconception that I sometimes hear. I explain this problem with a little example. After
executing the following code the table “dbo.Saeid” still exists. This demonstrates that the TRY/CATCH
block does not implement implicit transactions.
CREATE PROC sptest
AS
SET NOCOUNT ON;
BEGIN TRY
CREATE TABLE dbo.Saeid
( id int );
--No1
SELECT 1/0
--No2
END TRY
BEGIN CATCH
THROW
END CATCH
go
------------------------------------------EXEC sptest;
go
SELECT *
FROM dbo.Saeid;
--result
Can someone use TRANSACTION in the TRY/CATCH block?
The previous question showed that if we want to rollback entire statements in a try block, we need to
use explicit transactions in the TRY block. But the main question here is:
“Where is the right place to commit and rollback? “
It’s a complex discussion that I would not like to jump into in this article. But there is a simple template
that we can use for procedures (not triggers!).
This is that template:
CREATE PROC sptest
AS
SET NOCOUNT ON;
BEGIN TRY
SET XACT_ABORT ON;
--set xact_abort option
BEGIN TRAN
--begin transaction
CREATE TABLE dbo.Hasani
( id int );
SELECT 1/0
COMMIT TRAN
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK TRAN;
--commit transaction
--check if there are open transaction?
--rollback transaction
THROW
END CATCH
go
EXEC sptest;
go
SELECT * FROM dbo.Hasani;
The elements of this structure are:


TRY block
o XACT_ABORT
o Begin transaction
 Statements to try
o Commit transaction
CATCH block
o Check @@TRANCOUNT and rollback all transactions
o THROW
Here are a short description of two parts of the above code:
XACT_ABORT
In general it’s recommended to set the XACT_ABORT option to ON in our TRY/CATCH block in procedures.
By setting this option to ON if we want to roll back the transaction, any user defined transaction is rolled
back.
@@TRANCOUNT
We check this global variable to ensure there is no open transaction. If there is an open transaction it’s
time to execute rollback statements. This is a must in all CATCH blocks, even if you do not have any
transactions in that procedure. An alternative is to use XACT_STATE().
Conclusion
Introduction of the THROW statement is a big feat in Error Handling in SQL Server 2012. This statement
enables database developers to focus on accurate line numbers of the procedure code. This article
provided a simple and easy to use error handling mechanism with minimum complexity using SQL Server
2012. By the way, there are some more complex situations that I did not cover in this article. If you need
to dive deeper, you can see the articles in the See Also section.
BOL link http://technet.microsoft.com/en-us/library/ms175976.aspx
Error Handling within Triggers Using T-SQL
The goal of this article is to provide a simple and easy to use error handling mechanism within triggers
context.
Problem definition
Triggers are strange objects that have their own rules!


First rule says that triggers are part of the invoking transaction (the transaction that fired them).
Yes, this is True and it means that at the beginning of the trigger, both values of @@trancount
and xact_state() are "1". So, if we use COMMIT or ROLLBACK inside trigger, their values will
change to "0", just after executing these statements.
Second strange rule is that if the transaction ended in the trigger, database raises an abortion
error. An example for this rule is executing COMMIT or ROLLBACK within the trigger.
Next code shows these rules:
-- create test table
IF OBJECT_ID('dbo.Test', 'U') IS NOT NULL
DROP TABLE dbo.Test ;
GO
CREATE TABLE dbo.Test
( Id INT IDENTITY PRIMARY KEY,
NAME NVARCHAR(128)
) ;
GO
-- create test trigger
CREATE TRIGGER dbo.TriggerForTest
ON dbo.Test
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
-- declare variables
DECLARE @trancount CHAR(1) ,
@XACT_STATE CHAR(1) ;
-- fetch and print values at the beginning of the trigger
SET @trancount = @@TRANCOUNT ;
SET @XACT_STATE = XACT_STATE() ;
PRINT '-----------------------------------------------------------------------' ;
PRINT 'When trigger starts @@trancount value is (' + @trancount + ' ).';
PRINT 'When trigger starts XACT_STATE() return value is (' + @XACT_STATE
+ ' ).';
PRINT '-----------------------------------------------------------------------' ;
-- ending the transaction inside the trigger
COMMIT TRAN ;
-- fetch and print values again
SET @trancount = @@TRANCOUNT ;
SET @XACT_STATE = XACT_STATE() ;
PRINT 'After executing COMMIT statement, @@trancount value is (' +
@trancount + ' ).';
PRINT 'After executing COMMIT statement, XACT_STATE() return value is (' +
@XACT_STATE + ' ).';
PRINT '-----------------------------------------------------------------------' ;
END ;
GO
-- test time!
INSERT dbo.Test ( Name )
VALUES ( N'somthing' ) ;

So, what is the Error Handling mechanism within Triggers?
Solution
Classic Solution
This solution uses the second rule to rollback trigger and raise an error. The following code shows this
mechanism:
-- create test table
IF OBJECT_ID('dbo.Test', 'U') IS NOT NULL
DROP TABLE dbo.Test ;
GO
CREATE TABLE dbo.Test
( Id INT IDENTITY PRIMARY KEY,
NAME NVARCHAR(128)
) ;
GO
-- create test trigger
CREATE TRIGGER dbo.TriggerForTest
ON dbo.Test
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
IF 1 = 1
BEGIN
-- rollback and end the transaction inside the trigger
ROLLBACK TRAN ;
-- raise an error
RAISERROR ( 'Error Message!', 16, 1) ;
END
END ;
GO
-- test time!
INSERT dbo.Test ( Name )
VALUES ( N'somthing' ) ;
This solution works fine until the RAISERROR is the last statement in trigger. If we have some statements
after RAISERROR, they will execute as shown in next code:
-- create test table
IF OBJECT_ID('dbo.Test', 'U') IS NOT NULL
DROP TABLE dbo.Test ;
GO
CREATE TABLE dbo.Test( Id INT IDENTITY PRIMARY KEY, NAME NVARCHAR(128)
) ;
GO
-- create test trigger
CREATE TRIGGER dbo.TriggerForTest
ON dbo.Test
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
IF 1 = 1
BEGIN
-- rollback and end the transaction inside the trigger
ROLLBACK TRAN ;
-- raise an error
RAISERROR ( 'Error Message!', 16, 1) ;
END
INSERT dbo.Test ( Name )
VALUES ( N'extra' ) ;
END ;
GO
-- test time!
INSERT dbo.Test ( Name ) VALUES ( N'somthing' ) ;
SELECT * FROM dbo.Test
Modern Solution
This solution is applicable to SQL Server 2012 and above versions. THROW statement enhances the error
handling in triggers. It rollback the statements and throw an error message. Next code shows this
mechanism:
-- create test table
IF OBJECT_ID('dbo.Test', 'U') IS NOT NULL
DROP TABLE dbo.Test ;
GO
CREATE TABLE dbo.Test( Id INT IDENTITY PRIMARY KEY,NAME NVARCHAR(128)
) ;
GO
-- create test trigger
CREATE TRIGGER dbo.TriggerForTest
ON dbo.Test
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
IF 1 = 1
-- just throw!
THROW 60000, 'Error Message!', 1 ;
END ;
GO
-- test time!
INSERT dbo.Test ( Name ) VALUES ( N'somthing' ) ;
SELECT * FROM dbo.Test ;
Conclusion
As I explained in former article, introducing the THROW statement was a revolutionary movement in
SQL Server 2012 Error Handling. This article proves it again, this time with triggers.
Custom Sort in Acyclic Digraph
Problem definition
This article is derived from this MSDN forum post. This article addresses the task of how to present
a Tree in a custom order. In fact, the article title could be pre-order tree traversal.
Vocabulary
Digraph (Directed Graph)
A digraph is a graph, or a set of nodes connected by the edges, where the edges have a direction
associated with them.
Acyclic Graph
An acyclic graph is a graph without any cycle.
Acyclic Digraph
An acyclic digraph (directed acyclic graph), also known as a DAG, is a directed graph with no directed
cycles.
Topological Ordering
Every DAG has a topological ordering, an ordering of the vertices such that the starting endpoint of
every edge occurs earlier in the ordering than the ending endpoint of the edge.
Solution
The code below resolves the stated problem of how to present a non-topological ordering of a DAG (i.e.,
custom sorting an acyclic digraph). Executing the following script will create and populate a resultant
test table demonstrating the stated solution.
IF OBJECT_ID('tempdb..#test', 'U') IS NOT NULL
DROP TABLE #test;
GO
CREATE TABLE #test
(
Childid INT ,
parentid INT
);
GO
INSERT INTO #test
( Childid, parentid )
VALUES
( 100, 0 ),
( 102, 100 ),
( 103, 100 ),
( 104, 102 ),
( 105, 102 ),
( 106, 104 ),
( 107, 103 ),
( 109, 105 );
GO
The image below shows the sample data used in this solution.
The desired order is shown below.
The solution is to produce paths that differ from topological ordering. In the following code, changing
the ORDER BY list in the ROW_NUMBER function changes the sort order, producing paths that differ
from the topological ordering.
DECLARE @rootId AS INT = 100;
WITH
Subs
AS ( SELECT Childid ,
1 AS lvl ,
CAST(1 AS VARBINARY(MAX)) AS PathSort
FROM #test
WHERE Childid = @rootId
UNION ALL
SELECT C.Childid ,
P.lvl + 1 ,
P.PathSort + CAST(ROW_NUMBER() OVER (
PARTITION BY C.parentid ORDER BY C.Childid )AS BINARY(5))
FROM Subs AS P
JOIN #test AS C ON C.parentid = P.Childid
)
SELECT Childid ,
ROW_NUMBER() OVER ( ORDER BY PathSort ) AS CustomSort,
REPLICATE('
|
', lvl) + CAST(Childid AS NVARCHAR(100))
ChildInTree
FROM Subs
ORDER BY CustomSort;
The resulting output is shown in the following figure.
CHAPTER 7:
String Functions
Patindex Case Sensitive Search
This article is a result of a quick research of the problem of using PATINDEX to search case insensitive
column using case sensitive search. The BOL does not show examples of how to implement particular
collation with the PATINDEX function. A relevant thread in MSDN Transact-SQL forum showed the
syntax.
Thanks to Jeff Moden I found that I can use Binary collation to be able to use ranges in the search.
So, if we want to split proper names such as JohnDoe, EdgarPo, etc. into two parts, we can use the
following code:
DECLARE @t TABLE (Col VARCHAR(20))
INSERT INTO @t
SELECT 'JohnDoe'
UNION ALL
SELECT 'AvramLincoln'
UNION ALL
SELECT 'Brad Pitt'
SELECT Col
,COALESCE(STUFF(col, NULLIF(patindex('%[a-z][AZ]%', Col COLLATELatin1_General_BIN), 0) + 1, 0, ' '), Col) AS NewCol
FROM @t
Hope this article may help others looking for case sensitive search solution in SQL Server.
Remove Leading and Trailing Zeros
In this post I have consolidated few of the methods to remove leading and trailing zeroes in a string.
Here is an example:
DECLARE @BankAccount TABLE
INSERT @BankAccount SELECT
INSERT @BankAccount SELECT
INSERT @BankAccount SELECT
(AccNo VARCHAR(15))
'01010'
'0010200'
'000103000'
SELECT * FROM @BankAccount
--Methods to remove leading zeros
-- 1.) converting to integer data type
SELECT CONVERT(INT,AccNo) AccNo FROM @BankAccount
-- NN - note, this method will only work if the data are clean
-- 2.)
using SUBSTRING
SELECT SUBSTRING(AccNo,PATINDEX('%[^0]%',AccNo),LEN(AccNo)) AccNo FROM @BankAccount
-- 3.)
using REPLACE,LTRIM & RTRIM
SELECT REPLACE(LTRIM(REPLACE(AccNo,'0',' ')),' ','0') AccNo FROM @BankAccount
--To remove both leading & trailing zeros
SELECT REPLACE(RTRIM(LTRIM(REPLACE(AccNo,'0',' '))),' ','0') AccNo FROM @BankAccount
T-SQL: How to Find Rows with Bad Characters
One of the commonly asked questions in Transact SQL Forum on MSDN is how to filter rows
containing bad characters. Also, often times these bad characters are not known, say, in one of
the recent posts the question was to filter all the rows where characters were greater than ASCII 127.
The first step towards solution is to realize that in order to quickly filter out something we may want to
know the list of allowed characters first.
I will now show several samples of how important is to know the "good characters" in order to filter the
"bad" ones.
Let's suppose we only want alpha-numeric characters to remain and everything else should be
considered bad rows.
For all our examples let's create the following table variable:
DECLARE @TableWithBadRows TABLE (
Id INT identity(1, 1) PRIMARY KEY
,description VARCHAR(max)
);
INSERT INTO @TableWithBadRows (description)
VALUES ('test1'), ('I am OK'), ('Filter me, please.');
Our pattern then will be
SELECT * FROM @TableWithBadRows WHERE description LIKE '%[^a-z0-9]%';
where a-z means a range of all letters from a to z, 0-9 means range of all numbers from 0 to 9 and ^
means everything which is not like the following characters.
The above code will return 2 last rows. The second row is returned because it contains a space character
which was not included in the list of allowed characters.
Now, what should we do if want to keep all the "normal" characters and only disallow characters which
are greater than ASCII 127? In this case, we may want to build the pattern in a loop.
Here is some code demonstrating this idea:
DECLARE @TableWithBadRows TABLE (
Id INT identity(1, 1) PRIMARY KEY
,description VARCHAR(max)
);
INSERT INTO @TableWithBadRows (description)
VALUES ('test1')
,('I am OK')
,('Filter me, please.')
,('Let them be & be happy')
,(CHAR(200))
,(CHAR(137))
,(CHAR(10) + CHAR(13) + 'Test more');
SELECT *
FROM @TableWithBadRows;
SELECT *
FROM @TableWithBadRows
WHERE description LIKE '%[^A-Z0-9%]%';
DECLARE @i INT = 32;
DECLARE @pattern VARCHAR(max) = '^a-Z0-9'
,@ch CHAR(1);
WHILE @i < 47
BEGIN
SET @ch = CHAR(@i)
IF @ch = '_'
SET @pattern = @pattern + '[' + @ch + ']';
ELSE
IF @ch = '['
SET @pattern = @pattern + @ch + @ch;
ELSE
SET @pattern = @pattern + @ch;
SET @i = @i + 1;
END
SET @i = 58;
WHILE @i < 65
BEGIN
SET @ch = CHAR(@i)
IF @ch = '_'
SET @pattern = @pattern + '[' + @ch + ']';
ELSE
IF @ch = '['
SET @pattern = @pattern + @ch + @ch;
ELSE
SET @pattern = @pattern + @ch;
SET @i = @i + 1;
END
SELECT @pattern
SELECT *
FROM @TableWithBadRows
WHERE description LIKE '%[' + @pattern +']%'
As you can see from the second select statement, the CHAR(200) (È) is not being filtered by the a-z filter
as it is apparently considered a letter.
We may try adding binary collation to treat that letter as bad, e.g.
SELECT *
FROM @TableWithBadRows
WHERE description LIKE '%[^A-Za-z0-9% ]%' COLLATE Latin1_General_BIN;
As you see, now this letter is considered bad row.
This thread "Getting records with special characters" shows how to create a pattern when the bad
characters are in the special table and also which characters ([,^,-) we need to escape.
Conclusion
I have shown several examples of filtering bad rows using various patterns.
Random String
Introduction
In this article we are going to show several logistics on how to build a random string. This is very useful
for maintenance tasks like testing (Populate large tables with random values), generate random
password and so on...
If you have any other way of doing it, then you are most welcome to edit this article and give us your
insight :-)
Solutions
Let's examine several solutions. Those solutions came from forum's users, and we will try to put them
into perspective of advantages & disadvantages. We will close the article with conclusions and
recommendations. If you are just looking for the best solution then you can jump to the end.
1. Using NEWID as base string & NEWID to generate a random length
Basic idea
1. Create random string using the function NEWID (), this will give us a random 36 characters
string.
2. Create a random number using the function NEWID, as the string length.
3. Cut the string using the function LEFT
Code
DECLARE @StringMaxLen int = 12
SELECT TOP (1000)
LEFT (CAST (NEWID () AS NVARCHAR(MAX)) , ABS (CHECKSUM (NEWID ())) %
@StringMaxLen + 1)
FROM SYS.OBJECTS A
CROSS JOIN SYS.OBJECTS B
Advantages & Disadvantages




A: Very fast executing
A: Very fast writing the code
A: No need to create UDF (if someone care about this)
D: NOT a Random solution!
o The converting of NEWID to NVARCHAR generate a string with a specific format:
XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.
There for we are going to get in the string always the same format. For example Ninth
chain character will always be a dash...



D: NOT the same odds!
o As explained above, we get more chance of getting a dash sign.
D: Limited to a maximum length of 36 characters!
o We can't chose length more the the NEWID length (36 characters), and we probably
want to build 4000 characters sometimes.
o Theoretically, we can always join chains in order to get the length we need. This will fix
the problem of "limited maximum length", but at the same time will create a problem of
Inflexible, as we will need to build the query according to the maximum length.
D: Limit of characters that can get in results!
o We can get only characters that might be in NEWID value. We can't get any other
character! The only characters that we can get are: uppercase English, numbers, and
the sign "-"
2. Using Clean NEWID as base string & NEWID to generate a random length.
Basic idea
1.
2.
3.
4.
Create random string using the function NEWID, this will give us a random 36 characters string.
Clean the dash character, this will give us a random 32 characters string.
Create a random number using the function NEWID, as the string length.
Cut the string using the function LEFT
Code
DECLARE @StringMaxLen int = 12
SELECT TOP (1000)
LEFT (REPLACE(CAST (NEWID () AS NVARCHAR(MAX)),'-','') , ABS (CHECKSUM
(NEWID ())) % @StringMaxLen + 1)
FROM SYS.OBJECTS A
CROSS JOIN SYS.OBJECTS B
GO
Advantages & Disadvantages


advantages same as before.
disadvantage same as before basically, except the dash character problem but the max length is
now only 32 characters.
3. Performing data manipulation on an existing data
Basic idea
We can use an existing data, which is not random as a base string. then we use text manipulation
like "data scrambling", "data parsing", "random sorting" and so on, in order to get a "look like random
data".
* This idea can be improved significantly scale by using an existing random data table!
;WITH cte_1 AS(
SELECT ROW_NUMBER() OVER (ORDER BY NEWID() ASC) AS RN, t.name
FROM sys.tables AS t
CROSS JOIN sys.tables AS tt
),
cte_2 AS(
SELECT ROW_NUMBER() OVER (ORDER BY NEWID() ASC) AS RN, t.name
FROM sys.columns AS t
CROSS JOIN sys.columns AS tt
)
SELECT
cte_1.name + cte_2.name AS RandomString1,
REPLICATE(cte_1.name + cte_2.name,CASE WHEN ABS (CHECKSUM (NEWID ())) % 4 =
0 THEN 1 ELSE ABS(CHECKSUM (NEWID ())) % 4 + 1 END) AS RandomString2
FROM cte_1
INNER JOIN cte_2
ON cte_1.RN = cte_2.RN
In the example above we just used the tables name in the system as a base strings for manipulation. This
is only an example as this idea (using existing data) can be done in any way and on any tables that we
want.
Advantages & Disadvantages
D: NOT a Random solution & NOT the same odds!
The solution is base on existing data. as more manipulation as we do we can make this more "look like"
a random data.
D: Limited to a maximum length of the existing data!
D: Limit of characters that can get in results!
We can get only characters that might be in the existing data.
D: Slow and inefficient as the number of manipulations on the text exceeds.
4. Using Random CHAR-> Using Loop to build a flexible string length
Basic idea
We are using a UDF to create a single random string. the function get 2 parameters: (A) the maximum
length of the String (B) Do we need to create a string as long as the maximum or randomly length.
/******************************
* Version("2.0.0.0")
* FileVersion("2.0.0.0")
* WrittenBy("Ronen Ariely")
* WebSite("http://ariely.info/
")
* Blog("http://ariely.info/Blog/tabid/83/language/en-US/Default.aspx
")
******************************/
CREATE function [dbo].[ArielyRandomStringFunc_Ver2.0.0](@NumberOfChar int,
@IsFixedLength bit = true)
returns nvarchar(MAX)
WITH EXECUTE AS CALLER
AS
begin
DECLARE @TotalNumberOfCharToReturn int
IF (@IsFixedLength = 1)
SET @TotalNumberOfCharToReturn = @NumberOfChar
ELSE
-- I am using my own random function
-- you can read more about the resone here:
-- Using side-effecting build in functions inside a UDF (your
function)
-- http://ariely.info/Blog/tabid/83/EntryId/121/Using-side-effectingbuild-in-functions-inside-a-UDF-your-function.aspx
SET @TotalNumberOfCharToReturn
= CONVERT(int,(AccessoriesDB.dbo.ArielyRandFunc() * @NumberOfChar) +1)
declare @Out as nvarchar(MAX) = ''
declare @QQ01 as int = 0
while @QQ01 < @TotalNumberOfCharToReturn begin
set @QQ01 += 1
-- This is in-secure Function as we chose any unicode character
without filtering!
-- I prefered to chose from secured characters list as i show in
priview function (ver1.0.0)
--- 65535: Maximum UNICODE character value
-- You can limit this value to your own language's values or your
needs
-- NUmbers:
48 - 58
-- English uppercase: 65 - 91
-- English lowercase: 97 - 123
-- Hebrew:
1488 - 1515
select @Out += ISNULL(NCHAR(CAST(65535 *
AccessoriesDB.dbo.ArielyRandFunc() AS INT)),'')
end
--print @Out
RETURN @Out
end
Advantages & Disadvantages





A: Relatively fast.
A: Full Random
A: No length limit
A: No characters limit
D: No filtering option for security
5. Selecting from characters list-> Using Loop to build a flexible string length
Basic idea
The basic idea is same as above, with the option for filtering characters, as we chose from a list. we are
choosing a random number in order to chose the character in that position on our list. We use a loop to
build the entire string.
Code
/******************************
* Version("1.0.0.0")
* FileVersion("1.0.0.0")
* WrittenBy("Ronen Ariely")
* WebSite("http://ariely.info/
")
* Blog("http://ariely.info/Blog/tabid/83/language/en-US/Default.aspx
")
******************************/
CREATE function [dbo].[ArielyRandomStringFunc_Ver1.0.0](@NumberOfChar int,
@IsFixedLength bit = true)
returns nvarchar(MAX)
WITH EXECUTE AS CALLER
AS
begin
DECLARE @TotalNumberOfCharToReturn int
IF (@IsFixedLength = 1)
SET @TotalNumberOfCharToReturn = @NumberOfChar
ELSE
-- I am using my own random function
-- you can read more about the reasons here:
-- Using side-effecting build in functions inside a UDF (your
function)
-- http://ariely.info/Blog/tabid/83/EntryId/121/Using-side-effectingbuild-in-functions-inside-a-UDF-your-function.aspx
SET @TotalNumberOfCharToReturn
= CONVERT(int,(AccessoriesDB.dbo.ArielyRandFunc() * @NumberOfChar) +1)
DECLARE @AllChar as nvarchar(MAX)
='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890‫אבגדהוזחטיכלמ‬
‫'סעפצקרשת‬
-- it is faster to use a table with ID column and char column,
-- index by ID column
-- order the table in random order
-- get the top @NumberOfChar records
-- but in this case we will get a select without repeating a value!
-- if we want "with repeating" then we have to re-chose each character
separately
-- We can just use select number and choose char(random number) to get
random char
-- It is faster
-- But this is in-secure and I preferred to choose from secured
characters list here
declare @MyRnd as int
declare @Out as nvarchar(MAX) = ''
declare @QQ01 as int = 0
while @QQ01 < @TotalNumberOfCharToReturn begin
set @QQ01 += 1
set @MyRnd = (SELECT ((RandNumber * LEN(@AllChar))
+1) FROM ArielyRandView)
select @Out += SUBSTRING(@AllChar,@MyRnd,1)
--print SUBSTRING(@AllChar,@MyRnd,1)
end
--print @Out
RETURN @Out
end
Advantages & Disadvantages





A: Relatively fast.
A: Full Random
A: No length limit
A: No characters limit
A: Filtering option for security
6. Building a fix length random string using NEWID and NCHAR->Cut randomly using
LEFT
Basic idea
We build a String manually by joining a fix number of random characters.
Code
SELECT top 1000
LEFT (
NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID() AS VARCHAR(100)))) ) AS INT))
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID() AS VARCHAR(100)))) ) AS INT))
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID() AS VARCHAR(100)))) ) AS INT))
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID()
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
AS VARCHAR(100)))) ) AS INT))
+ NCHAR(CAST(1000 *
RAND( ABS(CHECKSUM(CAST(NEWID() AS VARCHAR(100)))) ) AS INT))
, 1 + CAST(29 * RAND( ABS(CHECKSUM(CAST(NEWID() AS VARCHAR(100))))
) AS INT)
) AS Str
from sys.columns a CROSS JOIN sys.columns b
Advantages & Disadvantages





A: Fast execution solution.
D: Very hard for coding, if we need long length string.
* We can build a dynamic query for this solution to get more flexible solution.
D: Length limit by the number of char in the query
A: No characters limit
D: No Filtering option for security
7. Using CLR Function
Basic idea
The option of using CLR function added in SQL 2005 and yet most DBAs do not use it. DBAs have to
internalize the extreme improvement and differences that can be (sometimes) by use CLR! While SQL
Server work well with SET of DATA, CLR work much better in manipulating strings (Split, Regular
expression...).
Code
http://social.technet.microsoft.com/wiki/contents/articles/21219.sql-server-create-random-stringusing-clr.aspx
* Links to other versions can be seen on the resources
Advantages & Disadvantages




A: VERY FAST.
A: Extremely flexible.
o No Length limit
o No characters limit
A: Filtering option for security
D: needed to enable the use of CLR (hopefully you have done it already!)
Conclusions and Recommendations

Without a doubt, CLR function is the best solution! If you can use it than chose it. Tests have
shown that this function can produce in less than 2 seconds what other functions have not been
able to produce in more than 20 minutes (The execution was terminated after 20 minutes). This
solution meat with any requirement.



It is highly recommended not to use solution without filtering mechanism! Several UNICODE
characters might be harmful in some situation. You can get more information about the
problematic CHAR ZERO For example, in this link [Hebrew].
If you need (A) fast query (B) flexible & unlimited length or (C) Filtering mechanism to choose
the characters that can be use, then use solution 5 or change a bit solution 4 to add filtering.
If you need (A) fast query and (B) short max length string and (C) you have to use all the
characters range, then you can use solution 6.
Sort Letters in a Phrase using T-SQL
Problem definition
This article comes up from this MSDN forum post. The problem is how can we sort the letters in a phrase
just using T-SQL? To clarify the question, for instance the desired result for CHICAGO must be ACCGHIO.
Introduction
Because SQL is a Declarative Language in Relational System, it does not have arrays. Table is a relational
variable that presents a relation, simply it is a Set that has no order. But if someone needs to do this sort
in SQL Server, for example, because of a need to sort and compare in a huge table, how can we handle it?
Solution
By using T-SQL, because it has additional features even beyond relational; there is a solution to solve this
problem. By the way, the first problem is how to assign array index to letters in a phrase?
One answer is to use spt_values helper table. Following sample code shows the functionality that will use
later.
DECLARE @String VARCHAR(MAX)
SET @String = 'abc';
SELECT SUBSTRING(@String, 1 + Number, 1) [char] , number AS [Array Index]
FROM master..spt_values
WHERE Number < DATALENGTH(@String)
AND type = 'P';
The following figure shows the result of the code. It shows the array index assigned per letter.
Now it’s possible to solve the main problem. Next script produces the sample data.
/*Create sample table*/
IF OBJECT_ID('tempdb..#Text', 'U') IS NOT NULL
DROP TABLE #Test;
CREATE TABLE #Test
(
ID INT IDENTITY(1, 1) ,
Phrase VARCHAR(255)
);
/*Populate the table with sample data*/
INSERT #Test
( Phrase )
VALUES
( 'CHICAGO' ),
( 'NEW YORK' ),
( 'HOUSTON' ),
( 'SAN FRANCISCO' );
Following figure shows the sample data presentation.
Next code is the final solution.
/*This is the final solution*/
;
WITH base
AS ( SELECT L.[char] ,
T.ID ,
T.Phrase
FROM #Test T
CROSS APPLY ( SELECT SUBSTRING(T.Phrase, 1 + Number,
1) [char]
FROM master..spt_values
WHERE Number < DATALENGTH(T.Phrase)
AND type = 'P'
) L
)
SELECT DISTINCT
b1.Phrase ,
REPLACE(( SELECT '' + [char]
FROM
FROM base b2
WHERE b1.Phrase = b2.Phrase
ORDER BY [char]
FOR
XML PATH('')
), '&#x20;', ' ') AS columns2
base AS b1;
The final result shown in the following figure.
Limitations
Using this solution has two limitations that come from the spt_value helper table. These limits are:
1. Data Type
The spt_value return extra records for Unicode data types. So the data type cannot be Unicode such as
NVARCHAR.
2. Data Length
The length of the data type could be up to 2048.
CHAPTER 8:
Dates Related
T-SQL: Date-Related Queries
In this article I plan to add various interesting date related queries. This article will expand when new
problems will present themselves in the Transact-SQL forum.
Finding Day Number from the Beginning of the Year
I want to start with this simple question that was posted today (May 31, 2013)
date day number from the beginning of the year.
This is my solution and a bit of explanation at the end
- how to find today's
DECLARE @curDate DATE = CURRENT_TIMESTAMP;
DECLARE @YearStart DATE = dateadd (
year
,datediff(year, '19000101', @curDate)
,'19000101'
);
SELECT datediff(day, @YearStart, @curDate) +
1 AS [Number of Days from the Year Start]
The @YearStart variable dynamically calculates the beginning of the year for any date based on the year
difference with any known date we use as anchor date.
However, there is much simpler solution as suggested by Gert-Jan Strick in the thread I referenced:
SELECT datepart(dayofyear, current_timestamp) AS [Number of Days]
Finding Beginning and Ending of the Previous Month
Today's Transact-SQL MSDN forum
data from previous month .
presented the following problem Change date parameters to find
I will give my solution to this problem from that thread:
DECLARE @MinDate DATETIME, @MaxDate DATETIME;
SET @MinDate = DATEADD(month,
DATEDIFF(month, '19000201', CURRENT_TIMESTAMP), '19000101');
SET @MaxDate = DATEADD(day,-1,DATEADD(month, 1, @MinDate)) -- for versions
prior to SQL 2012;
SET @MaxDate = EOMONTH(@MinDate); -- for SQL Server 2012 and up
How To Find Various Day, Current Week, Two Week, Month, Quarter,
Half Year and Year In SQL Server
Date Computation
I was working on one of the financial projects on one of my own custom implementation for SQL Server.
I found dates calculations to be extremely important which is needed by most of the applications which
stand on today’s market, henceforth I thought of publishing an article on the dates topic. This will be
needed for almost all financial applications that stands on today’s market and will be extremely
important as it has wide range of applications in financial, Retails, etc. industries.
This article provides collection which will be extremely helpful for the programmers who are using SQL
Server for their projects.
Finding Current Date
Extremely simple one and is mostly needed for beginners.
select GETDATE()
Gets the current date from SQL Server.
Output:
2013-07-27 14:45:44.463
Finding Start Date and End Date of the Week
The following will give start date of the current week. Assume Current Date is 27 th July 2013.
select DATEADD(wk, DATEDIFF(wk,0,GETDATE()), 0)
The output will be:
2013-07-22 00:00:00.000
Finding End Date of the Week
select DATEADD(dd, 6-(DATEPART(dw, GETDATE())), GETDATE())
The output will be:
2013-07-26 14:51:36.1
This is assumed that beginning of the week is Monday and End is Friday, based on business day
Finding Start Date and End Date of the Two Weeks
This part is pretty tricky as present day can be between first or second half and also the month may
contain 28,29,30,31 days.
We will divide the date for 1-15 being first half, as used by most financial institutions and then based on
where date falls we compute the two weeks
The following code provides beginning and end dates for two weeks:
if MONTH(getdate()) <= 15
begin
select @beginDate = DATEADD(mm, DATEDIFF(mm,0,GETDATE()), 0)
select @endDate = DATEADD(mm, DATEDIFF(mm,0,GETDATE()), 14)
end
else
begin
select @beginDate = DATEADD(mm, DATEDIFF(mm,0,GETDATE()), 15)
select @endDate = DATEADD(s,1,DATEADD(mm, DATEDIFF(m,0,GETDATE())+1,0))
end
end
This will output 1-14 or 15-end of month as begin and end dates
Finding Start Date and End Date of the Current Month
This part is pretty straight forward.
The following query provides start and end date of current month:
select @beginDate = DATEADD(mm, DATEDIFF(mm,0,GETDATE()), 0)
select @endDate = DATEADD(s,1,DATEADD(mm, DATEDIFF(m,0,GETDATE())+1,0))
Finding Start Date and End Date of the Current Quater
The following query provides start and end date of current month:
select @beginDate = DATEADD(q, DATEDIFF(q, 0, GETDATE()), 0)
select @endDate = DATEADD(d, 1, DATEADD(q, DATEDIFF(q, 0, GETDATE()) + 1,0))
Considering the today date as 27th July 2013.
The begin date will be:
2013-07-01 00:00:00.000
The End date will be:
2013-09-30 00:00:00.000
Finding Start Date and End Date For Half Year
This is quite complicate part. We need to find date falls under first half or second half of the year and no
direct methods available from sql server to do the same.
The following query provides start and end dates for half year:
select @beginDate = CAST(CAST(((((MONTH(GETDATE()) - 1) / 6) * 6) + 1) ASVARC
HAR) + '-1-' + CAST(YEAR(GETDATE()) AS VARCHAR) AS DATETIME);
select @endDate = CAST(CAST(((((MONTH(GETDATE()) - 1) / 6) * 6) + 6) AS VARCH
AR)+ '-1-' + CAST(YEAR(GETDATE()) AS VARCHAR) AS DATETIME);
Considering the today date as 27th July 2013.
The begin date will be:
2013-07-01 00:00:00.000
The End date will be:
2013-12-01 00:00:00.000
Finding Start Date and End Date For Year
The following query finds start and end date for the current year:
select @beginDate = dateadd(d,-datepart(dy,getdate())+1,getdate())
select @endDate = dateadd(d,-datepart(d,getdate()),dateadd(m,13datepart(m,getdate()),getdate()))
Considering the today date as 27th July 2013.
The begin date will be:
2013-01-01 15:15:47.097
The End date will be:
2013-12-31 15:15:47.113
SQL Server: How to Find the First Available Timeslot for Scheduling
In a scheduling application, it may be desirable to find the first available schedule time (timeslot) for a
new appointment. The new appointment must fit completely between existing appointments -without
overlap. As the schedule fills, new entries are assigned to the next first available schedule
time (timeslot). Alternatively, if desired, the first n available timeslots will be returned for selection.
In the sample data below, the Schedule table is pre-filled with a Start of Day record, and an End of Day
record. Normally, that information would be derived from a JOIN with a Calendar table.
A solution for SQL Server 2005 / SQL Server 2008 is provided below.
Create Sample Data
-- Suppress data loading messages
SET NOCOUNT ON
DECLARE @Schedule table
( AppID
int
IDENTITY,
AppTeam
varchar(20),
AppStart
datetime,
AppFinish datetime
)
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INTO
INTO
INTO
INTO
INTO
INTO
INTO
INTO
INTO
INTO
@Schedule
@Schedule
@Schedule
@Schedule
@Schedule
@Schedule
@Schedule
@Schedule
@Schedule
@Schedule
VALUES
VALUES
VALUES
VALUES
VALUES
VALUES
VALUES
VALUES
VALUES
VALUES
(
(
(
(
(
(
(
(
(
(
'Start', NULL, '01/11/2007 09:00' )
'Smith', '01/11/2007 09:00', '01/11/2007 09:30' )
'Smith', '01/11/2007 10:00', '01/11/2007 10:15' )
'Jones', '01/11/2007 11:00', '01/11/2007 12:00' )
'Williams', '01/11/2007 12:00', '01/11/2007 14:45' )
'Hsiao', '01/11/2007 15:30', '01/11/2007 16:00' )
'Lopez', '01/11/2007 16:00', '01/11/2007 17:30' )
'Green', '01/11/2007 17:30', '01/11/2007 18:30' )
'Alphonso', '01/11/2007 20:00', '01/11/2007 20:30' )
'End', '01/11/2007 21:00', NULL )
SQL Server 2005 / SQL Server 2008 Solution
-- Determine the Length of Time Required
DECLARE @AppNeed int
SET @AppNeed = 45
--Find FIRST Available Time Slot
;WITH CTE
AS ( SELECT *, RowNumber = ROW_NUMBER() OVER( ORDER BY AppStart ASC )
FROM @MySchedule
)
SELECT FirstApptAvail = min( a.AppFinish )
FROM CTE a
INNER JOIN CTE b
ON a.RowNumber = b.RowNumber - 1
WHERE datediff( minute, a.AppFinish, b.AppStart) >= @AppNeed
FirstApptAvail
2007-01-11 10:15:00.000
--Find All Available Time Slots
;WITH CTE
AS ( SELECT
*,
RowNumber = ROW_NUMBER() OVER( ORDER BY AppStart ASC )
FROM @MySchedule
)
SELECT TOP 3 ApptOptions = a.AppFinish
FROM CTE a
INNER JOIN CTE b
ON a.RowNumber = b.RowNumber - 1
WHERE datediff( minute, a.AppFinish, b.AppStart) >= @AppNeed
AppOptions
2007-01-11 10:15:00.000
2007-01-11 14:45:00.000
2007-01-11 18:30:00.000
Additional Resouces
Having a Calendar table is a very useful utility table that can benefit many data querying situations.
For this example, two additional columns (AppStart, AppFinish) can be added to the table to handle
situations where business hours are not the same for all days.
T-SQL: Group by Time Interval
Simple Problem Definition
A question was posted today in Transact-SQL forum "Counts by Time Interval " The thread
originator wanted to know how to find how many jobs were completed in each hour in a certain interval
for the current shift. The solution I implemented is based on the DATEPART function that allows to
get hour part of the datetime variable (or column).
Solution
This is the solution suggested:
SELECT datepart(hour, JobComplete) as [Hour], COUNT(JobId) as [Jobs Completed]
FROM dbo.Jobs
WHERE JobComplete between @StartTime and @EndTime
GROUP BY datepart(hour, JobComplete)
This solution assumes, that @StartTime and @EndTime variables will be set for the current day interval
(otherwise we may want to add CAST(JobComplete AS DATE) into select list and GROUP BY list.
Complex Problem Definition and Solution
Now, this is a very straightforward problem. What if we need to solve slightly more complex problem of
grouping by every 15 (Nth) minutes? I discussed this problem before as a first problem in this blog post
"Interesting T-SQL problems ". Below is a solution from that blog post:
;With cte As
(Select DateAdd(minute, 15 * (DateDiff(minute, '20000101', SalesDateTime) /
15), '20000101') AsSalesDateTime,
SalesAmount
From @Sales)
Select SalesDateTime, Cast(Avg(SalesAmount) As decimal(12,2)) As AvgSalesAmount
From cte
Group By SalesDateTime;
Finally, a few notes on missing data possibility. If we want to display data for all times in the predefined
interval, even if we don't have data for particular hour, for example, we need to have a Calendar table
analogue first and LEFT JOIN from that table of all needed time intervals to our summary solution.
CHAPTER 9:
XML
Avoid T (space) while generating XML using FOR XML clause
The following code shows an example on how to avoid T (space) while generating XML using FOR XML
clause
Sample Data:
DECLARE @Employee TABLE
(ID INT,
Name VARCHAR(100),
DOJ DATETIME)
INSERT @Employee SELECT 1,'Sathya','2013-06-08 08:50:52.687'
INSERT @Employee SELECT 2,'Madhu K Nair','2008-06-08 08:50:52.687'
INSERT @Employee SELECT 3,'Vidhyasagar','2008-06-08 08:50:52.687'
SELECT * FROM @Employee
--you will find T(space),if you are not converting date column with proper
datetime style,
SELECT * FROM @Employee
FOR XML PATH('Employee')
Output XML for above query :
<Employee>
<ID>1</ID>
<Name>Sathya</Name>
<DOJ>2013-06-08T08:50:52.687</DOJ>
</Employee>
<Employee>
<ID>2</ID>
<Name>Madhu K Nair</Name>
<DOJ>2008-06-08T08:50:52.687</DOJ>
</Employee>
<Employee>
<ID>3</ID>
<Name>Vidhyasagar</Name>
<DOJ>2008-06-08T08:50:52.687</DOJ>
</Employee>
--converting date column with proper datetime style (120/121)
SELECT ID,
Name,
CONVERT(VARCHAR(25),DOJ,121) DOJ
FROM @Employee
FOR XML PATH('Employee')
Output XML for above query :
<Employee>
<ID>1</ID>
<Name>Sathya</Name>
<DOJ>2013-06-08 08:50:52.687</DOJ>
</Employee>
<Employee>
<ID>2</ID>
<Name>Madhu K Nair</Name>
<DOJ>2008-06-08 08:50:52.687</DOJ>
</Employee>
<Employee>
<ID>3</ID>
<Name>Vidhyasagar</Name>
<DOJ>2008-06-08 08:50:52.687</DOJ>
</Employee>
Generate XML with Same Node Names using FOR XML PATH
In this post we are going to see how we can generate XML in the below mentioned format from the
relational data.
<row>
<column>1</column>
<column>1</column>
</row>
<row>
<column>2</column>
<column>2</column>
</row>
Here is an example:
--Sample data
DECLARE @Temp TABLE (Id1 INT, Id2 INT)
INSERT @Temp SELECT 1,1
INSERT @Temp SELECT 2,2
SELECT * FROM @Temp
--If we mention same alias name for all columns, all column values will be
merged
SELECT Id1 [column],
Id2 [column]
FROM @Temp
FOR XML PATH
/**XML result for above query
<row>
<column>11</column>
</row>
<row>
<column>22</column>
</row>
**/
--To overcome the above problem
-- Method 1 :
SELECT Id1 [column],
'',
Id2 [column]
FROM @Temp
FOR XML PATH
-- Method 2 :
SELECT Id1 [column],
NULL,
Id2 [column]
FROM @Temp
FOR XML PATH
/**XML result for above Method 1 & Method 2 query
<row>
<column>1</column>
<column>1</column>
</row>
<row>
<column>2</column>
<column>2</column>
</row>
**/
Generate XML - Column Names with their Values as text() Enclosed
within their Column Name Tag
The most commonly used XML format is the following: (column names with their values as text()
enclosed within their column name tag)
Lets find out how to generate the following XML for table provided below:
<Employees>
<field Name="ID">1</field>
<field Name="Name">Sathya</field>
<field Name="Age">25</field>
<field Name="Sex">Male</field>
<field Name="ID">2</field>
<field Name="Name">Madhu K Nair</field>
<field Name="Age">30</field>
<field Name="Sex">Male</field>
<field Name="ID">3</field>
<field Name="Name">Vidhyasagar</field>
<field Name="Age">28</field>
<field Name="Sex">Male</field>
</Employees>
Here is an example:
DECLARE @Employee TABLE
(ID INT,
Name VARCHAR(100),
Age INT,
Sex VARCHAR(50))
INSERT @Employee SELECT 1,'Sathya',25,'Male'
INSERT @Employee SELECT 2,'Madhu K Nair',30,'Male'
INSERT @Employee SELECT 3,'Vidhyasagar',28,'Male'
SELECT * FROM @Employee
DECLARE @xmldata XML
SET @xmldata = (SELECT ID,Name,Age,Sex FROM @Employee FOR XML PATH (''))
SET @xmldata = (
SELECT ColumnName AS "@Name",
ColumnValue AS "text()"
FROM(
SELECT i.value('local-name(.)','varchar(100)') ColumnName,
i.value('.','varchar(100)') ColumnValue
FROM @xmldata.nodes('//*[text()]') x(i)) tmp
FOR XML PATH ('field'),root('Employees'))
SELECT @xmldata
SQL Server XML: Sorting Data in XML Fragments
Working with data sets made us all aware of the fact that a set has no order. XML documents are not
data sets, thus they always have a natural order.
So what should we do when we have XML fragments in the wrong order?
Problem Definition
The original problem was to sort a XML document with two levels: Get double-sorted xml document
from xml-document
Approaches
A - Using T-SQL
Using T-SQL means, that we need to deconstruct our data by parsing the necessary values. We use
the nodes() method to extract the elements on the level we want to sort it. And we extract the order
criteria with the value() method to sort it with the ORDER BY clause. Finally we can reconstruct the
XML fragment by using FOR XML with the PATH mode .
Here is the trivial case. We are completely deconstructing a flat hierarchy and use only the data in T-SQL
to reconstruct the XML:
DECLARE @Data XML = N'
<element name="c" />
<element name="b" />
<element name="a" />
';
WITH Deconstructed AS (
SELECT Element.value('@name', 'NVARCHAR(255)') AS ElementName
FROM @Data.nodes('/element') [Elements] ( Element )
)
SELECT ElementName AS [@name]
FROM Deconstructed
ORDER BY ElementName
FOR XML PATH('element');
And this is the result:
<element name="a" />
<element name="b" />
<element name="c" />
A more complex case is the following. We are still working with a flat hierarchy, but now our elements
are no longer trivial. The deconstruction process must now provide us with the sort criteria and the rest
of the XML fragments per element:
DECLARE @Data XML = N'
<element name="c" >
<subelement name="t" />
</element>
<element name="b">
<subelement name="s" />
</element>
<element name="a" />
';
WITH Deconstructed AS (
SELECT Element.value('@name', 'NVARCHAR(255)') AS ElementName,
Element.query('.') AS ElementContent
FROM @Data.nodes('/element') [Elements] ( Element )
)
SELECT ElementContent AS '*'
FROM Deconstructed
ORDER BY ElementName
FOR XML PATH('');
Here is the result:
<element name="a" />
<element name="b">
<subelement name="s" />
</element>
<element name="c">
<subelement name="t" />
</element>
B - Using XQuery
Using XQuery means that we use the order clause of a FLWOR statement
Here is the trivial case again:
DECLARE @Data XML = N'
<element name="c" />
<element name="b" />
<element name="a" />
';
SELECT Fragment.query('
for $element in /element
order by $element/@name ascending
return $element’)
.
FROM
@Data.nodes('.') Fragment ( Fragment );
And the expected result:
<element name="a" />
<element name="b" />
<element name="c" />
As the XQuery FLWOR statement already works on nodes, we already have a solution for the more
complex case:
DECLARE @Data XML = N'
<element name="c" >
<subelement name="t" />
</element>
<element name="b">
<subelement name="s" />
</element>
<element name="a" />
';
SELECT Fragment.query('
for $element in /element
order by $element/@name ascending
return $element
')
FROM @Data.nodes('.') Fragment ( Fragment );
And here is the result:
<element name="a" />
<element name="b">
<subelement name="s" />
</element>
<element name="c">
<subelement name="t" />
</element>
Problem Solution
Sorting the first level of the list:
DECLARE @Data XML = N'
<level1 name="3">
<level2 name="f" />
<level2 name="e" />
<level2 name="d" />
</level1>
<level1 name="2">
<level2 name="c" />
<level2 name="b" />
</level1>
<level1 name="1">
<level2 name="a" />
</level1>
';
SELECT Levels.query('
for $level1 in /level1
order by $level1/@name ascending
return $level1
')
FROM @Data.nodes('.') Levels ( Levels );
Here is the result, the list is only sorted on the top level:
<level1 name="1">
<level2 name="a" />
</level1>
<level1 name="2">
<level2 name="c" />
<level2 name="b" />
</level1>
<level1 name="3">
<level2 name="f" />
<level2 name="e" />
<level2 name="d" />
</level1>
Here we already see that we need a kind of nested sort, because we have only sorted the outer levels.
In a FLWOR statement we can use complex return expressions, especially we can use further FLWOR
statements:
DECLARE @Data XML = N'
<level1 name="3">
<level2 name="f" />
<level2 name="e" />
<level2 name="d" />
</level1>
<level1 name="2">
<level2 name="c" />
<level2 name="b" />
</level1>
<level1 name="1">
<level2 name="a" />
</level1>
';
SELECT Levels.query('
for $level1 in /level1
order by $level1/@name ascending
return
<level1 name="{$level1/@name}">{
for $level2 in $level1/level2
order by $level2/@name ascending
return $level2
}</level1>
FROM
')
@Data.nodes('.') Levels ( Levels );
Now we have our double-sorted list:
<level1 name="1">
<level2 name="a" />
</level1>
<level1 name="2">
<level2 name="b" />
<level2 name="c" />
</level1>
<level1 name="3">
<level2 name="d" />
<level2 name="e" />
<level2 name="f" />
</level1>
Conclusion
Using the T-SQL approach means that we need to handle the conversion from and to XML to overcome
the barrier between XML and T-SQL. While this is only a small step, it simply means more code. And
more code is more complex per se.
The XQuery FLWOR expression on the other hand allows us to use a more compact notation. And this
kind of XQuery processing was exactly built for these kinds of manipulation. It is the better choice in our
case.
Terminology
Fragment: Part of an XML document

http://www.w3.org/TR/xml-fragment.html#defn-fragment
A fragment is not a document, thus it is not well-formed

http://www.validome.org/xml/validate/
FLWOR: FOR, LET, WHERE, ORDER BY, RETURN (XQuery)


http://en.wikipedia.org/wiki/FLWOR
http://www.w3.org/TR/xquery/#id-flwor-expressions
How to Extract Data in XML to Meet the Requirements of a Schema
Introduction
This article is based on a question posted on the TechNet Forum Brazil for SQL Server - XML - Mode
EXPLICIT, CDATA and will provide you a solution to a common problem; the formatting of a data query in
T-SQL into XML that adequately meets the conditions of a XML Schema (XSD) or a Document Type
Definition (DTD) .
This is one of the possible solutions related to this problem. If you know other options in T-SQL that
meet the needs of this problem, feel free to add your content to this article.
Problem
During my reading of the threads in the SQL forum, I found the following question that was discussed.
The question was: "I'm trying to generate XML using EXPLICIT mode because I need to use CDATA in
some fields. The problem is that an XML Schema requires that my XML have some AttributeName, such
as "g: AttributeName". And WITH XMLNAMESPACES is not compatible with the EXPLICIT mode of TSQL."
It is clear that the person who asked the question, even with some difficulty to ask, explains that his
need is to get the XML data in the following format:



All contents of the structure must belong to the namespace "g"
Each record in the XML must be under the "item" tag
The root tag must remain as "xml"
The XML expected by the poster should result in something similar to this content:
<?xml version="1.0" encoding="utf-8"?>
<xml>
<item>
<g:id>1</g:id>
<g:Name>test 1</g:Name>
</item>
<item>
<g:id>2</g:id>
<g:Name>test 2</g:Name>
</item>
</xml>
Causes
Typically, the use of XML Schema or DTD aims to verify and validate the change and/or receiving
information between two companies or departments that use different systems platforms. All these
validation criteria are to maintain the data integrity between the system supplier and the receiver
system.
This also occurs in environments with similar platforms, but to less extent. This need for data integration
between companies is very old. Even different departments/branches need to ensure that their shared
data is always updated. Today, SQL Server 2012 has the resources to handle this kinds of data processing
that we will present, but these same features can be obtained with greater depth through BizTalk
Server.
Diagnostic Steps
Once you diagnose the cause of the problem, we go to their proper resolution. There may be other
solutions as an alternative, but the one indicated at the end of this article answers the question posted
in the Forum in the simplest and most practical way possible.
Building the Scenario of the Problem
So that we can accurately simulate the problem and propose its solution, we build a table with little
data, but similar to the situation shown in Threads Forum (Figure 1):
CREATE TABLE dbo.test (
Id INT IDENTITY,
CD_Product INT,
NM_Product VARCHAR(50)
)
GO
INSERT INTO dbo.test ( CD_Product , NM_Product )
VALUES ( 1,'test 1'),(2,'test 2')
GO
Figure 1 - Creating the table to demonstrate the solution
Solution
To structure the solution of the problem, one must be clear about all conditions of the XML Schema
proponent of the question, even though it has not been submitted.
Despite the proponent of the question to be trying to get the desired XML format via a query T-SQL in
the EXPLICIT mode , this mode does not allow the condition of the XML Schema to the predecessor of
the "g" namespace in this way we will be presenting the solution with T-SQL using the RAW mode
query .
So to set the precedence of the "g" namespace URI, we set the table fields with this predecessor using
the standard XML Schema separator character.
Each line must have a tag called "item", set in RAW mode, so the new name for the "row" tag should be
"item".
To complete all the requirements stipulated by the proponent of the question, we define the ROOT
function that the root of the all XML has the defined name "xml" tag.
The code of the proposed solution is the following:
WITH XMLNAMESPACES ('uri' as g)
SELECT
CD_Product as 'g:ID',
NM_Product as 'g:NAME'
FROM dbo.test
FOR XML RAW('item'), ROOT('xml'), ELEMENTS XSINIL
The result is displayed as expected by the person asking the question (Figure 2):
Figure 2 - XML Structured as defined in XML Schema
Additional Information
If you want to know how to consume and validate the contents of an XML through XSD or DTD, using
the in VB.Net or C # programming language,
I recommend reading of Knowledge Bases(KB): 315533 and 318504 .
Credits
This article was inspired by writing articles:



Wiki: Templates For Converting a Forum Thread Into a New Wiki Article
Wiki: Technical Editing
Wiki: Best Practices for Source References and Quotes
Thanks Sandro, Naomi, and Peter for the constant guidance in your articles. This motivated me to create
this article!
To strengthen your knowledge about XML, XSD and DTD, I recommend reading of the following:

XML and XML Schemas in BizTalk Server Concepts


XML Schema Examples
Extraindo informações de arquivo XML para o SQL Server (pt-BR)
References
Read some advanced articles:


W3Schools - XML Schema (XSD)
W3Schools - Document Type Definition (DTD)
TechNet Library
Read the following topics:






What is XML Schema (XSD)?
Understanding on XML Schema
XML Schemas Collections (SQL Server)
How to: Create an XML Schema from an XML Document
XML Schema Sintax
Creating XML Schemas from XML Files
CHAPTER 10:
Miscellaneous
T-SQL Script to update string NULL with default NULL
Problem
It is common to have nullable columns in the table, but if we populate those nullable columns with
string NULL instead of default NULL, there araises the problem.
Effects of the Problem
If we populate nullable columns with string column, we cannot make use of NULL functions available in
SQL Server.
For Example:
USE [AdventureWorks2012]
GO
--Create test table with two columns to hold string & default NULL
CREATE TABLE Test_Null(Id INT IDENTITY(1,1),StringNull VARCHAR(10)
,DefaultNull VARCHAR(10))
INSERT Test_Null (StringNull) SELECT 'NULL'
INSERT Test_Null SELECT 'NULL',NULL
--Execute below two queries to find how "IS NULL" works with string & default
NULL
SELECT * FROM Test_Null WHERE StringNULL IS NULL
SELECT * FROM Test_Null WHERE DefaultNull IS NULL
--Execute below two queries to find how "ISNULL" works with string & default
NULL
SELECT ISNULL(StringNULL,0) StringNULL FROM Test_Null
SELECT ISNULL(DefaultNull,0) DefaultNull FROM Test_Null
Solution
USE [AdventureWorks2012]
GO
SET NOCOUNT ON
DECLARE @query NVARCHAR(MAX),
@table_count INT,
@column_count INT,
@tablename VARCHAR(100),
@Columnname VARCHAR(100),
@Schemaname VARCHAR(100) = 'HumanResources', --schema names to be passed
@i INT = 1,
@j INT = 1
DECLARE @MyTableVar TABLE(Number INT IDENTITY(1,1),
Table_list VARCHAR(200));
DECLARE @MyColumnVar TABLE(Number INT IDENTITY(1,1),
Column_list VARCHAR(200));
INSERT INTO @MyTableVar
SELECT name
FROM sys.tables
WHERE TYPE = 'U' AND SCHEMA_NAME(SCHEMA_ID) = @Schemaname
SELECT @table_count = MAX(Number) from @MyTableVar
WHILE @i <= @table_count
BEGIN
SELECT @tablename = Table_list FROM @MyTableVar WHERE Number = @i
INSERT @MyColumnVar
SELECT C.name
FROM SYS.columns C
INNER JOIN SYS.tables T ON T.object_id = C.object_id
INNER JOIN SYS.types TY ON TY.user_type_id =
C.user_type_id AND TY.system_type_id = C.system_type_id
WHERE SCHEMA_NAME(T.SCHEMA_ID) = @Schemaname
AND OBJECT_NAME(T.OBJECT_ID) = @tablename AND T.type = 'U'
AND C.is_nullable = 1
AND TY.system_type_id IN (167,175,231,239) --only character columns
ORDER BY C.column_id
SELECT @column_count = MAX(Number) FROM @MyColumnVar
WHILE @j <= @column_count
BEGIN
SELECT @Columnname = Column_list FROM @MyColumnVar WHERE Number = @j
SET @query = 'UPDATE ['+@Schemaname+'].['+@tablename+'] SET ['+@Columnname+']
= NULL WHERE ['+@Columnname +'] = ''NULL''' + CHAR(10) + 'GO'
SET @j = @j + 1
PRINT @query
--To execute the generated Update scripts
--EXEC (@query)
END
SET @i = @i + 1
END
Note :
i) Above code will generate UPDATE scripts for tables that belongs to the passed Schema names to the
variable@Schemaname
ii) Above code will generate UPDATE scripts only for character columns (VARCHAR, CHAR, NVARCHAR)
iii) Code is tested and working with SQL Server 2008, SQL Server 2012.
FIFO Inventory Problem - Cost of Goods Sold
In this article I am going to explain the FIFO (first in first out) algorithm for calculating cost of goods sold.
This is the real business problem I am working on now.
Different methods of calculating Cost of Goods Sold in the Inventory Calculation
There are many articles on the Internet explaining concepts of Calculating Cost of Goods On Hand and
Cost of Goods Sold in the inventory calculation. I will give just a few of them and quote a bit of material
from these articles to provide a brief overview. I suggest readers of this article review the mentioned
articles or just do a Google search on the terms "FIFO Cost of Goods Inventory calculation".
Inventory Valuation Method - FIFO Vs. Moving Average
How to Calculate Cost of Goods Sold (CoGS)
Inventory and Cost of Goods Sold
Chapter 4 by Hugo Kornelis in the "SQL Server MVP Deep Dives" book (first book) talks a bit about
Running Total problem, so it may be useful to read this chapter as well.
There are several valuation methods, but for small businesses it is generally restricted to FIFO and
Moving Average.
In our application we have two methods of calculating inventory: RWAC (Running Weighted Average
Cost) and FIFO. The preferred method of the calculation can be set in the Inventory Preference form.
Implementing FIFO Cost of Goods Sold in our application
After we have briefly discussed the theory, I am going to talk about implementing the FIFO algorithm of
calculating the Cost of Goods in our software. Historically, our application had only simpler RWAC
method (and not even the true RWAC, but rather just Average cost method). Few years ago the
company management team decided that it's time to offer our clients a FIFO method of calculating Cost
of Goods On Hand and Cost of Goods Sold. My first task was to identify all places in our software where
we may need adjustments and my colleague was tasked with creating necessary T-SQL functions.
I need to describe the Inventory table used in our application.
Here is its DDL
:
CREATE TABLE [dbo].[i_invent](
[pri_key] [int] IDENTITY(1,1) NOT NULL,
[department] [char](10) NOT NULL,
[category] [char](10) NOT NULL,
[item] [char](10) NOT NULL,
[invent_id] [int] NOT NULL,
[trans_type] [char](1) NOT NULL,
[ref_no] [numeric](17, 0) NOT NULL,
[quantity] [numeric](8, 2) NOT NULL,
[unit_cost] [money] NOT NULL,
[locatn_id] [int] NOT NULL,
[message] [varchar](25) NOT NULL,
[exportd_on] [datetime] NULL,
[operator] [char](6) NOT NULL,
[salespoint] [char](6) NOT NULL,
[date_time] [datetime] NOT NULL,
[po_link] [int] NOT NULL,
[adj_type] [int] NOT NULL,
CONSTRAINT [i_invent_track_no] PRIMARY KEY CLUSTERED
(
[pri_key] ASC
)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR =
75) ON [SaleTransactions]
) ON [SaleTransactions]
CREATE NONCLUSTERED INDEX [date_time] ON [dbo].[i_invent]
(
[date_time] ASC
)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE =OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS
= ON, FILLFACTOR = 75) ON [SaleTransactions]
CREATE NONCLUSTERED INDEX [department] ON [dbo].[i_invent]
(
[department] ASC,
[category] ASC,
[item] ASC,
[invent_id] ASC,
[quantity] ASC,
[locatn_id] ASC
)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE =OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS
= ON, FILLFACTOR = 75) ON [SaleTransactions]
CREATE NONCLUSTERED INDEX [i_invent_po_link] ON [dbo].[i_invent]
(
[po_link] ASC
)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE =OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS
= ON, FILLFACTOR = 75) ON [SaleTransactions]
CREATE NONCLUSTERED INDEX [locatn_id] ON [dbo].[i_invent]
(
[locatn_id] ASC
)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE =OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS
= ON, FILLFACTOR = 75) ON [SaleTransactions]
CREATE NONCLUSTERED INDEX [ref_no] ON [dbo].[i_invent]
(
[ref_no] ASC
)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE =OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS
= ON, FILLFACTOR = 75) ON [SaleTransactions]
GO
ALTER TABLE [dbo].[i_invent] WITH NOCHECK ADD CONSTRAINT [FK_i_invent_i_items]
FOREIGN KEY([invent_id])
REFERENCES [dbo].[i_items] ([invent_id])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[i_invent] NOCHECK CONSTRAINT [FK_i_invent_i_items]
GO
ALTER TABLE [dbo].[i_invent] WITH NOCHECK ADD CONSTRAINT [FK_i_invent_i_locatn]
FOREIGN KEY([locatn_id])
REFERENCES [dbo].[i_locatn] ([locat_id])
GO
ALTER TABLE [dbo].[i_invent] CHECK CONSTRAINT [FK_i_invent_i_locatn]
GO
ALTER TABLE [dbo].[i_invent] WITH NOCHECK ADD CONSTRAINT [FK_i_invent_i_pchord]
FOREIGN KEY([po_link])
REFERENCES [dbo].[i_pchord] ([pri_key])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[i_invent] NOCHECK CONSTRAINT [FK_i_invent_i_pchord]
GO
ALTER TABLE [dbo].[i_invent] WITH NOCHECK ADD CONSTRAINT [FK_i_invent_items] FO
REIGN KEY([department], [category], [item])
REFERENCES [dbo].[items] ([department], [category], [item])
GO
ALTER TABLE [dbo].[i_invent] CHECK CONSTRAINT [FK_i_invent_items]
GO
Each inventory item was defined by these attributes: department, category, item, invent_id, locatn_id.
These 5 columns are used to identify a single inventory item in its current location. Quantity and
unit_cost columns are used to identify each inventory movement. In case of Sales or Returns
(Trans_Type = 'S') the unit_cost is 0 and has to be calculated. Trans_Type can be one of the following: P purchase, A - adjustment, T - transfer and S - Sale (negative quantity) or Return (positive quantity). The
ref_no column in case of sales / returns provides a reference to the trans_no from transactions table.
Also the date_time column is important for our calculations. Other columns in the Inventory table are
used for other purposes and not relevant for the calculation of Cost of Goods on Hand or Cost of Goods
Sold.
So, as I said, the first implementation of the Cost of Goods on Hand calculation was written by my
colleague as a multi-statements table valued function that accepted many parameters (some of them
were optional) and the calculation method type (RWAC, FIFO or LIFO) and returned the result as a table.
I checked the date of the first implementation in our Source of Safe software and it is August 2010.
It was quickly determined that using the multi-statements table valued function in that way lead to a
very bad performance. Also, somehow the process of developing the functions (or procedures) to do
these calculations was turned into my hands. I tried to change these functions to inline table-valued
functions for each method separately (one for FIFO and one for RWAC, we decided to drop LIFO method
then) but yet the performance on these set based functions was really bad for the clients with
substantial inventory movements.
In addition to discussing the FIFO calculation problem in the forum's threads I also had private e-mail
exchange with Peter Larsson who eventually helped me to adapt his solution from the Set-based Speed
Phreakery: The FIFO Stock Inventory SQL Problem for our table's structure and Cost of Goods on
Hand problem.
I discussed this problem in many threads in Transact-SQL forum in MSDN. Here is one of the earliest
threads (from May 2011), where I found that inline CTE based solution when we needed to use the same
CTE multiple times, was significantly slower than using temp tables to hold intermediate calculations:
Temp tables vs CTE
I just re-read that long thread. Essentially, I confirmed that using inline UDF to calculate Cost of Goods
on Hand for the selected inventory and using CROSS APPLY with that function was very slow compared
to getting the inventory to work with into a temporary table first and then apply the calculations as a
stored procedure.
It also started to become clear that using our current structure of the table and not having anything precalculated will lead to bad performance as we need to re-calculate the cost every time from the very
beginning. About a year or so ago I proposed a plan to re-design our inventory table by adding few more
tables we will be updating at the same time as transaction occur. Unfortunately, we haven't proceed in
this direction yet and I don't know if we ever going to look into these ideas in order to make the
calculation process easier.
In this article I planned to discuss the Cost of Goods Sold calculations, so I will just give the current code
of the Cost of Goods on Hand FIFO procedure without too many explanations.
Current procedure to calculate Cost of Goods on Hand
--==========================================================
/* SP that returns total quantity and cost of goods on hand
by department, category, item, invent_id, and locatn_id,
using FIFO (first in/first out) method of cost valuation:
To retrieve the total (FIFO) cost of goods on hand
for all inventory by location, by department:
EXECUTE dbo.siriussp_CostOfGoodsOnHand_FIFO
1
locatn_id
department QuantityOnHand CostOfGoodsOnHand
----------- ---------- -------------- ---------------------999
RETAIL
2
0.90
1
RETAIL
2359
31567.73
3
RETAIL
1609
19001.21
*/
--=========================================================
ALTER PROCEDURE dbo.siriussp_CostOfGoodsOnHand_FIFO
(
@bIncludeZeroes BIT = 1
/* If 1, then include
records for items with zero on-hand */
)
AS
BEGIN
SET NOCOUNT
ON;
WITH cteInventorySum
AS (SELECT department,
category,
item,
invent_ID,
locatn_ID,
SUM(quantity) AS TotalInventory,
MAX(date_time) AS LastDateTime
FROM #Inventory
GROUP BY department,
category,
item,
invent_ID,
locatn_ID),
cteReverseInSum
AS (/* Perform a rolling balance ( in reverse order ) through the
inventory movements in */
SELECT s.department,
s.category,
s.item,
s.invent_ID,
s.locatn_ID,
s.Fifo_Rank,
(SELECT SUM(i.quantity)
FROM #Inventory AS i
WHERE i.department = s.department
AND i.category = s.category
AND i.item = s.item
AND i.invent_id = s.invent_id
AND i.locatn_id = s.locatn_id
AND i.trans_Type IN ('P','A','T')
AND i.Fifo_Rank >= s.Fifo_Rank) AS RollingInventory,
SUM(s.Quantity) AS ThisInventory
FROM #Inventory AS s
WHERE s.Trans_Type IN ('P','A','T')
GROUP BY s.Department,
s.Category,
s.Item,
s.Invent_ID,
s.Locatn_ID,
s.Fifo_Rank),
cteWithLastTranDate
AS (SELECT w.Department,
w.Category,
w.Item,
w.Invent_ID,
w.Locatn_ID,
w.LastDateTime,
w.TotalInventory,
COALESCE(LastPartialInventory.Fifo_Rank,0)
AS Fifo_Rank,
COALESCE(LastPartialInventory.InventoryToUse,0)
AS InventoryToUse,
COALESCE(LastPartialInventory.RunningTotal,0)
AS RunningTotal,
w.TotalInventory
- COALESCE(LastPartialInventory.RunningTotal,0)
+COALESCE(LastPartialInventory.InventoryToUse,0) AS UseThisInventory
FROM cteInventorySum AS w
OUTER APPLY (SELECT TOP ( 1 ) z.Fifo_Rank,
z.ThisInventory
AS Invent
oryToUse,
z.RollingInventory AS Runnin
gTotal
FROM cteReverseInSum AS z
WHERE z.Department = w.Department
AND z.Category = w.Category
AND z.Item = w.Item
AND z.Invent_ID = w.Invent_ID
AND z.Locatn_ID = w.Locatn_ID
AND z.RollingInventory >=
w.TotalInventory
ORDER BY z.Fifo_Rank DESC) AS LastPartialInvento
ry),
LastCost
AS (SELECT DISTINCT Cogs.department,
Cogs.category,
Cogs.item,
Cogs.invent_id,
LastCost.LastCost
FROM cteWithLastTranDate Cogs
CROSS APPLY
dbo.siriusfn_LastCostUpToDate(Cogs.department,Cogs.category,Cogs.item,Cogs.in
vent_id, Cogs.LastDateTime) LastCost
WHERE Cogs.UseThisInventory IS NULL
OR Cogs.UseThisInventory =
0 OR Cogs.TotalInventory IS NULL OR Cogs.TotalInventory = 0),
cteSource
AS (
SELECT y.Department,
y.Category,
y.Item,
y.Invent_ID,
y.Locatn_ID,
y.TotalInventory as QuantityOnHand,
SUM(CASE WHEN e.Fifo_Rank = y.Fifo_Rank
THEN y.UseThisInventory
ELSE e.Quantity END * Price.Unit_Cost) AS CostOfGoodsOnHand,
LastCost.LastCost
FROM
cteWithLastTranDate AS y
LEFT JOIN #Inventory AS e ON e.Department = y.Department
AND e.Category = y.Category
AND e.Item = y.Item
AND e.Invent_ID = y.Invent_ID
AND e.Locatn_ID = y.Locatn_ID
AND e.Fifo_Rank >= y.Fifo_Rank
AND e.Trans_Type IN ('P', 'A', 'T')
LEFT JOIN LastCost
ON y.Department = LastCost.Department
AND y.Category = LastCost.Category
AND y.Item = LastCost.Item
AND y.Invent_ID = LastCost.Invent_ID
OUTER APPLY (
/* Find the Price of the item in */
SELECT TOP (1) p.Unit_Cost
FROM #Inventory AS p
WHERE p.Department = e.Department and
p.Category = e.Category and
p.Item = e.Item and
p.Invent_ID = e.Invent_ID and
p.Locatn_ID = e.Locatn_ID and
p.Fifo_Rank <= e.Fifo_Rank and
p.Trans_Type IN ('P', 'A', 'T')
ORDER BY p.Fifo_Rank DESC
) AS Price
GROUP BY y.Department,
y.Category,
y.Item,
y.Invent_ID,
y.Locatn_ID,
y.TotalInventory,
LastCost.LastCost)
SELECT Department,
Category,
Item,
Invent_ID,
Locatn_ID,
CONVERT(INT,QuantityOnHand) as QuantityOnHand,
COALESCE(CostOfGoodsOnHand,0) AS CostOfGoodsOnHand,
COALESCE(CASE
WHEN QuantityOnHand <> 0
AND CostOfGoodsOnHand <> 0 THEN CostOfGoodsOnHand /
QuantityOnHand
ELSE LastCost
END, 0) AS AverageCost
FROM cteSource
WHERE @bIncludeZeroes = 1
OR (@bIncludeZeroes = 0
AND CostOfGoodsOnHand <> 0)
ORDER BY Department,
Category,
Item,
Invent_ID,
Locatn_ID;
END
GO
/* Test Cases
CREATE TABLE [dbo].[#Inventory](
[pri_key] [int] IDENTITY(1,1) NOT NULL,
[ref_no] [numeric](17, 0) NOT NULL,
[locatn_id] [int] NOT NULL,
[date_time] [datetime] NOT NULL,
[fifo_rank] [bigint] NULL,
[department] [char](10) NOT NULL,
[category] [char](10) NOT NULL,
[item] [char](10) NOT NULL,
[invent_id] [int] NOT NULL,
[trans_type] [char](1) NOT NULL,
[quantity] [numeric](8, 2) NOT NULL,
[unit_cost] [money] NOT NULL
) ON [PRIMARY]
SET IDENTITY_INSERT [dbo].[#Inventory] ON;
BEGIN TRANSACTION;
INSERT INTO [dbo].[#Inventory]([pri_key], [ref_no], [locatn_id], [date_time],
[fifo_rank], [department], [category], [item], [invent_id], [trans_type],
[quantity], [unit_cost])
SELECT 774, 0, 1, '20120627 11:58:26.000', 1, N'RETAIL
', N'SUPPLIES
N'BUG_SPRAY ', 0, N'T', 10.00, 2.0000 UNION ALL
SELECT 775, 129005001, 1, '20120627 13:02:57.000', 2, N'RETAIL
',
N'SUPPLIES ', N'BUG_SPRAY ', 0, N'S', -9.00, 0.0000 UNION ALL
SELECT 778, 0, 1, '20120627 13:06:07.000', 3, N'RETAIL
', N'SUPPLIES
N'BUG_SPRAY ', 0, N'T', 10.00, 2.6667 UNION ALL
SELECT 779, 130005001, 1, '20120627 13:17:46.000', 4, N'RETAIL
',
N'SUPPLIES ', N'BUG_SPRAY ', 0, N'S', -7.00, 0.0000 UNION ALL
SELECT 780, 131005001, 1, '20120627 13:18:16.000', 5, N'RETAIL
',
N'SUPPLIES ', N'BUG_SPRAY ', 0, N'S', 3.00, 0.0000 UNION ALL
SELECT 772, 24, 3, '20120627 11:57:17.000', 1, N'RETAIL
', N'SUPPLIES
N'BUG_SPRAY ', 0, N'P', 20.00, 2.0000 UNION ALL
SELECT 773, 0, 3, '20120627 11:58:26.000', 2, N'RETAIL
', N'SUPPLIES
N'BUG_SPRAY ', 0, N'T', -10.00, 2.0000 UNION ALL
SELECT 776, 24, 3, '20120627 13:04:29.000', 3, N'RETAIL
', N'SUPPLIES
N'BUG_SPRAY ', 0, N'P', 20.00, 3.0000 UNION ALL
SELECT 777, 0, 3, '20120627 13:06:07.000', 4, N'RETAIL
', N'SUPPLIES
N'BUG_SPRAY ', 0, N'T', -10.00, 2.6667
COMMIT;
RAISERROR (N'[dbo].[#Inventory]: Insert Batch: 1.....Done!', 10,
1) WITH NOWAIT;
GO
',
',
',
',
',
',
SET IDENTITY_INSERT [dbo].[#Inventory] OFF;
PRINT 'FIFO Calculation Cost:'
declare @Time datetime2(7) = SYSDATETIME(), @Elapsed int
EXECUTE dbo.siriussp_CostOfGoodsOnHand_FIFO
1
set @Elapsed = DATEDIFF(microsecond,@time, getdate())
print 'Elapsed: ' + convert(varchar(10),@Elapsed) + ' microseconds'
go
*/
You can see that this procedure is using #Inventory temporary table and that it also has fifo_rank
column which is not present in the i_invent table from the database. I am pre-selecting rows I may be
interested in into the temporary #Inventory table and creating fifo_rank column using ROW_NUMBER()
function to partition by the 5 columns that determine a single inventory item and order by date_time,
po_link columns. You can also see that this procedure references this function
siriusfn_LastCostUpToDate. This function calculates last cost of the item to date using iterative
approach - it first tries to calculate it for the specific invent_id (invent_id <> 0 is for the "matrix" items,
e.g. items that may come in different sizes or colors). If there are no rows for the specific invent_id it
tries to get the last cost for the item itself regardless of invent_id. If it is still unknown, it checks
purchase orders table (i_pchord) also first for the invent_id and then for the item itself.
Here is the current code of this function:
ALTER FUNCTION [dbo].[siriusfn_LastCostUpToDate] (
@cDepartment CHAR(10)
,@cCategory CHAR(10)
,@cItem CHAR(10)
,@iInventID INT
,@dtEnd DATETIME -- cut off date
)
RETURNS TABLE
--==============================================
/* Function that returns the last unit cost value for
every matrix item within the given range. It evaluates,
in order, until it finds the first applicable record:
1.
2.
3.
4.
5.
last received cost at the matrix level
last received cost at the item level
last ordered cost at the matrix level
last ordered cost at the item level
If no history is found, then last cost is zero.
*/
AS
RETURN
WITH cteItemsOnly AS (
SELECT i.department
,i.category
,i.item
,i.inventory
FROM dbo.items i
WHERE i.department = @cDepartment
AND i.category = @cCategory
AND i.item = @cItem
)
,cteItems AS (
SELECT i.department
,i.category
,i.item
,ISNULL(ii.invent_id, 0) AS invent_id
,inventory
FROM cteItemsOnly i
LEFT JOIN dbo.i_items ii ON i.inventory = 0
AND ii.department = i.department
AND ii.category = i.category
AND ii.item = i.item
AND ii.invent_id = @iInventID
)
,cteRcvdMatrix AS (
SELECT i.department
,i.category
,i.item
,i.invent_id
,F.unit_cost AS LastCost
FROM cteItems i
OUTER APPLY (
SELECT TOP 1 unit_cost
FROM dbo.i_invent ii
WHERE trans_type IN (
'P'
,'A'
,'T'
)
AND i.department = ii.department
AND i.category = ii.category
AND i.item = ii.item
AND i.invent_id = ii.invent_id
AND ii.date_time <= @dtEnd
ORDER BY ii.date_time DESC
,unit_cost DESC
) F
)
,cteRcvdItem AS (
SELECT *
FROM cteRcvdMatrix
WHERE LastCost IS NOT NULL
UNION ALL
SELECT i.department
,i.category
,i.item
,i.invent_id
,F.unit_cost AS LastCost
FROM cteRcvdMatrix i
OUTER APPLY (
SELECT TOP 1 unit_cost
FROM dbo.i_invent ii
WHERE trans_type IN (
'P'
,'A'
,'T'
)
AND i.department = ii.department
AND i.category = ii.category
AND i.item = ii.item
AND ii.date_time <= @dtEnd
ORDER BY ii.date_time DESC
,unit_cost DESC
) F
WHERE i.LastCost IS NULL
)
,ctePOMatrix AS (
SELECT *
FROM cteRcvdItem
WHERE LastCost IS NOT NULL
UNION ALL
SELECT i.department
,i.category
,i.item
,i.invent_id
,F.unit_cost AS LastCost
FROM cteRcvdItem i
OUTER APPLY (
SELECT TOP (1) unit_cost
FROM dbo.i_pchord ii
WHERE i.department = ii.department
AND i.category = ii.category
AND i.item = ii.item
AND i.invent_id = ii.invent_id
AND ii.date_time <= @dtEnd
ORDER BY ii.date_time DESC
,unit_cost DESC
) F
WHERE i.LastCost IS NULL
)
,ctePOItem AS (
SELECT *
FROM ctePOMatrix
WHERE LastCost IS NOT NULL
UNION ALL
SELECT i.department
,i.category
,i.item
,i.invent_id
,F.unit_cost AS LastCost
FROM ctePOMatrix i
OUTER APPLY (
SELECT TOP (1) unit_cost
FROM dbo.i_pchord ii
WHERE i.department = ii.department
AND i.category = ii.category
AND i.item = ii.item
AND ii.date_time <= @dtEnd
ORDER BY ii.date_time DESC
,unit_cost DESC
) F
WHERE i.LastCost IS NULL
)
SELECT i.department
,i.category
,i.item
,i.invent_id
,coalesce(i.LastCost, 0) AS LastCost
FROM ctePOItem i
GO
/* Test Cases
set statistics io on
SELECT * FROM dbo.siriusfn_LastCost('RT34HANDW','058GLOVEL',
409) -- RT34HANDW
058GLOVEL
19599
'19599
',
SELECT * FROM dbo.siriusfn_LastCostUpToDate('RT34HANDW','058GLOVEL',
'19
599
',
409,'20040101')
-- select top (1) * from dbo.i_invent where invent_id = 409 and trans_type
in ('A','P','T') and quantity > 0 order by date_time desc
set statistics io off
*/
FIFO Cost of Goods Sold
Now I am going to discuss the procedure I am using to calculate Cost of Goods Sold using FIFO method.
About a year ago I spent a lot of time creating two versions of the procedure - one for SQL Server 20052008 and one for SQL Server 2012. I thought I tested these procedures extensively and had them
working great. Our testers also tested them in various scenarios (I hope). Turned out I had not tested
them well enough and they were failing in a really simple scenario. Also, our client found a more
complex scenario and was able to perform analysis of these procedures and showed their faults.
Therefore I needed to look at them again and fix the problems.
So, I looked at them recently and I had to admit, I could not really understand what I was doing in
them. I think if I had written this article then rather than now, it may have helped. So, by documenting
my line of thoughts now in creating this procedure and also accepting the revisions by other people, it
may help me (and others) to perfect this procedure in the future, or re-design it again if needed.
The scenario that my colleague found failing in the last implementation of the procedure was the
following:
1. Create a new retail tracking item if necessary
2. Receive 20 units of the item at $10.00 each
3. Receive another 20 units of the item at $5.00 each
4. Sell 30 units of the item
5. Sell 10 units of the item the next day (or set carryover 1 day forward)
6. Make sure the "Closing Cost Calculation Algorithm" under Retail Preferences in SysManager is set to
FIFO. Run the Profit and Loss Report against each day. For the second day (sale of 10 units), COGS
correctly shows $50.00. For the first day (sale of 30 units), COGS shows $150.00 (should be $250.00)
So, I decided I am going to try to re-write this procedure again rather than trying to figure out what was
that procedure doing and where the bug may be. I also found the following thread in MSDN SQL FIFO
Query which I already used in my prior attempts to solve the FIFO Cost of Goods Sold problem. This time
I concentrated on the Peter Larsson's solution in that thread (SwePeso).
In the procedure, that is invoked before the FIFO Cost of Goods Sold procedure is called, I am selecting
inventory items for the user's selections (say, for the Profit and Loss report the user can select particular
department (or department and category), may select specific vendor and also selects a date range
interval). So, I select rows into #Inventory temp table up to the end date of the selected dates interval. I
again add FIFO_RANK column and also for simplicity I add InvNo numerical column using DENSE_RANK()
function with order by Department, Category, Item, Invent_ID, Locatn_ID. This is done in order to use a
single integer column to identify each inventory item rather than 5 columns. In my calculations I am also
using dbo.Numbers table that has a single number column. In our database that table contains numbers
from ~-100K to 100K.
The idea of the new design of this procedure is to calculate the starting point of the inventory in one
step (up to Start Date - dtStart parameter) using Peter's idea and then process each individual sale or
return (and negative quantity transfers) within the selected date intervals. The final result should have
all sales and returns in the selected period (quantity and unit_cost).
So, I decided to introduce yet another temporary table I called #MovingInventory. In this table I have
InvNo column (this is artificial Item Id for each inventory item in a location I created in the pre-step),
fifo_rank column, quantity - the same quantity as in the #Inventory, CurrentQuantity (this column
should reflect the current remaining quantity), Removed (quantity removed) and Returned (quantity
returned). If we are to change our current Inventory process, we may create this table as a permanent
table in the database and update it on each inventory movement. We can also create InventorySales
table. Using these tables will significantly simplify the current calculation process.
Therefore, in the beginning of the procedure I now have this code:
IF OBJECT_ID('TempDB..#MovingInventory', N'U') IS NOT NULL
DROP TABLE #MovingInventory;
CREATE TABLE [dbo].[#MovingInventory] (
InvNo INT NOT NULL
,fifo_rank INT NOT NULL
,quantity INT
,unit_cost MONEY
,Removed INT
,Returned INT
,CurrentQuantity INT
,CONSTRAINT pkMovingInventory PRIMARY KEY (
InvNo
,fifo_rank
)
)
INSERT INTO #MovingInventory (
InvNo
,fifo_rank
,quantity
,unit_cost
,Removed
,Returned
,CurrentQuantity
)
SELECT InvNo
,fifo_rank
,quantity
,unit_cost
,0
,0
,quantity
FROM #Inventory
WHERE trans_type IN (
'P'
,'A'
,'T'
)
AND quantity > 0
ORDER BY InvNo
,fifo_rank;
So, we start with populating this new #MovingInventory temporary table with all positive additions to
the inventory with their unit_cost. I set CurrentQuantity to quantity and Returned and Removed to 0.
I have two more temporary tables used in this procedure: #Sales - this table will be used to generate our
final result and it will contain all sales and returns in the specified date range with the quantity sold
(returned) and unit cost used.
I also have #Removed table. I could have used table variable here instead but I recall I had some
problems with the table variable before in my prior version of that procedure so I decided to use
temporary table again. This table will be used to hold items removed (or returned) on each iteration and
it will be cleaned (truncated) on each iteration.
Here is the definition of these 2 temporary tables at the top of the procedure:
IF OBJECT_ID('TempDB..#Sales', N'U') IS NOT NULL
DROP TABLE #Sales;
CREATE TABLE [dbo].[#Sales] (
InvNo INT NOT NULL
,[trans_no] [numeric](17, 0) NOT NULL
,[locatn_id] [int] NOT NULL
,[date_time] [datetime] NOT NULL
,[department] [char](10) COLLATE DATABASE_DEFAULT NOT NULL
,[category] [char](10) COLLATE DATABASE_DEFAULT NOT NULL
,[item] [char](10) COLLATE DATABASE_DEFAULT NOT NULL
,[invent_id] [int] NOT NULL
,quantity INT
,unit_cost MONEY
)
IF OBJECT_ID('TempDB..#Removed', N'U') IS NOT NULL
DROP TABLE #Removed;
CREATE TABLE [dbo].[#Removed] (
unit_cost MONEY
,Removed INT
)
Now, I decided to use two cursor loops in my procedure - one to iterate through each inventory item
and another inner loop to go through each individual sale for that item. We all know well, that cursor
based solutions are generally not recommended as they normally perform much worse than set based
solutions. However, for solving this problem I simply don't see a set-based solution that's why I decided
to use cursors. I may eventually re-design this procedure into CLR based procedure although I am not
sure CLR based procedures may work with the temporary tables to start with.
So, my first step is to calculate prior inventory in one step. Here is the code I use for this:
WHILE (@@FETCH_STATUS = 0)
BEGIN
SELECT @fifo_rank = MAX(fifo_rank)
,@Removed = - 1 * SUM(quantity)
FROM #Inventory
WHERE date_time < @dtStart
AND (
trans_type = 'S'
OR quantity < 0
)
AND InvNo = @InvNo;
IF COALESCE(@Removed, 0) > 0 -- what to do when we start with returns
- unlikely to happen, though?
BEGIN
IF @Debug = 1
PRINT 'Calculating starting inventory';;
WITH cteSource
AS (
SELECT TOP (@Removed) s.unit_Cost
,s.fifo_rank
,s.quantity
FROM #MovingInventory AS s
CROSS APPLY (
SELECT TOP (CAST(s.Quantity AS INT)) ROW_NUMBER() OVER (
ORDER BY number
) AS n
FROM dbo.numbers n5
WHERE number > 0
) AS f(n)
WHERE s.InvNo = @InvNo
AND s.fifo_rank < @fifo_rank
ORDER BY s.fifo_rank
)
,cteRemoved
AS (
SELECT unit_Cost
,fifo_rank
,quantity
,COUNT(*) AS Removed
FROM cteSource
GROUP BY unit_Cost
,fifo_rank
,quantity
)
UPDATE M
SET Removed = R.Removed
,CurrentQuantity = M.CurrentQuantity - R.Removed
FROM #MovingInventory M
INNER JOIN cteRemoved R ON M.fifo_rank = R.fifo_rank
WHERE M.InvNo = @InvNo;
-- We can also check if Removed = @Removed (if less, we have
negative inventory - unlikely situation)
END
Here I am attempting to calculate our current working inventory in one step. I get the total sold quantity
and last date (fifo_rank) when it was sold prior to dtStart and then distribute that sold quantity among
all prior additions into inventory.
Here I am not considering situations when somehow we already sold more than we had in the inventory
originally or when we returned more than sold (so total quantity will be greater than 0). To be honest, I
am not 100% sure how to treat these situations, so I assume that possibility of them occurring is very
low.
Once we got the inventory up to the starting date (dtStart) I am ready to process each individual sale or
return. Here is how I do it for Sales and negative transfers:
WHILE (@@FETCH_STATUS = 0)
BEGIN
IF @quantity < 0 -- Sale or transfer
BEGIN
IF @Debug = 1
BEGIN
SET @Message = 'Sale or transfer with quantity = ' + CAST(1 * @quantity ASVARCHAR(20))
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
SELECT @Removed = - 1 * @quantity;
WITH cteSource
AS (
SELECT TOP (@Removed) s.unit_Cost
,s.fifo_rank
,s.CurrentQuantity
FROM #MovingInventory AS s
CROSS APPLY (
SELECT TOP (s.CurrentQuantity) ROW_NUMBER() OVER (
ORDER BY number
) AS n
FROM dbo.numbers n5
WHERE number > 0
) AS f(n)
WHERE s.InvNo = @InvNo
AND s.fifo_rank < @fifo_rank
AND s.CurrentQuantity > 0
ORDER BY s.fifo_rank
)
,cteRemoved
AS (
SELECT unit_Cost
,fifo_rank
,CurrentQuantity
,COUNT(*) AS Removed
FROM cteSource
GROUP BY unit_Cost
,fifo_rank
,CurrentQuantity
)
UPDATE I
SET CurrentQuantity = I.CurrentQuantity - R.Removed
,Removed = I.Removed + R.Removed
OUTPUT Inserted.unit_cost
,Inserted.Removed - deleted.Removed
INTO #Removed(unit_cost, Removed)
FROM #MovingInventory I
INNER JOIN cteRemoved R ON I.fifo_rank = R.fifo_rank
WHERE I.InvNo = @InvNo;
IF @Debug = 1
BEGIN
SELECT *
FROM #MovingInventory I
WHERE I.InvNo = @InvNo;
RAISERROR (
'Current Moving Inventory after Sale or Return'
,10
,1
)
WITH NOWAIT
END
IF @trans_type = 'S'
AND @date_time >= @dtStart
INSERT INTO #Sales (
trans_no
,InvNo
,locatn_id
,date_time
,department
,category
,item
,invent_id
,unit_cost
,quantity
)
SELECT @ref_no
,@InvNo
,@locatn_id
,@date_time
,@department
,@category
,@item
,@invent_id
,unit_cost
,Removed
FROM #Removed;
--- Need to check for situations when we sell more than
currently in the inventory (rare cases)
SELECT @Difference = @Removed - COALESCE((
SELECT SUM(Removed)
FROM #Removed
), 0);
IF @Difference > 0 -- Sold more than were in the inventory
BEGIN
IF @Debug = 1
BEGIN
SET @Message = 'Sold more than in the inventory Difference = ' + CAST(@DifferenceAS VARCHAR(10))
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
SET @LastCost = 0;
SELECT @LastCost = LastCost.LastCost
FROM dbo.siriusfn_LastCostUpToDate(@department, @category,
@item, @invent_id, @date_time) LastCost;
INSERT INTO #Sales (
trans_no
,InvNo
,locatn_id
,date_time
,department
,category
,item
,invent_id
,unit_cost
,quantity
)
SELECT @ref_no
,@InvNo
,@locatn_id
,@date_time
,@department
,@category
,@item
,@invent_id
,@LastCost
,@Difference
So, for each sale (or negative transfer) I use the same idea as in calculating starting inventory. I remove
the sold quantity distributing it among rows where current quantity > 0 ordering by date_time
(fifo_rank) column. I then update the #MovingInventory table (current quantity and Removed columns)
and I output results using OUTPUT keyword for UPDATE into #Removed table. In addition, I populate
#Sales table if the Trans_Type is 'S' (sales) to be used in the final select statement.
I also try to consider situations when we sold (or moved out) more than we have in the inventory. In this
case we're using Last Cost for the item.
Here lies another problem not currently considered - if we have the negative quantity balance, we need
to keep decrementing that difference after we receive that item. This is not currently done in my
procedure - so we may get incorrect Cost of Goods Sold in such scenarios. I may need to think more how
to handle this problem.
For the returns I am using a similar process to what I use for Sales, but I try to return back what I've
already removed in the opposite direction (e.g. last removed - first returned). So, this is how I handle
returns:
SELECT @Returned = @quantity;
WITH cteSource
AS (
SELECT TOP (@Returned) s.unit_Cost
,s.fifo_rank
,s.quantity
FROM #MovingInventory AS s
CROSS APPLY (
SELECT TOP (s.Removed - s.Returned) ROW_NUMBER() OVER (
ORDER BY number
) AS n
FROM dbo.numbers n5
WHERE number > 0
) AS f(n)
WHERE s.InvNo = @InvNo
AND s.fifo_rank < @fifo_rank
AND (s.Removed - s.Returned) > 0
ORDER BY s.fifo_rank DESC -- returns in the LIFO order
)
,cteReturned
AS (
SELECT unit_Cost
,fifo_rank
,quantity
,COUNT(*) AS Returned
FROM cteSource
GROUP BY unit_Cost
,fifo_rank
,quantity
)
UPDATE I
SET CurrentQuantity = I.CurrentQuantity + R.Returned
,Returned = I.Returned + R.Returned
OUTPUT Inserted.unit_cost
,Inserted.Returned - deleted.Returned
INTO #Removed(unit_cost, Removed)
FROM #MovingInventory I
INNER JOIN cteReturned R ON I.fifo_rank = R.fifo_rank
WHERE I.InvNo = @InvNo;
IF @Debug = 1
BEGIN
SELECT *
FROM #MovingInventory I
WHERE I.InvNo = @InvNo;
RAISERROR (
'Result after return'
,10
,1
)
WITH NOWAIT;
END
IF @trans_type = 'S'
AND @date_time >= @dtStart
INSERT INTO #Sales (
trans_no
,InvNo
,locatn_id
,date_time
,department
,category
,item
,invent_id
,unit_cost
,quantity
)
SELECT @ref_no
,@InvNo
,@locatn_id
,@date_time
,@department
,@category
,@item
,@invent_id
,unit_cost
,(- 1) * Removed
FROM #Removed;-- handle returns
-- Need to check for situations when we return what we
didn't have in the inventory before
IF @Debug = 1
BEGIN
SELECT *
FROM #Sales;
RAISERROR (
'Current Sales after return'
,10
,1
)
WITH NOWAIT;
END
SELECT @Difference = @Returned - COALESCE((
SELECT SUM(Removed)
FROM #Removed
), 0);
IF @Difference > 0 -- Returned more than were in the inventory
originally, use Last Cost
BEGIN
IF @Debug = 1
BEGIN
SET @Message = 'Returned more than removed Difference= ' + CAST(@Difference AS VARCHAR(10)) + ' Last Cost = '
+ CAST(@LastCost AS VARCHAR(20));
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
SET @LastCost = 0;
SELECT @LastCost = LastCost.LastCost
FROM dbo.siriusfn_LastCostUpToDate(@department, @category,
@item, @invent_id, @date_time) LastCost;
INSERT INTO #Sales (
trans_no
,InvNo
,locatn_id
,date_time
,department
,category
,item
,invent_id
,unit_cost
,quantity
)
SELECT @ref_no
,@InvNo
,@locatn_id
,@date_time
,@department
,@category
,@item
,@invent_id
,@LastCost
,- 1 * @Difference;
END
END
Here again if we returned back more than we originally removed, I am returning using the last known
cost for the item.
The Cost of Goods Sold FIFO procedure
Now I will give you the whole procedure code and hopefully you will see my logic. I also will appreciate
comments or code corrections as this is still a work in progress and hasn't been tested extensively yet.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
SET NOCOUNT ON;
---------------- #Inventory test object creation so the script below doesn't
complain about #Inventory table ----------IF OBJECT_ID('tempdb..#Inventory', N'U') IS NOT NULL
DROP TABLE #Inventory;
CREATE TABLE [dbo].[#Inventory] (
[ref_no] [numeric](17, 0) NOT NULL
,[locatn_id] [int] NOT NULL
,[date_time] [datetime] NOT NULL
,[fifo_rank] [bigint] NULL
,[InvNo] [bigint] NULL
,[department] [char](10) NOT NULL
,[category] [char](10) NOT NULL
,[item] [char](10) NOT NULL
,[invent_id] [int] NOT NULL
,[trans_type] [char](1) NOT NULL
,[quantity] [numeric](8, 2) NOT NULL
,[unit_cost] [money] NOT NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
INSERT [dbo].[#Inventory] (
[ref_no]
,[locatn_id]
,[date_time]
,[fifo_rank]
,[InvNo]
,[department]
,[category]
,[item]
,[invent_id]
,[trans_type]
,[quantity]
,[unit_cost]
)
VALUES (
CAST(53 AS NUMERIC(17, 0))
,1
,CAST(0x0000A20000FF6D74 AS DATETIME)
,1
,1
,N'RETAIL
'
,N'BK-CHILD '
,N'DSCATTEST '
,0
,N'P'
,CAST(40.00 AS NUMERIC(8, 2))
,10.0000
)
INSERT [dbo].[#Inventory] (
[ref_no]
,[locatn_id]
,[date_time]
,[fifo_rank]
,[InvNo]
,[department]
,[category]
,[item]
,[invent_id]
,[trans_type]
,[quantity]
,[unit_cost]
)
VALUES (
CAST(53 AS NUMERIC(17, 0))
,1
,CAST(0x0000A20000FF6D74 AS DATETIME)
,2
,1
,N'RETAIL
'
,N'BK-CHILD '
,N'DSCATTEST '
,0
,N'P'
,CAST(40.00 AS NUMERIC(8, 2))
,5.0000
)
INSERT [dbo].[#Inventory] (
[ref_no]
,[locatn_id]
,[date_time]
,[fifo_rank]
,[InvNo]
,[department]
,[category]
,[item]
,[invent_id]
,[trans_type]
,[quantity]
,[unit_cost]
)
VALUES (
CAST(136005001 AS NUMERIC(17, 0))
,1
,CAST(0x0000A200011967D8 AS DATETIME)
,3
,1
,N'RETAIL
'
,N'BK-CHILD '
,N'DSCATTEST '
,0
,N'S'
,CAST(- 50.00 AS NUMERIC(8, 2))
,0.0000
)
INSERT [dbo].[#Inventory] (
[ref_no]
,[locatn_id]
,[date_time]
,[fifo_rank]
,[InvNo]
,[department]
,[category]
,[item]
,[invent_id]
,[trans_type]
,[quantity]
,[unit_cost]
)
VALUES (
CAST(54 AS NUMERIC(17, 0))
,1
,CAST(0x0000A200011967DA AS DATETIME)
,4
,1
,N'RETAIL
'
,N'BK-CHILD '
,N'DSCATTEST '
,0
,N'P'
,CAST(40.00 AS NUMERIC(8, 2))
,7.5000
)
INSERT [dbo].[#Inventory] (
[ref_no]
,[locatn_id]
,[date_time]
,[fifo_rank]
,[InvNo]
,[department]
,[category]
,[item]
,[invent_id]
,[trans_type]
,[quantity]
,[unit_cost]
)
VALUES (
CAST(136005002 AS NUMERIC(17, 0))
,1
,CAST(0x0000A200011967DE AS DATETIME)
,5
,1
,N'RETAIL
'
,N'BK-CHILD '
,N'DSCATTEST '
,0
,N'S'
,CAST(- 50.00 AS NUMERIC(8, 2))
,0.0000
)
GO
IF NOT EXISTS (
SELECT *
FROM INFORMATION_SCHEMA.ROUTINES
WHERE ROUTINE_NAME = 'siriussp_CostOfGoodsSold_FIFO'
AND ROUTINE_TYPE = 'PROCEDURE'
)
EXECUTE
('CREATE PROCEDURE dbo.siriussp_CostOfGoodsSold_FIFO AS SET NOCOUNT ON;');
GO
ALTER PROCEDURE dbo.siriussp_CostOfGoodsSold_FIFO (
@dtStart DATETIME
,@Debug BIT = 0
)
--=============================================================
/* SP that returns total quantity and cost of goods sold
by department, category, item, invent_id, and locatn_id,
using FIFO (First IN, First OUT) method of cost valuation.
Modified on 07/10/2012
Modified on 07/19/2013 - 7/26/2013
--=============================================================
*/
AS
BEGIN
SET NOCOUNT ON;
--IF CAST(LEFT(CAST(serverproperty('ProductVersion') AS VARCHAR(max)), 2)
AS DECIMAL(10, 2)) >= 11
-- AND OBJECT_ID('dbo.siriussp_CostOfGoodsSold_FIFO_2012', 'P') IS NOT
NULL
--BEGIN
-- PRINT 'Using 2012 version of the stored procedure'
-- EXECUTE sp_ExecuteSQL N'EXECUTE dbo.siriussp_CostOfGoodsSold_FIFO_2012
@dtStart, @Debug'
-,N'@dtStart DATETIME, @Debug BIT'
-,@dtStart, @Debug ;
-- RETURN;
--END
--PRINT 'Using cursor based version of the stored procedure'
IF OBJECT_ID('TempDB..#Sales', N'U') IS NOT NULL
DROP TABLE #Sales;
CREATE TABLE [dbo].[#Sales] (
InvNo INT NOT NULL
,[trans_no] [numeric](17, 0) NOT NULL
,[locatn_id] [int] NOT NULL
,[date_time] [datetime] NOT NULL
,[department] [char](10) COLLATE DATABASE_DEFAULT NOT NULL
,[category] [char](10) COLLATE DATABASE_DEFAULT NOT NULL
,[item] [char](10) COLLATE DATABASE_DEFAULT NOT NULL
,[invent_id] [int] NOT NULL
,quantity INT
,unit_cost MONEY
)
IF OBJECT_ID('TempDB..#Removed', N'U') IS NOT NULL
DROP TABLE #Removed;
CREATE TABLE [dbo].[#Removed] (
unit_cost MONEY
,Removed INT
)
IF OBJECT_ID('TempDB..#MovingInventory', N'U') IS NOT NULL
DROP TABLE #MovingInventory;
CREATE TABLE [dbo].[#MovingInventory] (
InvNo INT NOT NULL
,fifo_rank INT NOT NULL
,quantity INT
,unit_cost MONEY
,Removed INT
,Returned INT
,CurrentQuantity INT
,CONSTRAINT pkMovingInventory PRIMARY KEY (
InvNo
,fifo_rank
)
)
INSERT INTO #MovingInventory (
InvNo
,fifo_rank
,quantity
,unit_cost
,Removed
,Returned
,CurrentQuantity
)
SELECT InvNo
,fifo_rank
,quantity
,unit_cost
,0
,0
,quantity
FROM #Inventory
WHERE trans_type IN (
'P'
,'A'
,'T'
)
AND quantity > 0
ORDER BY InvNo
,fifo_rank;
IF NOT EXISTS (
SELECT NAME
FROM TempDB.sys.sysindexes
WHERE NAME = 'idx_Inventory_fifo_rank'
)
CREATE INDEX idx_Inventory_fifo_rank ON #Inventory (
InvNo
,fifo_rank
);
DECLARE @InvNo INT
,@ref_no NUMERIC(17, 0)
,@locatn_id INT
,@date_time DATETIME
,@fifo_rank INT
,@department CHAR(10)
,@category CHAR(10)
,@item CHAR(10)
,@invent_id INT
,@trans_type CHAR(1)
,@quantity INT
,@unit_cost MONEY
,@LastCost MONEY
,@CurInvNo INT
,@Removed INT
,@Returned INT
,@Elapsed INT
,@StartTime DATETIME
,@Message VARCHAR(MAX)
,@Difference INT;
SET @StartTime = CURRENT_TIMESTAMP;
DECLARE curMainProcess CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY
FOR
SELECT DISTINCT InvNo
FROM #Inventory
ORDER BY InvNo;
OPEN curMainProcess;
FETCH NEXT
FROM curMainProcess
INTO @InvNo;
WHILE (@@FETCH_STATUS = 0)
BEGIN
SELECT @fifo_rank = MAX(fifo_rank)
,@Removed = - 1 * SUM(quantity)
FROM #Inventory
WHERE date_time < @dtStart
AND (
trans_type = 'S'
OR quantity < 0
)
AND InvNo = @InvNo;
IF COALESCE(@Removed, 0) > 0 -- what to do when we start with returns
- unlikely to happen, though?
BEGIN
IF @Debug = 1
PRINT 'Calculating starting inventory';;
WITH cteSource
AS (
SELECT TOP (@Removed) s.unit_Cost
,s.fifo_rank
,s.quantity
FROM #MovingInventory AS s
CROSS APPLY (
SELECT TOP (CAST(s.Quantity AS INT)) ROW_NUMBER() OVER (
ORDER BY number
) AS n
FROM dbo.numbers n5
WHERE number > 0
) AS f(n)
WHERE s.InvNo = @InvNo
AND s.fifo_rank < @fifo_rank
ORDER BY s.fifo_rank
)
,cteRemoved
AS (
SELECT unit_Cost
,fifo_rank
,quantity
,COUNT(*) AS Removed
FROM cteSource
GROUP BY unit_Cost
,fifo_rank
,quantity
)
UPDATE M
SET Removed = R.Removed
,CurrentQuantity = M.CurrentQuantity - R.Removed
FROM #MovingInventory M
INNER JOIN cteRemoved R ON M.fifo_rank = R.fifo_rank
WHERE M.InvNo = @InvNo;
-- We can also check if Removed = @Removed (if less, we have
negative inventory - unlikely situation)
END
IF @Debug = 1
BEGIN
SELECT *
FROM #MovingInventory
WHERE InvNo = @InvNo;
RAISERROR (
'Done with the prior inventory - starting checking Sales
we''re interested in'
,10
,1
)
WITH NOWAIT;
END
DECLARE curProcess CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY
FOR
SELECT InvNo
,ref_no
,date_time
,fifo_rank
,quantity
,unit_cost
,trans_type
,department
,category
,item
,invent_id
,locatn_id
FROM #Inventory
WHERE InvNo = @InvNo
AND (
trans_type = 'S'
OR quantity < 0
)
AND date_time >= @dtStart -- now process only the Sales we're
interested in
ORDER BY InvNo
,fifo_rank
OPEN curProcess
FETCH NEXT
FROM curProcess
INTO @InvNo
,@ref_no
,@date_time
,@fifo_rank
,@quantity
,@unit_cost
,@trans_type
,@department
,@category
,@item
,@invent_id
,@locatn_id
WHILE (@@FETCH_STATUS = 0)
BEGIN
IF @quantity < 0 -- Sale or transfer
BEGIN
IF @Debug = 1
BEGIN
SET @Message = 'Sale or transfer with quantity = ' + CAST(1 * @quantity ASVARCHAR(20))
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
SELECT @Removed = - 1 * @quantity;
WITH cteSource
AS (
SELECT TOP (@Removed) s.unit_Cost
,s.fifo_rank
,s.CurrentQuantity
FROM #MovingInventory AS s
CROSS APPLY (
SELECT TOP (s.CurrentQuantity) ROW_NUMBER() OVER (
ORDER BY number
) AS n
FROM dbo.numbers n5
WHERE number > 0
) AS f(n)
WHERE s.InvNo = @InvNo
AND s.fifo_rank < @fifo_rank
AND s.CurrentQuantity > 0
ORDER BY s.fifo_rank
)
,cteRemoved
AS (
SELECT unit_Cost
,fifo_rank
,CurrentQuantity
,COUNT(*) AS Removed
FROM cteSource
GROUP BY unit_Cost
,fifo_rank
,CurrentQuantity
)
UPDATE I
SET CurrentQuantity = I.CurrentQuantity - R.Removed
,Removed = I.Removed + R.Removed
OUTPUT Inserted.unit_cost
,Inserted.Removed - deleted.Removed
INTO #Removed(unit_cost, Removed)
FROM #MovingInventory I
INNER JOIN cteRemoved R ON I.fifo_rank = R.fifo_rank
WHERE I.InvNo = @InvNo;
IF @Debug = 1
BEGIN
SELECT *
FROM #MovingInventory I
WHERE I.InvNo = @InvNo;
RAISERROR (
'Current Moving Inventory after Sale or Return'
,10
,1
)
WITH NOWAIT
END
IF @trans_type = 'S'
AND @date_time >= @dtStart
INSERT INTO #Sales (
trans_no
,InvNo
,locatn_id
,date_time
,department
,category
,item
,invent_id
,unit_cost
,quantity
)
SELECT @ref_no
,@InvNo
,@locatn_id
,@date_time
,@department
,@category
,@item
,@invent_id
,unit_cost
,Removed
FROM #Removed;
--- Need to check for situations when we sell more than
currently in the inventory (rare cases)
SELECT @Difference = @Removed - COALESCE((
SELECT SUM(Removed)
FROM #Removed
), 0);
IF @Difference > 0 -- Sold more than were in the inventory
BEGIN
IF @Debug = 1
BEGIN
SET @Message = 'Sold more than in the inventory Difference = ' + CAST(@DifferenceAS VARCHAR(10))
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
SET @LastCost = 0;
SELECT @LastCost = LastCost.LastCost
FROM dbo.siriusfn_LastCostUpToDate(@department, @category,
@item, @invent_id, @date_time) LastCost;
INSERT INTO #Sales (
trans_no
,InvNo
,locatn_id
,date_time
,department
,category
,item
,invent_id
,unit_cost
,quantity
)
SELECT @ref_no
,@InvNo
,@locatn_id
,@date_time
,@department
,@category
,@item
,@invent_id
,@LastCost
,@Difference
IF @Debug = 1
BEGIN
SET @Message = 'Last Cost =
' + CAST(@LastCost AS VARCHAR(10))
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
SELECT *
FROM #Sales
RAISERROR (
'Currently in #Sales'
,10
,1
)
WITH NOWAIT;
END
END
END
ELSE -- Returns
BEGIN
IF @Debug = 1
BEGIN
SET @Message = 'Return with quantity =
' + CAST(@quantity AS VARCHAR(20));
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
SELECT @Returned = @quantity;
WITH cteSource
AS (
SELECT TOP (@Returned) s.unit_Cost
,s.fifo_rank
,s.quantity
FROM #MovingInventory AS s
CROSS APPLY (
SELECT TOP (s.Removed - s.Returned) ROW_NUMBER() OVER (
ORDER BY number
) AS n
FROM dbo.numbers n5
WHERE number > 0
) AS f(n)
WHERE s.InvNo = @InvNo
AND s.fifo_rank < @fifo_rank
AND (s.Removed - s.Returned) > 0
ORDER BY s.fifo_rank DESC -- returns in the LIFO order
)
,cteReturned
AS (
SELECT unit_Cost
,fifo_rank
,quantity
,COUNT(*) AS Returned
FROM cteSource
GROUP BY unit_Cost
,fifo_rank
,quantity
)
UPDATE I
SET CurrentQuantity = I.CurrentQuantity + R.Returned
,Returned = I.Returned + R.Returned
OUTPUT Inserted.unit_cost
,Inserted.Returned - deleted.Returned
INTO #Removed(unit_cost, Removed)
FROM #MovingInventory I
INNER JOIN cteReturned R ON I.fifo_rank = R.fifo_rank
WHERE I.InvNo = @InvNo;
IF @Debug = 1
BEGIN
SELECT *
FROM #MovingInventory I
WHERE I.InvNo = @InvNo;
RAISERROR (
'Result after return'
,10
,1
)
WITH NOWAIT;
END
IF @trans_type = 'S'
AND @date_time >= @dtStart
INSERT INTO #Sales (
trans_no
,InvNo
,locatn_id
,date_time
,department
,category
,item
,invent_id
,unit_cost
,quantity
)
SELECT @ref_no
,@InvNo
,@locatn_id
,@date_time
,@department
,@category
,@item
,@invent_id
,unit_cost
,(- 1) * Removed
FROM #Removed;-- handle returns
-- Need to check for situations when we return what
we didn't have in the inventory before
IF @Debug = 1
BEGIN
SELECT *
FROM #Sales;
RAISERROR (
'Current Sales after return'
,10
,1
)
WITH NOWAIT;
END
SELECT @Difference = @Returned - COALESCE((
SELECT SUM(Removed)
FROM #Removed
), 0);
IF @Difference > 0 -- Returned more than were in the
inventory originally, use Last Cost
BEGIN
IF @Debug = 1
BEGIN
SET @Message = 'Returned more than removed Difference= ' + CAST(@Difference AS VARCHAR(10)) + ' Last Cost = ' +
CAST(@LastCost AS VARCHAR(20));
RAISERROR (
@Message
,10
,1
)
WITH NOWAIT;
END
SET @LastCost = 0;
SELECT @LastCost = LastCost.LastCost
FROM dbo.siriusfn_LastCostUpToDate(@department,
@category, @item, @invent_id, @date_time) LastCost;
INSERT INTO #Sales (
trans_no
,InvNo
,locatn_id
,date_time
,department
,category
,item
,invent_id
,unit_cost
,quantity
)
SELECT @ref_no
,@InvNo
,@locatn_id
,@date_time
,@department
,@category
,@item
,@invent_id
,@LastCost
,- 1 * @Difference;
END
END
TRUNCATE TABLE #Removed;-- done with this table for this
iteration
FETCH NEXT
FROM curProcess
INTO @InvNo
,@ref_no
,@date_time
,@fifo_rank
,@quantity
,@unit_cost
,@trans_type
,@department
,@category
,@item
,@invent_id
,@locatn_id
END -- while
CLOSE curProcess
DEALLOCATE curProcess
FETCH NEXT
FROM curMainProcess
INTO @InvNo
END -- while
CLOSE curMainProcess
DEALLOCATE curMainProcess
IF @Debug = 1
BEGIN
SET @Elapsed = datediff(second, @StartTime, CURRENT_TIMESTAMP);
PRINT ' Finished with the creation of #Sales tables using cursor in ' +
cast(@Elapsed AS VARCHAR(30)) + ' seconds';
END
SELECT S.trans_no
,S.department
,S.category
,S.item
,S.invent_id
,S.locatn_id
,SUM(S.quantity) AS QuantitySold
,CAST(SUM(S.quantity * S.unit_cost) AS MONEY) AS CostOfGoodsSold
FROM #Sales S
GROUP BY S.trans_no
,S.department
,S.category
,S.item
,S.invent_id
,S.locatn_id;
IF @Debug = 1
BEGIN
SET @Elapsed = datediff(second, @StartTime, CURRENT_TIMESTAMP);
PRINT ' Finished with the final selection in ' + cast(@Elapsed AS
VARCHAR(30)) + ' seconds';
END
END
RETURN;
GO
/* Test Cases
IF OBJECT_ID('TempDB..#Inventory',N'U') IS NOT NULL DROP TABLE #Inventory;
CREATE TABLE [dbo].[#Inventory](
[InvNo] [int] NOT NULL,
[ref_no] [numeric](17, 0) NOT NULL,
[locatn_id] [int] NOT NULL,
[date_time] [datetime] NOT NULL,
[fifo_rank] [bigint] NULL,
[department] [char](10) NOT NULL,
[category] [char](10) NOT NULL,
[item] [char](10) NOT NULL,
[invent_id] [int] NOT NULL,
[trans_type] [char](1) NOT NULL,
[quantity] [numeric](8, 2) NOT NULL,
[unit_cost] [money] NOT NULL
)
;with cte as (SELECT N'25' AS [ref_no], N'1' AS [locatn_id], N'2012-06-29
16:48:39.000' AS [date_time], N'1' AS [fifo_rank], N'1' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'BATT_TEST' AS [item], N'0' AS
[invent_id], N'P' AS [trans_type], N'100.00' AS [quantity], N'1.00' AS
[unit_cost] UNION ALL
SELECT N'133005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-29
17:00:13.000' AS [date_time], N'2' AS [fifo_rank], N'1' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'BATT_TEST' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-90.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'25' AS [ref_no], N'1' AS [locatn_id], N'2012-06-29 17:26:47.000' AS
[date_time], N'3' AS [fifo_rank], N'1' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BATT_TEST' AS [item], N'0' AS [invent_id], N'P'
AS [trans_type], N'100.00' AS [quantity], N'2.00' AS [unit_cost] UNION ALL
SELECT N'135005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-29
17:28:19.000' AS [date_time], N'4' AS [fifo_rank], N'1' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'BATT_TEST' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'10.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'1' AS [locatn_id], N'2012-06-27 11:58:26.000' AS
[date_time], N'1' AS [fifo_rank], N'2' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'10.00' AS [quantity], N'2.00' AS [unit_cost] UNION ALL
SELECT N'129005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-27
13:02:57.000' AS [date_time], N'2' AS [fifo_rank], N'2' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-9.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'1' AS [locatn_id], N'2012-06-27 13:06:07.000' AS
[date_time], N'3' AS [fifo_rank], N'2' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'10.00' AS [quantity], N'2.6667' AS [unit_cost] UNION ALL
SELECT N'130005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-27
13:17:46.000' AS [date_time], N'4' AS [fifo_rank], N'2' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-7.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'131005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-27
13:18:16.000' AS [date_time], N'5' AS [fifo_rank], N'2' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'3.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'24' AS [ref_no], N'3' AS [locatn_id], N'2012-06-27 11:57:17.000' AS
[date_time], N'1' AS [fifo_rank], N'3' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS [invent_id], N'P'
AS [trans_type], N'20.00' AS [quantity], N'2.00' AS [unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'3' AS [locatn_id], N'2012-06-27 11:58:26.000' AS
[date_time], N'2' AS [fifo_rank], N'3' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'-10.00' AS [quantity], N'2.00' AS [unit_cost] UNION ALL
SELECT N'24' AS [ref_no], N'3' AS [locatn_id], N'2012-06-27 13:04:29.000' AS
[date_time], N'3' AS [fifo_rank], N'3' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS [invent_id], N'P'
AS [trans_type], N'20.00' AS [quantity], N'3.00' AS [unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'3' AS [locatn_id], N'2012-06-27 13:06:07.000' AS
[date_time], N'4' AS [fifo_rank], N'3' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'BUG_SPRAY' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'-10.00' AS [quantity], N'2.6667' AS [unit_cost] UNION ALL
SELECT N'4' AS [ref_no], N'1' AS [locatn_id], N'2011-04-03 18:34:44.000' AS
[date_time], N'1' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'24.00' AS [quantity], N'0.75' AS [unit_cost] UNION ALL
SELECT N'11005001' AS [ref_no], N'1' AS [locatn_id], N'2011-04-07
09:57:51.000' AS [date_time], N'2' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-1.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'33005001' AS [ref_no], N'1' AS [locatn_id], N'2011-04-07
10:04:39.000' AS [date_time], N'3' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-1.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'103005001' AS [ref_no], N'1' AS [locatn_id], N'2011-07-06
17:55:17.000' AS [date_time], N'4' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-1.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'108005001' AS [ref_no], N'1' AS [locatn_id], N'2011-07-06
17:55:47.000' AS [date_time], N'5' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-1.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'115005001' AS [ref_no], N'1' AS [locatn_id], N'2011-08-01
17:47:11.000' AS [date_time], N'6' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-1.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'41005001' AS [ref_no], N'1' AS [locatn_id], N'2011-09-04
11:24:03.000' AS [date_time], N'7' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-2.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'48005001' AS [ref_no], N'1' AS [locatn_id], N'2011-09-04
11:38:31.000' AS [date_time], N'8' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-3.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'65005001' AS [ref_no], N'1' AS [locatn_id], N'2011-09-04
11:59:59.000' AS [date_time], N'9' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL'
AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'S' AS [trans_type], N'-1.00' AS [quantity], N'0.00' AS
[unit_cost] UNION ALL
SELECT N'1' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26 17:02:19.000' AS
[date_time], N'10' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL' AS
[department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'A' AS [trans_type], N'5.00' AS [quantity], N'0.75' AS
[unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26 17:09:46.000' AS
[date_time], N'11' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL' AS
[department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'A' AS [trans_type], N'5.00' AS [quantity], N'0.10' AS
[unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26 17:15:05.000' AS
[date_time], N'12' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL' AS
[department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'T' AS [trans_type], N'5.00' AS [quantity], N'0.5469' AS
[unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26 17:15:47.000' AS
[date_time], N'13' AS [fifo_rank], N'4' AS [InvNo], N'RETAIL' AS
[department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS
[invent_id], N'T' AS [trans_type], N'5.00' AS [quantity], N'0.5469' AS
[unit_cost] UNION ALL
SELECT N'125005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26
18:00:26.000' AS [date_time], N'14' AS [fifo_rank], N'4' AS [InvNo],
N'RETAIL' AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS
[item], N'0' AS [invent_id], N'S' AS [trans_type], N'-10.00' AS [quantity],
N'0.00' AS [unit_cost] UNION ALL
SELECT N'126005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26
18:01:05.000' AS [date_time], N'15' AS [fifo_rank], N'4' AS [InvNo],
N'RETAIL' AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS
[item], N'0' AS [invent_id], N'S' AS [trans_type], N'5.00' AS [quantity],
N'0.00' AS [unit_cost] UNION ALL
SELECT N'127005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26
18:02:07.000' AS [date_time], N'16' AS [fifo_rank], N'4' AS [InvNo],
N'RETAIL' AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS
[item], N'0' AS [invent_id], N'S' AS [trans_type], N'-50.00' AS [quantity],
N'0.00' AS [unit_cost] UNION ALL
SELECT N'128005001' AS [ref_no], N'1' AS [locatn_id], N'2012-06-26
18:02:51.000' AS [date_time], N'17' AS [fifo_rank], N'4' AS [InvNo],
N'RETAIL' AS [department], N'SUPPLIES' AS [category], N'GRANOLABAR' AS
[item], N'0' AS [invent_id], N'S' AS [trans_type], N'30.00' AS [quantity],
N'0.00' AS [unit_cost] UNION ALL
SELECT N'5' AS [ref_no], N'3' AS [locatn_id], N'2011-04-03 16:41:21.000' AS
[date_time], N'1' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'P'
AS [trans_type], N'60.00' AS [quantity], N'0.75' AS [unit_cost] UNION ALL
SELECT N'1' AS [ref_no], N'3' AS [locatn_id], N'2011-04-03 17:46:45.000' AS
[date_time], N'2' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'A'
AS [trans_type], N'-2.00' AS [quantity], N'0.75' AS [unit_cost] UNION ALL
SELECT N'4' AS [ref_no], N'3' AS [locatn_id], N'2011-04-03 18:34:44.000' AS
[date_time], N'3' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'-24.00' AS [quantity], N'0.75' AS [unit_cost] UNION ALL
SELECT N'23' AS [ref_no], N'3' AS [locatn_id], N'2012-06-26 17:00:58.000' AS
[date_time], N'4' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'P'
AS [trans_type], N'10.00' AS [quantity], N'0.75' AS [unit_cost] UNION ALL
SELECT N'23' AS [ref_no], N'3' AS [locatn_id], N'2012-06-26 17:04:59.000' AS
[date_time], N'5' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'P'
AS [trans_type], N'20.00' AS [quantity], N'0.10' AS [unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'3' AS [locatn_id], N'2012-06-26 17:15:05.000' AS
[date_time], N'6' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'-5.00' AS [quantity], N'0.5469' AS [unit_cost] UNION ALL
SELECT N'0' AS [ref_no], N'3' AS [locatn_id], N'2012-06-26 17:15:47.000' AS
[date_time], N'7' AS [fifo_rank], N'5' AS [InvNo], N'RETAIL' AS [department],
N'SUPPLIES' AS [category], N'GRANOLABAR' AS [item], N'0' AS [invent_id], N'T'
AS [trans_type], N'-5.00' AS [quantity], N'0.5469' AS [unit_cost] )
insert #Inventory ([ref_no], [locatn_id], [date_time], [fifo_rank], [InvNo],
[department], [category], [item], [invent_id], [trans_type], [quantity],
[unit_cost])
SELECT [ref_no], [locatn_id], [date_time], [fifo_rank], [InvNo],
[department], [category], [item], [invent_id], [trans_type], [quantity],
[unit_cost]
from cte
--CREATE INDEX idx_Inventory_fifo_rank ON #Inventory (InvNo, fifo_rank)
SELECT * FROM #Inventory
DECLARE @Time datetime, @Elapsed int, @dtStart datetime
set @dtStart = '20120629'
SET @time = GETDATE()
EXECUTE dbo.siriussp_CostOfGoodsSold_FIFO_TEST
@dtStart =
set @Elapsed = DATEDIFF(second,@time, getdate())
print 'Elapsed for SQL 2005-2008: - cursor version ' +
convert(varchar(10),@Elapsed) + ' seconds'
SET @time = GETDATE()
'20010629'
SET @time = GETDATE()
EXECUTE dbo.siriussp_CostOfGoodsSold_FIFO
@dtStart= '20010629'
set @Elapsed = DATEDIFF(second,@time, getdate())
print 'Elapsed for SQL 2005-2008: - Prior cursor version ' +
convert(varchar(10),@Elapsed) + ' seconds'
--EXECUTE dbo.siriussp_CostOfGoodsSold_FIFO_2012
'20010629'
--SET @time = GETDATE()
--set @Elapsed = DATEDIFF(second,@time, getdate())
--print 'Elapsed for SQL 2012: ' + convert(varchar(10),@Elapsed) + ' seconds'
go*/
At the top of the script code I provided #Inventory table for the original failing scenario in order to
confirm that it works correctly with the new code. I also have a scenario I tested originally in the
comments after the stored procedure.
Summary
In this article I described the process of working on a complex problem of Calculating Cost of Goods Sold
using FIFO method and gave my current procedure code. I also showed potential problems and flaws in
that code. I will appreciate comments and ideas of improving this algorithm.
T-SQL: Gaps and Islands Problem
This article will consider a simple classical Gaps & Islands problem
Forum at MSDN with non original title "Query Help" .
asked recently in Transact-SQL
Problem Definition
The thread originator was kind enough to provide DDL
task:
of the table and some data to describe the
Create table T1
(Id int identity primary key,
VoucherNo varchar(4),
TransNo varchar(10)
)
Insert into T1 values ('V100','Trns1'),('V101','Trns1'),('V102','Trns1'),('V103
','Trns1'),('V104','Trns1'),('V106','Trns1')
And he also provided the desired output:
TransNo FirstVoucher LastVoucher Quantity
Trns1 V100
V104
5
Trns1 V106
V106
1
The problem is to find consecutive vouchers (100-104, then 106).
Solution
As mentioned, this is a common problem in Transact SQL and it was described by Itzik Ben
Gan here or by Plamen Ratchev in this easy to understand blog post
Refactoring Ranges . Knowing main idea of the solution it is easy to provide it assuming that all
voucher numbers come in the following format (letter V following by the 3 digit number):
;WITH cte
AS (
SELECT *
,CAST(SUBSTRING(VoucherNo, 2, 3) AS INT) - ROW_NUMBER() OVER (
ORDER BY VoucherNo
) AS Grp
FROM T1
)
SELECT TransNo
,min(VoucherNo) AS FirstVoucherNo
,max(VoucherNo) AS LastVoucherNo
,count(*) AS Quantity
FROM cte
GROUP BY TransNo
,Grp
So, the idea of this solution is to group consecutive ranges first using ROW_NUMBER()
then apply aggregate functions based on that group identifier.
function and
Note, it is easy to modify this query to work with different formats of the voucher number (say, some
combinations of letters following by any length number). This article is concentrating on the problem
posted by the thread originator and is solving it for the particular voucher number format. You may
want to see some modifications of my solution suggested by Ronen Ariely in the original thread.
Crazy TSQL Queries play time
Background
Most of the articles in WIKI try to bring us tutorials on a specific topic or best solution for a specific
problem. This post is different! It has nothing to do with Optimization, Query's cost or Best solution
(getting the best query) or tutorial, but instead, it is all about crazy queries for getting most basic "buildin feature" (action or function for example) without using the "build-in feature".
The idea for this post came from lots of questions we can find in forums, and it looks like they do not
have any reason to be asked in the first place (for example this question from MSDN SQL Hebrew
forum). These questions most likely came from Job Interviews, courses, exams, and riddles. For
example: "how can we build a UNION query using JOIN", "how can we build a JOIN operation without
using the JOIN".
While none of these problems can be used in production server, it is a great way to make sure that we
really understand the operation/function we are trying to replace and those that we use for the replace.
Please feel free to add any idea, crazy as it is, as long as it requires the ability and understanding of the
feature you are writing about :-)
Playing with JOIN & UNION
Learning about "UNION" is simple, learning about "JOIN" can be done in one hour, but how many of us
really understand the meaning, and able to convert "JOIN" to "UNION" and vice versa?
UNION using JOIN
/******************************************** DDL+DML */
CREATE TABLE invoices (custId int,docNo int,docSum smallmoney)
CREATE TABLE creditNotes (custId int,docNo int,docSum smallmoney)
GO
INSERT INTO invoices VALUES (1234,1,1000),(1234,2,987)
INSERT INTO creditNotes VALUES (1234,10,456),(1234,11,256),(1234,12,252),(1234,
13,253),(1234,14,254)
GO
/******************************************** UNION usin JOIN */
-- UNION can be done using a FULL OUTER join
SELECT custId ,docNo ,docSum
FROM invoices
WHERE custId=1234
UNION
SELECT custId ,docNo ,docSum
FROM creditNotes
WHERE custId=1234
GO
SELECT
COALESCE(I.custId, C.custId) as custId
,COALESCE(I.docNo, C.docNo) as docNo
,COALESCE(I.docSum, C.docSum) as docSum
from invoices I
FULL OUTER JOIN creditNotes C ON 1=0
where I.custId = 1234 or C.custId = 1234
GO
.
INNER JOIN using SUB QUERY
/******************************************** DDL+DML */
CREATE TABLE UsersTbl (UserId int, Name nvarchar(100))
CREATE TABLE NotesTbl (UserId int,DocContent nvarchar(100))
GO
INSERT INTO UsersTbl VALUES (1,'A'),(2,'B'),(4,'N'),(11,'F')
INSERT INTO NotesTbl VALUES (1,'fgsdfgsg'),(2,'fgdgdfgs'),(1,'Ndfsgff sfg
fgds'),(9,'Ndfsgff sfg fgds')
GO
/******************************************** INNER JOIN using SUB QUERY */
select
N.UserId NUserId, N.DocContent NDocContent, U.UserId UUserId, U.Name UName
from UsersTbl U
INNER join NotesTbl N on U.UserId = N.UserId
GO
select
N.UserId NUserId,N.DocContent NDocContent,N.UserId
UUserId,(select Name from UsersTbl U whereU.UserId = N.UserId) UName
from NotesTbl N
where N.UserId in (select UserId from UsersTbl)
GO
.
LEFT JOIN using SUB QUERY & UNION
/******************************************** LEFT JOIN using SUB QUERY
& UNION */
select
N.UserId NUserId, N.DocContent NDocContent, U.UserId UUserId, U.Name UName
from UsersTbl U
LEFT join NotesTbl N on U.UserId = N.UserId
GO
select
N.UserId NUserId,N.DocContent NDocContent,N.UserId
UUserId,(select Name from UsersTbl U whereU.UserId = N.UserId) UName
from NotesTbl N
where N.UserId in (select UserId from UsersTbl)
UNION ALL
select NULL,NULL,UserId,Name
from UsersTbl
where UserId not in (select UserId from NotesTbl)
GO
* we are using the DDL+DML from above.
RIGHT JOIN we can query using LEFT JOIN
* we use the above LEFT JOIN query idea.
FULL OUTER JOIN using "LEFT JOIN" UNION "RIGHT JOIN"
* We can use the above queries and UNION to get both LEFT JOIN and RIGHT JOIN result set.
FULL OUTER JOIN using SUB QUERY & UNION
/******************************************** FULL OUTER JOIN using SUB QUERY
& UNION */
select
N.UserId NUserId, N.DocContent NDocContent, U.UserId UUserId, U.Name UName
from UsersTbl U
FULL OUTER join NotesTbl N on U.UserId = N.UserId
GO
-- using our "LEFT JOIN" query without the filter on first result set
select
N.UserId NUserId,N.DocContent NDocContent,(select U.UserId from UsersTbl
U where U.UserId = N.UserId) UUserId,(select Name from UsersTbl U where U.UserId
= N.UserId) UName
from NotesTbl N
UNION ALL
select NULL,NULL,UserId,Name
from UsersTbl
where UserId not in (select UserId from NotesTbl)
GO
.
Playing with NULL
The internet is full with questions about NULL.
What is so confusing about NULL that make it a great subject for debates?
NULL is not equal NULL
That's make it a great playground for us.
ISNULL using COALESCE
Let's start with simple example. The function ISNULL replaces the first parameter with specified
replacement value, if it is NULL. The function COALESCE returns the value of the first expression in a list,
that initially does not evaluate to NULL.
/******************************************** ISNULL using COALESCE */
declare @QQ01 as nvarchar(10) = 'd'
select ISNULL(@QQ01,'Yes it is NULL')
SELECT COALESCE(@QQ01,'Yes it is NULL')
GO
COALESCE using ISNULL
/******************************************** COALESCE using ISNULL */
declare @QQ01 as nvarchar(10) = NULL
declare @QQ02 as nvarchar(10) = 'B'
declare @QQ03 as nvarchar(10) = NULL
declare @QQ04 as nvarchar(10) = 'D'
select COALESCE(@QQ01,@QQ02,@QQ03,@QQ04)
select ISNULL(@QQ01,ISNULL(@QQ02,ISNULL(@QQ03,@QQ04)))
GO
Playing with Cursor and Loops
There are lot of questions about the difference between "Cursor" and "While Loop". This is a
fundamental mistake to compare them at all. It's like comparing a car and a boat. We use the car moving
on land, and we use a boat to travel at sea. I would not recommend anyone to try the opposite. That
could be another playground for us here.
Cursor Using While Loop (without using cursor)
use tempdb
GO
/******************************************** DDL+DML */
CREATE TABLE CursorAndLoopTbl(
ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
Txt NVARCHAR(100)
)
GO
INSERT INTO CursorAndLoopTbl (Txt)
SELECT top 10000 LEFT(REPLICATE(CAST(NEWID() AS VARCHAR(36)),30),100)
FROM sys.all_columns
CROSS JOIN sys.all_objects
GO
select * from CursorAndLoopTbl
GO
/******************************************** Cursor Using While Loop */
-- Using Cursor
DECLARE MyCursor CURSOR FAST_FORWARD
FOR (SELECT Txt FROM CursorAndLoopTbl)
GO
declare @MyVar as NVARCHAR(100)
OPEN MyCursor
FETCH NEXT
FROM MyCursor
INTO @MyVar
-- we need a "While Loop" in order to loop through all the table records
WHILE @@FETCH_STATUS = 0 BEGIN
PRINT @MyVar
FETCH NEXT
FROM MyCursor
INTO @MyVar
END
CLOSE MyCursor
GO
DEALLOCATE MyCursor
GO
-- Using Loop
DECLARE @Counter INT = 1
DECLARE @RowNum INT = (SELECT COUNT(*) FROM CursorAndLoopTbl)
DECLARE @MyVar as NVARCHAR(100) = (select Txt from CursorAndLoopTbl where ID =
1)
WHILE @Counter <= @RowNum BEGIN
PRINT @MyVar
SET @Counter += 1
SELECT @MyVar = (select Txt from CursorAndLoopTbl where ID = @Counter)
END
GO
DROP TABLE CursorAndLoopTbl
GO
References & Resources
* The idea for this post came from the question here (Hebrew):
http://social.technet.microsoft.com/Forums/he-IL/03fa90e1-1a2a-4756-8ca3-44ac3b015cf1/?forum=sqlhe
There are dozens of similar questions online :-)
* SQL basic JOIN tutorial
http://technet.microsoft.com/en-us/library/ms191517(v=sql.105).aspx
http://www.w3schools.com/sql/sql_join.asp
* SQL basic UNION tutorial
http://technet.microsoft.com/en-us/library/ms180026.aspx
http://www.w3schools.com/sql/sql_union.asp
* Cursor
http://technet.microsoft.com/en-us/library/ms181441.aspx
* WHILE
http://technet.microsoft.com/en-us/library/ms178642.aspx
* I highly recommend to check this link if you think about comparing "Cursor" and "While Loop"
http://ariely.info/Blog/tabid/83/EntryId/132/SQL-Server-cursor-loop.aspx
CHAPTER 11:
CLR
RegEx Class
Slightly boring class for doing some regex...
I've stored my notes on my pWord program which I also use for tracking passwords and it will always be
open source www.sourceforge.net/projects/pword .
using
using
using
using
using
using
System;
System.Text.RegularExpressions;
Microsoft.SqlServer.Server;
System.Data;
System.Data.SqlTypes;
System.Text;
namespace RegEx {
public class RX
{
[SqlFunction]
public static bool IsMatch(String input, String pattern)
//
Regex.Match()
MatchCollection mc1 = Regex.Matches(input, pattern);
if (mc1.Count > 0)
return true;
else
return false;
}
[SqlFunction]
public static int GetMatchCount(String input, String
pattern)
{
//
Regex.Match()
MatchCollection mc1 = Regex.Matches(input, pattern);
return mc1.Count;
}
[SqlFunction]
public static String GetMatch(String input, String pattern)
{
MatchCollection mc1 = Regex.Matches(input, pattern,
RegexOptions.Compiled);
String output = "";
if (mc1.Count > 0)
{
foreach (Match m1 in mc1)
{
output = m1.ToString();
// only find the first occurrence ;)
break;
}
return output;
}
{
else
return "";
}
[SqlFunction]
public static String GetAllMatches(String input, String pattern)
{
MatchCollection mc1 = Regex.Matches(input, pattern,
RegexOptions.Compiled);
StringBuilder output = new StringBuilder();
output.Append("");
if (mc1.Count > 0)
{
foreach (Match m1 in mc1)
{
output.Append( m1.ToString());
// only find the first occurrence ;)
}
return output.ToString();
}
else
return "";
}
}
}
Check out the SQL Script for adding UDFs into SQL Server 2012.
SQL Server Resource Re-Balancing in Failover Cluster
The poster asked how to automatically adjust SQL Server's max server memory setting following a
cluster fail-over – see here . I provided the following script with suggestions for how it could be
tailored for their environment.
USE [master]
GO
/****** Object: StoredProcedure [dbo].[usp_Rebalance_RAM_in_Cluster] Script Date: 05/13/2013 16:12:36
******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'[dbo].[usp_Rebalance_RAM_in_Cluster]') AND type in (N'P', N'PC'))
DROP PROCEDURE [dbo].[usp_Rebalance_RAM_in_Cluster]
GO
USE [master]
GO
/****** Object: StoredProcedure [dbo].[usp_Rebalance_RAM_in_Cluster]
******/
SET ANSI_NULLS ON
GO
Script Date: 05/13/2013 16:12:36
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[usp_Rebalance_RAM_in_Cluster]
AS
--WAITFOR DELAY '00:00:15'
DECLARE @Command VARCHAR(2000)
DECLARE @RAM INT
DECLARE @RAM_event NVARCHAR(10)
DECLARE @Link VARCHAR(50)
DECLARE @RemSQL VARCHAR(500)
/*
AUTHOR:
DATE:
VERSION:
ANDREW BAINBRIDGE - MS SQL DBA
13/05/2013
2.0
If there is a cluster failover that results in both SQL Server instances running on the same node, this
script will automatically rebalance the amount of RAM allocated to each instance. This is to prevent the
combined RAM allocated to SQL Server overwhelming the node.
If the D: drive and the H: drive are visible to the same host, this means that both instances are running
on the same node. In this event, the amount of RAM allocated to each of the SQL Serverswill be 90% of
half the amount of total RAM in the server. E.g. (384GB / 2) * 0.9
If only the D: drive or H: drive is visible, then 90% of the total amount of RAM available on the server is
allocated to the SQL Server instance.
This stored procedure will also set the max server memory of the other SQL Server instance in the
cluster. As this needs to be run across the linked server, and the sp_procoption startup procedure is
owned by SA (therefore can't use windows authentication), the stored procedure will be run on SQL
Server Agent startup, via a job.
*/
SET NOCOUNT ON;
BEGIN
IF (SELECT @@SERVERNAME) = 'MYSERVER\INSTANCE'
SET @Link = 'LINKED_SERVER_TO_OTHER_NODE'
ELSE
SET @Link = 'LINKED_SERVER_TO_MYSERVER\INSTANCE'
SET @Command = 'USE [master];
EXEC sp_configure ''show advanced options'', 1;
RECONFIGURE WITH OVERRIDE;
EXEC sp_configure ''max server memory (MB)'', $;
RECONFIGURE WITH OVERRIDE;'
IF OBJECT_ID('tempdb..#fd') IS NOT NULL
DROP TABLE #fd
CREATE TABLE #fd(drive CHAR(2), MBfree INT)
INSERT INTO #fd EXEC xp_fixeddrives
IF (SELECT COUNT(drive) FROM #fd WHERE drive IN ('D', 'H')) > 1
BEGIN
SET @RAM = (SELECT CONVERT(INT, ((physical_memory_in_bytes / 1024 /
1024) / 2) * 0.9) AS RAM_in_MB
FROM master.sys.dm_os_sys_info)
SET @Command = REPLACE(@Command, '$', @RAM)
SET @RAM_event = CONVERT(NVARCHAR(10), @RAM)
RAISERROR('MAX_SERVER_MEMORY set to %s', 0, 1, @RAM_event)
WITH NOWAIT, LOG
EXEC (@Command)
SET @RemSQL = 'EXEC (''' + REPLACE(@Command, '''', '''''') + ''') AT ' +
@Link
EXEC (@RemSQL)
END
ELSE
BEGIN
SET @RAM = (SELECT CONVERT(INT, ((physical_memory_in_bytes / 1024 /
1024)) * 0.9) AS RAM_in_MB
FROM master.sys.dm_os_sys_info)
SET @Command = REPLACE(@Command, '$', @RAM)
SET @RAM_event = CONVERT(NVARCHAR(10), @RAM)
RAISERROR('MAX_SERVER_MEMORY set to %s', 0, 1, @RAM_event)
WITH NOWAIT, LOG
EXEC(@Command)
SET @RemSQL = 'EXEC (''' + REPLACE(@Command, '''', '''''') + ''') AT ' +
@Link
EXEC (@RemSQL)
END
END
GO
The script is executed via a SQL Server Agent job that is configured to run when the Agent service starts
up.
SQL Server: Create Random String Using CLR
Introduction
This article comes as a continuation of a similar article which shows several solutions using T-SQL
queries to create a random string. This paper presents a simple code using C# language, which allows to
obtain the same results in a much more efficient manner and fast. This is very useful for maintenance
tasks like testing (Populate large tables with random values), generate random password and so on...
using System;
using System.Collections;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text;
using System.Linq;
/******************************
* Version("1.1.0.0")
* FileVersion("1.1.0.0")
* WrittenBy("Ronen Ariely")
******************************/
// AssemblyVersion attribute
using System.Reflection;
[assembly: AssemblyVersion("1.1.0.0")]
[assembly: AssemblyFileVersion("1.1.0.0")]
[assembly: AssemblyDescription("Creating Random string using CLR. Written by
Ronen Ariely")]
/// <summary>
/// How To compile:
/// 1. Open CMD SHELL
/// 2. move to the Dot.Net Folder
///
CD "C:\Windows\Microsoft.NET\Framework\v4.0.30319\"
/// 3. compile using csc.exe
///
csc.exe /target:library /out:"S:\Fn_RandomStringCLR_1.1.0.0.dll"
"S:\Fn_RandomStringCLR_1.1.0.0.cs"
///
/// * LINQ is not supported using DOT.NET 2.0 by default,
///
There for this code fit to Dot.Net 4.
///
/// </summary>
public partial class UserDefinedFunctions
{
private static readonly Random _RandomSize = new Random();
private static readonly Random _random = new Random();
private static readonly int[] _UnicodeCharactersList =
Numbers
65
- 90
97
- 122
Enumerable.Range(48, 10)
48
- 57
.Concat(Enumerable.Range(65, 26))
//
.Concat(Enumerable.Range(97, 26))
// English lowercase
.Concat(Enumerable.Range(1488, 27))
1488 - 1514
.ToArray();
Hebrew
// English uppercase
//
/// <summary></summary>
/// <param name="sMaxSize"></param>
/// <param name="IsFixed"></param>
[return: SqlFacet(MaxSize = -1)]
public static SqlString Fn_RandomStringCLR(
int sMaxSize,
int IsFixed
)
{
if (IsFixed == 0){
sMaxSize = _RandomSize.Next(1, sMaxSize);
}
StringBuilder builder = new StringBuilder();
char ch;
for (int i = 0; i < sMaxSize; i++)
{
ch = Convert.ToChar(
_UnicodeCharactersList[_random.Next(1,
_UnicodeCharactersList.Length)]
);
builder.Append(ch);
}
return builder.ToString();
}
};
Resources
This article is based on Ronen Ariely 's blog at:
http://ariely.info/Blog/tabid/83/EntryId/134/SQL-Random-String-using-CLR.aspx
* In addition to the code here, the blog includes older version for .NET 2.0 (without using LINQ)
Random String - several T-SQL solutions
http://social.technet.microsoft.com/wiki/contents/articles/21196.t-sql-random-string.aspx
Deploying CLR Database Objects
http://technet.microsoft.com/en-us/library/ms345099.aspx
CHAPTER 12:
Meta-Data
How to Compare Two Tables Definition / Metadata in Different
Databases
This article provides an example on T-SQL Script to compare two tables’ definition / metadata in
different databases.
The T-SQL Script [used to compare two tables definition / metadata in different databases] in this article
can be used from SQL Server 2012 and above version because I have used the function
sys.dm_exec_describe_first_result_set
that was introduced from SQL Server 2012 .
Create sample databases:
IF EXISTS (SELECT name FROM master.sys.databases WHERE name = N'SQLServer2012')
BEGIN
DROP DATABASE SQLServer2012
END
CREATE DATABASE SQLServer2012
IF EXISTS (SELECT name FROM master.sys.databases WHERE name = N'SQLServer2014')
BEGIN
DROP DATABASE SQLServer2014
END
CREATE DATABASE SQLServer2014
Create sample tables in above created databases:
USE SQLServer2012
GO
CREATE Table Test1 (Id INT NOT NULL Primary Key,Name VARCHAR(100))
USE SQLServer2014
GO
CREATE Table Test2 (Id INT, Name VARCHAR(100), Details XML)
Below T-SQL Script can be used to compare two tables definition / metadata in different databases
USE SQLServer2012
GO
SELECT A.name DB1_ColumnName,
B.name DB2_ColumnName,
A.is_nullable DB1_is_nullable,
B.is_nullable DB2_is_nullable,
A.system_type_name DB1_Datatype,
B.system_type_name DB2_Datatype,
A.collation_name DB1_collation,
B.collation_name DB2_collation,
A.is_identity_column DB1_is_identity,
B.is_identity_column DB2_is_identity,
A.is_updateable DB1_is_updateable,
B.is_updateable DB2_is_updateable,
A.is_part_of_unique_key DB1_part_of_unique_key,
B.is_part_of_unique_key DB2_part_of_unique_key,
A.is_computed_column DB1_is_computed_column,
B.is_computed_column DB2_is_computed_column,
A.is_xml_document DB1_is_xml_document,
B.is_xml_document DB2_is_xml_document
FROM SQLServer2012.sys.dm_exec_describe_first_result_set (N'SELECT * FROM
Test1', NULL, 0) A
FULL OUTER JOIN SQLServer2014.sys.dm_exec_describe_first_result_set (N'SELECT
* FROM Test2', NULL, 0) B
ON A.name = B.name
T-SQL: Script to Find the Names of Stored Procedures that Use Dynamic
SQL
This script was developed to answer the question in this thread I need query find all the SPs that used
dynamic SQL
We can execute dynamic sql using sp_executesql or just with Exec / Execute.
To find the names of the StoredProcedure that may have used dynamic SQL, this script can be used:
SELECT Schema_name(Schema_id)+'.'+Object_Name(M.Object_id)
StoredProceduresWithDynamicSQL
FROM sys.sql_modules M
JOIN sys.objects O ON M.object_id = O.object_id
WHERE definition LIKE '%CREATE PROC%'
AND (definition LIKE '%SP_ExecuteSQL%' OR definition LIKE '%EXEC%')
But Exec / Execute can be used inside a stored procedure to call another stored procedure or to
execute a dynamic sql. So, to eliminate the Stored procedure names referencing another Stored
procedure and to find the names of the Stored Procedure, that has used Exec / Execute only to execute
dynamic SQL, the following script can be used:
SELECT Schema_name(Schema_id)+'.'+Object_Name(M.Object_id)
StoredProceduresWithDynamicSQL
FROM sys.sql_modules M
JOIN sys.objects O ON M.object_id = O.object_id
WHERE definition LIKE '%CREATE PROC%'
AND (definition LIKE '%SP_ExecuteSQL%' OR definition LIKE '%EXEC%')
EXCEPT
SELECT StoredProcedure FROM (
SELECT Schema_name(Schema_id)+'.'+Object_Name(M.Object_id) StoredProcedure
FROM sys.sql_modules M
JOIN sys.objects O ON M.object_id = O.object_id
WHERE definition LIKE '%CREATE PROC%'
AND (definition LIKE '%SP_ExecuteSQL%' OR definition LIKE '%EXEC%')) tmp
CROSS APPLY sys.dm_sql_referenced_entities (StoredProcedure, 'OBJECT');
The above script will not work under the following scenarios: if we have used Exec / Execute inside a
stored procedure for both purposes, i.e. to call another stored procedure and to execute a dynamic SQL
and the other scenario: if we have used sp_executesql or Exec / Execute and commented it inside a
Stored procedure, but still the above scripts will be useful because we don't have any other direct way
to find the names of the Stored Procedure that has used dynamic SQL.
This script also won't work for encrypted procedures.
T-SQL Script to Get Detailed Information about Index Settings
This article is about a script which I wrote to get detailed information about index settings. The script in
this article does not show any information about missing indexes or index usage details. The script will
only show the information about settings made on an index using CREATE /ALTER INDEX statements.
Example
Just for a demonstration of the script, we will make use of the table Person.Address from the
AdventureWorks database.
Using system stored procedures SP_HELP
index_description and index_keys details.
USE AdventureWorks2012
GO
sp_help 'Person.Address'
GO
sp_helpindex 'Person.Address'
GO
and SP_HELPINDEX
, we can get only the index_name,
Just for testing purpose I am going to create a NONCLUSTERED filtered index with included columns and
then alter the fill factor of the created index.
USE AdventureWorks2012
GO
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode DESC )
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID)
WHERE City = 'Seattle'
GO
ALTER INDEX IX_Address_PostalCode ON Person.Address
REBUILD WITH (FILLFACTOR = 80);
The below code block will get us the information about settings made on an index using CREATE /ALTER
INDEX statements:
USE AdventureWorks2012
GO
SELECT
CASE WHEN I.is_unique = 1 THEN ' UNIQUE ' ELSE '' END [Is_unique],
I.type_desc+' INDEX' IndexType,
I.name IndexName,
Schema_name(T.Schema_id)+'.'+T.name ObjectName,
KeyColumns,
IncludedColumns,
I.Filter_definition,
CASE WHEN I.is_padded = 1 THEN ' ON ' ELSE ' OFF ' END [PAD_INDEX],
I.Fill_factor,
' OFF ' [SORT_IN_TEMPDB] , -- default value
CASE WHEN I.ignore_dup_key = 1 THEN ' ON ' ELSE ' OFF ' END [Ignore_dup_key],
CASE WHEN ST.no_recompute = 0 THEN ' OFF ' ELSE ' ON ' END [Stats_Recompute],
' OFF ' [DROP_EXISTING] ,-- default value
' OFF ' [ONLINE] , -- default value
CASE WHEN I.allow_row_locks = 1 THEN ' ON ' ELSE ' OFF
' END [Allow_row_locks],
CASE WHEN I.allow_page_locks = 1 THEN ' ON ' ELSE ' OFF
' END [Allow_page_locks] ,
CASE WHEN ST.auto_created = 0 THEN ' Not Automatically Created ' ELSE '
Automatically Created ' END[Statistics_Creation],
CASE WHEN I.is_primary_key = 1 THEN 'Yes' ELSE 'NO' END 'Part of PrimaryKey',
CASE WHEN I.is_unique_constraint = 1 THEN 'Yes' ELSE 'NO' END 'Part of
UniqueKey',
CASE WHEN I.is_disabled = 1 THEN 'Disabled' ELSE 'Enabled' END IndexStatus,
CASE WHEN I.Is_hypothetical = 1 THEN 'Yes' ELSE 'NO' END Is_hypothetical,
CASE WHEN I.has_filter = 1 THEN 'Yes' ELSE 'NO' END 'Filtered Index',
DS.name [FilegroupName]
FROM sys.indexes I
JOIN sys.tables T ON T.Object_id = I.Object_id
JOIN sys.sysindexes SI ON I.Object_id = SI.id AND I.index_id = SI.indid
JOIN (SELECT * FROM (
SELECT IC2.object_id , IC2.index_id ,
STUFF((SELECT ' , ' +
C.name + CASE WHEN MAX(CONVERT(INT,IC1.is_descending_key)) = 1 THEN ' DESC
'ELSE ' ASC ' END
FROM sys.index_columns IC1
JOIN Sys.columns C
ON C.object_id = IC1.object_id
AND C.column_id = IC1.column_id
AND IC1.is_included_column = 0
WHERE IC1.object_id = IC2.object_id
AND IC1.index_id = IC2.index_id
GROUP BY IC1.object_id,C.name,index_id
ORDER BY MAX(IC1.key_ordinal)
FOR XML PATH('')), 1, 2, '') KeyColumns
FROM sys.index_columns IC2
WHERE IC2.Object_id = object_id('Person.Address') --Comment for all tables
GROUP BY IC2.object_id ,IC2.index_id) tmp3 )tmp4
ON I.object_id = tmp4.object_id AND I.Index_id = tmp4.index_id
JOIN sys.stats ST ON ST.object_id = I.object_id AND ST.stats_id = I.index_id
JOIN sys.data_spaces DS ON I.data_space_id=DS.data_space_id
--JOIN sys.filegroups FG ON I.data_space_id=FG.data_space_id
LEFT JOIN (SELECT * FROM (
SELECT IC2.object_id , IC2.index_id ,
STUFF((SELECT ' , ' + C.name
FROM sys.index_columns IC1
JOIN Sys.columns C
ON C.object_id = IC1.object_id
AND C.column_id = IC1.column_id
AND IC1.is_included_column = 1
WHERE IC1.object_id = IC2.object_id
AND IC1.index_id = IC2.index_id
GROUP BY IC1.object_id,C.name,index_id
FOR XML PATH('')), 1, 2, '') IncludedColumns
FROM sys.index_columns IC2
WHERE IC2.Object_id = object_id('Person.Address') --Comment for all tables
GROUP BY IC2.object_id ,IC2.index_id) tmp1
WHERE IncludedColumns IS NOT NULL ) tmp2
ON tmp2.object_id = I.object_id AND tmp2.index_id = I.index_id
WHERE I.Object_id = object_id('Person.Address') --Comment for all tables
Related Reference links:
http://technet.microsoft.com/en-us/library/ms188783.aspx
http://technet.microsoft.com/en-us/library/ms173760.aspx
http://technet.microsoft.com/en-us/library/ms190283.aspx
http://technet.microsoft.com/en-us/library/ms175105.aspx
http://www.microsoft.com/en-in/download/details.aspx?id=722
How to Check when Index was Last Rebuilt
I have executed the script (given on below link) to rebuild all table indexes on SQL Server, but how do I
know whether it is updated or not.
SQL Script for rebuilding all the tables’ indexes
Now the problem is that SQL Server does not store the information when all the indexes were rebuilt,
however it stores the information on when was the last time statistics were updated. There is a system
table sys.stats which can be queried for knowing the same. Whenever index rebuilt operation happens
on database, it updates the sys.stats with last statistics updates. So for a given index it can be checked
when the index was rebuilt by checking the last stats updated date. However, what if we only updated
Statistics, but haven't rebuilt indexes?
Below is the query you can use to get the details on last stats updated date, this query works on
AdventureWorks database
USE AdventureWorks;
GO
SELECT name AS Stats,
STATS_DATE(object_id, stats_id) AS LastStatsUpdate
FROM sys.stats
WHERE object_id = OBJECT_ID('Sales.SalesOrderDetail')
and left(name,4)!='_WA_';
GO
You can find the detailed post on this topic at http://insqlserver.com/Blog/how-check-when-indexrebuild-update-statistics-last-happened-sql-server
How to Generate Index Creation Scripts for all Tables in a Database
using T-SQL
The need often arises to create or recreate the indexes for all tables in a database, especially in
development and testing scenarios. This article presents a script to generate Index Creation Scripts for
all tables in a database using Transact-SQL (T-SQL).
The code block below will generate Index Creation Scripts for all tables in a database:
SELECT ' CREATE ' +
CASE WHEN I.is_unique = 1 THEN ' UNIQUE ' ELSE '' END +
I.type_desc COLLATE DATABASE_DEFAULT +' INDEX ' +
I.name + ' ON ' +
Schema_name(T.Schema_id)+'.'+T.name + ' ( ' +
KeyColumns + ' ) ' +
ISNULL(' INCLUDE ('+IncludedColumns+' ) ','') +
ISNULL(' WHERE '+I.Filter_definition,'') + ' WITH ( ' +
CASE WHEN I.is_padded = 1 THEN ' PAD_INDEX = ON ' ELSE ' PAD_INDEX = OFF
' END + ',' +
'FILLFACTOR = '+CONVERT(CHAR(5),CASE WHEN I.Fill_factor =
0 THEN 100 ELSE I.Fill_factor END) + ',' +
-- default value
'SORT_IN_TEMPDB = OFF ' + ',' +
CASE WHEN I.ignore_dup_key = 1 THEN ' IGNORE_DUP_KEY = ON ' ELSE '
IGNORE_DUP_KEY = OFF ' END + ',' +
CASE WHEN ST.no_recompute = 0 THEN ' STATISTICS_NORECOMPUTE = OFF ' ELSE '
STATISTICS_NORECOMPUTE = ON 'END + ',' +
-- default value
' DROP_EXISTING = ON ' + ',' +
-- default value
' ONLINE = OFF ' + ',' +
CASE WHEN I.allow_row_locks = 1 THEN ' ALLOW_ROW_LOCKS = ON ' ELSE '
ALLOW_ROW_LOCKS = OFF ' END + ',' +
CASE WHEN I.allow_page_locks = 1 THEN ' ALLOW_PAGE_LOCKS = ON ' ELSE '
ALLOW_PAGE_LOCKS = OFF ' END + ' ) ON [' +
DS.name + ' ] ' [CreateIndexScript]
FROM sys.indexes I
JOIN sys.tables T ON T.Object_id = I.Object_id
JOIN sys.sysindexes SI ON I.Object_id = SI.id AND I.index_id = SI.indid
JOIN (SELECT * FROM (
SELECT IC2.object_id , IC2.index_id ,
STUFF((SELECT ' , ' +
C.name + CASE WHEN MAX(CONVERT(INT,IC1.is_descending_key)) = 1 THEN ' DESC
'ELSE ' ASC ' END
FROM sys.index_columns IC1
JOIN Sys.columns C
ON C.object_id = IC1.object_id
AND C.column_id = IC1.column_id
AND IC1.is_included_column = 0
WHERE IC1.object_id = IC2.object_id
AND IC1.index_id = IC2.index_id
GROUP BY IC1.object_id,C.name,index_id
ORDER BY MAX(IC1.key_ordinal)
FOR XML PATH('')), 1, 2, '') KeyColumns
FROM sys.index_columns IC2
--WHERE IC2.Object_id = object_id('Person.Address') --Comment for all
tables
GROUP BY IC2.object_id ,IC2.index_id) tmp3 )tmp4
ON I.object_id = tmp4.object_id AND I.Index_id = tmp4.index_id
JOIN sys.stats ST ON ST.object_id = I.object_id AND ST.stats_id =
I.index_id
JOIN sys.data_spaces DS ON I.data_space_id=DS.data_space_id
JOIN sys.filegroups FG ON I.data_space_id=FG.data_space_id
LEFT JOIN (SELECT * FROM (
SELECT IC2.object_id , IC2.index_id ,
STUFF((SELECT ' , ' + C.name
FROM sys.index_columns IC1
JOIN Sys.columns C
ON C.object_id = IC1.object_id
AND C.column_id = IC1.column_id
AND IC1.is_included_column = 1
WHERE IC1.object_id = IC2.object_id
AND IC1.index_id = IC2.index_id
GROUP BY IC1.object_id,C.name,index_id
FOR XML PATH('')), 1, 2, '') IncludedColumns
FROM sys.index_columns IC2
--WHERE IC2.Object_id = object_id('Person.Address') --Comment for all
tables
GROUP BY IC2.object_id ,IC2.index_id) tmp1
WHERE IncludedColumns IS NOT NULL ) tmp2
ON tmp2.object_id = I.object_id AND tmp2.index_id = I.index_id
WHERE I.is_primary_key = 0 AND I.is_unique_constraint = 0
--AND I.Object_id = object_id('Person.Address') --Comment for all tables
--AND I.name = 'IX_Address_PostalCode' --comment for all indexes
T-SQL: Fast Code for Relationship within the Database
Sometimes one needs to find out all the relationships within a database. For example, if you are a
contractor and you go to new company even for only one day, just to make some new report requested
by the boss or other similar stuff; probably you need fast code that you can keep in your personal code
folder, just for a quick copy and paste:
;with cte as (
select constraint_object_id, constraint_column_id,
c.parent_object_id as parentobjectid, parent_column_id
, referenced_object_id,
referenced_column_id, name as parentname from sys.foreign_key_columns
c inner joinsys.tables on c.parent_object_id=object_id)
,cte2 as
(select constraint_object_id, constraint_column_id, parentobjectid,
referenced_object_id, parent_column_id, parentname, referenced_column_id,
name as referencedname from cte
ct inner join sys.tables on ct.referenced_object_id=object_id)
, cte3 as
(select constraint_object_id, constraint_column_id, parentobjectid,
parent_column_id
, referenced_object_id, referenced_column_id, parentname,
referencedname, name as parentcolumname from cte2inner join sys.all_columns
cl on parentobjectid=cl.object_id
where cl.column_id=parent_column_id)
select constraint_object_id, constraint_column_id, parentobjectid,
parent_column_id
, referenced_object_id, referenced_column_id, parentname as ParentTable,
referencedname as ReferencedTable,
parentcolumname as parentsColumn, name as ReferencedColumn from cte3 inner join s
ys.all_columns cl onreferenced_object_id=cl.object_id
where cl.column_id=referenced_column_id order by ParentTable
Another purpose of this code is that (after having saved the results in a table for example) it can be
compared. That means that if you save the last result in a table that you can call LastRelantionship dated
February 2013 and you being called after months for another contract in the same company because
"maybe someone changed something and now the software doesn't work or the statistics are wrong",
you can run the same query, building a new table LastRelantionship date October 2013 and after
comparing the two tables you can quickly find out if someone touched the relationships (believe me,
this can happen pretty frequently).
So, I hope this code can help everyone to be faster in case of job contract issues.
How to Check the Syntax of Dynamic SQL Before Execution
This article is about the system function sys.dm_exec_describe_first_result_set
that can be used to
check the syntax of dynamic SQL before execution.
This system function sys.dm_exec_describe_first_result_set was introduced in SQL Server 2012.
Create sample table and insert sample data :
CREATE Table Test (Id INT NOT NULL Primary Key,Name VARCHAR(100))
INSERT Test SELECT 1 , 'Sathya'
GO
Create sample Stored procedure :
CREATE PROC TestProc
AS
BEGIN
DECLARE @SQL NVARCHAR(MAX) = 'SELECT *, FROM Test'
IF EXISTS (
SELECT 1 FROM sys.dm_exec_describe_first_result_set(@SQL, NULL, 0)
WHERE error_message IS NOT NULL
AND error_number IS NOT NULL
AND error_severity IS NOT NULL
AND error_state IS NOT NULL
AND error_type IS NOT NULL
AND error_type_desc IS NOT NULL )
BEGIN
SELECT error_message
FROM sys.dm_exec_describe_first_result_set(@SQL, NULL, 0)
WHERE column_ordinal = 0
END
ELSE
BEGIN
EXEC (@SQL)
END
END
GO
If you examine the dynamic SQL in the above Stored procedure, you will notice incorrect syntax in that
query with extra comma before FROM clause.
To execute Stored procedure:
EXEC TestProc
GO
After removing comma before the FROM clause in the @SQL variable, alter the Stored procedure.
ALTER PROC TestProc
AS
BEGIN
DECLARE @SQL NVARCHAR(MAX) = 'SELECT * FROM Test'
IF EXISTS(
SELECT 1 FROM sys.dm_exec_describe_first_result_set(@SQL, NULL, 0)
WHERE error_message IS NOT NULL
AND error_number IS NOT NULL
AND error_severity IS NOT NULL
AND error_state IS NOT NULL
AND error_type IS NOT NULL
AND error_type_desc IS NOT NULL )
BEGIN
SELECT error_message
FROM sys.dm_exec_describe_first_result_set(@SQL, NULL, 0)
WHERE column_ordinal = 0
END
ELSE
BEGIN
EXEC (@SQL)
END
END
To execute Stored procedure:
EXEC TestProc
GO
CHAPTER 13:
Bulk-Data
Using Bulk Insert to Import Inconsistent Data Format (Using Pure T-SQL)
Introduction
Some third-party applications produce reports as CSV files where each line is in a different format. These
applications use an export format that parses each record on-the-fly, and each value in the record
separately. The format for each value is set by the value itself.
For example, if there is data in the field, then it will export the data inside a quotation mark, but if there
is no data then the data will be blank and without a quotation mark. Moreover some applications do not
use 'data type' when generating a report. If there is data and the data is numeric, then some
applications might not use any quotation marks, and in the same field on a different record data that is
not numeric will be exported with the quotation marks. We can imagine a single CSV file with a specific
column exported in 6 different formats.
In order to use bulk insert directly we have to make sure that all the data is consistent with one format.
The problem
We need to use bulk insert to import data from a CSV file into the SQL server database using pure T-SQL.
A Bulk insert operation can use only one format file and our metadata must remain consistent with it.
The first step using bulk insert is to find a set of "bulk insert" rules (like: End Of Line, End Of Column,
Collation…) that fit all of the data. This is not the case sometimes as mentioned above.
If you got the answer in the forum that this can't be done using pure T-SQL then remember that I
always say "never say never".
* In this article we are going to talk only about a pure T-SQL solution, as there are several solutions (such
as using SSIS, CLR or a third party app) that can do it in different ways; sometimes in a better way.
Our Case Study
Our application exports the data as a CSV Comma Delimited file without a consistent format. In this
example we will deal with a very common situation that fits these rules:
1. Our application use column type (just to make it easier for this article we will focus on a string
column). So a numeric column will never use quotation marks and a string column will use quotation
marks on and off by these rules.
2. If there is data in the field then it will export the data inside quotation marks (no matter if the data is
numeric or not, as the column is string type)
3. If there is no data then the data will be blank and without quotation marks.
The original sample data that we use looks like this:
ID
1
2
2
3
4
Phone Number
9999999
8888888
8888888
7777777
2222222
5 1111111
First Name
ronen
xxx1, xxx2
xxx1, xxx2
Last Name
Ariely
yyy
yyy
zzz
5000.5
kkk
5
According to the application export rules above, our CSV file looks like this:
1,9999999,"ronen","ariely"
2,8888888,"xxx1,xxx2",yyy
2,8888888,"xxx1,xxx2",
3,,,"yyy"
4,7777777,,
,2222222,zzz,kkk
5,1111111,"5000.5","5"
* we can see in the last line that our application uses column type, so even when our value is numeric it
will be inside quotation marks. But we have to remember that there are some more complex situations,
like applications that do not use column type. Then the last line can look like: [5,5000.5,5]. And it can be
more complex if the culture formats numbers similar to 5,000.5. Then our CSV line might look like this
[5,5,000.5,5]
The solution:
* Remember that this is only a workaround for our specific case. For each set of data a slightly different
solution might fit. The idea of how to get to the solution is what is important here.
STEP 1: Identify the import file format
In this step we will run several test with different format files. Our aim is to identify any potential
problems and to find the best format file which will fit as many columns as we can from the start.
Finding the problematic columns and the consistent column format
First of all you have to find a record format that fits most of the data, as well as the columns that might
have be in-consistent with this format. In order to do that we are going to run several tests and then we
will implement the conclusion at the next step. We will start with a simple bulk insert and continue with
some more complex formats. Using the ERROR messages and the results, we will identify the potential
problems.
Let's try this in practice
Open Notepad and copy our CSV data into the file.
1,9999999,"ronen","ariely"
2,8888888,"xxx1,xxx2",yyy
2,8888888,"xxx1,xxx2",
3,,,"yyy"
4,7777777,,
,2222222,zzz,kkk
5,1111111,"5000.5","5"
Save the file as "C:\ArielyBulkInsertTesting\Test01.csv"
* make sure that you use ANSI format when you save the file (you can use a different format like
UNICODE but for this example we shall use ANSI).
Open Notepad and copy our XML format data into the file.
* Using a file format can help for more complex formats. I highly recommended always to use a file
format.
<xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/forma
t
"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm"
TERMINATOR=","
/>
<FIELD ID="2" xsi:type="CharTerm"
MAX_LENGTH="7"
TERMINATOR=","
/>
<FIELD ID="3" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="15
" TERMINATOR=","
/>
<FIELD ID="4" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="15
" TERMINATOR="\r\n" />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ID"
xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="PhoneNumber" xsi:type="SQLINT"/>
<COLUMN SOURCE="3" NAME="FirstName" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="LastName" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
Save the file as "C:\ArielyBulkInsertTesting\Test01.xml"
Open SSMS and run this DDL to create our table:
CREATE TABLE #test (
ID int
, PhoneNumber int
, FirstName varchar(15)
, LastName varchar(15)
)
GO
Try to use this simple bulk insert query to import our data:
BULK INSERT #test FROM 'C:\ArielyBulkInsertTesting\Test01.csv'
WITH (
FORMATFILE='C:\ArielyBulkInsertTesting\Test01.xml'
, MAXERRORS = 1
, KEEPNULLS
--, DATAFILETYPE = 'native'
--, CODEPAGE='RAW'
--, ROWTERMINATOR='\r\n'
--, FIRSTROW = 3
);
GO
No error… have you got a good feeling? Let's check our data
select * from #test
GO
Results:
ID
1
2
2
3
4
NULL
5
PhoneNumber
9999999
8888888
8888888
NULL
7777777
2222222
1111111
FirstName
"ronen"
"xxx1
"xxx1
NULL
NULL
"zzz
"5000.5"
LastName
"ariely"
xxx2,"yyy"
xxx2",
"yyy"
NULL
kkk"
"5"
Compare our results to the original data... Ops… that's bad…
In our case we can see that the first and second columns have no problem, but the problems start on
the third column and continue to fourth column. First of all we have some quotation marks in the
results. Moreover the third column was split in several records and part of the data moved into the
fourth column. Actually, as our format file says that the third column ends on the comma, then every
time we have a comma as part of the string data it will be split. That make sense.
When we have string data we surround the content in quotes. If our data had "a consistent format" then
all string data would be enclosed in quotes, even empty data.
Let's demonstrate a well formatted data CSV. Save this data as as
"C:\ArielyBulkInsertTesting\Test02.csv"
1,9999999,"ronen","ariely"
2,8888888,"xxx1,xxx2","yyy"
2,8888888,"xxx1,xxx2",""
3,,"","yyy"
4,7777777,"",""
,2222222,"zzz","kkk"
5,1111111,"5000.5","5"
In that case the solution was very simple. We could use this format file:
<xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/forma
t
"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
>
<RECORD>
<FIELD ID="1" xsi:type="CharTerm"
TERMINATOR=','
/>
<FIELD ID="2" xsi:type="CharTerm"
MAX_LENGTH="7"
TERMINATOR=',\"'
/>
<FIELD ID="3" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="1
5" TERMINATOR='\",\"' />
<FIELD ID="4" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="1
5" TERMINATOR='\"\r\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ID"
xsi:type="SQLINT"
/>
<COLUMN SOURCE="2" NAME="PhoneNumber" xsi:type="SQLINT"
/>
<COLUMN SOURCE="3" NAME="FirstName"
xsi:type="SQLNVARCHAR" />
<COLUMN SOURCE="4" NAME="LastName"
xsi:type="SQLNVARCHAR" />
</ROW>
</BCPFORMAT>
Save the file as "C:\ArielyBulkInsertTesting\Test02.xml"
Clear our table of previous data:
truncate table #test
GO
Now execute the bulk insert and the data should be placed in the table correctly. If our data was
formatted in this way (with consistent format) then we would not need this article :-)
BULK INSERT #test FROM 'C:\ArielyBulkInsertTesting\Test02.csv'
WITH (
FORMATFILE='C:\ArielyBulkInsertTesting\Test02.xml'
, MAXERRORS = 1
--, KEEPNULLS
--, DATAFILETYPE = 'native'
--, CODEPAGE='RAW'
--, ROWTERMINATOR='\r\n'
--, FIRSTROW = 3
);
GO
Let's continue to work on our "real" data! Clear the data
truncate table #test
GO
In some cases we might build a format file which bring us error messages. We already know that the
data will not fit all records. This test will give us more info using the error message. Try to use this
format file (C:\ArielyBulkInsertTesting\Test03.xml):
<xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/forma
t
"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm"
TERMINATOR=","
/>
<FIELD ID="2" xsi:type="CharTerm"
MAX_LENGTH="7"
TERMINATOR=',\"' />
<FIELD ID="3" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="15
" TERMINATOR=','
/>
<FIELD ID="4" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="15
" TERMINATOR='\r\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ID"
xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="PhoneNumber" xsi:type="SQLINT"/>
<COLUMN SOURCE="3" NAME="FirstName" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="LastName" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
Execute our bulk insert
BULK INSERT #test FROM 'C:\ArielyBulkInsertTesting\Test01.csv'
WITH (
FORMATFILE='C:\ArielyBulkInsertTesting\Test03.xml'
, MAXERRORS = 10
--, KEEPNULLS
--, DATAFILETYPE = 'native'
--, CODEPAGE='RAW'
--, ROWTERMINATOR='\r\n'
--, FIRSTROW = 3
);
GO
We get an error message which can help us a lot in this case:
Msg 4864, Level 16, State 1, Line 1 Bulk load data conversion error (type mismatch or invalid character
for the specified codepage) for row 4, column 2 (PhoneNumber).
Moreover we can see that the rest of the records in our data were inserted (using SQL 2012). In some
cases using data with a small amount of inconsistent records this can be the best way, as most of the
data is inserted. Now we can just check which records do not exist in the table and fix it. The error
message includes the number of the problematic first row.
In conclusion the best format file we found succeeds in inserting the first and second columns without
any problem. We recognized that the problems start on the third column.
STEP 2: insert the data into temporary table
This is the main step, as now we can use bulk insert to import the data into SQL Server. Since we found
that our data does not have a consistent format, we are going to use a temporary table to import the
data.
* We don’t have to use a temporary table, as we can just use OPENROWSET to get the data and do the
parsing on-the-fly. I will show this in step 3.
The basic idea is to bring all the data before the problematic point (in our case the first and second
columns) into separate columns, as they should appear in the final table. Then the rest of the data from
the problematic point to the end of the problematic point (or to the end of line if there is no other way)
into one column. So in our case the third and fourth columns will be imported as one column.
Let's do it. We will use this format file (save as C:\ArielyBulkInsertTesting\Test04.xml), which is similar to
"Test01.xml" file, without the third column:
<xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/forma
t
"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm"
TERMINATOR=","
/>
<FIELD ID="2" xsi:type="CharTerm"
MAX_LENGTH="7"
TERMINATOR=','
/>
<FIELD ID="4" xsi:type="CharTerm" COLLATION="Hebrew_CI_AS" MAX_LENGTH="30
" TERMINATOR='\r\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ID"
xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="PhoneNumber"
xsi:type="SQLINT"/>
<COLUMN SOURCE="4" NAME="FirstName_LastName" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
And execute this query (drop old table, create new table with 3 columns, bulk insert data, select and
show the data):
DROP TABLE #test
GO
CREATE TABLE #test (
ID int
, PhoneNumber int
, FirstName_LastName varchar(30)
)
GO
BULK INSERT #test FROM 'C:\ArielyBulkInsertTesting\Test01.csv'
WITH (
FORMATFILE='C:\ArielyBulkInsertTesting\Test04.xml'
, MAXERRORS = 10
--, KEEPNULLS
--, DATAFILETYPE = 'native'
--, CODEPAGE='RAW'
--, ROWTERMINATOR='\r\n'
--, FIRSTROW = 3
);
GO
select * from #test
GO
our results look like this:
ID
PhoneNumber FirstName_LastName
----------- ---------------------------------------1
9999999
"ronen","ariely"
2
8888888
"xxx1,xxx2","yyy"
2
8888888
"xxx1,xxx2",
3
NULL
,"yyy"
4
7777777
,
NULL 2222222
"zzz","kkk"
5
1111111
"5000.5","5"
The goal of this article is to give an optimal use of pure T-SQL in order to import data which is not well
formatted into the database, in an appropriate and effective structure for parsing(step 3). I will not
elaborate on step 3 of parsing the data. There can be hundreds of different ways to do this for each
case.
I'll show some sample solutions in step 3. Those solutions are not necessarily optimal parsing solutions,
but represent solutions in which I use different string functions for parsing our data. Usually when
parsing data in SQL Server it is best to use CLR functions.
Step 3: Parsing the data into the final table
Now that we imported the data, all that we need to do is parse the last column. These queries can do
the job in our case:
select
ID, PhoneNumber , FirstName_LastName
, FN = case
when CHARINDEX('",', FirstName_LastName, 1) > 0
then LEFT (
RIGHT(FirstName_LastName, LEN(FirstName_LastName) - 1)
, CHARINDEX('",', FirstName_LastName, 1) - 2
)
else ''
END
, LN = case
when CHARINDEX(',"', FirstName_LastName, 1) > 0
then SUBSTRING(
FirstName_LastName
, CHARINDEX(',"', FirstName_LastName, 1) + 2
, LEN(FirstName_LastName) - CHARINDEX(',"', FirstName_LastName,
1) - 2 )
else ''
END
from #test
go
-- I use @ char but you should use any combination of chars that cannot be in
the data value!
-- I can clean in one time all " char as I know it is not part of my data
select ID, PhoneNumber , FirstName_LastName
, SUBSTRING(Temp, 0, charindex('@',Temp) ) FN
, SUBSTRING(Temp, charindex('@',Temp) + 1, LEN(Temp) charindex('@',Temp)) LN
from (
select
ID, PhoneNumber , FirstName_LastName
, Temp = REPLACE(REPLACE(REPLACE(REPLACE (FirstName_LastName, '","', '@
'),'",','@'),',"','@'),'"','')
from #test
) T
go
After we found a way to parse the data, we can use a simple SELECT INTO query to move the data from
the temporary table to the final table.
Usually if this is not a onetime operation then I prefer to use one query do it all without declaring a
temporary table. I do need these steps to find my Bulk Insert query & format (step 1+2) and to find the
parsing function (step 3). Next I convert my queries into an OPENROWSET import query like this (in our
case study)
--FINAL TABLE
CREATE TABLE #FINAL (
ID int
, PhoneNumber int
, FirstName varchar(15)
, LastName varchar(15)
)
GO
insert #FINAL
select
ID, PhoneNumber --, FirstName_LastName
, FN = case
when CHARINDEX('",', FirstName_LastName, 1) > 0
then LEFT (
RIGHT(FirstName_LastName, LEN(FirstName_LastName) - 1)
, CHARINDEX('",', FirstName_LastName, 1) - 2
)
else ''
END
, LN = case
when CHARINDEX(',"', FirstName_LastName, 1) > 0
then SUBSTRING(
FirstName_LastName
, CHARINDEX(',"', FirstName_LastName, 1) + 2
, LEN(FirstName_LastName) - CHARINDEX(',"', FirstName_LastName,
1) - 2 )
else ''
END
FROM OPENROWSET(
BULK N'C:\ArielyBulkInsertTesting\Test01.csv'
, FORMATFILE = 'C:\ArielyBulkInsertTesting\Test04.xml'
) a
GO
select * from #FINAL
GO
Summary
The basic idea is to bring all the data in the problematic columns (or until the end of line if there is no
other way) into one column. We can use a temporary table to store the data. Then we can parse the
temporary column using any way that suits us. We can use T-SQL functions or CLR functions like SPLIT.
We can clean some characters using replace. We can find characters using CHARINDEX, and so on. This is
all depends on your specific data. It has nothing to do with bulk insert anymore :-)
We must separate the operation into two parts:
1. Insert the data using bulk insert into the data base (temporary table or using OPENROWSET) in such
way that we will be able to use it for step two
2. Parsing and splitting the text on the last column into the final columns
* This article elaborates step 1.
Comments
* A more complex case study in which I used this logic can be seen in the MSDN forum in this link:
http://social.msdn.microsoft.com/Forums/en-US/5aab602e-1c6b-4316-9b7e-1b89d6c3aebf/bulk-inserthelp-needed
* Usually it is much better to do the parsing using CLR functions. If you are not convinced by my
recommendation then you can check this link: http://www.sqlperformance.com/2012/07/t-sqlqueries/split-strings
* If you can export the file in a consistent format fit with bulk insert than you should do it! This is only a
workaround solution.
* If you can build a well formatted import file in advance, from the original import file, using a small
application which will format a new file, then do it! This is a much better solution as most languages do
a better job of parsing text than SQL Server (T-SQL).
* If you can manage the order of the columns during the exporting, then try to make sure that you move
all the problematic columns to the end. This will help us to use the bulk insert in a more optimal way as
we will need to parse fewer columns in step 3
* Why not import all the data into one column in a temp table instead of STEP 1 & STEP 2?
This is always an option but probably not a good one. In our case study we use a very simple table
structure with 4 columns and only 7 records, but in real life we might get a table with 20 columns or
more and several million records. If we have 2 columns (out of 20 columns) with potential problems and
we can order the columns so those columns come last, than we can import the most of the data (18
columns) into the final data structure, and we will need to import only the last two columns into one
column for parsing. It is much better to separate the data into as many columns as we can and minimize
the use of parsing. Parsing is a CPU intensive operation. Parsing the data after importing will probably
take longer.
When you have to use complex parsing it is much better to use CLR solutions. As I mentioned in the start
this is a pure T-SQL solution.
Resources
* This article is based on several forum questions (more than 15 which I found using google, and I
checked only the first several pages of search results) that remained unanswered for too long. I did not
find any solutions or answers except my own based on this logic. This is a very easy solution but we have
to think outside the box to get it :-)
There are no other references for this solution that I know of and most forum questions that I found
were closed, or by sending the questioner to a different solution like using SSIS, or a third party
application, or by saying that it cannot be done using bulk insert and pure T-SQL.