Document 6562051
Transcription
Document 6562051
QlikView for Ninjas The first and ultimate step to the exciting world of QlikView Rajesh Pillai This book is for sale at http://leanpub.com/qlikviewforninjas This version was published on 2014-10-10 This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do. ©2014 Rajesh Pillai This book is dedicated to my son Rohan, my nephew Tanuv who being just 9.5 years has started with CSS3, HTML and JavaScript, my wife Radhika and my parents and well wishers. This book is also dedicated to all the participants who attended my QlikView training and has given some really constructive feedback based on which this series of books have been structured. Contents Chapter 0 - Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Case study - Adventure Works Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1 - Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 - QlikView Development Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 - Developer Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4 - Data and Scripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . QlikView Data Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 12 5 - Connect, Select and Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7 - Green, White and Gray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 SECTION 2 - Programming for QlikView Ninjas . Variables . . . . . . . . . . . . . . . . . . . . . . Loop . . . . . . . . . . . . . . . . . . . . . . . . For..Next . . . . . . . . . . . . . . . . . . . . . . If..then..elseif..else..end if . . . . . . . . . . . . . 24 25 25 26 29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 0 - Agenda Let’s quickly glance through the table of content in a nutshell. SECTION 1 - QlikView Developer • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Introduction QlikView Development Methodology Data and Scripting Connect, Select and Load Loading Data from SQL Server – Executing Stored Procedures – Reading XML Data from SQL Server Green, White and Gray The Magic of Preceding Load Associations Loading Data from XML file Loading Data from Excel and CSV files Data Modelling Issues Referencing external files Basic Data Transformation Data generation in the QlikView script Scripting Best Practices Master Calendar Mapping Tables Data Model Optimization Link Tables Interval Match Cross Tables Advanced Calculations Alternate States Set Analysis Metadata Generating QVD’s QlikView Security Additional Load Types How to use QlikView Extensions Chapter 0 - Agenda • Performance Tuning SECTION 2 - Programming for QlikView Ninjas SECTION 3 - Advanced Development SECTION 4 - QlikView Designer SECTION 5 - Basic Server Administration 2 Case study - Adventure Works Cycles This case study is adapted from the Microsoft Sample database files. The detailed description can be found here. Adventure Works Cycles, the fictitious company on which the AdventureWorks sample databases are based, is a large, multinational manufacturing company. The company manufactures and sells metal and composite bicycles to North American, European and Asian commercial markets. While its base operation is located in Bothell, Washington with 290 employees, several regional sales teams are located throughout their market base. In 2000, Adventure Works Cycles bought a small manufacturing plant, Importadores Neptuno, located in Mexico. Importadores Neptuno manufactures several critical subcomponents for the Adventure Works Cycles product line. These subcomponents are shipped to the Bothell location for final product assembly. In 2001, Importadores Neptuno, became the sole manufacturer and distributor of the touring bicycle product group. Coming off a successful fiscal year, Adventure Works Cycles is looking to broaden its market share by targeting their sales to their best customers, extending their product availability through an external Web site, and reducing their cost of sales through lower production costs. In all the subsequent chapters, we will cover various scenarios which will address some of AdventureWorks concern. We will build our datamodel for them and also build some visualization around this area. The reason I have taken AdventureWorks as a case study is because it is widely available and accessible, the domain is pretty much familiar to most of the people and the data structure is pretty significantly complex to understand various issues when we will transform from OLTP to OLAP kind of schema. 1 - Introduction Welcome to the “QlikView for Ninja” series of book. This book will be a handy step by step introduction to most of the QlikView features that will help you become a better “QlikView Developer”. This book is in “early access” mode and chapters will be published every week. Also, the readers constructive feedback, will help form the structure of the book as well apart from the core agenda. QlikView is a Guided Analytical BI tool developed by QlikTech(now Qlik). QlikView is more of a business discovery platform. It helps you find data quickly and get out meaning from it to make efficient and quick decisions with minimum support from the developer team. Though mostly the business user or the end user can directly work with QlikView but for efficient administration and management the support of the IT team is required. Also, QlikView is one of the foremost in memory analytics, i.e. everything, all data, including filters eveything is stored in RAM once the document is loaded. This makes it very fast from user experience perspective. Also, with more power comes more responsibilities(remember the Spiderman movie). The application needs to be tuned as it scales and many aspects of performance needs to be taken into account beginning from chart optimization, data model design, server configurations, user loads etc. Associative Technology How QlikView refers from other BI products the associative user experience. Traditional BI tools has a fixed navigation path to explore data, but in QlikView, you can start any where, any column or field and this gives your mind more opportunity to explore and deduce meanings from data. The below diagram visualizes the two experiences. 1 - Introduction 5 Some of the benefits of associative technology is outlined below • Works the way the mind works The user is not limited to follow a predefined path to access the data. The business user can see hidden trends and make discoveries like with no other BI platform on the market. • Delivers direct and indirect search The user can conduct both direct and indirect searches. For e.g. if a user wanted to identify a sales rep but can’t remember his/her name, he can just use some other attributes to do the search. If the business user remembers that the said sales rep sells laptop in APAC region, e can search on the Sales Rep list box and search for “APAC” and “Laptop” to get the names of sales reps who meet those criteria. • Delivers answers as fast as users an think up questions There are many ways in QlikView in which a user can asks question, such as putting data in charts and graphs, maps, tables, sliders, calenders etc. The user can quickly see relationships and find meaning in the data. • Puts in the meaning to the gray In QlikView unassociated data is represented in gray color. The user can easily see the data not associated and this gives them additional insight into their data. Sometimes the “Aha” moment comes up by looking at data that is not directly associated. ** Components of the QlikView Business Discovery platform ** 1 - Introduction 6 The below figure depicts a simplified view of QlikView deployment diagram containing the products that take part in the deployment. 1 - Introduction 7 QLIKVIEW DESKTOP The QlikView Desktop is a windows based desktop tool that is used by developers and business analysts to create data models and to build the graphical user interface for QlikView apps. This is the swiss army knife for the QlikView developer and Designer who can use all the features to create robust data model and a very efficient user interface for the clients. All the scripting is done through this application. The file type that is created using the QlikView Desktop is known as QVW (.qvw or QlikView file). One can also create a readonly QVD (QlikView data) file, which is a format which QlikView uses to store its compressed data. QLIKVIEW SERVER (QVS) The QVS is a server product that contains the in-memory analytics engine. The QVS handles all client/server communication between a QlikView client (i.e. desktop, AJAX, IE plugin or Mobile) and the server. It includes a management environment i.e QlikView Management Console for providing administrator access to control all aspects of the server deployments including security, distribution, clustering, authorization etc) and also included a web server to provide front-end access to the documents within. The web server’s user portal is known as Access Point.​ Also, its important to note that while QVS contains its own web server, once can also utilize Microsoft IIS (Internet Information Sever) for 1 - Introduction 8 this purpose as well. The QVS handles client authorization against existing directory providers like Microsoft Active Directory, LDAP etc and also performs read and write to ACL’s (Access Control lists) for QVW documents. QLIKVIEW PUBLISHER The QlikView Publisher is also a server-side product that performs two important functions: 1. It is used to load data directly from data sources defined via connection strings in the QVW files. 2. It is also used as a distribution service to reduce data and applications from source QVW file based on various rules, such as user authorization or data access privileges (these rules are based on fields from the data model) and to distribute these newly-created documents to the appropriate QlikView servers or static PDF reports via email. QlikView publisher is not a mandatory server component though, but is useful for large scale enterprises. When QlikView Server is installed the publisher component is also installed, but it is only activated when the publisher license is enabled in the QlikView Management Console. ** How QlikVIew Works: A peek under the covers ** When a QlikView document is published to a QlikView Server, the content it contains become available for consumption by any user with the required privileges to access it. The flow is outlined below: • When a user first opens a QlikView document, data is loaded in memory (server memory). The compressed and unaggregated dataset is loaded from the disk into QlikView Server’s RAM. This in-memory repository serves as the based dataset for this initial user and all other users requesting the same document. This repository stays in memory until no user activity has occurred within a defined time-out period. • Users explore data via selections. The concept of user defined selection state is central to QlikVIew. As user clicks around in a QlikView document trying to demystify the maze of data, they indicate which subsets of data they are interested in analyzing and which subsets should be ignored. QlikView takes advantage of the highly indexed nature of the unaggregated dataset. QlikView dynamically presents a subset of all the data available to the QlikView document based on the selection state. This happens in real time. • Upon selection, aggregates render instantly. On the fly, QlikView renders aggregates as intuitive and interactive user interface objects via, charts, graphs, tables, listbox etc. Users interact with objects in QlikView documents through any supported client. Also users can create their own objects using collaboration features of QlikView. We will have a look at how data is structured in QlikView memory model in later chapters. 2 - QlikView Development Methodology Each and every company and may be individuals have their own development methodology. Now QlikView doesn’t put a hard and fast rule as to what kind of methodology works, but its recommended to go with any kind of agile methodology to get the most from the project in time and budget. Irrespective of the selection of any specific methodology, some coding practics and development guidelines needs to be strictly followed to get optimum result These practices are spread across the following major activities. • • • • • Scripting Guidelines Requirements Gathering, Understanding the key metrics and KPI’s Source control integrations Understanding the source data et el. We will cover these topics incrementally over the next series of chapters. 3 - Developer Roles For any successful QlikView implementation it is very essential to have the correct team structure. This team structure could compose of single person to start with but eventually for all different roles there could be specialist appointed. Let’s look at roles available in any QlikView implementation. The QlikView team structure consist primary of a backend developer, a designer, a visualization expert and an administrator. Each and every role has its own responsibility, but as I said, when you are starting up with QlikView implementation, may be a single person could be playing all the roles. But as you move forward into the project, a dedicated team needs to be setup to deal with the chores of each roles. A successful implementation consist of a fine mixture of the above roles. The responsibilities of each role is outlined below. The various roles are summarized the the below table. 11 3 - Developer Roles Role Description Developer/BA QlikView Developer is responsible for building the data model and for scripting the ETL tasks. The Business Analyst (BA) role is kind of common and is either played by developer or designer based on the team structure. Designer/BA QlikView Designer works with charts and user interfaces. In small firms or when QlikView is being evaluated the developer and designer role is usually played by a single person. Visualization Visualization expert deals with the over UX of the QlikView application. Sometimes the designer and visualization role is played by the same member depending on their skill set. Admin Administrator is responsible for managing the QlikView server environment, managing licenses, reload tasks, security etc. Business User They work with QlikView applications mostly through browser interface (access point) or even desktop interface. 4 - Data and Scripting All the data files used with this book will be available shortly. In the meantime you can grab the primary source from CodePlex. We will be using the AdventureWorks 2012 OLTP database for all our demo code. The required files can be found in the leanpub QlikView for Ninja home page. QlikView Data Modelling A data model is a conceptual model that describes how data are related and accessed in a system. The two main concepts in a data modelling is that of dimensions and facts. ** Dimensions and facts ** A dimension is used to categorize data, such as products, customers, region or ara etc. It consists of one of more tables containing keys and attributes that describe the data values. A fact table on the other hand generally contains the foreign keys of dimension tables along with measures (numeric data). Each row of the fact table is defined by the set of dimensional keys that contribute the measurements or measures. An example of a fact would be sales revenue, number of issues resolved etc. ** Star schemas and snowflake schemas ** A schema is a way to graphically represent a model through a diagram. A data model with merged dimensions can be represented by a star schema. The name derives from the fact that, a full fledged star schema resembles a start, with a central point and other other nodes erupting from it. This structure contains a single fact table surrounded by a set of dimension tables. 4 - Data and Scripting 13 Snowflakes on the other hand more of a relational strucute, in which the dimension tables are not fully merged. 4 - Data and Scripting 14 The general rule of thumb is to use star schema for analysis purpose and snowflake schema for transactional database. We will be build a data model that initially represents the first figure below and then through data modelling activities we will make the model look as in the second figure. Figure : 1 4 - Data and Scripting 15 Figure : 2 Wishing you best in your data modelling adventures and we will use many of the ninja like tools to achieve our end objective. At any point you can press “Ctrl+T” to bring up the table viewer in QlikView. Table viewer is a handy tool that shows the struture of tables laid out in QlikViews memory. 4 - Data and Scripting It shows both the source structure and the internal table structure. We will use the internal table structure view for troubleshooting most of our data modelling issues. AdventureWorks Sample Database¹ ¹https://msftdbprodsamples.codeplex.com/releases/view/55330 16 5 - Connect, Select and Load Connect, Select and Load is the basic mechanism through which data can be loaded into QlikView. Data can be loaded from various sources like flat files (csv, excel, tab delimited), RDBMS or any other systems. You can also get/buy or write custom providers for data sources which QlikView doesn’t provide the connectors. Exercise 1. Load an excel file. This will be the only most elaborate step, starting with creating a new doucment. Rest of the exercise will contain on the required screenshots for understanding. 1. Fire up your QlikView and click on “New Document”. 2. You will be greeted with the “Getting started” wizard. Close this by clicking on th “X” button or by pressing the “ESC” key. 5 - Connect, Select and Load 18 3. The next thing to do then is to save this document. QlikView will automatically add the file extension as “.qvw”. 4. Open up the “Script Editor” by clicking on the “Edit Script” toolbar or by pressing the “Ctrl+E” key on your keyboard. The following screen will show up. 5. Create a new tab by clicking on the “Tab” menu and then click on “Add Tab” and enter a friendly name. 5 - Connect, Select and Load 19 6. Click on the “Relative Path” check box and then click on the “Table Files” and select the “special_offer.xlsx”. 7. You will be presented with the “File Wizard:Type” dialog. Look at the “File Type” section on left side. The file type Excel(xlsx) should be automatically selected. 5 - Connect, Select and Load 20 8. Click on the “Finish” button for now and the following screen should be presented to you. 9. Click on the “Reload” toolbar menu to load the data into QlikView. You will get a progress dialog, which will be automatically closed or you have to manually close based on User 5 - Connect, Select and Load settings. We will see these settings in later chapters. 10. On the subsequent screen click on “Add All” and then click on “OK”. 21 5 - Connect, Select and Load 22 11. The following will be the output of your hard labour. (Don’t despair, this will get better). Click on “Layout” menu and then click on”Rearrange Sheet Objects”. 7 - Green, White and Gray The color green, white and gray, has significant meaning within QlikView. Green color represents the current selection or filter. White color represents records matching the current selections and Gray color represents records not matching the selections. Don’t worry about how things are related. We will talk about relations in detail. For now understand that if the name of the field matches, QlikView creates an association between the tables automatically. SECTION 2 - Programming for QlikView Ninjas Programming is a way to control the computer or rather in this case your QlikView applications. Programming is nothing more than a series of step by step instructions indication the computer to do a specific task. The following figure depicts the basic element that constitutes any given program. Let’s dissect each of the above elements. In any way the above elements doesn’t represent the entire thing, but they form the most important part from the whole. SECTION 2 - Programming for QlikView Ninjas 25 Variables Variables are placeholders in memory. They hold whatever value (number, strings, date) you put in them. You can set a variable to a specific value and also read values from them. In QlikView there are two ways to define a variable, by SET and LET statement. SET statement can be used to define variables for lazy evaluation, i.e. variables assigned with SET is not immediately evaluated and is stored as is. This is useful for substituting strings, paths, drives, evaluating formulas etc. SET x = 3 + 2; In the above statement the variable is evaluated as 3 + 2. The LET statement on the other hand evaluates whatever is on the right hand side and store the result in the variable. For e.g. SET y = 2 + 2; In the above statement y contains the value 4. Variables are a very powerful construct in QlikView (rather any programming environment). It is variable that makes the system dynamic. You can change the behaviour of the application by changing the variables. We will be heavily using variables when creating master calendars, qvd generators, in reusing expressions, even multilingual design etc. Loop A loop represents repetition in the program. Loop is yet another powerful construct in any programming environment. This helps us maintain our program/script and keep it small by avoiding repetition. A loop executes so long as the condition defining the loop is met. There are various ways in which loops can be defined. Let’s examine one at a time. The do..loop control statement is a script iteration construct which executes one or several statements until a logical condition is met. The syntax is: do[ ( while | until ) condition ] [statements] [exit do [ ( when | unless ) condition ] [statements] loop[ ( while | until ) condition ] Where: condition SECTION 2 - Programming for QlikView Ninjas 26 is a logical expression evaluating to true or false. statements is any group of one or more QlikView script statements. The while or until conditional clause must only appear once in any do..loop statement, i.e. either after do or after loop. Each condition is interpreted only the first time it is encountered but is evaluated for every time it encountered in the loop. If an exit do clause is encountered inside the loop, the execution of the script will be transferred to the first statement after the loop clause denoting the end of the loop. An exit do clause can be made conditional by the optional use of a when or unless suffix. Since the do..loop statement is a control statement and as such is ended with either a semicolon or end-of-line, each of its three possible clauses (do, exit do and loop) must not cross a line boundary. Simply the below figure illustrates the logic. Though the above program does nothing apart from looping 9 times, but the concepts can be applied to any area. Inside the loop you can load files, create variables, automate script creation etc. For..Next The for..next control statement is a script iteration construct with a counter. The statements inside the loop enclosed by for and next will be executed for each value of the counter variable between specified low and high limits. The syntax is: for counter = expr1 to expr2 [ step expr3 ] SECTION 2 - Programming for QlikView Ninjas 1 2 3 4 5 27 [statements] [exit for [ ( when \| unless ) condition ] [statements] next[counter] Where: counter is a variable name. If counter is specified after next it must be the same variable name as the one found after the corresponding for. expr1 is an expression which determines the first value of the counter variable for which the loop should be executed. expr2 is an expression which determines the last value of the counter variable for which the loop should be executed. expr3 is an expression which determines the value indicating the increment of the counter variable each time the loop has been executed. condition is a logical expression evaluating to true or false. statements is any group of one or more QlikView script statements. The expressions expr1, expr2 and expr3 are only evaluated the first time the loop is entered. The value of the counter variable may be changed by statements inside the loop, but this is not good programming practice. If an exit for clause is encountered inside the loop, the execution of the script will be transferred to the first statement after the next clause denoting the end of the loop. An exit for clause can be made conditional by the optional use of a when or unless suffix. Since the for..next statement is a control statement and as such is ended with either a semicolon or end-of-line, each of its three possible clauses (for..to..step, exit for and next) must not cross a line boundary. ** Exercise: Loading variables defined in excel** Lets create variables dynamically which is defined in excel. Later on we will see how to extend this functionality. For this use the file variables.xmls in the datasource folder. Look at the figure below for the code. The full code in text format is given at the end of this section. SECTION 2 - Programming for QlikView Ninjas 28 Here’s is a line by line explanation. Line 1 indicates that the file that is loaded uses relative path. This is optional, QlikView adds this whenever you check the “Relative Paths” checkbox. Line 2 gives a new name to our table, i.e. VariableTable Line 3 to 7 is the load statement Line 10 is a comment statement. Comments are ignored by QlikView and the only reason for their existence is for documentation purpose. Line 12 starts a for loop with variable “i” which starts with 0 and continues upto the count of rows present in the excel sheet. The NoOfRows(‘tablename’) funtion takes a table name as a parameter and returns the count of rows. So, in our case there are 2 rows in the variables.xlsx file. Line 13 creates a variable called “vname” that contains the name of the variable form the excel sheet. The peek() function in QlikView is used to read records from already loaded table. The first parameter is the name of the “field” to read, the second parameter is the record number (record no. starts with 0 in QlikView) and the third parameter is the name of the table. Line 14 creats a dynamic variable using $ sign expansion, which contains the value of the variable. So first the when the loop is executed the following will be the state of the program. 1 2 3 Let vName = "vSourceQvdPath"; \$(vName) = "c:\\qlikview\\apps\\qvd\\"; Line 15 will continue the loop. SECTION 2 - Programming for QlikView Ninjas 29 Line 18 and 19 removes the unwanted variable. Line 21 drops the table (as it is no longer needed and the variables are already created). When you reload this script and inspect the variable (by clicking on Settings->Variable Overview) you can see the variables that you created in the script. Here is full code. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Directory; VariableTable: LOAD VariableName, VariableValue FROM DataSources\variables.xlsx (ooxml, embedded labels, table is Variables); // Set variables for each row in the sheet for i = 0 to NoOfRows('VariableTable') - 1 LET vname = peek('VariableName', i, 'VariableTable'); LET $(vname) = peek('VariableValue',i,'VariableTable'); next i DROP Table VariableTable; If..then..elseif..else..end if The if..then control statement is a script selection construct forcing the script execution to follow different paths depending on one or several logical conditions. The syntax is: 1 2 3 4 5 6 7 8 9 10 11 if condition then [ statements ] { elseif condition then [ statements ] } [ else [ statements ] ] SECTION 2 - Programming for QlikView Ninjas 12 13 1 2 3 4 5 6 30 end if Where: condition is a logical expression which can be evaluated as true or false. statements is any group of one or more QlikView script statements. Since the if..then statement is a control statement and as such is ended with either a semicolon or end-of-line, each of its four possible clauses (if..then, elseif..then, else and end if) must not cross a line boundary. ** Exercise: Lets only do an actual reload of the document only if the reload parameter is set in reloadflag.txt file ** The reloadflag.txt file can be found in the DataSources folder. In actual case, this file may be stored in a remote accessible shared location. Please go through the line by line explanation of the above script. Line 1 assigns a tablename to the load statement. Line 2 Since the text file doesn’t have any header we are indicating to QlikView that we only need to load the First 1 record. QlikView will automatically assign a column name [@@] 1 if no column headers is present. Line 15 creates a varaible named reload, which contains the value from the text field. SECTION 2 - Programming for QlikView Ninjas 31 Line 17 Trace simply outputs whatever is feeded to it. This can be viewed in the Script Execution Progress Dialog. Line 19 check whether the reload value of the reload variable after coverting it to upper case is equal to FALSE. Line 20 Will exit the script if Line 19 statement is evaluated to true (reload is false) NOTE: Sometimes the Script Execution Progress Dialog just closes of automatically after the reload. You can change this behaviour by setting the “Keep Progress Open After Reload” value checked in the “Settings->User Preferences” menu. Here is the full code. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [ReloadFlag]: First 1 LOAD * FROM DataSources\reloadflag.txt (txt, codepage is 1252, no labels, delimiter is '\t', msq); LET reload = Peek('@1',0,'ReloadFlag'); // Just for troubleshooting. rogress Dialog This value will be shown in the Script Execution P\ TRACE reload; if upper(reload) = 'FALSE' then exit Script end if